* [Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage [not found] <546CE8EC.9090908@gmail.com> @ 2014-11-19 20:12 ` Gary R Hook 2014-11-20 9:54 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 3+ messages in thread From: Gary R Hook @ 2014-11-19 20:12 UTC (permalink / raw) To: qemu-devel Ugh, I wish I could teach Thunderbird to understand how to reply to a newsgroup. Apologies to Paolo for the direct note. On 11/19/14 4:19 AM, Paolo Bonzini wrote: > > > On 19/11/2014 10:35, Dr. David Alan Gilbert wrote: >> * Paolo Bonzini (pbonzini@redhat.com) wrote: >>> >>> >>> On 18/11/2014 21:28, Dr. David Alan Gilbert wrote: >>>> This seems odd, since as far as I know the tunneling code is quite separate >>>> to the migration code; I thought the only thing that the migration >>>> code sees different is the file descriptors it gets past. >>>> (Having said that, again I don't know storage stuff, so if this >>>> is a storage special there may be something there...) >>> >>> Tunnelled migration uses the old block-migration.c code. Non-tunnelled >>> migration uses the NBD server and block/mirror.c. >> >> OK, that explains that. Is that because the tunneling code can't >> deal with tunneling the NBD server connection? >> >>> The main problem with >>> the old code is that uses a possibly unbounded amount of memory in >>> mig_save_device_dirty and can have huge jitter if any serious workload >>> is running in the guest. >> >> So that's sending dirty blocks iteratively? Not that I can see >> when the allocations get freed; but is the amount allocated there >> related to total disk size (as Gary suggested) or to the amount >> of dirty blocks? > > It should be related to the maximum rate limit (which can be set to > arbitrarily high values, however). This makes no sense. The code in block_save_iterate() specifically attempts to control the rate of transfer. But when qemu_file_get_rate_limit() returns a number like 922337203685372723 (0xCCCCCCCCCCB3333) I'm under the impression that no bandwidth constraints are being imposed at this layer. Why, then, would that transfer be occurring at 20MB/sec (simple, under-utilized 1 gigE connection) with no clear bottleneck in CPU or network? What other relation might exist? > The reads are started, then the ones that are ready are sent and the > blocks are freed in flush_blks. The jitter happens when the guest reads > a lot but only writes a few blocks. In that case, the bdrv_drain_all in > mig_save_device_dirty can be called relatively often and it can be > expensive because it also waits for all guest-initiated reads to complete. Pardon my ignorance, but this does not match my observations. What I am seeing is the process size of the source qemu grow steadily until the COR completes; during this time the backing file on the destination system does not change/grow at all, which implies that no blocks are being transferred. (I have tested this with a 25GB VM disk, and larger; no network activity occurs during this period.) Once the COR is done and the in-memory copy ready (marked by a "Completed 100%" message from blk_mig_save_builked_block()) the transfer begins. At an abysmally slow rate, I'll add, per the above. Another problem to be investigated. > The bulk phase is similar, just with different functions (the reads are > done in mig_save_device_bulk). With a high rate limit, the total > allocated memory can reach a few gigabytes indeed. Much, much more than that. It's definitely dependent upon the disk file size. Tiny VM disks are a nit; big VM disks are a problem. > Depending on the scenario, a possible disadvantage of NBD migration is > that it can only throttle each disk separately, while the old code will > apply a single limit to all migrations. How about no throttling at all? And just to be very clear, the goal is fast (NBD-based) migrations of VMs using non-shared storage over an encrypted channel. Safest, worst-case scenario. Aside from gaining an understanding of this code. Thank you for your attention. -- Gary R Hook Senior Kernel Engineer NIMBOXX, Inc ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage 2014-11-19 20:12 ` [Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage Gary R Hook @ 2014-11-20 9:54 ` Dr. David Alan Gilbert 2014-11-20 17:04 ` Gary R Hook 0 siblings, 1 reply; 3+ messages in thread From: Dr. David Alan Gilbert @ 2014-11-20 9:54 UTC (permalink / raw) To: Gary R Hook; +Cc: qemu-devel * Gary R Hook (grhookatwork@gmail.com) wrote: > Ugh, I wish I could teach Thunderbird to understand how to reply to a > newsgroup. > > Apologies to Paolo for the direct note. > > On 11/19/14 4:19 AM, Paolo Bonzini wrote: > > > > > >On 19/11/2014 10:35, Dr. David Alan Gilbert wrote: > >>* Paolo Bonzini (pbonzini@redhat.com) wrote: > >>> > >>> > >>>On 18/11/2014 21:28, Dr. David Alan Gilbert wrote: > >>>>This seems odd, since as far as I know the tunneling code is quite separate > >>>>to the migration code; I thought the only thing that the migration > >>>>code sees different is the file descriptors it gets past. > >>>>(Having said that, again I don't know storage stuff, so if this > >>>>is a storage special there may be something there...) > >>> > >>>Tunnelled migration uses the old block-migration.c code. Non-tunnelled > >>>migration uses the NBD server and block/mirror.c. > >> > >>OK, that explains that. Is that because the tunneling code can't > >>deal with tunneling the NBD server connection? > >> > >>>The main problem with > >>>the old code is that uses a possibly unbounded amount of memory in > >>>mig_save_device_dirty and can have huge jitter if any serious workload > >>>is running in the guest. > >> > >>So that's sending dirty blocks iteratively? Not that I can see > >>when the allocations get freed; but is the amount allocated there > >>related to total disk size (as Gary suggested) or to the amount > >>of dirty blocks? > > > >It should be related to the maximum rate limit (which can be set to > >arbitrarily high values, however). > > This makes no sense. The code in block_save_iterate() specifically > attempts to control the rate of transfer. But when > qemu_file_get_rate_limit() returns a number like 922337203685372723 > (0xCCCCCCCCCCB3333) I'm under the impression that no bandwidth > constraints are being imposed at this layer. Why, then, would that > transfer be occurring at 20MB/sec (simple, under-utilized 1 gigE > connection) with no clear bottleneck in CPU or network? What other > relation might exist? Disk IO on the disk that you're trying to transfer? > >The reads are started, then the ones that are ready are sent and the > >blocks are freed in flush_blks. The jitter happens when the guest reads > >a lot but only writes a few blocks. In that case, the bdrv_drain_all in > >mig_save_device_dirty can be called relatively often and it can be > >expensive because it also waits for all guest-initiated reads to complete. > > Pardon my ignorance, but this does not match my observations. What I am > seeing is the process size of the source qemu grow steadily until the > COR completes; during this time the backing file on the destination > system does not change/grow at all, which implies that no blocks are > being transferred. (I have tested this with a 25GB VM disk, and larger; > no network activity occurs during this period.) Once the COR is done and > the in-memory copy ready (marked by a "Completed 100%" message from > blk_mig_save_builked_block()) the transfer begins. At an abysmally slow > rate, I'll add, per the above. Another problem to be investigated. Odd thought; can you try dropping your migration bandwidth limit (migrate_set_speed) - try something low, like 10M - does the behaviour stay the same, or does it start transmitting disk data before it's read the lot? > >The bulk phase is similar, just with different functions (the reads are > >done in mig_save_device_bulk). With a high rate limit, the total > >allocated memory can reach a few gigabytes indeed. > > Much, much more than that. It's definitely dependent upon the disk file > size. Tiny VM disks are a nit; big VM disks are a problem. Well, if as you say it's not starting transmitting for some reason until it's read the lot then that would make sense. > >Depending on the scenario, a possible disadvantage of NBD migration is > >that it can only throttle each disk separately, while the old code will > >apply a single limit to all migrations. > > How about no throttling at all? And just to be very clear, the goal is > fast (NBD-based) migrations of VMs using non-shared storage over an > encrypted channel. Safest, worst-case scenario. Aside from gaining an > understanding of this code. There are vague plans to add TLS support for encrypting these streams internally to qemu; but they're just thoughts at the moment. > Thank you for your attention. Dave > > -- > Gary R Hook > Senior Kernel Engineer > NIMBOXX, Inc > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage 2014-11-20 9:54 ` Dr. David Alan Gilbert @ 2014-11-20 17:04 ` Gary R Hook 0 siblings, 0 replies; 3+ messages in thread From: Gary R Hook @ 2014-11-20 17:04 UTC (permalink / raw) To: qemu-devel; +Cc: Dr. David Alan Gilbert On 11/20/14 3:54 AM, Dr. David Alan Gilbert wrote: > * Gary R Hook (grhookatwork@gmail.com) wrote: >> Ugh, I wish I could teach Thunderbird to understand how to reply to a >> newsgroup. >> >> Apologies to Paolo for the direct note. >> >> On 11/19/14 4:19 AM, Paolo Bonzini wrote: >>> >>> >>> On 19/11/2014 10:35, Dr. David Alan Gilbert wrote: >>>> * Paolo Bonzini (pbonzini@redhat.com) wrote: >>>>> >>>>> >>>>> On 18/11/2014 21:28, Dr. David Alan Gilbert wrote: >>>>>> This seems odd, since as far as I know the tunneling code is quite separate >>>>>> to the migration code; I thought the only thing that the migration >>>>>> code sees different is the file descriptors it gets past. >>>>>> (Having said that, again I don't know storage stuff, so if this >>>>>> is a storage special there may be something there...) >>>>> >>>>> Tunnelled migration uses the old block-migration.c code. Non-tunnelled >>>>> migration uses the NBD server and block/mirror.c. >>>> >>>> OK, that explains that. Is that because the tunneling code can't >>>> deal with tunneling the NBD server connection? >>>> >>>>> The main problem with >>>>> the old code is that uses a possibly unbounded amount of memory in >>>>> mig_save_device_dirty and can have huge jitter if any serious workload >>>>> is running in the guest. >>>> >>>> So that's sending dirty blocks iteratively? Not that I can see >>>> when the allocations get freed; but is the amount allocated there >>>> related to total disk size (as Gary suggested) or to the amount >>>> of dirty blocks? >>> >>> It should be related to the maximum rate limit (which can be set to >>> arbitrarily high values, however). >> >> This makes no sense. The code in block_save_iterate() specifically >> attempts to control the rate of transfer. But when >> qemu_file_get_rate_limit() returns a number like 922337203685372723 >> (0xCCCCCCCCCCB3333) I'm under the impression that no bandwidth >> constraints are being imposed at this layer. Why, then, would that >> transfer be occurring at 20MB/sec (simple, under-utilized 1 gigE >> connection) with no clear bottleneck in CPU or network? What other >> relation might exist? > > Disk IO on the disk that you're trying to transfer? Well, non-tunneled runs fast enough (120 MB/s) to saturate the network pipe, so it's evident to me that the blocks can come screaming from the disk plenty fast. And there's no CPU bottleneck; the VM is really not doing much of anything at all. So I'll say no. I shall continue my investigation. >>> The reads are started, then the ones that are ready are sent and the >>> blocks are freed in flush_blks. The jitter happens when the guest reads >>> a lot but only writes a few blocks. In that case, the bdrv_drain_all in >>> mig_save_device_dirty can be called relatively often and it can be >>> expensive because it also waits for all guest-initiated reads to complete. >> >> Pardon my ignorance, but this does not match my observations. What I am >> seeing is the process size of the source qemu grow steadily until the >> COR completes; during this time the backing file on the destination >> system does not change/grow at all, which implies that no blocks are >> being transferred. (I have tested this with a 25GB VM disk, and larger; >> no network activity occurs during this period.) Once the COR is done and >> the in-memory copy ready (marked by a "Completed 100%" message from >> blk_mig_save_builked_block()) the transfer begins. At an abysmally slow >> rate, I'll add, per the above. Another problem to be investigated. > > Odd thought; can you try dropping your migration bandwidth limit > (migrate_set_speed) - try something low, like 10M - does the behaviour > stay the same, or does it start transmitting disk data before it's read > the lot? Interesting idea. I shall attempt that. >>> The bulk phase is similar, just with different functions (the reads are >>> done in mig_save_device_bulk). With a high rate limit, the total >>> allocated memory can reach a few gigabytes indeed. >> >> Much, much more than that. It's definitely dependent upon the disk file >> size. Tiny VM disks are a nit; big VM disks are a problem. > > Well, if as you say it's not starting transmitting for some reason until > it's read the lot then that would make sense. Right. I'm just saying that I don't think this works the way people thinks it works. >>> Depending on the scenario, a possible disadvantage of NBD migration is >>> that it can only throttle each disk separately, while the old code will >>> apply a single limit to all migrations. >> >> How about no throttling at all? And just to be very clear, the goal is >> fast (NBD-based) migrations of VMs using non-shared storage over an >> encrypted channel. Safest, worst-case scenario. Aside from gaining an >> understanding of this code. > > There are vague plans to add TLS support for encrypting these streams > internally to qemu; but they're just thoughts at the moment. :-( -- Gary R Hook Senior Kernel Engineer NIMBOXX, Inc ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-11-20 17:04 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <546CE8EC.9090908@gmail.com> 2014-11-19 20:12 ` [Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage Gary R Hook 2014-11-20 9:54 ` Dr. David Alan Gilbert 2014-11-20 17:04 ` Gary R Hook
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).