[Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage
       [not found] <546CE8EC.9090908@gmail.com>
@ 2014-11-19 20:12 ` Gary R Hook
  2014-11-20  9:54   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 3+ messages in thread
From: Gary R Hook @ 2014-11-19 20:12 UTC (permalink / raw)
  To: qemu-devel

Ugh, I wish I could teach Thunderbird to understand how to reply to a 
newsgroup.

Apologies to Paolo for the direct note.

On 11/19/14 4:19 AM, Paolo Bonzini wrote:
>
>
> On 19/11/2014 10:35, Dr. David Alan Gilbert wrote:
>> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>>>
>>>
>>> On 18/11/2014 21:28, Dr. David Alan Gilbert wrote:
>>>> This seems odd, since as far as I know the tunneling code is quite separate
>>>> to the migration code; I thought the only thing that the migration
>>>> code sees different is the file descriptors it gets past.
>>>> (Having said that, again I don't know storage stuff, so if this
>>>> is a storage special there may be something there...)
>>>
>>> Tunnelled migration uses the old block-migration.c code.  Non-tunnelled
>>> migration uses the NBD server and block/mirror.c.
>>
>> OK, that explains that.  Is that because the tunneling code can't
>> deal with tunneling the NBD server connection?
>>
>>> The main problem with
>>> the old code is that uses a possibly unbounded amount of memory in
>>> mig_save_device_dirty and can have huge jitter if any serious workload
>>> is running in the guest.
>>
>> So that's sending dirty blocks iteratively? Not that I can see
>> when the allocations get freed; but is the amount allocated there
>> related to total disk size (as Gary suggested) or to the amount
>> of dirty blocks?
>
> It should be related to the maximum rate limit (which can be set to
> arbitrarily high values, however).

This makes no sense. The code in block_save_iterate() specifically
attempts to control the rate of transfer. But when
qemu_file_get_rate_limit() returns a number like 922337203685372723
(0xCCCCCCCCCCB3333) I'm under the impression that no bandwidth
constraints are being imposed at this layer. Why, then, would that
transfer be occurring at 20MB/sec (simple, under-utilized 1 gigE
connection) with no clear bottleneck in CPU or network? What other
relation might exist?

> The reads are started, then the ones that are ready are sent and the
> blocks are freed in flush_blks.  The jitter happens when the guest reads
> a lot but only writes a few blocks.  In that case, the bdrv_drain_all in
> mig_save_device_dirty can be called relatively often and it can be
> expensive because it also waits for all guest-initiated reads to complete.

Pardon my ignorance, but this does not match my observations. What I am
seeing is the process size of the source qemu grow steadily until the
COR completes; during this time the backing file on the destination
system does not change/grow at all, which implies that no blocks are
being transferred. (I have tested this with a 25GB VM disk, and larger;
no network activity occurs during this period.) Once the COR is done and
the in-memory copy ready (marked by a "Completed 100%" message from
blk_mig_save_builked_block()) the transfer begins. At an abysmally slow
rate, I'll add, per the above. Another problem to be investigated.

> The bulk phase is similar, just with different functions (the reads are
> done in mig_save_device_bulk).  With a high rate limit, the total
> allocated memory can reach a few gigabytes indeed.

Much, much more than that. It's definitely dependent upon the disk file
size. Tiny VM disks are a nit; big VM disks are a problem.

> Depending on the scenario, a possible disadvantage of NBD migration is
> that it can only throttle each disk separately, while the old code will
> apply a single limit to all migrations.

How about no throttling at all? And just to be very clear, the goal is
fast (NBD-based) migrations of VMs using non-shared storage over an
encrypted channel. Safest, worst-case scenario. Aside from gaining an
understanding of this code.

Thank you for your attention.

-- 
Gary R Hook
Senior Kernel Engineer
NIMBOXX, Inc

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage
  2014-11-19 20:12 ` [Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage Gary R Hook
@ 2014-11-20  9:54   ` Dr. David Alan Gilbert
  2014-11-20 17:04     ` Gary R Hook
  0 siblings, 1 reply; 3+ messages in thread
From: Dr. David Alan Gilbert @ 2014-11-20  9:54 UTC (permalink / raw)
  To: Gary R Hook; +Cc: qemu-devel

* Gary R Hook (grhookatwork@gmail.com) wrote:
> Ugh, I wish I could teach Thunderbird to understand how to reply to a
> newsgroup.
> 
> Apologies to Paolo for the direct note.
> 
> On 11/19/14 4:19 AM, Paolo Bonzini wrote:
> >
> >
> >On 19/11/2014 10:35, Dr. David Alan Gilbert wrote:
> >>* Paolo Bonzini (pbonzini@redhat.com) wrote:
> >>>
> >>>
> >>>On 18/11/2014 21:28, Dr. David Alan Gilbert wrote:
> >>>>This seems odd, since as far as I know the tunneling code is quite separate
> >>>>to the migration code; I thought the only thing that the migration
> >>>>code sees different is the file descriptors it gets past.
> >>>>(Having said that, again I don't know storage stuff, so if this
> >>>>is a storage special there may be something there...)
> >>>
> >>>Tunnelled migration uses the old block-migration.c code.  Non-tunnelled
> >>>migration uses the NBD server and block/mirror.c.
> >>
> >>OK, that explains that.  Is that because the tunneling code can't
> >>deal with tunneling the NBD server connection?
> >>
> >>>The main problem with
> >>>the old code is that uses a possibly unbounded amount of memory in
> >>>mig_save_device_dirty and can have huge jitter if any serious workload
> >>>is running in the guest.
> >>
> >>So that's sending dirty blocks iteratively? Not that I can see
> >>when the allocations get freed; but is the amount allocated there
> >>related to total disk size (as Gary suggested) or to the amount
> >>of dirty blocks?
> >
> >It should be related to the maximum rate limit (which can be set to
> >arbitrarily high values, however).
> 
> This makes no sense. The code in block_save_iterate() specifically
> attempts to control the rate of transfer. But when
> qemu_file_get_rate_limit() returns a number like 922337203685372723
> (0xCCCCCCCCCCB3333) I'm under the impression that no bandwidth
> constraints are being imposed at this layer. Why, then, would that
> transfer be occurring at 20MB/sec (simple, under-utilized 1 gigE
> connection) with no clear bottleneck in CPU or network? What other
> relation might exist?

Disk IO on the disk that you're trying to transfer?

> >The reads are started, then the ones that are ready are sent and the
> >blocks are freed in flush_blks.  The jitter happens when the guest reads
> >a lot but only writes a few blocks.  In that case, the bdrv_drain_all in
> >mig_save_device_dirty can be called relatively often and it can be
> >expensive because it also waits for all guest-initiated reads to complete.
> 
> Pardon my ignorance, but this does not match my observations. What I am
> seeing is the process size of the source qemu grow steadily until the
> COR completes; during this time the backing file on the destination
> system does not change/grow at all, which implies that no blocks are
> being transferred. (I have tested this with a 25GB VM disk, and larger;
> no network activity occurs during this period.) Once the COR is done and
> the in-memory copy ready (marked by a "Completed 100%" message from
> blk_mig_save_builked_block()) the transfer begins. At an abysmally slow
> rate, I'll add, per the above. Another problem to be investigated.

Odd thought; can you try dropping your migration bandwidth limit
(migrate_set_speed) - try something low, like 10M - does the behaviour
stay the same, or does it start transmitting disk data before it's read
the lot?

> >The bulk phase is similar, just with different functions (the reads are
> >done in mig_save_device_bulk).  With a high rate limit, the total
> >allocated memory can reach a few gigabytes indeed.
> 
> Much, much more than that. It's definitely dependent upon the disk file
> size. Tiny VM disks are a nit; big VM disks are a problem.

Well, if as you say it's not starting transmitting for some reason until
it's read the lot then that would make sense.

> >Depending on the scenario, a possible disadvantage of NBD migration is
> >that it can only throttle each disk separately, while the old code will
> >apply a single limit to all migrations.
> 
> How about no throttling at all? And just to be very clear, the goal is
> fast (NBD-based) migrations of VMs using non-shared storage over an
> encrypted channel. Safest, worst-case scenario. Aside from gaining an
> understanding of this code.

There are vague plans to add TLS support for encrypting these streams
internally to qemu; but they're just thoughts at the moment.

> Thank you for your attention.

Dave

> 
> -- 
> Gary R Hook
> Senior Kernel Engineer
> NIMBOXX, Inc
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage
  2014-11-20  9:54   ` Dr. David Alan Gilbert
@ 2014-11-20 17:04     ` Gary R Hook
  0 siblings, 0 replies; 3+ messages in thread
From: Gary R Hook @ 2014-11-20 17:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert

On 11/20/14 3:54 AM, Dr. David Alan Gilbert wrote:
> * Gary R Hook (grhookatwork@gmail.com) wrote:
>> Ugh, I wish I could teach Thunderbird to understand how to reply to a
>> newsgroup.
>>
>> Apologies to Paolo for the direct note.
>>
>> On 11/19/14 4:19 AM, Paolo Bonzini wrote:
>>>
>>>
>>> On 19/11/2014 10:35, Dr. David Alan Gilbert wrote:
>>>> * Paolo Bonzini (pbonzini@redhat.com) wrote:
>>>>>
>>>>>
>>>>> On 18/11/2014 21:28, Dr. David Alan Gilbert wrote:
>>>>>> This seems odd, since as far as I know the tunneling code is quite separate
>>>>>> to the migration code; I thought the only thing that the migration
>>>>>> code sees different is the file descriptors it gets past.
>>>>>> (Having said that, again I don't know storage stuff, so if this
>>>>>> is a storage special there may be something there...)
>>>>>
>>>>> Tunnelled migration uses the old block-migration.c code.  Non-tunnelled
>>>>> migration uses the NBD server and block/mirror.c.
>>>>
>>>> OK, that explains that.  Is that because the tunneling code can't
>>>> deal with tunneling the NBD server connection?
>>>>
>>>>> The main problem with
>>>>> the old code is that uses a possibly unbounded amount of memory in
>>>>> mig_save_device_dirty and can have huge jitter if any serious workload
>>>>> is running in the guest.
>>>>
>>>> So that's sending dirty blocks iteratively? Not that I can see
>>>> when the allocations get freed; but is the amount allocated there
>>>> related to total disk size (as Gary suggested) or to the amount
>>>> of dirty blocks?
>>>
>>> It should be related to the maximum rate limit (which can be set to
>>> arbitrarily high values, however).
>>
>> This makes no sense. The code in block_save_iterate() specifically
>> attempts to control the rate of transfer. But when
>> qemu_file_get_rate_limit() returns a number like 922337203685372723
>> (0xCCCCCCCCCCB3333) I'm under the impression that no bandwidth
>> constraints are being imposed at this layer. Why, then, would that
>> transfer be occurring at 20MB/sec (simple, under-utilized 1 gigE
>> connection) with no clear bottleneck in CPU or network? What other
>> relation might exist?
>
> Disk IO on the disk that you're trying to transfer?

Well, non-tunneled runs fast enough (120 MB/s) to saturate the network 
pipe, so it's evident to me that the blocks can come screaming from the 
disk plenty fast. And there's no CPU bottleneck; the VM is really not 
doing much of anything at all. So I'll say no. I shall continue my 
investigation.

>>> The reads are started, then the ones that are ready are sent and the
>>> blocks are freed in flush_blks.  The jitter happens when the guest reads
>>> a lot but only writes a few blocks.  In that case, the bdrv_drain_all in
>>> mig_save_device_dirty can be called relatively often and it can be
>>> expensive because it also waits for all guest-initiated reads to complete.
>>
>> Pardon my ignorance, but this does not match my observations. What I am
>> seeing is the process size of the source qemu grow steadily until the
>> COR completes; during this time the backing file on the destination
>> system does not change/grow at all, which implies that no blocks are
>> being transferred. (I have tested this with a 25GB VM disk, and larger;
>> no network activity occurs during this period.) Once the COR is done and
>> the in-memory copy ready (marked by a "Completed 100%" message from
>> blk_mig_save_builked_block()) the transfer begins. At an abysmally slow
>> rate, I'll add, per the above. Another problem to be investigated.
>
> Odd thought; can you try dropping your migration bandwidth limit
> (migrate_set_speed) - try something low, like 10M - does the behaviour
> stay the same, or does it start transmitting disk data before it's read
> the lot?

Interesting idea. I shall attempt that.

>>> The bulk phase is similar, just with different functions (the reads are
>>> done in mig_save_device_bulk).  With a high rate limit, the total
>>> allocated memory can reach a few gigabytes indeed.
>>
>> Much, much more than that. It's definitely dependent upon the disk file
>> size. Tiny VM disks are a nit; big VM disks are a problem.
>
> Well, if as you say it's not starting transmitting for some reason until
> it's read the lot then that would make sense.

Right. I'm just saying that I don't think this works the way people 
thinks it works.

>>> Depending on the scenario, a possible disadvantage of NBD migration is
>>> that it can only throttle each disk separately, while the old code will
>>> apply a single limit to all migrations.
>>
>> How about no throttling at all? And just to be very clear, the goal is
>> fast (NBD-based) migrations of VMs using non-shared storage over an
>> encrypted channel. Safest, worst-case scenario. Aside from gaining an
>> understanding of this code.
>
> There are vague plans to add TLS support for encrypting these streams
> internally to qemu; but they're just thoughts at the moment.

:-(

-- 
Gary R Hook
Senior Kernel Engineer
NIMBOXX, Inc

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-11-20 17:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <546CE8EC.9090908@gmail.com>
2014-11-19 20:12 ` [Qemu-devel] Fwd: Re: Tunneled Migration with Non-Shared Storage Gary R Hook
2014-11-20  9:54   ` Dr. David Alan Gilbert
2014-11-20 17:04     ` Gary R Hook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).