Re: [Qemu-devel] [PATCH v6 00/11] rdma: migration support

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chegu Vinod <chegu_vinod@hp.com>
To: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
Cc: Karen Noel <knoel@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Juan Jose Quintela Carreira <quintela@redhat.com>,
	qemu-devel qemu-devel <qemu-devel@nongnu.org>,
	Orit Wasserman <owasserm@redhat.com>,
	"Michael R. Hines" <mrhines@us.ibm.com>,
	Anthony Liguori <anthony@codemonkey.ws>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v6 00/11] rdma: migration support
Date: Thu, 09 May 2013 15:20:37 -0700	[thread overview]
Message-ID: <518C2135.306@hp.com> (raw)
In-Reply-To: <518BDADB.1070705@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 7498 bytes --]

On 5/9/2013 10:20 AM, Michael R. Hines wrote:
> Comments inline. FYI: please CC mrhines@us.ibm.com,
> because it helps me know when to scroll threw the bazillion qemu-devel 
> emails.
>
> I have things separated out into folders and rules, but a direct CC is 
> better =)
>

Sure will do.

>
> On 05/03/2013 07:28 PM, Chegu Vinod wrote:
>>
>> Hi Michael,
>>
>> I picked up the qemu bits from your github branch and gave it a 
>> try.   (BTW the setup I was given temporary access to has a pair of 
>> MLX's  IB QDR cards connected back to back via QSFP cables)
>>
>> Observed a couple of things and wanted to share..perhaps you may be 
>> aware of them already or perhaps these are unrelated to your specific 
>> changes ? (Note: Still haven't finished the review of your changes ).
>>
>> a) x-rdma-pin-all off case
>>
>> Seem to only work sometimes but fails at other times. Here is an 
>> example...
>>
>> (qemu) rdma: Accepting rdma connection...
>> rdma: Memory pin all: disabled
>> rdma: verbs context after listen: 0x555556757d50
>> rdma: dest_connect Source GID: fe80::2:c903:9:53a5, Dest GID: 
>> fe80::2:c903:9:5855
>> rdma: Accepted migration
>> qemu-system-x86_64: VQ 1 size 0x100 Guest index 0x4d2 inconsistent 
>> with Host ind
>> ex 0x4ec: delta 0xffe6
>> qemu: warning: error while loading state for instance 0x0 of device 
>> 'virtio-net'
>> load of migration failed
>>
>
> Can you give me more details about the configuration of your VM?

The guest is a 10-VCPU/128GB ...and nothing really that fancy with 
respect to storage or networking.

Hosted on a large Westmere-EX box (target is a similarly configured 
Westmere-X system). There is a shared SAN disk between the two hosts.  
Both hosts have 3.9-rc7 kernel that I got at that time from kvm.git 
tree. The guest was also running the same kernel.

Since I was just trying it out I was not running any workload either.

On the source host the qemu command line :


/usr/local/bin/qemu-system-x86_64 \
-enable-kvm \
-cpu host \
-name vm1 \
-m 131072 -smp 10,sockets=1,cores=10,threads=1 \
-mem-path /dev/hugepages \
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait \
-drive 
file=/dev/libvirt_lvm3/vm1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native 
\
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 
\
-monitor stdio \
-net nic,model=virtio,macaddr=52:54:00:71:01:01,netdev=nic-0 \
-netdev tap,id=nic-0,ifname=tap0,script=no,downscript=no,vhost=on \
-vnc :4


On the destination host the command line was same as the above with the 
following additional arg...

-incoming x-rdma:<static private ipaddr of the IB>:<port #>


>
>>
>> b) x-rdma-pin-all on case :
>>
>> The guest is not resuming on the target host. i.e. the source host's 
>> qemu states that migration is complete but the guest is not 
>> responsive anymore... (doesn't seem to have crashed but its stuck 
>> somewhere).    Have you seen this behavior before ? Any tips on how I 
>> could extract additional info ?
>
> Is the QEMU monitor still responsive?

They were responsive.

> Can you capture a screenshot of the guest's console to see if there is 
> a panic?

No panic on the guest's console :(

> What kind of storage is attached to the VM?
>

Simple virtio disk hosted on a SAN disk (see the qemu command line).

>
>>
>> Besides the list of noted restrictions/issues around having to pin 
>> all of guest memory....if the pinning is done as part of starting of 
>> the migration it ends up taking noticeably long time for larger 
>> guests. Wonder whether that should be counted as part of the total 
>> migration time ?.
>>
>
> That's a good question: The pin-all option should not be slowing down 
> your VM to much as the VM should still be running before the 
> migration_thread() actually kicks in and starts the migration.

Well I had hoped that it would not have any serious impacts but it ended 
up freezing the guest...



> I need more information on the configuration of your VM, guest 
> operating system, architecture and so forth.......

Pl. see above.

> And similarly as before whether or not QEMU is not responsive or 
> whether or not it's the guest that's panicked.......

Guest just freezes...doesn't panic when this pinning is in progress 
(i.e. after I set the capability and start the migration) . After the 
pin'ng completes the guest continues to run and the migration 
continues...till it "completes" (as per the source host's qemu)...but I 
never see it resume on the target host.
>
>> Also the act of pinning all the memory seems to "freeze" the guest. 
>> e.g. : For larger enterprise sized guests (say 128GB and higher) the 
>> guest is "frozen" is anywhere from nearly a minute (~50seconds) to 
>> multiple minutes as the guest size increases...which imo kind of 
>> defeats the purpose of live guest migration.
>
> That's bad =) There must be a bug somewhere........ the largest VM I 
> can create on my hardware is ~16GB - so let me give that a try and try 
> to track down the problem.

Ok. Perhaps run a simple test run inside the guest can help observe any 
scheduling delays even when you are attempting to pin a 16GB guest ?

>
>>
>> Would like to hear if you have already thought about any other 
>> alternatives to address this issue ? for e.g. would it be better to 
>> pin all of the guest's memory as part of starting the guest itself ? 
>> Yes there are restrictions when we do pinning...but it can help with 
>> performance.
>
> For such a large VM, I would definitely recommend pinning because I'm 
> assuming you have enough processors or a large enough application to 
> actually *use* that much memory, which would suggest that even after 
> the bulk phase round of the migration has already completed that your 
> VM is probably going to remain to be pretty busy.
>
> It's just a matter of me tracking down what's causing the freeze and 
> fixing it........ I'll look into it right now on my machine.
>

Ok
>> ---
>> BTW, a different (yet sort of related) topic... recently a patch went 
>> into upstream that provided an option to qemu to mlock all of guest 
>> memory :
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg03947.html .
>
> I had no idea.......very interesting.
>
>>
>> but when attempting to do the mlock for larger guests a lot of time 
>> is spent bringing each page into cache and clearing/zeron'g it etc.etc.
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg04161.html
>>
>
> Wow, I didn't know that either. Perhaps this must be causing the 
> entire QEMU process and its threads to seize up.
>
> It may be necessary to run the pinning command *outside* of QEMU's I/O 
> lock in a separate thread if it's really that much overhead.

Not really sure if the BQL is causing the freeze...but in general 
pinning of all memory when the guest is run is perhaps not the best 
choice for large enterprise class guests...i.e. its better to do it as 
part of the start of the guest.

>
> Thanks a lot for pointing this out.........
>
>

BTW, A good thing to try out is to see if we can mlock memory of a large 
guest (i.e. on the source and target qemu's) and migrate the guest using 
basic TCP over a regular 10Gig NIC.

Thanks,
Vinod
>
>>
>> ----
>>
>> Note: The basic tcp based live guest migration in the same qemu 
>> version still works fine on the same hosts over a pair of non-RDMA 
>> cards 10Gb NICs connected back-to-back.
>>
>
> Acknowledged.
>


[-- Attachment #2: Type: text/html, Size: 11926 bytes --]

next prev parent reply	other threads:[~2013-05-09 22:20 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-03 23:28 [Qemu-devel] [PATCH v6 00/11] rdma: migration support Chegu Vinod
2013-05-09 17:20 ` Michael R. Hines
2013-05-09 22:20   ` Chegu Vinod [this message]
2013-05-09 22:45     ` Michael R. Hines
2013-06-02  4:09       ` Michael R. Hines
2013-06-06 23:51         ` Chegu Vinod
2013-06-07  5:38           ` Michael R. Hines
2013-05-10  7:58     ` Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2013-04-24 19:00 mrhines
2013-04-24 21:50 ` Paolo Bonzini
2013-04-24 23:48   ` Michael R. Hines

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=518C2135.306@hp.com \
    --to=chegu_vinod@hp.com \
    --cc=anthony@codemonkey.ws \
    --cc=knoel@redhat.com \
    --cc=mrhines@linux.vnet.ibm.com \
    --cc=mrhines@us.ibm.com \
    --cc=mst@redhat.com \
    --cc=owasserm@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.