From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:56664)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <chegu_vinod@hp.com>) id 1Se5Co-0002pt-LZ
	for qemu-devel@nongnu.org; Mon, 11 Jun 2012 10:03:00 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <chegu_vinod@hp.com>) id 1Se5Cl-0004Df-Rt
	for qemu-devel@nongnu.org; Mon, 11 Jun 2012 10:02:54 -0400
Received: from g5t0008.atlanta.hp.com ([15.192.0.45]:39941)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <chegu_vinod@hp.com>) id 1Se5Cl-0004AJ-Lc
	for qemu-devel@nongnu.org; Mon, 11 Jun 2012 10:02:51 -0400
Received: from hpuxmail.cup.hp.com (hpuxmail.cup.hp.com [16.92.73.33])
	by g5t0008.atlanta.hp.com (Postfix) with ESMTP id 1770E240BA
	for <qemu-devel@nongnu.org>; Mon, 11 Jun 2012 14:02:37 +0000 (UTC)
Received: from [16.212.57.219] (cvinod.americas.hpqcorp.net [16.212.57.219])
	by hpuxmail.cup.hp.com (8.9.3 (PHNE_29774)/8.8.6) with ESMTP id
	HAA06088
	for <qemu-devel@nongnu.org>; Mon, 11 Jun 2012 07:02:37 -0700 (PDT)
Message-ID: <4FD5FA5E.1020305@hp.com>
Date: Mon, 11 Jun 2012 07:02:06 -0700
From: Chegu Vinod <chegu_vinod@hp.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="------------070705030108060608000201"
Subject: [Qemu-devel] Live Migration of a large guest : guest frozen on the
 destination host
Reply-To: chegu_vinod@hp.com
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel <qemu-devel@nongnu.org>

This is a multi-part message in MIME format.
--------------070705030108060608000201
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hello,

'am having some issues trying to live migrate a large guest and would 
like to get some pointers
on how to go about about debugging this. Here is some info. on the 
configuration

_Hardware :_
Two DL980's  each with 80 Westmere cores + 1 TB of RAM. Using a 10G NIC 
private link
(back to back) between two DL980's
_
Host software used:_
Host 3.4.1 kernel
Qemu versions used :
   Case 1: upstream qemu  (1.1.50) - from qemu.git
   Case 2 : 1.0.92 +  Juan Quintela's huge_memory changes
_
Guest :
_40VCPUs + 512GB

_Guest software used:_
RHEL6.3 RC1  (had some basic boot issues with 3.4.1 kernel and udevd..)
Guest is booted off an FC LUN (visible to both the hosts).

[Note: 'am not using virsh/virt-manager etc. but just the qemu to start 
the guest and also interact with
the qemu monitor for live migration etc. Have set the migration speed to 
10G but haven't changed the
downtime (default :  30ms) ]


Tried to live migrate this large guest..using either of the qemu's (i.e. 
Case 1 or Case2) and observed
the following :

When this guest was Idling  'was able to live migrate and have the guest 
come up fine on the
other host. Was able to interact with the guest on the destination host.

With workloads (e.g. AIM7-compute or SpecJBB or Google Stress App Test 
(SAT)) running in the
guest if we tried to do live migration.. we observe that [after a while] 
the source host claims that the
  live migration is complete...but the guest on the destination host is 
often in a "frozen/hung" state..
can't really interact with it or ping it.   Still trying to capture more 
information...but was also hoping to
get some clues/tips from the experts on these mailing lists...

[ BTW, is there a way to get a snap shot of the image of the guest on 
the source host just before
the "downtime" (i.e. start of stage 3) on the source host and compare 
that with the image of the guest on
the destination host just before its about to resume ? Is such a 
debugging feature already available ? ]

Thanks
Vinod


--------------070705030108060608000201
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Hello,<br>
    <br>
    'am having some issues trying to live migrate a large guest and
    would like to get some pointers <br>
    on how to go about about debugging this. Here is some info. on the
    configuration<br>
    <br>
    <u>Hardware :</u><br>
    Two DL980's&nbsp; each with 80 Westmere cores + 1 TB of RAM. Using a 10G
    NIC private link <br>
    (back to back) between two DL980's<br>
    <u><br>
      Host software used:</u><br>
    Host 3.4.1 kernel<br>
    Qemu versions used :<br>
    &nbsp; Case 1: upstream qemu&nbsp; (1.1.50) - from qemu.git&nbsp;&nbsp; <br>
    &nbsp; Case 2 : 1.0.92 +&nbsp; Juan Quintela's huge_memory changes<br>
    <u><br>
      Guest : <br>
    </u>40VCPUs + 512GB <br>
    <br>
    <u>
      Guest software used:</u><br>
    RHEL6.3 RC1&nbsp; (had some basic boot issues with 3.4.1 kernel and
    udevd..)<br>
    Guest is booted off an FC LUN (visible to both the hosts).<br>
    <br>
    [Note: 'am not using virsh/virt-manager etc. but just the qemu to
    start the guest and also interact with <br>
    the qemu monitor for live migration etc. Have set the migration
    speed to 10G but haven't changed the <br>
    downtime (default :&nbsp; 30ms) ]<br>
    <br>
    <br>
    Tried to live migrate this large guest..using either of the qemu's
    (i.e. Case 1 or Case2) and observed<br>
    the following : <br>
    <br>
    When this guest was Idling&nbsp; 'was able to live migrate and have the
    guest come up fine on the <br>
    other host. Was able to interact with the guest on the destination
    host. <br>
    <br>
    With workloads (e.g. AIM7-compute or SpecJBB or Google Stress App
    Test (SAT)) running in the <br>
    guest if we tried to do live migration.. we observe that [after a
    while] the source host claims that the<br>
    &nbsp;live migration is complete...but the guest on the destination host
    is often in a "frozen/hung" state..<br>
    can't really interact with it or ping it.&nbsp;&nbsp; Still trying to capture
    more information...but was also hoping to <br>
    get some clues/tips from the experts on these mailing lists...<br>
    <br>
    [ BTW, is there a way to get a snap shot of the image of the guest
    on the source host just before <br>
    the "downtime" (i.e. start of stage 3) on the source host and
    compare that with the image of the guest on <br>
    the destination host just before its about to resume ? Is such a
    debugging feature already available ? ]<br>
    <br>
    Thanks<br>
    Vinod<br>
    <br>
  </body>
</html>

--------------070705030108060608000201--