From mboxrd@z Thu Jan  1 00:00:00 1970
From: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Subject: Re: an issue with 'xm save'
Date: Fri, 28 Sep 2012 18:34:51 +0800
Message-ID: <50657D4B.9040303@oracle.com>
References: <505C3647.1030003@oracle.com>
	<20120921143430.GA3522@phenom.dumpdata.com>
	<5062C16A.1020306@oracle.com>
	<20120926123534.GF7356@phenom.dumpdata.com>
	<5063EAFB.1070307@oracle.com>
	<20120927115918.GE8832@phenom.dumpdata.com>
Reply-To: zhenzhong.duan@oracle.com
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <20120927115918.GE8832@phenom.dumpdata.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>, Dan Magenheimer <dan.magenheimer@oracle.com>, Feng Jin <joe.jin@oracle.com>, xen-devel <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 2012-09-27 19:59, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 27, 2012 at 01:58:19PM +0800, Zhenzhong Duan wrote:
>>
>> On 2012-09-26 20:35, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Sep 26, 2012 at 04:48:42PM +0800, Zhenzhong Duan wrote:
>>>> Konrad Rzeszutek Wilk wrote:
>>>>> On Fri, Sep 21, 2012 at 05:41:27PM +0800, Zhenzhong Duan wrote:
>>>>>> Hi maintainers,
>>>>>>
>>>>>> I found there is an issue when 'xm save' a pvm guest. See below:
>>>>>>
>>>>>> When I do save then restore once, CPU(%) in xentop showed around 99%.
>>>>>> When I do that second time, CPU(%) showed 199%
>>>>>>
>>>>>> top in dom0 showed:
>>>>>>      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>>>     20946 root      18  -2 10984 1284  964 S 19.8  0.3   0:48.93 block
>>>>>>     4939 root      18  -2 10984 1288  964 S 19.5  0.3   1:34.68 block
>>>>>>
>>>>>> I could kill the block process, then all look normal again.
>>>>> What is the 'block' process? If you attach 'perf' to it do you get an idea
>>>>> of what it is spinning at?
>>>> It's /etc/xen/scripts/block
>>>> I add 'set -x' to /etc/xen/scripts/block, found it blocked at claim_lock.
>>>> When domU was created first time, claim_lock/release_lock finished quickly,
>>>> when 'xm save' was called, claim_lock spin in its own while loop.
>>>> I can ensure no other domU create/save/etc happen when I test.
>>> OK, so how come you have two block processes? Is it b/c you have two
>>> disks attached to the guest? The are multiple claim_lock in the shell
>>> script - do you know where each of two threads are spinning? Are they
>>> spinning on the same function?
>> In above test, I run save/restore twice, so two block processes.
>> In other test, run save/restore once, there is only one block process.
>> After do 'xm save', I see block process spin at line 328:
>> 321   remove)
>> 322     case $t in
>> 323       phy)
>> 324         exit 0
>> 325         ;;
>> 326
>> 327       file)
>> 328         claim_lock "block"
>> 329         node=$(xenstore_read "$XENBUS_PATH/node")
>> 330         losetup -d "$node"
>> 331         release_lock "block"
>> 332         exit 0
>> 333         ;;
> So with the patches in OVM - do they have this fixed? Can they be upstreamed
> or are the dependent on some magic OVM sauce?
After replace locking.sh with OVM's, it worked.
But xen-tools evolved to use flock based locking currently. We can't 
revert back.
It seems changeset 25595:497e2fe49455 bring the issue.
Finally, I came with a small patch that workaround the issue.

diff -r d364becfb083 tools/hotplug/Linux/locking.sh
--- a/tools/hotplug/Linux/locking.sh    Thu Sep 20 13:31:19 2012 +0200
+++ b/tools/hotplug/Linux/locking.sh    Fri Sep 28 18:27:31 2012 +0800
@@ -66,6 +66,7 @@
  release_lock()
  {
      _setlockfd $1
+    flock -u $_lockfd
      rm "$_lockfile"
  }