From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zhenzhong Duan Subject: Re: an issue with 'xm save' Date: Fri, 28 Sep 2012 18:34:51 +0800 Message-ID: <50657D4B.9040303@oracle.com> References: <505C3647.1030003@oracle.com> <20120921143430.GA3522@phenom.dumpdata.com> <5062C16A.1020306@oracle.com> <20120926123534.GF7356@phenom.dumpdata.com> <5063EAFB.1070307@oracle.com> <20120927115918.GE8832@phenom.dumpdata.com> Reply-To: zhenzhong.duan@oracle.com Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120927115918.GE8832@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Konrad Rzeszutek Wilk , Dan Magenheimer , Feng Jin , xen-devel List-Id: xen-devel@lists.xenproject.org On 2012-09-27 19:59, Konrad Rzeszutek Wilk wrote: > On Thu, Sep 27, 2012 at 01:58:19PM +0800, Zhenzhong Duan wrote: >> >> On 2012-09-26 20:35, Konrad Rzeszutek Wilk wrote: >>> On Wed, Sep 26, 2012 at 04:48:42PM +0800, Zhenzhong Duan wrote: >>>> Konrad Rzeszutek Wilk wrote: >>>>> On Fri, Sep 21, 2012 at 05:41:27PM +0800, Zhenzhong Duan wrote: >>>>>> Hi maintainers, >>>>>> >>>>>> I found there is an issue when 'xm save' a pvm guest. See below: >>>>>> >>>>>> When I do save then restore once, CPU(%) in xentop showed around 99%. >>>>>> When I do that second time, CPU(%) showed 199% >>>>>> >>>>>> top in dom0 showed: >>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>>>> 20946 root 18 -2 10984 1284 964 S 19.8 0.3 0:48.93 block >>>>>> 4939 root 18 -2 10984 1288 964 S 19.5 0.3 1:34.68 block >>>>>> >>>>>> I could kill the block process, then all look normal again. >>>>> What is the 'block' process? If you attach 'perf' to it do you get an idea >>>>> of what it is spinning at? >>>> It's /etc/xen/scripts/block >>>> I add 'set -x' to /etc/xen/scripts/block, found it blocked at claim_lock. >>>> When domU was created first time, claim_lock/release_lock finished quickly, >>>> when 'xm save' was called, claim_lock spin in its own while loop. >>>> I can ensure no other domU create/save/etc happen when I test. >>> OK, so how come you have two block processes? Is it b/c you have two >>> disks attached to the guest? The are multiple claim_lock in the shell >>> script - do you know where each of two threads are spinning? Are they >>> spinning on the same function? >> In above test, I run save/restore twice, so two block processes. >> In other test, run save/restore once, there is only one block process. >> After do 'xm save', I see block process spin at line 328: >> 321 remove) >> 322 case $t in >> 323 phy) >> 324 exit 0 >> 325 ;; >> 326 >> 327 file) >> 328 claim_lock "block" >> 329 node=$(xenstore_read "$XENBUS_PATH/node") >> 330 losetup -d "$node" >> 331 release_lock "block" >> 332 exit 0 >> 333 ;; > So with the patches in OVM - do they have this fixed? Can they be upstreamed > or are the dependent on some magic OVM sauce? After replace locking.sh with OVM's, it worked. But xen-tools evolved to use flock based locking currently. We can't revert back. It seems changeset 25595:497e2fe49455 bring the issue. Finally, I came with a small patch that workaround the issue. diff -r d364becfb083 tools/hotplug/Linux/locking.sh --- a/tools/hotplug/Linux/locking.sh Thu Sep 20 13:31:19 2012 +0200 +++ b/tools/hotplug/Linux/locking.sh Fri Sep 28 18:27:31 2012 +0800 @@ -66,6 +66,7 @@ release_lock() { _setlockfd $1 + flock -u $_lockfd rm "$_lockfile" }