xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrange" <berrange@redhat.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Andrew Jones <drjones@redhat.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	xen-devel <xen-devel@lists.xen.org>,
	Don Dutile <ddutile@redhat.com>,
	Kouya Shimura <kouya@jp.fujitsu.com>,
	Zhigang Wang <zhigang.x.wang@oracle.com>
Subject: Re: [PATCH] tools/hotplug: fix locking
Date: Thu, 21 Jun 2012 13:20:22 +0100	[thread overview]
Message-ID: <20120621122022.GO21124@redhat.com> (raw)
In-Reply-To: <1340279364.21872.118.camel@zakaz.uk.xensource.com>

On Thu, Jun 21, 2012 at 12:49:24PM +0100, Ian Campbell wrote:
> On Thu, 2012-06-21 at 12:42 +0100, Ian Campbell wrote:
> > On Wed, 2012-06-20 at 18:53 +0100, Zhigang Wang wrote:
> > > On 06/20/2012 11:34 AM, Ian Campbell wrote:
> > > > On Wed, 2012-06-13 at 18:29 +0100, Zhigang Wang wrote:
> > > >> # HG changeset patch
> > > >> # User Zhigang Wang <zhigang.x.wang@oracle.com>
> > > >> # Date 1339608534 14400
> > > >> # Node ID 650b03f412143c3057abbd0ae0e256e6b3fd2ba8
> > > >> # Parent  32034d1914a607d7b6f1f060352b4cac973600f8
> > > >> tools/hotplug: fix locking
> > > > A better title would be "tools/hotplug: Switch to locking with flock"
> > > > since "fix" is not very descriptive.
> > > Agree.
> > > >
> > > > The commit message should also state why changing this scheme is
> > > > preferable to fixing the current one.
> > > I have two points:
> > > 
> > > 1. No timeout: in the old one, if one process holding the lock more than 100
> > >    seconds, other processes could steal it.
> > > 2. No leftovers: if a process holding this lock is dead, it will close the lock
> > >    file, so it will not affect other processes.
> > > 
> > > >
> > > > Is this flock tool widely available? Does it need to be added to the
> > > > dependencies in the README?
> > > It is widely distributed: it's part of the util-linux package. So I think no
> > > need to document it.
> > > >
> > > >> The current locking implementation would allow two processes get the lock
> > > >> simultaneously:
> > > > [...snip shell trace...]
> > > >
> > > > Can you add a line or two of analysis to explain where in that shell
> > > > spew things are actually going wrong and why?
> > > I didn't spent much time on this complicated implement. But it fails for me and
> > > also for others (from response).
> > > >
> > > >> This patch is ported from Red Hat Enterprise Linux 5.8.
> > > > I think we need a Signed-off-by from the original patch author. (Unless
> > > > DCO clause b somehow applies?)
> > > I'm not sure how to handle this. There's no signed-off-by on the original Red
> > > Hat patch. Could anyone in Red Hat help to get it signed off?
> > 
> > Perhaps Andrew Jones or Don Dutile can point us in the right direction
> > (both CCd) if you identify the precise source of the patch (i.e. the
> > name of the .src.rpm and the name of the patch within it)?
> > 
> > I looked in kernel-2.6.18-308.8.2.el5.src.rpm (which seems fairly
> > recent) but maybe I'm looking in the wrong place.
> 
> Nevermind, I found it in xen-3.0.3-135.el5_8.2.src.rpm (I'd have sworn
> that RH shipped kernel+xen in a single source package, oh well).
> 
> %changelog says:
>         * Fri Sep 14 2007 Daniel P. Berrange <berrange@redhat.com> - 3.0.3-40.el5
>         - Rewrite locking in hotplug scripts to fix timeouts (rhbz #267861)
> 
> Daniel CCd.
> 
> I wonder if there were any followup fixes or changes, especially wrt to
> your point 1 above (the lack of a timeout now) requiring xend (or now
> libxl) changes?

Removing any kind of timeout was in fact the point of this change.
We do not want hotplug scripts stealing each others locks, or
timing out, because the timeouts cause alot of problems when
launching guests on a highly loaded machine where execution time
can be alot longer than a timeout setting may expect.

I expect you can't see the original BZ ticket quoted, since it
has customer information in it.  Here is my description of
what the problem we weere solving was:

[quote]
In the normal case of a single domain booting with one disk, the disk hotplug
script will fire once. In this case taking out the lock will never cause a sleep
because there's no other concurrent invocations. If a domain has 4 disks
configured, then the hotplug script will fire 4 times, all 4 invocations at
pretty much the same time. If there is even a little load on the system, the
locking function in the shell script will sleep for a few seconds - as many as 5
seconds, or potentially even time out & fail completely.

If say 4 or even more domains each with 4 disks start up at once, that's 16
invocations of the hotplug script running at once. There will be alot of sleep's
done & because of the very coarse 1 second granularity the delay can add up
significantly. 

The change to using flock() removes the arbitrary 1 second sleep, so the very
instant once hotplug script finishes, another can start & there is no repeated
attempts & failures to lock which would add more delay.
[/quote]

Usually it was our policy to send all these kind of patches upstream,
so I'm not really sure why this was not already merged. Possibly I
forgot to submit it, or maybe it was rejected - I honestly can't
remember.

Below is the original patch I wrote, to which I apply:

  Signed-off-by: Daniel P. Berrange.

diff -rup xen-3.1.0-src.orig/tools/examples/locking.sh xen-3.1.0-src.new/tools/examples/locking.sh
--- xen-3.1.0-src.orig/tools/examples/locking.sh	2006-10-15 08:22:03.000000000 -0400
+++ xen-3.1.0-src.new/tools/examples/locking.sh	2007-09-12 11:25:20.000000000 -0400
@@ -1,5 +1,6 @@
 #
 # Copyright (c) 2005 XenSource Ltd.
+# Copyright (c) 2007 Red Hat
 #
 # This library is free software; you can redistribute it and/or
 # modify it under the terms of version 2.1 of the GNU Lesser General Public
@@ -19,80 +20,30 @@
 # Serialisation
 #
 
-LOCK_SLEEPTIME=1
-LOCK_SPINNING_RETRIES=5
-LOCK_RETRIES=100
 LOCK_BASEDIR=/var/run/xen-hotplug
 
-
-claim_lock()
-{
-  local lockdir="$LOCK_BASEDIR/$1"
-  mkdir -p "$LOCK_BASEDIR"
-  _claim_lock "$lockdir"
-}
-
-
-release_lock()
-{
-  _release_lock "$LOCK_BASEDIR/$1"
-}
-
-
-_claim_lock()
+_setlockfd()
 {
-  local lockdir="$1"
-  local owner=$(_lock_owner "$lockdir")
-  local retries=0
-
-  while [ $retries -lt $LOCK_RETRIES ]
-  do
-    mkdir "$lockdir" 2>/dev/null && trap "release_lock $1; sigerr" ERR &&
-      _update_lock_info "$lockdir" && return
-
-    local new_owner=$(_lock_owner "$lockdir")
-    if [ "$new_owner" != "$owner" ]
-    then
-      owner="$new_owner"
-      retries=0
-    fi
-
-    if [ $retries -gt $LOCK_SPINNING_RETRIES ]
-    then
-      sleep $LOCK_SLEEPTIME
-    else
-      sleep 0
-    fi
-    retries=$(($retries + 1))
-  done
-  _steal_lock "$lockdir"
+    local i
+    for ((i = 0; i < ${#_lockdict}; i++))
+    do [ -z "${_lockdict[$i]}" -o "${_lockdict[$i]}" = "$1" ] && break
+    done
+    _lockdict[$i]="$1"
+    let _lockfd=200+i
 }
 
 
-_release_lock()
-{
-  trap sigerr ERR
-  rm -rf "$1" 2>/dev/null || true
-}
-
-
-_steal_lock()
-{
-  local lockdir="$1"
-  local owner=$(cat "$lockdir/owner" 2>/dev/null || echo "unknown")
-  log err "Forced to steal lock on $lockdir from $owner!"
-  _release_lock "$lockdir"
-  _claim_lock "$lockdir"
-}
-
-
-_lock_owner()
+claim_lock()
 {
-  cat "$1/owner" 2>/dev/null || echo "unknown"
+    mkdir -p "$LOCK_BASEDIR"
+    _setlockfd $1
+    eval "exec $_lockfd>>$LOCK_BASEDIR/$1"
+    flock -x $_lockfd
 }
 
 
-_update_lock_info()
+release_lock()
 {
-  echo "$$: $0" >"$1/owner"
+    _setlockfd $1
+    flock -u $_lockfd
 }



Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

  parent reply	other threads:[~2012-06-21 12:20 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-13 17:29 [PATCH] tools/hotplug: fix locking Zhigang Wang
2012-06-20 15:34 ` Ian Campbell
2012-06-20 17:53   ` Zhigang Wang
2012-06-21 11:42     ` Ian Campbell
2012-06-21 11:49       ` Ian Campbell
2012-06-21 12:08         ` Zhigang Wang
2012-06-21 12:23           ` Daniel P. Berrange
2012-06-21 13:07             ` Zhigang Wang
2012-06-21 12:20         ` Daniel P. Berrange [this message]
2012-06-26 15:53           ` Ian Campbell
2012-06-26 16:14             ` Daniel P. Berrange
2012-07-04 14:48               ` Ian Campbell
  -- strict thread matches above, loose matches on Subject: below --
2012-06-11 13:59 Zhigang Wang
2012-06-12  6:53 ` Kouya Shimura
2012-06-13  2:25   ` Marek Marczykowski
2012-06-13  8:11 ` Ian Campbell
2012-06-13  8:16   ` Zhigang Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120621122022.GO21124@redhat.com \
    --to=berrange@redhat.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=ddutile@redhat.com \
    --cc=drjones@redhat.com \
    --cc=kouya@jp.fujitsu.com \
    --cc=xen-devel@lists.xen.org \
    --cc=zhigang.x.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).