xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Zhigang Wang <zhigang.x.wang@oracle.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Andrew Jones <drjones@redhat.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	xen-devel <xen-devel@lists.xen.org>,
	Don Dutile <ddutile@redhat.com>,
	Kouya Shimura <kouya@jp.fujitsu.com>
Subject: Re: [PATCH] tools/hotplug: fix locking
Date: Thu, 21 Jun 2012 09:07:00 -0400	[thread overview]
Message-ID: <4FE31C74.7050100@oracle.com> (raw)
In-Reply-To: <20120621122337.GP21124@redhat.com>

On 06/21/2012 08:23 AM, Daniel P. Berrange wrote:
> On Thu, Jun 21, 2012 at 08:08:51AM -0400, Zhigang Wang wrote:
>> On 06/21/2012 07:49 AM, Ian Campbell wrote:
>>> On Thu, 2012-06-21 at 12:42 +0100, Ian Campbell wrote:
>>>> On Wed, 2012-06-20 at 18:53 +0100, Zhigang Wang wrote:
>>>>> On 06/20/2012 11:34 AM, Ian Campbell wrote:
>>>>>> On Wed, 2012-06-13 at 18:29 +0100, Zhigang Wang wrote:
>>>>>>> # HG changeset patch
>>>>>>> # User Zhigang Wang <zhigang.x.wang@oracle.com>
>>>>>>> # Date 1339608534 14400
>>>>>>> # Node ID 650b03f412143c3057abbd0ae0e256e6b3fd2ba8
>>>>>>> # Parent  32034d1914a607d7b6f1f060352b4cac973600f8
>>>>>>> tools/hotplug: fix locking
>>>>>> A better title would be "tools/hotplug: Switch to locking with flock"
>>>>>> since "fix" is not very descriptive.
>>>>> Agree.
>>>>>> The commit message should also state why changing this scheme is
>>>>>> preferable to fixing the current one.
>>>>> I have two points:
>>>>>
>>>>> 1. No timeout: in the old one, if one process holding the lock more than 100
>>>>>    seconds, other processes could steal it.
>>>>> 2. No leftovers: if a process holding this lock is dead, it will close the lock
>>>>>    file, so it will not affect other processes.
>>>>>
>>>>>> Is this flock tool widely available? Does it need to be added to the
>>>>>> dependencies in the README?
>>>>> It is widely distributed: it's part of the util-linux package. So I think no
>>>>> need to document it.
>>>>>>> The current locking implementation would allow two processes get the lock
>>>>>>> simultaneously:
>>>>>> [...snip shell trace...]
>>>>>>
>>>>>> Can you add a line or two of analysis to explain where in that shell
>>>>>> spew things are actually going wrong and why?
>>>>> I didn't spent much time on this complicated implement. But it fails for me and
>>>>> also for others (from response).
>>>>>>> This patch is ported from Red Hat Enterprise Linux 5.8.
>>>>>> I think we need a Signed-off-by from the original patch author. (Unless
>>>>>> DCO clause b somehow applies?)
>>>>> I'm not sure how to handle this. There's no signed-off-by on the original Red
>>>>> Hat patch. Could anyone in Red Hat help to get it signed off?
>>>> Perhaps Andrew Jones or Don Dutile can point us in the right direction
>>>> (both CCd) if you identify the precise source of the patch (i.e. the
>>>> name of the .src.rpm and the name of the patch within it)?
>>>>
>>>> I looked in kernel-2.6.18-308.8.2.el5.src.rpm (which seems fairly
>>>> recent) but maybe I'm looking in the wrong place.
>>> Nevermind, I found it in xen-3.0.3-135.el5_8.2.src.rpm (I'd have sworn
>>> that RH shipped kernel+xen in a single source package, oh well).
>>>
>>> %changelog says:
>>>         * Fri Sep 14 2007 Daniel P. Berrange <berrange@redhat.com> - 3.0.3-40.el5
>>>         - Rewrite locking in hotplug scripts to fix timeouts (rhbz #267861)
>>>
>>> Daniel CCd.
>>>
>>> I wonder if there were any followup fixes or changes, especially wrt to
>>> your point 1 above (the lack of a timeout now) requiring xend (or now
>>> libxl) changes?
>> We have another timeout in xend/xl: if device backend is not setup properly, the
>> device creation will fail.
>>
>> Without any timeout for this locking, all backend uevent scripts will hang if
>> one goes wrong. Just stealing the lock upon timeoutis not a good idea: it may
>> cause error or damage. What about adding a long (10minutes) timeout? (by adding
>> -w/--wait/--timeout 600)
> Timeouts are a really bad idea for this area of code. If you make the
> timeout really long (10 mins as you suggest), then no other guest can
> be started during this entire 10 minute window if something goes wrong.
> IMHO this is unacceptable.  If you make the timeout short enough that
> you don't unduly delay other VM startup operations, then you will often
> hit bogus timeouts under high load.
>
> In practice the scripts should not hang unless something is seriously
> wrong, in which case timing out & pretending every thing is still fine
> for the future is bogus.  If you have real world examples of hangs
> then you should focus efforts on fixing the hangs, rather than papering
> over the problem with a timeout.
>
> Daniel
Thanks Daniel. I agree with your points on timeout.

Zhigang

  reply	other threads:[~2012-06-21 13:07 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-13 17:29 [PATCH] tools/hotplug: fix locking Zhigang Wang
2012-06-20 15:34 ` Ian Campbell
2012-06-20 17:53   ` Zhigang Wang
2012-06-21 11:42     ` Ian Campbell
2012-06-21 11:49       ` Ian Campbell
2012-06-21 12:08         ` Zhigang Wang
2012-06-21 12:23           ` Daniel P. Berrange
2012-06-21 13:07             ` Zhigang Wang [this message]
2012-06-21 12:20         ` Daniel P. Berrange
2012-06-26 15:53           ` Ian Campbell
2012-06-26 16:14             ` Daniel P. Berrange
2012-07-04 14:48               ` Ian Campbell
  -- strict thread matches above, loose matches on Subject: below --
2012-06-11 13:59 Zhigang Wang
2012-06-12  6:53 ` Kouya Shimura
2012-06-13  2:25   ` Marek Marczykowski
2012-06-13  8:11 ` Ian Campbell
2012-06-13  8:16   ` Zhigang Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FE31C74.7050100@oracle.com \
    --to=zhigang.x.wang@oracle.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=berrange@redhat.com \
    --cc=ddutile@redhat.com \
    --cc=drjones@redhat.com \
    --cc=kouya@jp.fujitsu.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).