All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jim Fehlig <jfehlig@suse.com>
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	M A Young <m.a.young@durham.ac.uk>
Subject: Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc.
Date: Mon, 24 Feb 2014 22:17:45 -0700	[thread overview]
Message-ID: <530C2779.20502@suse.com> (raw)
In-Reply-To: <CAFLBxZbOGM4ALC8DD92ZLBcFf0CDFq=aONuaLEcUujutmbyzTA@mail.gmail.com>

George Dunlap wrote:
> On Mon, Feb 24, 2014 at 3:47 PM, George Dunlap
> <George.Dunlap@eu.citrix.com> wrote:
>   
>> On Mon, Feb 24, 2014 at 3:17 PM, George Dunlap
>> <george.dunlap@eu.citrix.com> wrote:
>>     
>>> On 02/24/2014 02:19 PM, Ian Jackson wrote:
>>>       
>>>> libxl_postfork_child_noexec would nestedly reaquire the non-recursive
>>>> "no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
>>>> The result on Linux is that the process always deadlocks before
>>>> returning from this function.
>>>>
>>>> This is used by xl's console child.  So, the ultimate effect is that
>>>> xl with pygrub does not manage to connect to the pygrub console.
>>>> This beahviour was reported by Michael Young in Xen 4.4.0 RC5.
>>>>
>>>> Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is
>>>> not correct with SIGCHLD sharing.  libxl_postfork_child_noexec is
>>>> documented to suffice if called only on one ctx.  So deregistering the
>>>> ctx it's called on is not sufficient.  Instead, we need a new approach
>>>> which discards the whole sigchld_user list and unconditionally removes
>>>> our SIGCHLD handler if we had one.
>>>>
>>>> Prompted by this, clarify the semantics of
>>>> libxl_postfork_child_noexec.  Specifically, expand on the meaning of
>>>> "quickly" by explaining what operations are not permitted; and
>>>> document the fact that the function doesn't reclaim the resources in
>>>> the ctxs.
>>>>
>>>> And add a comment in libxl_postfork_child_noexec explaining the
>>>> internal concurrency situation.
>>>>
>>>> This is an important bugfix.  IMO the bug is a blocker for Xen 4.4.
>>>>
>>>> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
>>>> Reported-by: M A Young <m.a.young@durham.ac.uk>
>>>> CC: Ian Campbell <Ian.Campbell@citrix.com>
>>>> CC: George Dunlap <george.dunlap@eu.citrix.com>
>>>>         
>>> So it looks like this path gets called from a number of other places in xl:
>>>
>>> libxl_postfork_child_noexec() is called by xl.c:postfork().
>>>
>>> postfork() is called in xl_cmdimpl.c by autoconnect_vncviewer(),
>>> autoconnect_console(), and do_daemonize().
>>>
>>> do_daemonize() is called during "xl create", and "xl devd".
>>>
>>> Was this deadlock not triggered for those, or was it triggered and nobody
>>> noticed?
>>>       
>> In any case, I do think we need to fix this; the main question is, do
>> we need to delay the release a bit further to make sure it gets
>> sufficient testing?
>>     
>
> Also,  it would be nice to get a Tested-by: from someone using it with
> libvirt (before the release at least, if not before the check-in).
>
> Jim / Dario?
>   

I'll update my test system to rc6 tomorrow and restart my tests.

FYI, the tests were running over the weekend on rc5 + libvirt 1.2.2
rc1.  Over 25,000 domains started, shutdown, created, saved, restored,
etc. with no problems noted.

Regards,
Jim

  reply	other threads:[~2014-02-25  5:17 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-24 14:19 [PATCH 0/3] libxl: Fix deadlock with pygrub Ian Jackson
2014-02-24 14:19 ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc Ian Jackson
2014-02-24 14:45   ` Ian Campbell
2014-02-24 14:53     ` M A Young
2014-02-24 14:55       ` Ian Campbell
2014-02-24 15:26       ` [PATCH] tools/console: reset tty when xenconsole fails Ian Jackson
2014-02-24 15:42         ` Ian Campbell
2014-02-24 16:17         ` George Dunlap
2014-02-24 15:17   ` [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc George Dunlap
2014-02-24 15:47     ` George Dunlap
2014-02-24 15:49       ` George Dunlap
2014-02-25  5:17         ` Jim Fehlig [this message]
2014-02-25 13:32           ` Ian Jackson
2014-02-27 17:05           ` Jim Fehlig
2014-02-27 17:08             ` George Dunlap
2014-02-24 15:56     ` Ian Jackson
2014-02-24 16:28       ` George Dunlap
2014-02-24 17:09         ` Ian Jackson
2014-02-24 14:19 ` [PATCH 2/3] libxl: Hold the atfork lock while closing carefd Ian Jackson
2014-02-24 14:47   ` Ian Campbell
2014-02-24 14:19 ` [PATCH 3/3] libxl: Fix carefd lock leak in save callout Ian Jackson
2014-02-24 14:48   ` Ian Campbell
2014-02-24 14:49 ` [PATCH 0/3] libxl: Fix deadlock with pygrub Ian Campbell
2014-04-04 15:06   ` Ian Jackson
2014-03-13 13:59 ` Ian Campbell
2014-03-13 18:09   ` Ian Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=530C2779.20502@suse.com \
    --to=jfehlig@suse.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=m.a.young@durham.ac.uk \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.