From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH 1/3] libxl: Fix libxl_postfork_child_noexec deadlock etc. Date: Mon, 24 Feb 2014 15:17:54 +0000 Message-ID: <530B62A2.3080901@eu.citrix.com> References: <1393251555-22418-1-git-send-email-ian.jackson@eu.citrix.com> <1393251555-22418-2-git-send-email-ian.jackson@eu.citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1393251555-22418-2-git-send-email-ian.jackson@eu.citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Jackson , xen-devel@lists.xensource.com Cc: Ian Campbell , M A Young List-Id: xen-devel@lists.xenproject.org On 02/24/2014 02:19 PM, Ian Jackson wrote: > libxl_postfork_child_noexec would nestedly reaquire the non-recursive > "no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove. > The result on Linux is that the process always deadlocks before > returning from this function. > > This is used by xl's console child. So, the ultimate effect is that > xl with pygrub does not manage to connect to the pygrub console. > This beahviour was reported by Michael Young in Xen 4.4.0 RC5. > > Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is > not correct with SIGCHLD sharing. libxl_postfork_child_noexec is > documented to suffice if called only on one ctx. So deregistering the > ctx it's called on is not sufficient. Instead, we need a new approach > which discards the whole sigchld_user list and unconditionally removes > our SIGCHLD handler if we had one. > > Prompted by this, clarify the semantics of > libxl_postfork_child_noexec. Specifically, expand on the meaning of > "quickly" by explaining what operations are not permitted; and > document the fact that the function doesn't reclaim the resources in > the ctxs. > > And add a comment in libxl_postfork_child_noexec explaining the > internal concurrency situation. > > This is an important bugfix. IMO the bug is a blocker for Xen 4.4. > > Signed-off-by: Ian Jackson > Reported-by: M A Young > CC: Ian Campbell > CC: George Dunlap So it looks like this path gets called from a number of other places in xl: libxl_postfork_child_noexec() is called by xl.c:postfork(). postfork() is called in xl_cmdimpl.c by autoconnect_vncviewer(), autoconnect_console(), and do_daemonize(). do_daemonize() is called during "xl create", and "xl devd". Was this deadlock not triggered for those, or was it triggered and nobody noticed? -George