xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Ian Campbell <ian.campbell@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: xen-users@lists.xenproject.org,
	Andrew Armenia <andrew@asquaredlabs.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	xen-devel <xen-devel@lists.xen.org>
Subject: Re: [Xen-users] "xl restore" leaks a file descriptor?
Date: Wed, 12 Aug 2015 11:04:25 +0100	[thread overview]
Message-ID: <1439373865.9747.330.camel@citrix.com> (raw)
In-Reply-To: <20150812094918.GY7460@zion.uk.xensource.com>

On Wed, 2015-08-12 at 10:49 +0100, Wei Liu wrote:
> On Wed, Aug 12, 2015 at 09:41:13AM +0100, Ian Campbell wrote:
> > On Tue, 2015-08-11 at 18:07 +0100, Wei Liu wrote:
> > > On Tue, Aug 11, 2015 at 04:48:13PM +0100, Ian Campbell wrote:
> > > > On Tue, 2015-08-11 at 11:13 -0400, Andrew Armenia wrote:
> > > > > It's the checkpoint file - i.e. the command line argument to xl
> > > > > restore - that is being leaked.
> > > > 
> > > > Thanks.
> > > > 
> > > > [...]
> > > > > So the checkpoint file is clearly being leaked.
> > > > 
> > > > Indeed. I confirmed this even with the current development version 
> > > > using ls
> > > > -l /proc/<pid>/fd which shows an fd open on a deleted file:
> > > > 
> > > > # ps aux| grep xl
> > > > root     20465  0.0  0.2 106036   984 ?        SLsl 15:42   0:00 xl 
> > > > 
> > > > restore save
> > > > # ls -l /proc/20465/fd
> > > > [...]
> > > > lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save
> > > > [...]
> > > > # rm /root/save
> > > > # ls -l /proc/20465/fd
> > > > [...]
> > > > lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save (deleted)
> > > > [...]
> > > > 
> > > > >  Its space is not freed
> > > > > until the 'xl restore' process is ended by shutting down the 
> > > > > domain:
> > > > [...]
> > > > > 
> > > > > It seems like xl restore should close the checkpoint file as soon 
> > > > > as
> > > > > it's done restoring the domain, allowing the space to be freed, 
> > > > > but
> > > > > that's clearly not happening.
> > > > 
> > > > Right. In fact xl sets the file to be close-on-exec right after 
> > > > opening 
> > > > it,
> > > > which is before the daemonisation step, so it ought to be closed
> > > > automatically, but isn't for some reason.
> > > > 
> > > > My working theory is that something in the machinery which spawns 
> > > > the 
> > > > save
> > > > helper is defeating the use of CLOEXEC, perhaps by dup2() or 
> > > > perhaps by
> > > > unsetting CLOEXEC.
> > > > 
> > > > Any way, thanks for reporting. I've copied the devel list and 4.6 
> > > > RM. 
> > > > Wei
> > > > this probably ought to be a blocker for 4.6 (and the fix ought 
> > > > ultimately
> > > > to be backported to 4.4 onwards at least).
> > > > 
> > > > NB: This leak seems to be independent of the switch to migration 
> > > > v2.
> > > > 
> > > > Ian.
> > > 
> > > Maybe this is just because we leak a fd.
> > > 
> > > I don't see how CLOEXEC would be of any use if xl doesn't actually 
> > > exec
> > > anything.
> > 
> > Duh, for some reason I thought daemonize would activate the CLOEXEC, 
> > but
> > it's just fork without exec. Silly me.
> > 
> > > 
> > > Below is a PoC patch which seems to fix the problem for me.
> > > 
> > > ---8<---
> > > commit 7b5f466d5977dc9f41991ca0c2227023ac07709d
> > > Author: Wei Liu <wei.liu2@citrix.com>
> > > Date:   Tue Aug 11 18:02:25 2015 +0100
> > > 
> > >     xl: close restore_fd when we finish with it
> > >     
> > >     Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > 
> > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > > index 499a05c..525cd24 100644
> > > --- a/tools/libxl/xl_cmdimpl.c
> > > +++ b/tools/libxl/xl_cmdimpl.c
> > > @@ -2846,6 +2846,10 @@ start:
> > >          ret = libxl_domain_create_new(ctx, &d_config, &domid,
> > >                                        0, autoconnect_console_how);
> > >      }
> > > +
> > > +    if (migrate_fd < 0)
> > > +        close(restore_fd);
> > 
> > As Andy says I think we want restore_fd in the check, I can't see any
> > reason we wouldn't want to close the socket too.
> > 
> 
> Do you mean migrate_fd when you say "socket"?

In the migrate case we do "restore_fd = migrate_fd;", so yes, indirectly.


>  I tried that, but that led
> to failure because toolstack still needs to get controlling information
> out of it (the "GO" message).
> 
> Maybe I close this too early.

Right.


>  I will have a closer look today.
> 
> > For reboot handing you would need to reset the fd to < 0, otherwise 
> > when we
> > come back around on reboot we will close this again.
> > 
> > Would it be less error prone to put this in the if (restoring) just 
> > above,
> > i.e. exactly where restore_fd is used and which already has the reboot
> > logic in place with restoring = 0.
> > 
> 
> Depending on whether we can close migrate_fd.
> 
> Wei.
> 
> > Ian.

  reply	other threads:[~2015-08-12 10:04 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+jCKRWVz1UsybJq6w18-x4vDB5D2j=qi2uqdbqWFaVWv9Gu-A@mail.gmail.com>
     [not found] ` <1438592915.30740.101.camel@citrix.com>
     [not found]   ` <CA+jCKRUSxG3nFC=BJCqKy=kABrN27Nde4A67bxBEm5TYD71yPA@mail.gmail.com>
     [not found]     ` <1439283311.9747.193.camel@citrix.com>
     [not found]       ` <CA+jCKRVqL4DOYZK-etugCnVRhOocVKYdhGQWG4XYCqWZUWcmfA@mail.gmail.com>
2015-08-11 15:48         ` [Xen-users] "xl restore" leaks a file descriptor? Ian Campbell
2015-08-11 15:56           ` Andrew Cooper
2015-08-11 17:07           ` Wei Liu
2015-08-11 17:21             ` Andrew Cooper
2015-08-11 20:06               ` Wei Liu
2015-08-12  8:41             ` Ian Campbell
2015-08-12  9:30               ` Ian Campbell
2015-08-12  9:49               ` Wei Liu
2015-08-12 10:04                 ` Ian Campbell [this message]
2015-08-12 17:12                   ` Wei Liu
2015-08-13  8:39                     ` Ian Campbell
2015-08-13  8:50                       ` Wei Liu
2015-08-13  9:17                         ` Ian Campbell
2015-08-13  9:38                           ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1439373865.9747.330.camel@citrix.com \
    --to=ian.campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=andrew@asquaredlabs.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    --cc=xen-users@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).