All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ian Campbell <ian.campbell@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: xen-users@lists.xenproject.org,
	Andrew Armenia <andrew@asquaredlabs.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	xen-devel <xen-devel@lists.xen.org>
Subject: Re: [Xen-users] "xl restore" leaks a file descriptor?
Date: Wed, 12 Aug 2015 11:04:25 +0100	[thread overview]
Message-ID: <1439373865.9747.330.camel@citrix.com> (raw)
In-Reply-To: <20150812094918.GY7460@zion.uk.xensource.com>

On Wed, 2015-08-12 at 10:49 +0100, Wei Liu wrote:
> On Wed, Aug 12, 2015 at 09:41:13AM +0100, Ian Campbell wrote:
> > On Tue, 2015-08-11 at 18:07 +0100, Wei Liu wrote:
> > > On Tue, Aug 11, 2015 at 04:48:13PM +0100, Ian Campbell wrote:
> > > > On Tue, 2015-08-11 at 11:13 -0400, Andrew Armenia wrote:
> > > > > It's the checkpoint file - i.e. the command line argument to xl
> > > > > restore - that is being leaked.
> > > > 
> > > > Thanks.
> > > > 
> > > > [...]
> > > > > So the checkpoint file is clearly being leaked.
> > > > 
> > > > Indeed. I confirmed this even with the current development version 
> > > > using ls
> > > > -l /proc/<pid>/fd which shows an fd open on a deleted file:
> > > > 
> > > > # ps aux| grep xl
> > > > root     20465  0.0  0.2 106036   984 ?        SLsl 15:42   0:00 xl 
> > > > 
> > > > restore save
> > > > # ls -l /proc/20465/fd
> > > > [...]
> > > > lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save
> > > > [...]
> > > > # rm /root/save
> > > > # ls -l /proc/20465/fd
> > > > [...]
> > > > lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save (deleted)
> > > > [...]
> > > > 
> > > > >  Its space is not freed
> > > > > until the 'xl restore' process is ended by shutting down the 
> > > > > domain:
> > > > [...]
> > > > > 
> > > > > It seems like xl restore should close the checkpoint file as soon 
> > > > > as
> > > > > it's done restoring the domain, allowing the space to be freed, 
> > > > > but
> > > > > that's clearly not happening.
> > > > 
> > > > Right. In fact xl sets the file to be close-on-exec right after 
> > > > opening 
> > > > it,
> > > > which is before the daemonisation step, so it ought to be closed
> > > > automatically, but isn't for some reason.
> > > > 
> > > > My working theory is that something in the machinery which spawns 
> > > > the 
> > > > save
> > > > helper is defeating the use of CLOEXEC, perhaps by dup2() or 
> > > > perhaps by
> > > > unsetting CLOEXEC.
> > > > 
> > > > Any way, thanks for reporting. I've copied the devel list and 4.6 
> > > > RM. 
> > > > Wei
> > > > this probably ought to be a blocker for 4.6 (and the fix ought 
> > > > ultimately
> > > > to be backported to 4.4 onwards at least).
> > > > 
> > > > NB: This leak seems to be independent of the switch to migration 
> > > > v2.
> > > > 
> > > > Ian.
> > > 
> > > Maybe this is just because we leak a fd.
> > > 
> > > I don't see how CLOEXEC would be of any use if xl doesn't actually 
> > > exec
> > > anything.
> > 
> > Duh, for some reason I thought daemonize would activate the CLOEXEC, 
> > but
> > it's just fork without exec. Silly me.
> > 
> > > 
> > > Below is a PoC patch which seems to fix the problem for me.
> > > 
> > > ---8<---
> > > commit 7b5f466d5977dc9f41991ca0c2227023ac07709d
> > > Author: Wei Liu <wei.liu2@citrix.com>
> > > Date:   Tue Aug 11 18:02:25 2015 +0100
> > > 
> > >     xl: close restore_fd when we finish with it
> > >     
> > >     Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > 
> > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > > index 499a05c..525cd24 100644
> > > --- a/tools/libxl/xl_cmdimpl.c
> > > +++ b/tools/libxl/xl_cmdimpl.c
> > > @@ -2846,6 +2846,10 @@ start:
> > >          ret = libxl_domain_create_new(ctx, &d_config, &domid,
> > >                                        0, autoconnect_console_how);
> > >      }
> > > +
> > > +    if (migrate_fd < 0)
> > > +        close(restore_fd);
> > 
> > As Andy says I think we want restore_fd in the check, I can't see any
> > reason we wouldn't want to close the socket too.
> > 
> 
> Do you mean migrate_fd when you say "socket"?

In the migrate case we do "restore_fd = migrate_fd;", so yes, indirectly.


>  I tried that, but that led
> to failure because toolstack still needs to get controlling information
> out of it (the "GO" message).
> 
> Maybe I close this too early.

Right.


>  I will have a closer look today.
> 
> > For reboot handing you would need to reset the fd to < 0, otherwise 
> > when we
> > come back around on reboot we will close this again.
> > 
> > Would it be less error prone to put this in the if (restoring) just 
> > above,
> > i.e. exactly where restore_fd is used and which already has the reboot
> > logic in place with restoring = 0.
> > 
> 
> Depending on whether we can close migrate_fd.
> 
> Wei.
> 
> > Ian.

  reply	other threads:[~2015-08-12 10:04 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+jCKRWVz1UsybJq6w18-x4vDB5D2j=qi2uqdbqWFaVWv9Gu-A@mail.gmail.com>
     [not found] ` <1438592915.30740.101.camel@citrix.com>
     [not found]   ` <CA+jCKRUSxG3nFC=BJCqKy=kABrN27Nde4A67bxBEm5TYD71yPA@mail.gmail.com>
     [not found]     ` <1439283311.9747.193.camel@citrix.com>
     [not found]       ` <CA+jCKRVqL4DOYZK-etugCnVRhOocVKYdhGQWG4XYCqWZUWcmfA@mail.gmail.com>
2015-08-11 15:48         ` [Xen-users] "xl restore" leaks a file descriptor? Ian Campbell
2015-08-11 15:56           ` Andrew Cooper
2015-08-11 17:07           ` Wei Liu
2015-08-11 17:21             ` Andrew Cooper
2015-08-11 20:06               ` Wei Liu
2015-08-12  8:41             ` Ian Campbell
2015-08-12  9:30               ` Ian Campbell
2015-08-12  9:49               ` Wei Liu
2015-08-12 10:04                 ` Ian Campbell [this message]
2015-08-12 17:12                   ` Wei Liu
2015-08-13  8:39                     ` Ian Campbell
2015-08-13  8:50                       ` Wei Liu
2015-08-13  9:17                         ` Ian Campbell
2015-08-13  9:38                           ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1439373865.9747.330.camel@citrix.com \
    --to=ian.campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=andrew@asquaredlabs.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    --cc=xen-users@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.