From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [Xen-users] "xl restore" leaks a file descriptor? Date: Tue, 11 Aug 2015 18:21:18 +0100 Message-ID: <55CA2F0E.5020400@citrix.com> References: <1438592915.30740.101.camel@citrix.com> <1439283311.9747.193.camel@citrix.com> <1439308093.9747.291.camel@citrix.com> <20150811170725.GU7460@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20150811170725.GU7460@zion.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu , Ian Campbell Cc: xen-devel , Andrew Armenia , Ian Jackson , xen-users@lists.xenproject.org List-Id: xen-devel@lists.xenproject.org On 11/08/15 18:07, Wei Liu wrote: > On Tue, Aug 11, 2015 at 04:48:13PM +0100, Ian Campbell wrote: >> On Tue, 2015-08-11 at 11:13 -0400, Andrew Armenia wrote: >>> It's the checkpoint file - i.e. the command line argument to xl >>> restore - that is being leaked. >> Thanks. >> >> [...] >>> So the checkpoint file is clearly being leaked. >> Indeed. I confirmed this even with the current development version using ls >> -l /proc//fd which shows an fd open on a deleted file: >> >> # ps aux| grep xl >> root 20465 0.0 0.2 106036 984 ? SLsl 15:42 0:00 xl restore save >> # ls -l /proc/20465/fd >> [...] >> lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save >> [...] >> # rm /root/save >> # ls -l /proc/20465/fd >> [...] >> lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save (deleted) >> [...] >> >>> Its space is not freed >>> until the 'xl restore' process is ended by shutting down the domain: >> [...] >>> It seems like xl restore should close the checkpoint file as soon as >>> it's done restoring the domain, allowing the space to be freed, but >>> that's clearly not happening. >> Right. In fact xl sets the file to be close-on-exec right after opening it, >> which is before the daemonisation step, so it ought to be closed >> automatically, but isn't for some reason. >> >> My working theory is that something in the machinery which spawns the save >> helper is defeating the use of CLOEXEC, perhaps by dup2() or perhaps by >> unsetting CLOEXEC. >> >> Any way, thanks for reporting. I've copied the devel list and 4.6 RM. Wei >> this probably ought to be a blocker for 4.6 (and the fix ought ultimately >> to be backported to 4.4 onwards at least). >> >> NB: This leak seems to be independent of the switch to migration v2. >> >> Ian. > Maybe this is just because we leak a fd. > > I don't see how CLOEXEC would be of any use if xl doesn't actually exec > anything. > > Below is a PoC patch which seems to fix the problem for me. > > ---8<--- > commit 7b5f466d5977dc9f41991ca0c2227023ac07709d > Author: Wei Liu > Date: Tue Aug 11 18:02:25 2015 +0100 > > xl: close restore_fd when we finish with it > > Signed-off-by: Wei Liu > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > index 499a05c..525cd24 100644 > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -2846,6 +2846,10 @@ start: > ret = libxl_domain_create_new(ctx, &d_config, &domid, > 0, autoconnect_console_how); > } > + > + if (migrate_fd < 0) > + close(restore_fd); > + You surely need check for restore_fd >= 0, to avoid a potential EBADF ? ~Andrew