From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [Xen-users] "xl restore" leaks a file descriptor? Date: Tue, 11 Aug 2015 16:56:02 +0100 Message-ID: <55CA1B12.4060700@citrix.com> References: <1438592915.30740.101.camel@citrix.com> <1439283311.9747.193.camel@citrix.com> <1439308093.9747.291.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1439308093.9747.291.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell , Andrew Armenia , xen-devel , Wei Liu , Ian Jackson Cc: xen-users@lists.xenproject.org List-Id: xen-devel@lists.xenproject.org On 11/08/15 16:48, Ian Campbell wrote: > On Tue, 2015-08-11 at 11:13 -0400, Andrew Armenia wrote: >> It's the checkpoint file - i.e. the command line argument to xl >> restore - that is being leaked. > Thanks. > > [...] >> So the checkpoint file is clearly being leaked. > Indeed. I confirmed this even with the current development version using ls > -l /proc//fd which shows an fd open on a deleted file: > > # ps aux| grep xl > root 20465 0.0 0.2 106036 984 ? SLsl 15:42 0:00 xl restore save > # ls -l /proc/20465/fd > [...] > lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save > [...] > # rm /root/save > # ls -l /proc/20465/fd > [...] > lr-x------. 1 root root 64 Aug 11 15:42 7 -> /root/save (deleted) > [...] > >> Its space is not freed >> until the 'xl restore' process is ended by shutting down the domain: > [...] >> It seems like xl restore should close the checkpoint file as soon as >> it's done restoring the domain, allowing the space to be freed, but >> that's clearly not happening. > Right. In fact xl sets the file to be close-on-exec right after opening it, > which is before the daemonisation step, so it ought to be closed > automatically, but isn't for some reason. > > My working theory is that something in the machinery which spawns the save > helper is defeating the use of CLOEXEC, perhaps by dup2() or perhaps by > unsetting CLOEXEC. > > Any way, thanks for reporting. I've copied the devel list and 4.6 RM. Wei > this probably ought to be a blocker for 4.6 (and the fix ought ultimately > to be backported to 4.4 onwards at least). > > NB: This leak seems to be independent of the switch to migration v2. IIRC, the file descriptor for this is fcntl()'d by at least 3 separate bits of code (libxl, libxl-save-helper, libxc) once it has been passed into libxl. I would not be surprised if one of the higher levels accidentally clobbered CLOEXEC. ~Andrew