From mboxrd@z Thu Jan  1 00:00:00 1970
From: Josef Bacik <jbacik@fusionio.com>
Subject: Re: btrfs send 'leaks' open files
Date: Tue, 22 Oct 2013 17:07:05 -0400
Message-ID: <20131022210705.GC10632@localhost.localdomain>
References: <CAERquQ78pOig3oQoom2e0Jcsv1i3-_8ET0ewAWUUo+nt2vk4Mg@mail.gmail.com>
 <20131022172249.GV13318@ZenIV.linux.org.uk>
 <20131022204105.GA13318@ZenIV.linux.org.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Cc: Phil Davis <pmdhazy@gmail.com>, <linux-btrfs@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
To: Al Viro <viro@ZenIV.linux.org.uk>
Return-path: <linux-btrfs-owner@vger.kernel.org>
Content-Disposition: inline
In-Reply-To: <20131022204105.GA13318@ZenIV.linux.org.uk>
Sender: linux-btrfs-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org

On Tue, Oct 22, 2013 at 09:41:05PM +0100, Al Viro wrote:
> On Tue, Oct 22, 2013 at 06:22:49PM +0100, Al Viro wrote:
> > On Sun, Oct 20, 2013 at 11:33:56AM +0100, Phil Davis wrote:
> > 
> > > The reason I think btrfs send is leaking open files is if you watch
> > > /proc/sys/fs/file-nr you see the
> > > number of open files increasing  but if you kill the btrfs send
> > > process then the open
> > > files count reduces back down.  In fact suspending the process also
> > > reduces the open file count but
> > > resuming it then makes the count start increasing again.
> > 
> > What does lsof show while you are running that?  AFAICS, btrfs_ioctl_send()
> > should be neutral wrt file references - we do fget() on entry and
> > fput() of the result on exit, with nothing else looking relevant in
> > sight...  OTOH, btrfs-progs number of calls of that ioctl() seems to
> > be bounded by the length of argument list.  So the interesting questions
> > are
> > 	a) how many btrfs send instances are running at the time?
> > 	b) what do their arg lists look like?
> > 	c) who (if anyone) has all those opened files in their descriptor
> > tables?
> 
> 
> aaaarrrgh.....  OK, I see what's going on.  We have a normal process doing
> a normal syscall (well, inasmuch as ioctl(2) can be called normal).  It's
> just that this syscall happens to open and close an obscene amount of
> struct file as it goes.  Which leads to all those struct file sitting there
> and waiting for task_work_run() to do __fput().  In the meanwhile we keep
> allocating new ones (and closing them).  All without returning to userland.
> 
> Result: O(regular files in snapshot) struct file instances by the time it
> ends.  Of course, once we return to userland (or get killed by OOM),
> we have task_work_run() called and all those suckers go away.
> 
> Note that decrementing the opened files counter earlier will do nothing
> to OOM - it *is* caused by the bloody huge pile of struct file / struct
> dentry / struct inode.  We really need to flush that shite somehow - or
> avoid producing it in the first place.
> 
> The trouble is, I'm not sure that doing __fput() here is really safe - the
> call chains are long and convoluted and I don't see what the locking
> environment is.  IOW, I'm not sure that it's really deadlock-free with
> fput() done synchronously.  btrfs_release_file() seems to be doing some
> non-trivial work if we had the file truncated by somebody else, so...
> 
> Does using vfs_read() in send_write() really make things much simpler for
> us?  That's the only reason to bother with that dentry_open() at all;
> we could bloody well just copy the data from page cache without bothering
> with struct file, set_fs(), etc.
> 
> Comments?

I agree, we should probably just drop the vfs_read() call and do the hard work
ourselves.  I will look into doing this in the near future.  Thanks,

Josef