Re: btrfs send 'leaks' open files

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: btrfs send 'leaks' open files
       [not found] ` <20131022172249.GV13318@ZenIV.linux.org.uk>
@ 2013-10-22 20:41   ` Al Viro
  2013-10-22 21:07     ` Josef Bacik
  0 siblings, 1 reply; 2+ messages in thread
From: Al Viro @ 2013-10-22 20:41 UTC (permalink / raw)
  To: Phil Davis; +Cc: linux-btrfs, linux-fsdevel, Linus Torvalds

On Tue, Oct 22, 2013 at 06:22:49PM +0100, Al Viro wrote:
> On Sun, Oct 20, 2013 at 11:33:56AM +0100, Phil Davis wrote:
> 
> > The reason I think btrfs send is leaking open files is if you watch
> > /proc/sys/fs/file-nr you see the
> > number of open files increasing  but if you kill the btrfs send
> > process then the open
> > files count reduces back down.  In fact suspending the process also
> > reduces the open file count but
> > resuming it then makes the count start increasing again.
> 
> What does lsof show while you are running that?  AFAICS, btrfs_ioctl_send()
> should be neutral wrt file references - we do fget() on entry and
> fput() of the result on exit, with nothing else looking relevant in
> sight...  OTOH, btrfs-progs number of calls of that ioctl() seems to
> be bounded by the length of argument list.  So the interesting questions
> are
> 	a) how many btrfs send instances are running at the time?
> 	b) what do their arg lists look like?
> 	c) who (if anyone) has all those opened files in their descriptor
> tables?

aaaarrrgh.....  OK, I see what's going on.  We have a normal process doing
a normal syscall (well, inasmuch as ioctl(2) can be called normal).  It's
just that this syscall happens to open and close an obscene amount of
struct file as it goes.  Which leads to all those struct file sitting there
and waiting for task_work_run() to do __fput().  In the meanwhile we keep
allocating new ones (and closing them).  All without returning to userland.

Result: O(regular files in snapshot) struct file instances by the time it
ends.  Of course, once we return to userland (or get killed by OOM),
we have task_work_run() called and all those suckers go away.

Note that decrementing the opened files counter earlier will do nothing
to OOM - it *is* caused by the bloody huge pile of struct file / struct
dentry / struct inode.  We really need to flush that shite somehow - or
avoid producing it in the first place.

The trouble is, I'm not sure that doing __fput() here is really safe - the
call chains are long and convoluted and I don't see what the locking
environment is.  IOW, I'm not sure that it's really deadlock-free with
fput() done synchronously.  btrfs_release_file() seems to be doing some
non-trivial work if we had the file truncated by somebody else, so...

Does using vfs_read() in send_write() really make things much simpler for
us?  That's the only reason to bother with that dentry_open() at all;
we could bloody well just copy the data from page cache without bothering
with struct file, set_fs(), etc.

Comments?

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: btrfs send 'leaks' open files
  2013-10-22 20:41   ` btrfs send 'leaks' open files Al Viro
@ 2013-10-22 21:07     ` Josef Bacik
  0 siblings, 0 replies; 2+ messages in thread
From: Josef Bacik @ 2013-10-22 21:07 UTC (permalink / raw)
  To: Al Viro; +Cc: Phil Davis, linux-btrfs, linux-fsdevel, Linus Torvalds

On Tue, Oct 22, 2013 at 09:41:05PM +0100, Al Viro wrote:
> On Tue, Oct 22, 2013 at 06:22:49PM +0100, Al Viro wrote:
> > On Sun, Oct 20, 2013 at 11:33:56AM +0100, Phil Davis wrote:
> > 
> > > The reason I think btrfs send is leaking open files is if you watch
> > > /proc/sys/fs/file-nr you see the
> > > number of open files increasing  but if you kill the btrfs send
> > > process then the open
> > > files count reduces back down.  In fact suspending the process also
> > > reduces the open file count but
> > > resuming it then makes the count start increasing again.
> > 
> > What does lsof show while you are running that?  AFAICS, btrfs_ioctl_send()
> > should be neutral wrt file references - we do fget() on entry and
> > fput() of the result on exit, with nothing else looking relevant in
> > sight...  OTOH, btrfs-progs number of calls of that ioctl() seems to
> > be bounded by the length of argument list.  So the interesting questions
> > are
> > 	a) how many btrfs send instances are running at the time?
> > 	b) what do their arg lists look like?
> > 	c) who (if anyone) has all those opened files in their descriptor
> > tables?
> 
> 
> aaaarrrgh.....  OK, I see what's going on.  We have a normal process doing
> a normal syscall (well, inasmuch as ioctl(2) can be called normal).  It's
> just that this syscall happens to open and close an obscene amount of
> struct file as it goes.  Which leads to all those struct file sitting there
> and waiting for task_work_run() to do __fput().  In the meanwhile we keep
> allocating new ones (and closing them).  All without returning to userland.
> 
> Result: O(regular files in snapshot) struct file instances by the time it
> ends.  Of course, once we return to userland (or get killed by OOM),
> we have task_work_run() called and all those suckers go away.
> 
> Note that decrementing the opened files counter earlier will do nothing
> to OOM - it *is* caused by the bloody huge pile of struct file / struct
> dentry / struct inode.  We really need to flush that shite somehow - or
> avoid producing it in the first place.
> 
> The trouble is, I'm not sure that doing __fput() here is really safe - the
> call chains are long and convoluted and I don't see what the locking
> environment is.  IOW, I'm not sure that it's really deadlock-free with
> fput() done synchronously.  btrfs_release_file() seems to be doing some
> non-trivial work if we had the file truncated by somebody else, so...
> 
> Does using vfs_read() in send_write() really make things much simpler for
> us?  That's the only reason to bother with that dentry_open() at all;
> we could bloody well just copy the data from page cache without bothering
> with struct file, set_fs(), etc.
> 
> Comments?

I agree, we should probably just drop the vfs_read() call and do the hard work
ourselves.  I will look into doing this in the near future.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-10-22 21:07 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAERquQ78pOig3oQoom2e0Jcsv1i3-_8ET0ewAWUUo+nt2vk4Mg@mail.gmail.com>
     [not found] ` <20131022172249.GV13318@ZenIV.linux.org.uk>
2013-10-22 20:41   ` btrfs send 'leaks' open files Al Viro
2013-10-22 21:07     ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).