linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ian Kent <raven@themaw.net>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] freeing unliked file indefinitely delayed
Date: Thu, 09 Jul 2015 19:26:44 +0800	[thread overview]
Message-ID: <1436441204.2709.10.camel@pluto.fritz.box> (raw)
In-Reply-To: <1436440655.2709.8.camel@pluto.fritz.box>

On Thu, 2015-07-09 at 19:17 +0800, Ian Kent wrote:
> On Wed, 2015-07-08 at 02:42 +0100, Al Viro wrote:
> > 	Normally opening a file, unlinking it and then closing will have
> > the inode freed upon close() (provided that it's not otherwise busy and
> > has no remaining links, of course).  However, there's one case where that
> > does *not* happen.  Namely, if you open it by fhandle with cold dcache,
> > then unlink() and close().
> > 
> > 	In normal case you get d_delete() in unlink(2) notice that dentry
> > is busy and unhash it; on the final dput() it will be forcibly evicted from
> > dcache, triggering iput() and inode removal.  In this case, though, we end
> > up with *two* dentries - disconnected (created by open-by-fhandle) and
> > regular one (used by unlink()).  The latter will have its reference to inode
> > dropped just fine, but the former will not - it's considered hashed (it
> > is on the ->s_anon list), so it will stay around until the memory pressure
> > will finally do it in.  As the result, we have the final iput() delayed
> > indefinitely.  It's trivial to reproduce -
> > 
> > #define _GNU_SOURCE
> > #include <sys/types.h>
> > #include <sys/stat.h>
> > #include <fcntl.h>
> > 
> > void flush_dcache(void)
> > {
> >         system("mount -o remount,rw /");
> > }
> > 
> > static char buf[20 * 1024 * 1024];
> > 
> > main()
> > {
> >         int fd;
> >         union { 
> >                 struct file_handle f;
> >                 char buf[MAX_HANDLE_SZ];
> >         } x;
> >         int m;
> > 
> >         x.f.handle_bytes = sizeof(x);
> >         chdir("/root");
> >         mkdir("foo", 0700);
> >         fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
> >         close(fd);
> >         name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
> >         flush_dcache();
> >         fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
> >         unlink("foo/bar");
> >         write(fd, buf, sizeof(buf));
> >         system("df .");			/* 20Mb eaten */
> >         close(fd);
> >         system("df .");			/* should've freed those 20Mb */
> >         flush_dcache();
> >         system("df .");			/* should be the same as #2 */
> > }
> > 
> > will spit out something like
> > Filesystem     1K-blocks   Used Available Use% Mounted on
> > /dev/root         322023 303843      1131 100% /
> > Filesystem     1K-blocks   Used Available Use% Mounted on
> > /dev/root         322023 303843      1131 100% /
> > Filesystem     1K-blocks   Used Available Use% Mounted on
> > /dev/root         322023 283282     21692  93% /
> > - inode gets freed only when dentry is finally evicted (here we trigger
> > than by remount; normally it would've happened in response to memory
> > pressure hell knows when).
> > 
> > IMO it's a bug.  Between the close() and final flush_dcache() the file has
> > no surviving links, is *not* busy, won't show up in fuser/lsof/whatnot
> > output, and yet it's still not freed.  I'm not saying that it's realistically
> > exploitable (albeit with nfsd it might be), but it's a very unpleasant
> > self-LART.
> > 
> > FWIW, my prefered fix would be simply to have the final dput() treat
> > disconnected dentries as "kill on sight"; checking for i_nlink of the
> > inode, as Bruce suggested several years ago, will *not* work, simply
> > because having another link to that file and unlinking it after close
> > will reproduce the situation - disconnected dentry sticks around in
> > dcache past its final dput() and past the last unlink() of our file.
> > Theoretically it might cause an overhead for nfsd (no_subtree_check v3
> > export might see more d_alloc()/d_free(); icache retention will still
> > prevent constant rereading the inode from disk).  _IF_ that proves to
> > be noticable, we might need to come up with something more elaborate
> > (e.g. have unlink() and rename() kick disconnected aliases out if the link
> > count has reached 0), but it's more complex and needs careful ananlysis
> > of correctness - we need to prove that there's no way to miss the link
> > count reaching 0.  I would prefer to treat all disconnected as unhashed
> > for dcache retention purposes - it's simpler and less brittle.  Comments?
> > I mean something like this:
> 
> Al, help me out here, I'm struggling to understand where these dentrys
> come from (apart from your reproducer).
> 
> For example, on the heavily patched 2.6.32 kernel where this was first
> seen the problem dentry is annoymous, refcount 0, and unhashed.
> 
> But the dentrys that will most likely face summary execution will be
> hashed, such as was the case on that 2.6.32 kernel at dput().
> 
> Doesn't that mean that something dropped the dentry after the dput(),
> that will now also free the dentry, that took the refcount to 0?

Oh wait, think I get it now ... perhaps it's prune_one_dentry() doing
it ...

Ian

  reply	other threads:[~2015-07-09 11:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-08  1:42 [RFC] freeing unliked file indefinitely delayed Al Viro
2015-07-08  2:39 ` J. Bruce Fields
2015-07-08 15:41 ` Ben Myers
2015-07-12 15:00   ` [RFC] freeing unlinked " Al Viro
2015-07-13 18:17     ` Ben Myers
2015-07-13 19:56       ` Al Viro
2015-07-14  0:54         ` Ben Myers
2015-07-09 11:17 ` [RFC] freeing unliked " Ian Kent
2015-07-09 11:26   ` Ian Kent [this message]
2015-07-12 15:17     ` [RFC] freeing unlinked " Al Viro
2015-07-13  2:30       ` Ian Kent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1436441204.2709.10.camel@pluto.fritz.box \
    --to=raven@themaw.net \
    --cc=bfields@fieldses.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).