From: "J. Bruce Fields" <bfields@fieldses.org>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] freeing unliked file indefinitely delayed
Date: Tue, 7 Jul 2015 22:39:04 -0400 [thread overview]
Message-ID: <20150708023904.GA13727@fieldses.org> (raw)
In-Reply-To: <20150708014237.GC17109@ZenIV.linux.org.uk>
On Wed, Jul 08, 2015 at 02:42:38AM +0100, Al Viro wrote:
> Normally opening a file, unlinking it and then closing will have
> the inode freed upon close() (provided that it's not otherwise busy and
> has no remaining links, of course). However, there's one case where that
> does *not* happen. Namely, if you open it by fhandle with cold dcache,
> then unlink() and close().
>
> In normal case you get d_delete() in unlink(2) notice that dentry
> is busy and unhash it; on the final dput() it will be forcibly evicted from
> dcache, triggering iput() and inode removal. In this case, though, we end
> up with *two* dentries - disconnected (created by open-by-fhandle) and
> regular one (used by unlink()). The latter will have its reference to inode
> dropped just fine, but the former will not - it's considered hashed (it
> is on the ->s_anon list), so it will stay around until the memory pressure
> will finally do it in. As the result, we have the final iput() delayed
> indefinitely. It's trivial to reproduce -
>
> #define _GNU_SOURCE
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
>
> void flush_dcache(void)
> {
> system("mount -o remount,rw /");
> }
>
> static char buf[20 * 1024 * 1024];
>
> main()
> {
> int fd;
> union {
> struct file_handle f;
> char buf[MAX_HANDLE_SZ];
> } x;
> int m;
>
> x.f.handle_bytes = sizeof(x);
> chdir("/root");
> mkdir("foo", 0700);
> fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
> close(fd);
> name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
> flush_dcache();
> fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
> unlink("foo/bar");
> write(fd, buf, sizeof(buf));
> system("df ."); /* 20Mb eaten */
> close(fd);
> system("df ."); /* should've freed those 20Mb */
> flush_dcache();
> system("df ."); /* should be the same as #2 */
> }
>
> will spit out something like
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/root 322023 303843 1131 100% /
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/root 322023 303843 1131 100% /
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/root 322023 283282 21692 93% /
> - inode gets freed only when dentry is finally evicted (here we trigger
> than by remount; normally it would've happened in response to memory
> pressure hell knows when).
>
> IMO it's a bug.
Definitely agreed.
> Between the close() and final flush_dcache() the file has
> no surviving links, is *not* busy, won't show up in fuser/lsof/whatnot
> output, and yet it's still not freed. I'm not saying that it's realistically
> exploitable (albeit with nfsd it might be)
I'd be surprised if it doesn't happen. This, for example, is a "space
on nfs export not freed when I expected it after an unlink" report that
could easily have the same cause:
https://www.redhat.com/archives/linux-cluster/2015-May/msg00000.html
(Not sure, I'll get back to them and see if they can confirm.)
> but it's a very unpleasant self-LART.
> FWIW, my prefered fix would be simply to have the final dput() treat
> disconnected dentries as "kill on sight"; checking for i_nlink of the
> inode, as Bruce suggested several years ago, will *not* work, simply
> because having another link to that file and unlinking it after close
> will reproduce the situation - disconnected dentry sticks around in
> dcache past its final dput() and past the last unlink() of our file.
> Theoretically it might cause an overhead for nfsd (no_subtree_check v3
> export might see more d_alloc()/d_free(); icache retention will still
> prevent constant rereading the inode from disk). _IF_ that proves to
> be noticable, we might need to come up with something more elaborate
> (e.g. have unlink() and rename() kick disconnected aliases out if the link
> count has reached 0), but it's more complex and needs careful ananlysis
> of correctness - we need to prove that there's no way to miss the link
> count reaching 0. I would prefer to treat all disconnected as unhashed
> for dcache retention purposes - it's simpler and less brittle. Comments?
> I mean something like this:
ACK from me.
--b.
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 7a3f3e5..5c8ea15 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -642,7 +642,7 @@ static inline bool fast_dput(struct dentry *dentry)
>
> /*
> * If we have a d_op->d_delete() operation, we sould not
> - * let the dentry count go to zero, so use "put__or_lock".
> + * let the dentry count go to zero, so use "put_or_lock".
> */
> if (unlikely(dentry->d_flags & DCACHE_OP_DELETE))
> return lockref_put_or_lock(&dentry->d_lockref);
> @@ -697,7 +697,7 @@ static inline bool fast_dput(struct dentry *dentry)
> */
> smp_rmb();
> d_flags = ACCESS_ONCE(dentry->d_flags);
> - d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST;
> + d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | DCACHE_DISCONNECTED;
>
> /* Nothing to do? Dropping the reference was all we needed? */
> if (d_flags == (DCACHE_REFERENCED | DCACHE_LRU_LIST) && !d_unhashed(dentry))
> @@ -776,6 +776,9 @@ repeat:
> if (unlikely(d_unhashed(dentry)))
> goto kill_it;
>
> + if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED))
> + goto kill_it;
> +
> if (unlikely(dentry->d_flags & DCACHE_OP_DELETE)) {
> if (dentry->d_op->d_delete(dentry))
> goto kill_it;
next prev parent reply other threads:[~2015-07-08 2:39 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-08 1:42 [RFC] freeing unliked file indefinitely delayed Al Viro
2015-07-08 2:39 ` J. Bruce Fields [this message]
2015-07-08 15:41 ` Ben Myers
2015-07-12 15:00 ` [RFC] freeing unlinked " Al Viro
2015-07-13 18:17 ` Ben Myers
2015-07-13 19:56 ` Al Viro
2015-07-14 0:54 ` Ben Myers
2015-07-09 11:17 ` [RFC] freeing unliked " Ian Kent
2015-07-09 11:26 ` Ian Kent
2015-07-12 15:17 ` [RFC] freeing unlinked " Al Viro
2015-07-13 2:30 ` Ian Kent
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150708023904.GA13727@fieldses.org \
--to=bfields@fieldses.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.