From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel@redhat.com, <linux-fsdevel@vger.kernel.org>,
Dave Chinner <dchinner@redhat.com>
Cc: linux-kernel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>
Subject: [PATCH 0/2] Per superblock inode reclaim
Date: Fri, 24 Jun 2016 14:50:09 -0500 [thread overview]
Message-ID: <1466797811-5873-1-git-send-email-rpeterso@redhat.com> (raw)
In July 2011, near patch b0d40c92adafde7c2d81203ce7c1c69275f41140,
Dave Chinner introduced the concept of per-superblock shrinkers.
However, vfs still uses its own function, prune_icache_sb, which
gives almost no control to the underlying file system over the
inode eviction.
The trouble is, some file systems (GFS2 in particular) need more
control over the eviction of inodes. When you evict an inode in GFS2,
it may need to do inter-node locking and unlocking, for which it
calls the distributed lock manager, DLM. But DLM may not be able to
service the request immediately, due to unforeseen circumstances.
For example, if a cluster node has failed, DLM may need to wait for
that node to be fenced which, in turn, may block on a response from the
user space fence daemon. That, in turn, may block waiting on memory
allocation which caused the shrinker to be called in the first place.
Thus, GFS2 sits forever in an unrecoverable deadlock.
This is not unique to GFS2: OCFS2 and NFS probably have the same issue.
The first patch set extends Dave Chinner's idea further, adding a
filesystem-specific prune_icache_sb function and exporting the
existing one.
The second patch changes GFS2 to provide a prune_icache_sb function,
which really just tells its running daemon to shrink the inode slab
when it can, and by how many items. It does so by calling the vfs
prune_icache_sb to do the majority of the work. Deferring the
shrinker to the GFS2 daemon allows the vfs shrinker to proceed
without blocking on GFS2. The process that caused the shrink (in the
example above, the fence daemon) may then proceed without blocking.
The daemon can then safely evict the inodes, calling the DLM without
the risk of deadlock.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
Bob Peterson (2):
vfs: Add hooks for filesystem-specific prune_icache_sb
GFS2: Add a gfs2-specific prune_icache_sb
Documentation/filesystems/vfs.txt | 15 +++++++++++++++
fs/gfs2/incore.h | 2 ++
fs/gfs2/ops_fstype.c | 1 +
fs/gfs2/quota.c | 25 +++++++++++++++++++++++++
fs/gfs2/super.c | 13 +++++++++++++
fs/inode.c | 1 +
fs/super.c | 5 ++++-
include/linux/fs.h | 3 +++
8 files changed, 64 insertions(+), 1 deletion(-)
--
2.5.5
next reply other threads:[~2016-06-24 19:50 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-24 19:50 Bob Peterson [this message]
2016-06-24 19:50 ` [PATCH 1/2] vfs: Add hooks for filesystem-specific prune_icache_sb Bob Peterson
2016-06-28 1:10 ` Dave Chinner
2016-06-24 19:50 ` [PATCH 2/2] GFS2: Add a gfs2-specific prune_icache_sb Bob Peterson
2016-06-26 13:18 ` [Cluster-devel] " Steven Whitehouse
2016-06-28 2:08 ` Dave Chinner
2016-06-28 9:13 ` [Cluster-devel] " Steven Whitehouse
2016-06-28 22:47 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1466797811-5873-1-git-send-email-rpeterso@redhat.com \
--to=rpeterso@redhat.com \
--cc=cluster-devel@redhat.com \
--cc=dchinner@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).