cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH 0/2] Per superblock inode reclaim
@ 2016-06-24 19:50 Bob Peterson
  2016-06-24 19:50 ` [Cluster-devel] [PATCH 1/2] vfs: Add hooks for filesystem-specific prune_icache_sb Bob Peterson
  2016-06-24 19:50 ` [Cluster-devel] [PATCH 2/2] GFS2: Add a gfs2-specific prune_icache_sb Bob Peterson
  0 siblings, 2 replies; 8+ messages in thread
From: Bob Peterson @ 2016-06-24 19:50 UTC (permalink / raw)
  To: cluster-devel.redhat.com

In July 2011, near patch b0d40c92adafde7c2d81203ce7c1c69275f41140,
Dave Chinner introduced the concept of per-superblock shrinkers.
However, vfs still uses its own function, prune_icache_sb, which
gives almost no control to the underlying file system over the
inode eviction.

The trouble is, some file systems (GFS2 in particular) need more
control over the eviction of inodes. When you evict an inode in GFS2,
it may need to do inter-node locking and unlocking, for which it
calls the distributed lock manager, DLM. But DLM may not be able to
service the request immediately, due to unforeseen circumstances.

For example, if a cluster node has failed, DLM may need to wait for
that node to be fenced which, in turn, may block on a response from the
user space fence daemon. That, in turn, may block waiting on memory
allocation which caused the shrinker to be called in the first place.
Thus, GFS2 sits forever in an unrecoverable deadlock.

This is not unique to GFS2: OCFS2 and NFS probably have the same issue.

The first patch set extends Dave Chinner's idea further, adding a
filesystem-specific prune_icache_sb function and exporting the
existing one.

The second patch changes GFS2 to provide a prune_icache_sb function,
which really just tells its running daemon to shrink the inode slab
when it can, and by how many items. It does so by calling the vfs
prune_icache_sb to do the majority of the work. Deferring the
shrinker to the GFS2 daemon allows the vfs shrinker to proceed
without blocking on GFS2. The process that caused the shrink (in the
example above, the fence daemon) may then proceed without blocking.
The daemon can then safely evict the inodes, calling the DLM without
the risk of deadlock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
Bob Peterson (2):
  vfs: Add hooks for filesystem-specific prune_icache_sb
  GFS2: Add a gfs2-specific prune_icache_sb

 Documentation/filesystems/vfs.txt | 15 +++++++++++++++
 fs/gfs2/incore.h                  |  2 ++
 fs/gfs2/ops_fstype.c              |  1 +
 fs/gfs2/quota.c                   | 25 +++++++++++++++++++++++++
 fs/gfs2/super.c                   | 13 +++++++++++++
 fs/inode.c                        |  1 +
 fs/super.c                        |  5 ++++-
 include/linux/fs.h                |  3 +++
 8 files changed, 64 insertions(+), 1 deletion(-)

-- 
2.5.5



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-06-28 22:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-24 19:50 [Cluster-devel] [PATCH 0/2] Per superblock inode reclaim Bob Peterson
2016-06-24 19:50 ` [Cluster-devel] [PATCH 1/2] vfs: Add hooks for filesystem-specific prune_icache_sb Bob Peterson
2016-06-28  1:10   ` Dave Chinner
2016-06-24 19:50 ` [Cluster-devel] [PATCH 2/2] GFS2: Add a gfs2-specific prune_icache_sb Bob Peterson
2016-06-26 13:18   ` Steven Whitehouse
2016-06-28  2:08   ` Dave Chinner
2016-06-28  9:13     ` Steven Whitehouse
2016-06-28 22:47       ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).