cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] [GFS2 PATCH 0/2] GFS2: Avoid inode shrinker-related deadlocks
@ 2015-11-19 18:42 Bob Peterson
  2015-11-19 18:42 ` [Cluster-devel] [GFS2 PATCH 1/2] GFS2: Make gfs2_clear_inode() queue the final put Bob Peterson
  2015-11-19 18:42 ` [Cluster-devel] [GFS2 PATCH 2/2] GFS2: Revert 35e478f Flush pending glock work when evicting an inode Bob Peterson
  0 siblings, 2 replies; 18+ messages in thread
From: Bob Peterson @ 2015-11-19 18:42 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

This set of two patches is very simple, even though it's somewhat big.

The problem is that GFS2 can livelock when the inode shrinker runs.

The failing scenario goes like this:
The inode shrinker tells GFS2 to evict inodes, but in order to evict them,
the evict code needs to unlock their glocks, and calls DLM to do so.
In this particular scenario, the DLM can't unlock the glocks because
it's blocked waiting on a pending fence operation to complete. The fence
operation can't complete because it's blocked waiting to allocate memory.
The allocation of memory can't complete because it's waiting for the
shrinker, which of course, is waiting for GFS2 to evict inodes.

The problem is that it's unsafe for GFS2's evict code to make a call into
DLM. The solution is to queue the final glock put to the glock state
machine. Since the glocks outlive the inode anyway, it's safe for the
evict code to continue and return successfully back to the shrinker,
which can then continue running and satisfy the memory needed for the
fence operation, and so forth.

The second patch reverts commit 35e478f, which made the evict code wait
for outstanding work on the glock state machine. That's also unsafe for
the same reason: It can block on the state machine, which can block on
DLM. That should be safe now, for reasons stated in that patch.

Bob Peterson (2):
  GFS2: Make gfs2_clear_inode() queue the final put
  GFS2: Revert 35e478f Flush pending glock work when evicting an inode

 fs/gfs2/glock.c | 34 +++++++++++++++++-----------------
 fs/gfs2/glock.h |  1 +
 fs/gfs2/super.c |  6 ++++--
 3 files changed, 22 insertions(+), 19 deletions(-)

-- 
2.5.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-12-08  9:03 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-19 18:42 [Cluster-devel] [GFS2 PATCH 0/2] GFS2: Avoid inode shrinker-related deadlocks Bob Peterson
2015-11-19 18:42 ` [Cluster-devel] [GFS2 PATCH 1/2] GFS2: Make gfs2_clear_inode() queue the final put Bob Peterson
2015-11-20 13:33   ` Steven Whitehouse
2015-11-25 14:22     ` Bob Peterson
2015-11-25 14:26       ` Steven Whitehouse
2015-12-01 15:42         ` Bob Peterson
2015-12-02 10:23           ` Steven Whitehouse
2015-12-02 16:42             ` Bob Peterson
2015-12-02 17:41               ` Bob Peterson
2015-12-03 11:18                 ` Steven Whitehouse
2015-12-04 14:51                   ` Bob Peterson
2015-12-04 15:51                     ` David Teigland
2015-12-04 17:38                       ` Bob Peterson
2015-12-08  7:57               ` Dave Chinner
2015-12-08  9:03                 ` Steven Whitehouse
2015-11-19 18:42 ` [Cluster-devel] [GFS2 PATCH 2/2] GFS2: Revert 35e478f Flush pending glock work when evicting an inode Bob Peterson
2015-11-20 13:47   ` Steven Whitehouse
2015-11-25 14:36     ` Bob Peterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).