From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Date: Tue, 8 Dec 2015 18:57:37 +1100 Subject: [Cluster-devel] [GFS2 PATCH 1/2] GFS2: Make gfs2_clear_inode() queue the final put In-Reply-To: <2096888943.25047174.1449074533569.JavaMail.zimbra@redhat.com> References: <1447958561-2584-1-git-send-email-rpeterso@redhat.com> <1447958561-2584-2-git-send-email-rpeterso@redhat.com> <564F2122.1040506@redhat.com> <1265723491.17688510.1448461363949.JavaMail.zimbra@redhat.com> <5655C51B.4030407@redhat.com> <1268841607.23167369.1448984551492.JavaMail.zimbra@redhat.com> <565EC696.1080100@redhat.com> <2096888943.25047174.1449074533569.JavaMail.zimbra@redhat.com> Message-ID: <20151208075737.GA29099@devil.localdomain> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Wed, Dec 02, 2015 at 11:42:13AM -0500, Bob Peterson wrote: > ----- Original Message ----- > (snip) > > Please take a look at this > > again and figure out what the problematic cycle of events is, and then > > work out how to avoid that happening in the first place. There is no > > point in replacing one problem with another one, particularly one which > > would likely be very tricky to debug, > > > > Steve. > > Rhe problematic cycle of events is well known: > gfs2_clear_inode calls gfs2_glock_put() for the inode's glock, > but if it's the very last put, it calls into dlm, which can block, > and that's where we get into trouble. > > The livelock goes like this: > > 1. A fence operation needs memory, so it blocks on memory allocation. > 2. Memory allocation blocks on slab shrinker. > 3. Slab shrinker calls into vfs inode shrinker to free inodes from memory. .... > 7. dlm blocks on a pending fence operation. Goto 1. Therefore, the fence operation should be doing GFP_NOFS allocations to prevent re-entry into the DLM via the filesystem via the shrinker.... Cheers, Dave. -- Dave Chinner dchinner at redhat.com