From mboxrd@z Thu Jan  1 00:00:00 1970
From: Bob Peterson <rpeterso@redhat.com>
Date: Thu, 17 Mar 2016 12:01:46 -0400 (EDT)
Subject: [Cluster-devel] [PATCH 4/4] GFS2: Make deletes use regular
 inode_lookup with special code path
In-Reply-To: <56E93EFC.2050207@redhat.com>
References: <1450465350-10060-1-git-send-email-rpeterso@redhat.com>
	<1450465350-10060-5-git-send-email-rpeterso@redhat.com>
	<56CC339F.8010609@redhat.com>
	<1064069915.29336433.1456346534130.JavaMail.zimbra@redhat.com>
	<56CED24F.4030106@redhat.com>
	<862175385.38286170.1458049845115.JavaMail.zimbra@redhat.com>
	<56E93EFC.2050207@redhat.com>
Message-ID: <908777631.39362893.1458230506784.JavaMail.zimbra@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

----- Original Message -----
> > Hi Steve (and all),
> >
> > I've done a lot of analysis of this issue, and run into multiple
> > complications,
> > chief among them that these problems often manifest in unmount after the
> > debugfs files have been destroyed. But I finally have some answers.
> >
> > In almost all cases, what's happening is that delete_work_func sees
> > gl->gl_object and calls gfs2_ilookup, but since the inode is gone at that
> > point,
> > it returns an error. So lookup_by_inum never gets called, and therefore
> > function
> > delete_work_func just exits without doing anything. So the inode is left in
> > an
> > unlinked state.
> >
> > I tried making it call lookup_by_inum as a fallback if gfs2_ilookup doesn't
> > find
> > the inode, but that causes a hang in unmount. This can still happen in
> > today's
> > code, but it's less likely than the first scenario.
> So that bit sounds like there is a race between clearing gl_object and
> the test that the delete_work_func uses. So that is perhaps one thing to
> look at.
> 
> > The hang is caused because gfs2_lookup_by_inum() grabs the inode glock
> > first, then
> > the vfs inode lock. The normal lookup path, gfs2_inode_lookup, gets the vfs
> > inode
> > lock first, then the iopen glock, before everything else. It doesn't really
> > lock
> > the inode glock, but the caller usually does that after it returns from the
> > lookup.
> I'm not sure I follow... which "vfs inode lock" is the problem here?
> Where is it called in/from gfs2_lookup_by_inum ?
> 
> > So in the unmount hang, the delete func is holding the inode glock and
> > waiting for
> > the vfs inode which is stuck in I_FREEING state. That may go back to that
> > patch I
> > reverted which skips over I_FREEING inodes. But if that patch is in place,
> > the
> > I_FREEING inode is skipped, and once again, the inode is left in an
> > UNLINKED state.
> The delete function should not be waiting for I_FREEING inodes, since
> those are already being freed, so there is no need to wait for them - we
> should just ignore them I think, since they will go away anyway.

VFS needs to wait for the I_FREEING inode before it can create another
inode in its place. We need the replacement in order to do our iput
which will cause the gfs2_delete_inode which changes it from unlinked to free.
If we don't, the result is a bunch of inodes that stay in unlinked, which is
the cause of the du vs. df issue.

> > The reason the regular lookup path has no problems is because it grabs the
> > vfs inode
> > lock first, which waits for the I_FREEING inode before going on to lock any
> > glocks.
> >
> > What this means is that NFS very likely has a similar hang, since it also
> > uses
> > lookup_by_inum. So we have to figure out what we're going to do about that.
> > Perhaps you could explain why we wrote lookup_by_inum to begin with?
> The reason is that we have to be very careful when looking up inodes by
> number, because we do not hold the parent directory lock as we would
> during normal lookup, so that the inode can vanish from under us at any
> time. So we have to grab the glock first, and then make several checks
> before we can read in the inode's details.

That makes good sense.
 
Perhaps the crux of the problem is best explained by comparing today's
lookup path by inum (which hangs) versus the regular lookup path, which I
used in this patch, which seemed to fix the problem:

A = glock lock
B = VFS-level inode lock

The original delete lookup path              Normal lookup path:
-----------------------------------------    -------------------------------------
delete_work_func:                            delete_work_func:
{                                            {
   if (gl_object)
      inode = gfs2_ilookup(sdp, no_addr); <-------This one often fails, meaning
                                                  that we don't hang, but we also
                                                  don't transition from unlinked to free.
   else {
      inode = gfs2_lookup_by_inum         <-------This is the problematic code path
A        1. glock_nq inode glock
         2. check block type == unlinked
         3. inode_lookup                      inode_lookup
B              iget_locked (inode_lock)      B   iget_locked (inode_lock)
                  __wait_on_freeing_inode           __wait_on_freeing_inode
               if I_NEW                          if I_NEW
                                                    verify unlinked <------- New section I added
	   				     A         glock nq inode glock
		   				       check block type
					     A         glock dq inode glock
                                                    <----------------------- End of new section
                   unlock_new_inode/i_state         unlock_new_inode/i_state
A        4. glock dq inode glock
   }
B  iput / atomic_dec_and_lock (inode_lock)   B  iput / atomic_dec_and_lock (inode_lock)
      generic_drop_inode                           generic_drop_inode
         generic_delete_inode                         generic_delete_inode
            gfs2_delete_inode                            gfs2_delete_inode
A               gfs2_glock_nq_init(ip->i_gl) A              gfs2_glock_nq_init(ip->i_gl)
A               gfs2_glock_dq_uninit         A              gfs2_glock_dq_uninit
}                                            }

With the lookup_by_inum path, the code gets into an ABAB deadlock where
one process holds the vfs-level lock and requires the GFS2 glock, and the
other does the opposite.

Regards,

Bob Peterson
Red Hat File Systems