[Cluster-devel] [GFS2 PATCH 2/2] GFS2: Revert 35e478f Flush pending glock work when evicting an inode

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH 2/2] GFS2: Revert 35e478f Flush pending glock work when evicting an inode
Date: Wed, 25 Nov 2015 09:36:34 -0500 (EST)	[thread overview]
Message-ID: <793601052.17695822.1448462194602.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <564F2458.7090902@redhat.com>

Hi,

----- Original Message -----
> Hi,
> 
> On 19/11/15 18:42, Bob Peterson wrote:
> > Commit 35e478f was added because there were cases where
> > the inode's glock still had pending delayed work.
> > The theory is that the delayed work referenced the inode after it
> > had been deleted from memory by gfs2_evict_inode during its call to
> > filemap_fdatawrite. I'm guessing that's wrong. I'm guessing that
> > the delayed work was referencing the glock after it had been
> > freed due to the call to gfs2_glock_put() in gfs2_evict_inode().
> > The previous patch made the last put be done from the delayed
> > work queue, so that should make 35e478f obsolete. The other thing
> > to consider is that vfs evict() calls invalidate_inode_buffers
> > before calling the file system-specific gfs2_evict_inode(), and
> > therefore we should have no dirty buffers for the inode itself.
> > (We might have dirty buffers for the glock address space, though).
> Not only are there likely to be dirty buffers still in the meta data
> address space, that is a requirement for good performance when unlinking
> lots of inodes.
> 
> > The reason why the call to flush_delayed_work can't be done here
> > is that it can block, waiting like so:
> >
> > kernel: Call Trace:
> > kernel: [<ffffffff815395f5>] schedule_timeout+0x215/0x2e0
> > kernel: [<ffffffff81141f1a>] ? shrink_inactive_list+0x53a/0x830
> > kernel: [<ffffffff81539273>] wait_for_common+0x123/0x180
> > kernel: [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
> > kernel: [<ffffffff8153938d>] wait_for_completion+0x1d/0x20
> > kernel: [<ffffffff8109b3e7>] flush_work+0x77/0xc0
> > kernel: [<ffffffff8109ac50>] ? wq_barrier_func+0x0/0x20
> > kernel: [<ffffffff8109b604>] flush_delayed_work+0x54/0x70
> > kernel: [<ffffffffa05e3552>] gfs2_clear_inode+0x62/0xf0 [gfs2]
> >
> > The delayed work can depend on actions from DLM, and DLM can
> > hang indefinitely on a fencing action.
> >
> > Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> > ---
> >   fs/gfs2/super.c | 1 -
> >   1 file changed, 1 deletion(-)
> >
> > diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> > index 46e5004..4a9b090 100644
> > --- a/fs/gfs2/super.c
> > +++ b/fs/gfs2/super.c
> > @@ -1613,7 +1613,6 @@ out:
> >   	clear_inode(inode);
> >   	gfs2_dir_hash_inval(ip);
> >   	ip->i_gl->gl_object = NULL;
> > -	flush_delayed_work(&ip->i_gl->gl_work);
> >   	gfs2_glock_add_to_lru(ip->i_gl);
> >   	if (queue_delayed_work(gfs2_glock_workqueue,
> >   			       &ip->i_gl->gl_work, 0) == 0)
> 
> I'm not convinced that it is safe to do this, without finding some other
> way to flush the workqueue. All glock work items on the queue hold a ref
> count to the glock, so it should not be possible to have a case where
> the glock is referenced by work queue activity after it has been freed
> - unless I've misunderstood the explanation above,
> 
> Steve.

I'm not convinced it's 100% safe either, however, two things to consider:

1. With the previous patch, we're queueing the final glock put, so when the
   final put is done, the glock delayed work should be done, right?

   By the time the final delayed work is run, gl_object is set to NULL
   (immediately above) so the state machine should not reference the inode
   being deleted.
2. I've asked Barry to re-test the failing scenario with specsfs 2008, and
   he's done so several times and not recreated the problem.

Regards,

Bob Peterson
Red Hat File Systems

     prev parent reply	other threads:[~2015-11-25 14:36 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-19 18:42 [Cluster-devel] [GFS2 PATCH 0/2] GFS2: Avoid inode shrinker-related deadlocks Bob Peterson
2015-11-19 18:42 ` [Cluster-devel] [GFS2 PATCH 1/2] GFS2: Make gfs2_clear_inode() queue the final put Bob Peterson
2015-11-20 13:33   ` Steven Whitehouse
2015-11-25 14:22     ` Bob Peterson
2015-11-25 14:26       ` Steven Whitehouse
2015-12-01 15:42         ` Bob Peterson
2015-12-02 10:23           ` Steven Whitehouse
2015-12-02 16:42             ` Bob Peterson
2015-12-02 17:41               ` Bob Peterson
2015-12-03 11:18                 ` Steven Whitehouse
2015-12-04 14:51                   ` Bob Peterson
2015-12-04 15:51                     ` David Teigland
2015-12-04 17:38                       ` Bob Peterson
2015-12-08  7:57               ` Dave Chinner
2015-12-08  9:03                 ` Steven Whitehouse
2015-11-19 18:42 ` [Cluster-devel] [GFS2 PATCH 2/2] GFS2: Revert 35e478f Flush pending glock work when evicting an inode Bob Peterson
2015-11-20 13:47   ` Steven Whitehouse
2015-11-25 14:36     ` Bob Peterson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=793601052.17695822.1448462194602.JavaMail.zimbra@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).