From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Whitehouse Date: Fri, 20 Nov 2015 13:47:04 +0000 Subject: [Cluster-devel] [GFS2 PATCH 2/2] GFS2: Revert 35e478f Flush pending glock work when evicting an inode In-Reply-To: <1447958561-2584-3-git-send-email-rpeterso@redhat.com> References: <1447958561-2584-1-git-send-email-rpeterso@redhat.com> <1447958561-2584-3-git-send-email-rpeterso@redhat.com> Message-ID: <564F2458.7090902@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, On 19/11/15 18:42, Bob Peterson wrote: > Commit 35e478f was added because there were cases where > the inode's glock still had pending delayed work. > The theory is that the delayed work referenced the inode after it > had been deleted from memory by gfs2_evict_inode during its call to > filemap_fdatawrite. I'm guessing that's wrong. I'm guessing that > the delayed work was referencing the glock after it had been > freed due to the call to gfs2_glock_put() in gfs2_evict_inode(). > The previous patch made the last put be done from the delayed > work queue, so that should make 35e478f obsolete. The other thing > to consider is that vfs evict() calls invalidate_inode_buffers > before calling the file system-specific gfs2_evict_inode(), and > therefore we should have no dirty buffers for the inode itself. > (We might have dirty buffers for the glock address space, though). Not only are there likely to be dirty buffers still in the meta data address space, that is a requirement for good performance when unlinking lots of inodes. > The reason why the call to flush_delayed_work can't be done here > is that it can block, waiting like so: > > kernel: Call Trace: > kernel: [] schedule_timeout+0x215/0x2e0 > kernel: [] ? shrink_inactive_list+0x53a/0x830 > kernel: [] wait_for_common+0x123/0x180 > kernel: [] ? default_wake_function+0x0/0x20 > kernel: [] wait_for_completion+0x1d/0x20 > kernel: [] flush_work+0x77/0xc0 > kernel: [] ? wq_barrier_func+0x0/0x20 > kernel: [] flush_delayed_work+0x54/0x70 > kernel: [] gfs2_clear_inode+0x62/0xf0 [gfs2] > > The delayed work can depend on actions from DLM, and DLM can > hang indefinitely on a fencing action. > > Signed-off-by: Bob Peterson > --- > fs/gfs2/super.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c > index 46e5004..4a9b090 100644 > --- a/fs/gfs2/super.c > +++ b/fs/gfs2/super.c > @@ -1613,7 +1613,6 @@ out: > clear_inode(inode); > gfs2_dir_hash_inval(ip); > ip->i_gl->gl_object = NULL; > - flush_delayed_work(&ip->i_gl->gl_work); > gfs2_glock_add_to_lru(ip->i_gl); > if (queue_delayed_work(gfs2_glock_workqueue, > &ip->i_gl->gl_work, 0) == 0) I'm not convinced that it is safe to do this, without finding some other way to flush the workqueue. All glock work items on the queue hold a ref count to the glock, so it should not be possible to have a case where the glock is referenced by work queue activity after it has been freed - unless I've misunderstood the explanation above, Steve.