From mboxrd@z Thu Jan  1 00:00:00 1970
From: Steven Whitehouse <swhiteho@redhat.com>
Date: Fri, 20 Nov 2015 13:47:04 +0000
Subject: [Cluster-devel] [GFS2 PATCH 2/2] GFS2: Revert 35e478f Flush
 pending glock work when evicting an inode
In-Reply-To: <1447958561-2584-3-git-send-email-rpeterso@redhat.com>
References: <1447958561-2584-1-git-send-email-rpeterso@redhat.com>
	<1447958561-2584-3-git-send-email-rpeterso@redhat.com>
Message-ID: <564F2458.7090902@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi,

On 19/11/15 18:42, Bob Peterson wrote:
> Commit 35e478f was added because there were cases where
> the inode's glock still had pending delayed work.
> The theory is that the delayed work referenced the inode after it
> had been deleted from memory by gfs2_evict_inode during its call to
> filemap_fdatawrite. I'm guessing that's wrong. I'm guessing that
> the delayed work was referencing the glock after it had been
> freed due to the call to gfs2_glock_put() in gfs2_evict_inode().
> The previous patch made the last put be done from the delayed
> work queue, so that should make 35e478f obsolete. The other thing
> to consider is that vfs evict() calls invalidate_inode_buffers
> before calling the file system-specific gfs2_evict_inode(), and
> therefore we should have no dirty buffers for the inode itself.
> (We might have dirty buffers for the glock address space, though).
Not only are there likely to be dirty buffers still in the meta data 
address space, that is a requirement for good performance when unlinking 
lots of inodes.

> The reason why the call to flush_delayed_work can't be done here
> is that it can block, waiting like so:
>
> kernel: Call Trace:
> kernel: [<ffffffff815395f5>] schedule_timeout+0x215/0x2e0
> kernel: [<ffffffff81141f1a>] ? shrink_inactive_list+0x53a/0x830
> kernel: [<ffffffff81539273>] wait_for_common+0x123/0x180
> kernel: [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
> kernel: [<ffffffff8153938d>] wait_for_completion+0x1d/0x20
> kernel: [<ffffffff8109b3e7>] flush_work+0x77/0xc0
> kernel: [<ffffffff8109ac50>] ? wq_barrier_func+0x0/0x20
> kernel: [<ffffffff8109b604>] flush_delayed_work+0x54/0x70
> kernel: [<ffffffffa05e3552>] gfs2_clear_inode+0x62/0xf0 [gfs2]
>
> The delayed work can depend on actions from DLM, and DLM can
> hang indefinitely on a fencing action.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>   fs/gfs2/super.c | 1 -
>   1 file changed, 1 deletion(-)
>
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 46e5004..4a9b090 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1613,7 +1613,6 @@ out:
>   	clear_inode(inode);
>   	gfs2_dir_hash_inval(ip);
>   	ip->i_gl->gl_object = NULL;
> -	flush_delayed_work(&ip->i_gl->gl_work);
>   	gfs2_glock_add_to_lru(ip->i_gl);
>   	if (queue_delayed_work(gfs2_glock_workqueue,
>   			       &ip->i_gl->gl_work, 0) == 0)

I'm not convinced that it is safe to do this, without finding some other 
way to flush the workqueue. All glock work items on the queue hold a ref 
count to the glock, so it should not be possible to have a case where 
the glock is referenced by work queue activity after it has been freed  
- unless I've misunderstood the explanation above,

Steve.