From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:47207 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751951AbcFZNS5 (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Sun, 26 Jun 2016 09:18:57 -0400
Subject: Re: [Cluster-devel] [PATCH 2/2] GFS2: Add a gfs2-specific
 prune_icache_sb
To: Bob Peterson <rpeterso@redhat.com>, cluster-devel@redhat.com,
	linux-fsdevel@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
References: <1466797811-5873-1-git-send-email-rpeterso@redhat.com>
 <1466797811-5873-3-git-send-email-rpeterso@redhat.com>
Cc: linux-kernel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>
From: Steven Whitehouse <swhiteho@redhat.com>
Message-ID: <576FD63C.6020805@redhat.com>
Date: Sun, 26 Jun 2016 14:18:52 +0100
MIME-Version: 1.0
In-Reply-To: <1466797811-5873-3-git-send-email-rpeterso@redhat.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

Hi,

I think the idea looks good. A couple of comments below though...

On 24/06/16 20:50, Bob Peterson wrote:
> This patch adds a new prune_icache_sb function for the VFS slab
> shrinker to call. Trying to directly free the inodes from memory
> might deadlock because it evicts inodes, which calls into DLM
> to acquire the glock. The DLM, in turn, may block on a pending
> fence operation, which may already be blocked on memory allocation
> that caused the slab shrinker to be called in the first place.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>   fs/gfs2/incore.h     |  2 ++
>   fs/gfs2/ops_fstype.c |  1 +
>   fs/gfs2/quota.c      | 25 +++++++++++++++++++++++++
>   fs/gfs2/super.c      | 13 +++++++++++++
>   4 files changed, 41 insertions(+)
>
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index a6a3389..a367459 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -757,6 +757,8 @@ struct gfs2_sbd {
>   
>   	struct task_struct *sd_logd_process;
>   	struct task_struct *sd_quotad_process;
> +	int sd_iprune; /* inodes to prune */
> +	spinlock_t sd_shrinkspin;
>   
>   	/* Quota stuff */
>   
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index 4546360..65a69be 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -95,6 +95,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
>   	spin_lock_init(&sdp->sd_jindex_spin);
>   	mutex_init(&sdp->sd_jindex_mutex);
>   	init_completion(&sdp->sd_journal_ready);
> +	spin_lock_init(&sdp->sd_shrinkspin);
>   
>   	INIT_LIST_HEAD(&sdp->sd_quota_list);
>   	mutex_init(&sdp->sd_quota_mutex);
> diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
> index ce7d69a..5810a2c 100644
> --- a/fs/gfs2/quota.c
> +++ b/fs/gfs2/quota.c
> @@ -1528,14 +1528,39 @@ void gfs2_wake_up_statfs(struct gfs2_sbd *sdp) {
>   int gfs2_quotad(void *data)
>   {
>   	struct gfs2_sbd *sdp = data;
> +	struct super_block *sb = sdp->sd_vfs;
>   	struct gfs2_tune *tune = &sdp->sd_tune;
>   	unsigned long statfs_timeo = 0;
>   	unsigned long quotad_timeo = 0;
>   	unsigned long t = 0;
>   	DEFINE_WAIT(wait);
>   	int empty;
> +	int rc;
> +	struct shrink_control sc = {.gfp_mask = GFP_KERNEL, };
>   
>   	while (!kthread_should_stop()) {
> +		/* TODO: Deal with shrinking of dcache */
> +		/* Prune any inode cache intended by the shrinker. */
> +		spin_lock(&sdp->sd_shrinkspin);
> +		if (sdp->sd_iprune > 0) {
> +			sc.nr_to_scan = sdp->sd_iprune;
> +			if (sc.nr_to_scan > 1024)
> +				sc.nr_to_scan = 1024;
> +			sdp->sd_iprune -= sc.nr_to_scan;
> +			spin_unlock(&sdp->sd_shrinkspin);
> +			rc = prune_icache_sb(sb, &sc);
> +			if (rc < 0) {
> +				spin_lock(&sdp->sd_shrinkspin);
> +				sdp->sd_iprune = 0;
> +				spin_unlock(&sdp->sd_shrinkspin);
> +			}
> +			if (sdp->sd_iprune) {
> +				cond_resched();
> +				continue;
> +			}
> +		} else {
> +			spin_unlock(&sdp->sd_shrinkspin);
> +		}
>   
>   		/* Update the master statfs file */
>   		if (sdp->sd_statfs_force_sync) {
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 9b2ff353..75e8a85 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1667,6 +1667,18 @@ static void gfs2_destroy_inode(struct inode *inode)
>   	call_rcu(&inode->i_rcu, gfs2_i_callback);
>   }
>   
> +static long gfs2_prune_icache_sb(struct super_block *sb,
> +				 struct shrink_control *sc)
> +{
> +	struct gfs2_sbd *sdp;
> +
> +	sdp = sb->s_fs_info;
> +	spin_lock(&sdp->sd_shrinkspin);
> +	sdp->sd_iprune = sc->nr_to_scan + 1;
> +	spin_unlock(&sdp->sd_shrinkspin);
> +	return 0;
> +}
This doesn't wake up the thread that will do the reclaim, so that there 
may be a significant delay between the request to shrink and the actual 
shrink. Also, using quotad is not a good plan, since it might itself 
block waiting for memory. This should be done by a thread on its own to 
avoid any deadlock possibility here.

There also appears to be a limit of 1024 to scan per run of quotad, 
which means it would take a very long time to push out any significant 
number, and it does seem a bit arbitrary - was there a reason for 
selecting that number? It would probably be better to simply yield every 
now and then if there are a lot of items to process,

Steve.

> +
>   const struct super_operations gfs2_super_ops = {
>   	.alloc_inode		= gfs2_alloc_inode,
>   	.destroy_inode		= gfs2_destroy_inode,
> @@ -1681,5 +1693,6 @@ const struct super_operations gfs2_super_ops = {
>   	.remount_fs		= gfs2_remount_fs,
>   	.drop_inode		= gfs2_drop_inode,
>   	.show_options		= gfs2_show_options,
> +	.prune_icache_sb	= gfs2_prune_icache_sb,
>   };
>