All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Whitehouse <swhiteho@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 2/2] GFS2: Add a gfs2-specific prune_icache_sb
Date: Sun, 26 Jun 2016 14:18:52 +0100	[thread overview]
Message-ID: <576FD63C.6020805@redhat.com> (raw)
In-Reply-To: <1466797811-5873-3-git-send-email-rpeterso@redhat.com>

Hi,

I think the idea looks good. A couple of comments below though...

On 24/06/16 20:50, Bob Peterson wrote:
> This patch adds a new prune_icache_sb function for the VFS slab
> shrinker to call. Trying to directly free the inodes from memory
> might deadlock because it evicts inodes, which calls into DLM
> to acquire the glock. The DLM, in turn, may block on a pending
> fence operation, which may already be blocked on memory allocation
> that caused the slab shrinker to be called in the first place.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>   fs/gfs2/incore.h     |  2 ++
>   fs/gfs2/ops_fstype.c |  1 +
>   fs/gfs2/quota.c      | 25 +++++++++++++++++++++++++
>   fs/gfs2/super.c      | 13 +++++++++++++
>   4 files changed, 41 insertions(+)
>
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index a6a3389..a367459 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -757,6 +757,8 @@ struct gfs2_sbd {
>   
>   	struct task_struct *sd_logd_process;
>   	struct task_struct *sd_quotad_process;
> +	int sd_iprune; /* inodes to prune */
> +	spinlock_t sd_shrinkspin;
>   
>   	/* Quota stuff */
>   
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index 4546360..65a69be 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -95,6 +95,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
>   	spin_lock_init(&sdp->sd_jindex_spin);
>   	mutex_init(&sdp->sd_jindex_mutex);
>   	init_completion(&sdp->sd_journal_ready);
> +	spin_lock_init(&sdp->sd_shrinkspin);
>   
>   	INIT_LIST_HEAD(&sdp->sd_quota_list);
>   	mutex_init(&sdp->sd_quota_mutex);
> diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
> index ce7d69a..5810a2c 100644
> --- a/fs/gfs2/quota.c
> +++ b/fs/gfs2/quota.c
> @@ -1528,14 +1528,39 @@ void gfs2_wake_up_statfs(struct gfs2_sbd *sdp) {
>   int gfs2_quotad(void *data)
>   {
>   	struct gfs2_sbd *sdp = data;
> +	struct super_block *sb = sdp->sd_vfs;
>   	struct gfs2_tune *tune = &sdp->sd_tune;
>   	unsigned long statfs_timeo = 0;
>   	unsigned long quotad_timeo = 0;
>   	unsigned long t = 0;
>   	DEFINE_WAIT(wait);
>   	int empty;
> +	int rc;
> +	struct shrink_control sc = {.gfp_mask = GFP_KERNEL, };
>   
>   	while (!kthread_should_stop()) {
> +		/* TODO: Deal with shrinking of dcache */
> +		/* Prune any inode cache intended by the shrinker. */
> +		spin_lock(&sdp->sd_shrinkspin);
> +		if (sdp->sd_iprune > 0) {
> +			sc.nr_to_scan = sdp->sd_iprune;
> +			if (sc.nr_to_scan > 1024)
> +				sc.nr_to_scan = 1024;
> +			sdp->sd_iprune -= sc.nr_to_scan;
> +			spin_unlock(&sdp->sd_shrinkspin);
> +			rc = prune_icache_sb(sb, &sc);
> +			if (rc < 0) {
> +				spin_lock(&sdp->sd_shrinkspin);
> +				sdp->sd_iprune = 0;
> +				spin_unlock(&sdp->sd_shrinkspin);
> +			}
> +			if (sdp->sd_iprune) {
> +				cond_resched();
> +				continue;
> +			}
> +		} else {
> +			spin_unlock(&sdp->sd_shrinkspin);
> +		}
>   
>   		/* Update the master statfs file */
>   		if (sdp->sd_statfs_force_sync) {
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 9b2ff353..75e8a85 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1667,6 +1667,18 @@ static void gfs2_destroy_inode(struct inode *inode)
>   	call_rcu(&inode->i_rcu, gfs2_i_callback);
>   }
>   
> +static long gfs2_prune_icache_sb(struct super_block *sb,
> +				 struct shrink_control *sc)
> +{
> +	struct gfs2_sbd *sdp;
> +
> +	sdp = sb->s_fs_info;
> +	spin_lock(&sdp->sd_shrinkspin);
> +	sdp->sd_iprune = sc->nr_to_scan + 1;
> +	spin_unlock(&sdp->sd_shrinkspin);
> +	return 0;
> +}
This doesn't wake up the thread that will do the reclaim, so that there 
may be a significant delay between the request to shrink and the actual 
shrink. Also, using quotad is not a good plan, since it might itself 
block waiting for memory. This should be done by a thread on its own to 
avoid any deadlock possibility here.

There also appears to be a limit of 1024 to scan per run of quotad, 
which means it would take a very long time to push out any significant 
number, and it does seem a bit arbitrary - was there a reason for 
selecting that number? It would probably be better to simply yield every 
now and then if there are a lot of items to process,

Steve.

> +
>   const struct super_operations gfs2_super_ops = {
>   	.alloc_inode		= gfs2_alloc_inode,
>   	.destroy_inode		= gfs2_destroy_inode,
> @@ -1681,5 +1693,6 @@ const struct super_operations gfs2_super_ops = {
>   	.remount_fs		= gfs2_remount_fs,
>   	.drop_inode		= gfs2_drop_inode,
>   	.show_options		= gfs2_show_options,
> +	.prune_icache_sb	= gfs2_prune_icache_sb,
>   };
>   



WARNING: multiple messages have this Message-ID (diff)
From: Steven Whitehouse <swhiteho@redhat.com>
To: Bob Peterson <rpeterso@redhat.com>,
	cluster-devel@redhat.com, linux-fsdevel@vger.kernel.org,
	Dave Chinner <dchinner@redhat.com>
Cc: linux-kernel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [Cluster-devel] [PATCH 2/2] GFS2: Add a gfs2-specific prune_icache_sb
Date: Sun, 26 Jun 2016 14:18:52 +0100	[thread overview]
Message-ID: <576FD63C.6020805@redhat.com> (raw)
In-Reply-To: <1466797811-5873-3-git-send-email-rpeterso@redhat.com>

Hi,

I think the idea looks good. A couple of comments below though...

On 24/06/16 20:50, Bob Peterson wrote:
> This patch adds a new prune_icache_sb function for the VFS slab
> shrinker to call. Trying to directly free the inodes from memory
> might deadlock because it evicts inodes, which calls into DLM
> to acquire the glock. The DLM, in turn, may block on a pending
> fence operation, which may already be blocked on memory allocation
> that caused the slab shrinker to be called in the first place.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>   fs/gfs2/incore.h     |  2 ++
>   fs/gfs2/ops_fstype.c |  1 +
>   fs/gfs2/quota.c      | 25 +++++++++++++++++++++++++
>   fs/gfs2/super.c      | 13 +++++++++++++
>   4 files changed, 41 insertions(+)
>
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index a6a3389..a367459 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -757,6 +757,8 @@ struct gfs2_sbd {
>   
>   	struct task_struct *sd_logd_process;
>   	struct task_struct *sd_quotad_process;
> +	int sd_iprune; /* inodes to prune */
> +	spinlock_t sd_shrinkspin;
>   
>   	/* Quota stuff */
>   
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index 4546360..65a69be 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -95,6 +95,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
>   	spin_lock_init(&sdp->sd_jindex_spin);
>   	mutex_init(&sdp->sd_jindex_mutex);
>   	init_completion(&sdp->sd_journal_ready);
> +	spin_lock_init(&sdp->sd_shrinkspin);
>   
>   	INIT_LIST_HEAD(&sdp->sd_quota_list);
>   	mutex_init(&sdp->sd_quota_mutex);
> diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
> index ce7d69a..5810a2c 100644
> --- a/fs/gfs2/quota.c
> +++ b/fs/gfs2/quota.c
> @@ -1528,14 +1528,39 @@ void gfs2_wake_up_statfs(struct gfs2_sbd *sdp) {
>   int gfs2_quotad(void *data)
>   {
>   	struct gfs2_sbd *sdp = data;
> +	struct super_block *sb = sdp->sd_vfs;
>   	struct gfs2_tune *tune = &sdp->sd_tune;
>   	unsigned long statfs_timeo = 0;
>   	unsigned long quotad_timeo = 0;
>   	unsigned long t = 0;
>   	DEFINE_WAIT(wait);
>   	int empty;
> +	int rc;
> +	struct shrink_control sc = {.gfp_mask = GFP_KERNEL, };
>   
>   	while (!kthread_should_stop()) {
> +		/* TODO: Deal with shrinking of dcache */
> +		/* Prune any inode cache intended by the shrinker. */
> +		spin_lock(&sdp->sd_shrinkspin);
> +		if (sdp->sd_iprune > 0) {
> +			sc.nr_to_scan = sdp->sd_iprune;
> +			if (sc.nr_to_scan > 1024)
> +				sc.nr_to_scan = 1024;
> +			sdp->sd_iprune -= sc.nr_to_scan;
> +			spin_unlock(&sdp->sd_shrinkspin);
> +			rc = prune_icache_sb(sb, &sc);
> +			if (rc < 0) {
> +				spin_lock(&sdp->sd_shrinkspin);
> +				sdp->sd_iprune = 0;
> +				spin_unlock(&sdp->sd_shrinkspin);
> +			}
> +			if (sdp->sd_iprune) {
> +				cond_resched();
> +				continue;
> +			}
> +		} else {
> +			spin_unlock(&sdp->sd_shrinkspin);
> +		}
>   
>   		/* Update the master statfs file */
>   		if (sdp->sd_statfs_force_sync) {
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 9b2ff353..75e8a85 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1667,6 +1667,18 @@ static void gfs2_destroy_inode(struct inode *inode)
>   	call_rcu(&inode->i_rcu, gfs2_i_callback);
>   }
>   
> +static long gfs2_prune_icache_sb(struct super_block *sb,
> +				 struct shrink_control *sc)
> +{
> +	struct gfs2_sbd *sdp;
> +
> +	sdp = sb->s_fs_info;
> +	spin_lock(&sdp->sd_shrinkspin);
> +	sdp->sd_iprune = sc->nr_to_scan + 1;
> +	spin_unlock(&sdp->sd_shrinkspin);
> +	return 0;
> +}
This doesn't wake up the thread that will do the reclaim, so that there 
may be a significant delay between the request to shrink and the actual 
shrink. Also, using quotad is not a good plan, since it might itself 
block waiting for memory. This should be done by a thread on its own to 
avoid any deadlock possibility here.

There also appears to be a limit of 1024 to scan per run of quotad, 
which means it would take a very long time to push out any significant 
number, and it does seem a bit arbitrary - was there a reason for 
selecting that number? It would probably be better to simply yield every 
now and then if there are a lot of items to process,

Steve.

> +
>   const struct super_operations gfs2_super_ops = {
>   	.alloc_inode		= gfs2_alloc_inode,
>   	.destroy_inode		= gfs2_destroy_inode,
> @@ -1681,5 +1693,6 @@ const struct super_operations gfs2_super_ops = {
>   	.remount_fs		= gfs2_remount_fs,
>   	.drop_inode		= gfs2_drop_inode,
>   	.show_options		= gfs2_show_options,
> +	.prune_icache_sb	= gfs2_prune_icache_sb,
>   };
>   


  reply	other threads:[~2016-06-26 13:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-24 19:50 [Cluster-devel] [PATCH 0/2] Per superblock inode reclaim Bob Peterson
2016-06-24 19:50 ` Bob Peterson
2016-06-24 19:50 ` [Cluster-devel] [PATCH 1/2] vfs: Add hooks for filesystem-specific prune_icache_sb Bob Peterson
2016-06-24 19:50   ` Bob Peterson
2016-06-28  1:10   ` [Cluster-devel] " Dave Chinner
2016-06-28  1:10     ` Dave Chinner
2016-06-24 19:50 ` [Cluster-devel] [PATCH 2/2] GFS2: Add a gfs2-specific prune_icache_sb Bob Peterson
2016-06-24 19:50   ` Bob Peterson
2016-06-26 13:18   ` Steven Whitehouse [this message]
2016-06-26 13:18     ` [Cluster-devel] " Steven Whitehouse
2016-06-28  2:08   ` Dave Chinner
2016-06-28  2:08     ` Dave Chinner
2016-06-28  9:13     ` [Cluster-devel] " Steven Whitehouse
2016-06-28  9:13       ` Steven Whitehouse
2016-06-28 22:47       ` Dave Chinner
2016-06-28 22:47         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=576FD63C.6020805@redhat.com \
    --to=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.