cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Steven Whitehouse <swhiteho@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously
Date: Thu, 1 Jun 2017 16:37:15 +0100	[thread overview]
Message-ID: <13b3fa4d-34ab-cae5-4fba-9cd1f02b0b22@redhat.com> (raw)
In-Reply-To: <1496242992-1607-8-git-send-email-agruenba@redhat.com>

Hi,

comments below...


On 31/05/17 16:03, Andreas Gruenbacher wrote:
> gfs2_evict_inode is called to free inodes under memory pressure.  The
> function calls into DLM when an inode's last cluster-wide reference goes
> away (remote unlink) and to release the glock and associated DLM lock
> before finally destroying the inode.  However, if DLM is blocked on
> memory to become available, calling into DLM again will deadlock.
>
> Avoid that by decoupling releasing glocks from destroying inodes in that
> case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously
> in work queue context, when the associated inodes have most likely
> already been destroyed.
>
> With this change, it appears that inodes can end up being unlinked,
> remote-unlink can be triggered, and then the inode can be reallocated
> before all remote-unlink callbacks are processed.  Revalidate the link
> count in gfs2_evict_inode to make sure we're not destroying an
> allocated, referenced inode.
>
> In addition, skip remote unlinks under memory pressure; the next inode
> allocation in the same resource group will take care of destroying
> unlinked inodes.
>
> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> ---
>   fs/gfs2/glock.c | 10 +++++++++-
>   fs/gfs2/glock.h |  2 ++
>   fs/gfs2/super.c | 30 ++++++++++++++++++++++++++++--
>   3 files changed, 39 insertions(+), 3 deletions(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index e6d32d2..4ba53e9 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -170,7 +170,7 @@ void gfs2_glock_free(struct gfs2_glock *gl)
>    *
>    */
>   
> -static void gfs2_glock_hold(struct gfs2_glock *gl)
> +void gfs2_glock_hold(struct gfs2_glock *gl)
>   {
>   	GLOCK_BUG_ON(gl, __lockref_is_dead(&gl->gl_lockref));
>   	lockref_get(&gl->gl_lockref);
> @@ -269,6 +269,14 @@ void gfs2_glock_put(struct gfs2_glock *gl)
>   	sdp->sd_lockstruct.ls_ops->lm_put_lock(gl);
>   }
>   
> +/*
> + * Cause the glock to be put in work queue context.
> + */
> +void gfs2_glock_queue_put(struct gfs2_glock *gl)
> +{
> +	gfs2_glock_queue_work(gl, 0);
> +}
> +
This is confusing. If the work queue is already scheduled, then it will 
simply drop the ref count on the glock by one directly and not 
reschedule the work item. That means that if the ref count were to hit 
zero, then that state would potentially by missed, maybe that can't 
happen in this case, but it isn't clear that it can't at first glance.

>   /**
>    * may_grant - check if its ok to grant a new lock
>    * @gl: The glock
> diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
> index 9ad4a6a..33e0511 100644
> --- a/fs/gfs2/glock.h
> +++ b/fs/gfs2/glock.h
> @@ -181,7 +181,9 @@ static inline struct address_space *gfs2_glock2aspace(struct gfs2_glock *gl)
>   extern int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
>   			  const struct gfs2_glock_operations *glops,
>   			  int create, struct gfs2_glock **glp);
> +extern void gfs2_glock_hold(struct gfs2_glock *gl);
>   extern void gfs2_glock_put(struct gfs2_glock *gl);
> +extern void gfs2_glock_queue_put(struct gfs2_glock *gl);
>   extern void gfs2_holder_init(struct gfs2_glock *gl, unsigned int state,
>   			     u16 flags, struct gfs2_holder *gh);
>   extern void gfs2_holder_reinit(unsigned int state, u16 flags,
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index c651983..ace4814 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -1541,6 +1541,16 @@ static void gfs2_evict_inode(struct inode *inode)
>   	if (inode->i_nlink || (sb->s_flags & MS_RDONLY))
>   		goto out;
>   
> +	/*
> +	 * If we are in shrinker context, DLM may depend on us to make
> +	 * progress.  In that case, calling into DLM again would deadlock.  To
> +	 * prevent that from happening, skip deallocating the inode here; it
> +	 * will be deallocated when another inode is allocated in the same
> +	 * resource group.
> +	 */
> +	if (current->flags & PF_MEMALLOC)
> +		goto out;
> +
That is not quick enough, and in fact I'm not sure it is true either. We 
don't search the whole rgrp when we go to allocate new blocks, we only 
look when we are short of space. So the question is what causes the 
inode to be deallocated in this case? We need to be able to do:

rm foo
df

and see that the space taken up by foo has been returned, otherwise we 
are substituting one bug for another one.


>   	/* Must not read inode block until block type has been verified */
>   	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, &gh);
>   	if (unlikely(error)) {
> @@ -1561,6 +1571,12 @@ static void gfs2_evict_inode(struct inode *inode)
>   			goto out_truncate;
>   	}
>   
> +	/*
> +	 * The inode may have been recreated in the meantime.
> +	 */
> +	if (inode->i_nlink)
> +		goto out_truncate;
> +
>   	if (gfs2_holder_initialized(&ip->i_iopen_gh) &&
>   	    test_bit(HIF_HOLDER, &ip->i_iopen_gh.gh_iflags)) {
>   		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
> @@ -1640,12 +1656,22 @@ static void gfs2_evict_inode(struct inode *inode)
>   	glock_set_object(ip->i_gl, NULL);
>   	wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
>   	gfs2_glock_add_to_lru(ip->i_gl);
> -	gfs2_glock_put(ip->i_gl);
> +	if (current->flags & PF_MEMALLOC)
> +		gfs2_glock_queue_put(ip->i_gl);
> +	else
> +		gfs2_glock_put(ip->i_gl);
>   	ip->i_gl = NULL;
>   	if (gfs2_holder_initialized(&ip->i_iopen_gh)) {
> -		glock_set_object(ip->i_iopen_gh.gh_gl, NULL);
> +		struct gfs2_glock *gl = ip->i_iopen_gh.gh_gl;
> +
> +		glock_set_object(gl, NULL);
>   		ip->i_iopen_gh.gh_flags |= GL_NOCACHE;
> +		gfs2_glock_hold(gl);
>   		gfs2_glock_dq_uninit(&ip->i_iopen_gh);
> +		if (current->flags & PF_MEMALLOC)
> +			gfs2_glock_queue_put(gl);
> +		else
> +			gfs2_glock_put(ip->i_gl);
>   	}
>   }
>   



  parent reply	other threads:[~2017-06-01 15:37 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-31 15:03 [Cluster-devel] [PATCH 0/8] GFS2 shrinker deadlock Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 1/8] gfs2: Get rid of flush_delayed_work in gfs2_evict_inode Andreas Gruenbacher
2017-06-01 15:13   ` Steven Whitehouse
2017-05-31 15:03 ` [Cluster-devel] [PATCH 2/8] gfs2: Protect gl->gl_object by spin lock Andreas Gruenbacher
2017-06-01 15:16   ` Steven Whitehouse
2017-05-31 15:03 ` [Cluster-devel] [PATCH 3/8] gfs2: Clean up glock work enqueuing Andreas Gruenbacher
2017-06-01 15:21   ` Steven Whitehouse
2017-06-01 15:43     ` Andreas Gruenbacher
2017-06-01 15:46       ` Steven Whitehouse
2017-06-02  9:19         ` Andreas Gruenbacher
2017-06-02  9:29           ` Steven Whitehouse
2017-05-31 15:03 ` [Cluster-devel] [PATCH 4/8] gfs2: Always check block type in gfs2_evict_inode Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 5/8] gfs2: gfs2_glock_get: Wait on freeing glocks Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 6/8] gfs2: Get rid of gfs2_set_nlink Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously Andreas Gruenbacher
2017-06-01 10:51   ` Andreas Gruenbacher
2017-06-01 15:37   ` Steven Whitehouse [this message]
2017-06-02 13:57     ` Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 8/8] gfs2: Warn when not deleting inodes under memory pressure Andreas Gruenbacher
2017-06-01 15:39   ` Steven Whitehouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13b3fa4d-34ab-cae5-4fba-9cd1f02b0b22@redhat.com \
    --to=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).