cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Andreas Gruenbacher <agruenba@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously
Date: Fri, 2 Jun 2017 15:57:23 +0200	[thread overview]
Message-ID: <CAHc6FU58aerhgAveS=BBPo-zr9HDvy9hULVgC-K_cjpknm-zHA@mail.gmail.com> (raw)
In-Reply-To: <13b3fa4d-34ab-cae5-4fba-9cd1f02b0b22@redhat.com>

On Thu, Jun 1, 2017 at 5:37 PM, Steven Whitehouse <swhiteho@redhat.com> wrote:
> Hi,
>
> comments below...
>
>
>
> On 31/05/17 16:03, Andreas Gruenbacher wrote:
>>
>> gfs2_evict_inode is called to free inodes under memory pressure.  The
>> function calls into DLM when an inode's last cluster-wide reference goes
>> away (remote unlink) and to release the glock and associated DLM lock
>> before finally destroying the inode.  However, if DLM is blocked on
>> memory to become available, calling into DLM again will deadlock.
>>
>> Avoid that by decoupling releasing glocks from destroying inodes in that
>> case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously
>> in work queue context, when the associated inodes have most likely
>> already been destroyed.
>>
>> With this change, it appears that inodes can end up being unlinked,
>> remote-unlink can be triggered, and then the inode can be reallocated
>> before all remote-unlink callbacks are processed.  Revalidate the link
>> count in gfs2_evict_inode to make sure we're not destroying an
>> allocated, referenced inode.
>>
>> In addition, skip remote unlinks under memory pressure; the next inode
>> allocation in the same resource group will take care of destroying
>> unlinked inodes.
>>
>> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
>> ---
>>   fs/gfs2/glock.c | 10 +++++++++-
>>   fs/gfs2/glock.h |  2 ++
>>   fs/gfs2/super.c | 30 ++++++++++++++++++++++++++++--
>>   3 files changed, 39 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
>> index e6d32d2..4ba53e9 100644
>> --- a/fs/gfs2/glock.c
>> +++ b/fs/gfs2/glock.c
>> @@ -170,7 +170,7 @@ void gfs2_glock_free(struct gfs2_glock *gl)
>>    *
>>    */
>>   -static void gfs2_glock_hold(struct gfs2_glock *gl)
>> +void gfs2_glock_hold(struct gfs2_glock *gl)
>>   {
>>         GLOCK_BUG_ON(gl, __lockref_is_dead(&gl->gl_lockref));
>>         lockref_get(&gl->gl_lockref);
>> @@ -269,6 +269,14 @@ void gfs2_glock_put(struct gfs2_glock *gl)
>>         sdp->sd_lockstruct.ls_ops->lm_put_lock(gl);
>>   }
>>   +/*
>> + * Cause the glock to be put in work queue context.
>> + */
>> +void gfs2_glock_queue_put(struct gfs2_glock *gl)
>> +{
>> +       gfs2_glock_queue_work(gl, 0);
>> +}
>> +
>
> This is confusing. If the work queue is already scheduled, then it will
> simply drop the ref count on the glock by one directly and not reschedule
> the work item. That means that if the ref count were to hit zero, then that
> state would potentially by missed, maybe that can't happen in this case,
> but it isn't clear that it can't at first glance.

gfs2_glock_queue_work is meant to make sure glock_work_func runs at
least once in the future, and consumes one glock reference. Always.

In case you're wondering why decrementing the ref count in
__gfs2_glock_queue_work is safe: inside __gfs2_glock_queue_work, we
are holding the glock spin lock. glock_work_func grabs that spin lock
before dropping extra references, and it owns at least one reference.
So when queue_delayed_work tells us that the work is already queued,
we know that glock_work_func cannot get to the point of dropping its
references, and therefore the reference we drop in
__gfs2_glock_queue_work cannot be the last one.

Thanks,
Andreas



  reply	other threads:[~2017-06-02 13:57 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-31 15:03 [Cluster-devel] [PATCH 0/8] GFS2 shrinker deadlock Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 1/8] gfs2: Get rid of flush_delayed_work in gfs2_evict_inode Andreas Gruenbacher
2017-06-01 15:13   ` Steven Whitehouse
2017-05-31 15:03 ` [Cluster-devel] [PATCH 2/8] gfs2: Protect gl->gl_object by spin lock Andreas Gruenbacher
2017-06-01 15:16   ` Steven Whitehouse
2017-05-31 15:03 ` [Cluster-devel] [PATCH 3/8] gfs2: Clean up glock work enqueuing Andreas Gruenbacher
2017-06-01 15:21   ` Steven Whitehouse
2017-06-01 15:43     ` Andreas Gruenbacher
2017-06-01 15:46       ` Steven Whitehouse
2017-06-02  9:19         ` Andreas Gruenbacher
2017-06-02  9:29           ` Steven Whitehouse
2017-05-31 15:03 ` [Cluster-devel] [PATCH 4/8] gfs2: Always check block type in gfs2_evict_inode Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 5/8] gfs2: gfs2_glock_get: Wait on freeing glocks Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 6/8] gfs2: Get rid of gfs2_set_nlink Andreas Gruenbacher
2017-05-31 15:03 ` [Cluster-devel] [PATCH 7/8] gfs2: gfs2_evict_inode: Put glocks asynchronously Andreas Gruenbacher
2017-06-01 10:51   ` Andreas Gruenbacher
2017-06-01 15:37   ` Steven Whitehouse
2017-06-02 13:57     ` Andreas Gruenbacher [this message]
2017-05-31 15:03 ` [Cluster-devel] [PATCH 8/8] gfs2: Warn when not deleting inodes under memory pressure Andreas Gruenbacher
2017-06-01 15:39   ` Steven Whitehouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHc6FU58aerhgAveS=BBPo-zr9HDvy9hULVgC-K_cjpknm-zHA@mail.gmail.com' \
    --to=agruenba@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).