All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH] gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS
Date: Fri, 15 Nov 2019 12:39:41 -0500 (EST)	[thread overview]
Message-ID: <491033255.30283381.1573839581928.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <CAHc6FU5zTfvnqjrhQ13Jid79Vfqrk68X_FusUwRyczgAfu3xMA@mail.gmail.com>

----- Original Message -----
> On Fri, Nov 15, 2019 at 4:45 PM Bob Peterson <rpeterso@redhat.com> wrote:
> >
> > Hi,
> >
> > This patch closes a timing window in which two processes compete
> > and overlap in the execution of do_xmote for the same glock:
> >
> >              Process A                              Process B
> >    ------------------------------------   -----------------------------
> > 1. Grabs gl_lockref and calls do_xmote
> > 2.                                        Grabs gl_lockref but is blocked
> > 3. Sets GLF_INVALIDATE_IN_PROGRESS
> > 4. Unlocks gl_lockref
> > 5.                                        Calls do_xmote
> > 6. Call glops->go_sync
> > 7. test_and_clear_bit GLF_DIRTY
> > 8. Call gfs2_log_flush                    Call glops->go_sync
> > 9. (slow IO, so it blocks a long time)    test_and_clear_bit GLF_DIRTY
> >                                           It's not dirty (step 7) returns
> > 10.                                       Tests GLF_INVALIDATE_IN_PROGRESS
> > 11.                                       Calls go_inval (rgrp_go_inval)
> > 12.                                       gfs2_rgrp_relse does brelse
> > 13.                                       truncate_inode_pages_range
> > 14.                                       Calls lm_lock UN
> >
> > In step 14 we've just told dlm to give the glock to another node
> > when, in fact, process A has not finished the IO and synced all
> > buffer_heads to disk and make sure their revokes are done.
> >
> > This patch fixes the problem by changing the GLF_INVALIDATE_IN_PROGRESS
> > to use test_and_set_bit, and if the bit is already set, process B just
> > ignores it and trusts that process A will do the do_xmote in the proper
> > order.
> >
> > Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> > ---
> >  fs/gfs2/glock.c | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> > index faa88bd594e2..4a4a390ffd00 100644
> > --- a/fs/gfs2/glock.c
> > +++ b/fs/gfs2/glock.c
> > @@ -558,7 +558,19 @@ __acquires(&gl->gl_lockref.lock)
> >         GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target);
> >         if ((target == LM_ST_UNLOCKED || target == LM_ST_DEFERRED) &&
> >             glops->go_inval) {
> > -               set_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags);
> > +               /*
> > +                * If another process is already doing the invalidate we
> > should
> > +                * not interfere. If we call go_sync and it finds the glock
> > is
> > +                * not dirty, we might call go_inval prematurely before the
> > +                * other go_sync has finished with its revokes. If we then
> > call
> > +                * lm_lock prematurely, we've really screwed up: we cannot
> > tell
> > +                * dlm to give the glock away until we're synced and
> > +                * invalidated. Best thing is to return and trust the other
> > +                * process will finish do_xmote tasks in their proper
> > order.
> > +                */
> 
> That's a bit too much information here. Can we please change that as follows?
> 
>                 * If another process is already doing the invalidate, let
>                 that
>                 * finish first.  The glock state machine will get back to
>                 this
>                 * holder again later.
> 
> > +               if (test_and_set_bit(GLF_INVALIDATE_IN_PROGRESS,
> > +                                    &gl->gl_flags))
> > +                       return;
> >                 do_error(gl, 0); /* Fail queued try locks */
> >         }
> >         gl->gl_req = target;
> >
> 
> Thanks,
> Andreas
> 
> 
Sure. Make it so.

Bob



  reply	other threads:[~2019-11-15 17:39 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1819526286.30262006.1573832637022.JavaMail.zimbra@redhat.com>
2019-11-15 15:45 ` [Cluster-devel] [GFS2 PATCH] gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS Bob Peterson
2019-11-15 17:23   ` Andreas Gruenbacher
2019-11-15 17:39     ` Bob Peterson [this message]
2019-11-18 17:43       ` Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=491033255.30283381.1573839581928.JavaMail.zimbra@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.