All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH] gfs2: eliminate circular lock dependency in inode.c
Date: Mon, 12 Aug 2019 09:43:35 -0400 (EDT)	[thread overview]
Message-ID: <432645393.7956321.1565617415072.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <17eda7cd-5662-51a5-a6c8-0bbd34b48594@redhat.com>

----- Original Message -----
> > The real problem came with renames, though. Function
> > gfs2_rename(), which locked a series of inode glocks, did so
> > in parent-child order due to that patch. But it was still
> > possible to create circular lock dependencies just by doing the
> > wrong combination of renames on different nodes. For example:
> >
> > Node a: mv /mnt/gfs2/sub /mnt/gfs2/tmp_name (rename sub to tmp_name)
> >
> > a1. Same directory, so rename glock is NOT held
> > a2. /mnt/gfs2 is locked
> > a3. Tries to lock sub for rename, but it is locked on node b
> >
> > Node b: mv /mnt/gfs2/sub /mnt/gfs2/dir1/ (move sub to dir1...
> >          mv /mnt/gfs2/dir1/sub /mnt/gfs2/  ...then move it back)
> >
> > b1. Different directory, so rename glock IS held
> > b2. /mnt/gfs2 is locked
> > b3. dir1 is locked
> > b4. sub is moved to dir1 and everything is unlocked
> > b5. Different directory, so rename glock IS held again
> > b6. dir1 is locked
> > b7. Lock for /mnt/gfs2 is requested, but cannot be granted because
> >      node 1 locked it in step a2.
> 
> If the parents are being locked before the child, as per the correct
> locking order, then this cannot happen. The directory in which the child
> is located should always be locked first, before the child, so that is
> what protects the operation on a from whatever might be going on, on node b.
> 
> When you get to step b7, sub is not locked (since it was unlocked in b4)
> and not locked again. Thus a3 can complete. So this doesn't look like it
> is the right explanation.

Hi,

I guess maybe my explanation is lacking.
It's not so much a relationship between "parent" and "child"
directories as it is "old" and "new" directories.

The comments for function vfs_rename() explain the situations in which
this can happen, and have been prevented on a single node through the
use of s_vfs_rename_mutex. However, that mutex is not cluster-wide,
which means the relationship of which inode is the "old" and which
inode is the "new" can change indiscriminately without notice and
without cluster-wide locking. The whole point of the "a" and "b"
scenarios was to illustrate that one node can lock "old", then "new",
but the other node can reverse the roles of those same inodes (which
is the "old" and which is the "new") and therefore reverse the lock
order without notice.

Since the old-new relationship itself is not protected, we need
some other way to get the lock order correct.

My first attempt to fix this was to extend the "rename" glock to have
a rename-wide reach so it affected both types of renames rather than
today's code which only locks old and new if they're different.
I implemented this with a new i_op called by vfs (vfs_rename) to make
the rename glock serve as a kind of cluster-wide version vfs's
s_vfs_rename_mutex. However, this ended up having a huge performance
penalty for my test.

My second attempt (the patch I posted) was to lock the inodes in
block-number-sort order, because the block number relationships
will never change, regardless of which is old and which is new.
It made no sense to me to reinvent the wheel wrt locking them in
sorted order, so I used gfs2_glock_nq_m which already does that.

Regards,

Bob Peterson



  reply	other threads:[~2019-08-12 13:43 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <655197227.7744188.1565376946008.JavaMail.zimbra@redhat.com>
2019-08-09 18:58 ` [Cluster-devel] [GFS2 PATCH] gfs2: eliminate circular lock dependency in inode.c Bob Peterson
2019-08-12  9:07   ` Steven Whitehouse
2019-08-12 13:43     ` Bob Peterson [this message]
2019-08-12 13:56       ` Steven Whitehouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=432645393.7956321.1565617415072.JavaMail.zimbra@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.