From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 4/4] GFS2: Make deletes use regular inode_lookup with special code path
Date: Wed, 23 Mar 2016 12:13:35 -0400 (EDT) [thread overview]
Message-ID: <898570662.42261383.1458749615201.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20160321185654.GD2915@octiron.msp.redhat.com>
Hi Ben,
----- Original Message -----
> On Mon, Mar 21, 2016 at 02:12:43PM -0400, Bob Peterson wrote:
> > Hi Ben,
> >
> > ----- Original Message -----
> > > On Fri, Dec 18, 2015 at 01:02:30PM -0600, Bob Peterson wrote:
> > > > This patch changes the inode reclaiming code, delete_work_func, use
> > > > the normal gfs2_inode_lookup, but with a special parameter, for_del
> > > > which causes additional checks to be made.
> > >
> > > I get what you're doing here, and I can't see anything really wrong with
> > > it, but I haven't gone through all the delete code as carefully as I
> > > could. I do have two questions.
> > >
> > > 1. I don't see any reason why, even if it's just for paranoias sake, we
> > > don't avoid dereferencing the gfs2_inode in verify_unlinked before
> > > locking and checking it. It seems like a simple fix that avoids any
> > > chance that there is a problem due to not holding the directory glock
> > > when we're getting the inode.
> >
> > I'm not sure I understand the question. Today, function verify_unlinked
> > dereferences the gfs2_inode, ip, but so does its only caller,
> > gfs2_inode_lookup. We already know ip is valid at that point, so more
> > references shouldn't matter. Unless you mean the inode's glock?
>
> So before, delete_work_func called gfs2_lookup_by_inum, which locked the
> glock without dereferencing the gfs2_inode pointer, and then called
> gfs2_inode_lookup. Other code paths call gfs2_inode_lookup without the
> glock on the inode itself locked, but with the containing directory's
> inode glock held. But now, we are deferencing the gfs2_inode pointer
> without either holding its glock or the glock of the directory that the
> file is in.
Yes, and the old lock order was the majority of the problem.
It seems to me that a lot of GFS2 touches the inode without holding
the inode's glock. For example, take function gfs2_create_inode().
It locks the inode's directory first, but then it does:
inode = gfs2_dir_search
gfs2_inode_lookup
iget_locked
So it's referencing the inode (of which ip is part of) without its glock.
If the inode isn't found by gfs2_dir_search, it continues on to do:
inode = new_inode(sdp->sd_vfs);
inode = new_inode_pseudo(sb);
struct inode *inode = alloc_inode(sb);
if (inode) {
spin_lock(&inode->i_lock);
...
error = gfs2_glock_get(sdp, ip->i_no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl);
...
error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_SKIP, ghs + 1);
Which means it also accesses the inode before the inode's glock.
So AFAICT, the correct order is: grab a reference to the inode, then the inode's glock.
But gfs2_lookup_by_inum was doing this in the opposite order: It was holding
the inode glock (found by its number) then it grabbed the inode itself.
> Assuming that everything has to hold the inode lock first now, once
Now, as it has always been.
> we've locked the that, and locked the glock and verified the file in
> verify inode, it should be safe to unlock the glock in verify_inode, and
> continue to dereference it the gfs2_inode until we drop the inode lock.
>
> >
> > > 2. gfs2_inode_lookup calls iget_locked, which creates an inode if one
> > > doesn't already exist. If there's any chance of racing and calling
> > > delete_work_func for an inode that has already been removed, won't we
> > > just end up creating a new one here?
> > >
> > > -Ben
> >
> > Yes, and that's exactly what we want. I found cases in which the
> > delete_work_func races with the evict() code. In those cases, we need
> > the inode to be recreated in order to go through another evict() cycle
> > which will end up deleting it, whereas the first evict didn't.
> > For example, the shrinker can arbitrarily call evict() and get rid of
> > the inode while its nlink is still positive, IOW, it's not deleted,
> > but after that point in time, the system gets a callback from another
> > node instructing it to delete the inode because it was unlinked
> > remotely from that other node. In that case, we still need to
> > transition it from unlinked to free, but we need a valid inode
> > in order to do the proper delete. So we recreate it and then it gets
> > evicted again, in the name of deleting it.
>
> O.k. clearly I didn't read through all the delete code as carefully as I
> could have.
Well, the delete code is a maze, to be sure, but the proof is in the
pudding. Instrumentation proved to me that this was happening.
One of the many pitfalls I fell into with the delete code is: not only
did I need to recreate the inode after was freed from memory in these
circumstances, but I also needed to do an inode refresh, which seemed
like a real waste of time, given that the inode was being deleted.
Ideally I'd like to eliminate that need and thereby make it more efficient.
These were both handled by the standard inode lookup path, so that's
why I ditched my original attempt to create a delete-specific function to
do the inode lookup, in favor of using the normal inode lookup path.
And even now, I'm not sure what to do about the NFS lookup path, because
that lookup_by_inum still has the wrong lock order AFAICT.
Regards,
Bob Peterson
Red Hat File Systems
prev parent reply other threads:[~2016-03-23 16:13 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-18 19:02 [Cluster-devel] [PATCH 0/4] patches related to file unlink->delete->new Bob Peterson
2015-12-18 19:02 ` [Cluster-devel] [PATCH 1/4] GFS2: Prevent delete work from occurring on glocks used for create Bob Peterson
2016-02-02 11:40 ` Steven Whitehouse
2015-12-18 19:02 ` [Cluster-devel] [PATCH 2/4] GFS2: Don't filter out I_FREEING inodes anymore Bob Peterson
2016-02-23 10:15 ` Steven Whitehouse
2015-12-18 19:02 ` [Cluster-devel] [PATCH 3/4] GFS2: Eliminate parameter non_block on gfs2_inode_lookup Bob Peterson
2016-02-23 10:17 ` Steven Whitehouse
2015-12-18 19:02 ` [Cluster-devel] [PATCH 4/4] GFS2: Make deletes use regular inode_lookup with special code path Bob Peterson
2016-02-23 10:25 ` Steven Whitehouse
2016-02-24 20:42 ` Bob Peterson
2016-02-25 10:07 ` Steven Whitehouse
2016-03-15 13:50 ` Bob Peterson
2016-03-16 11:09 ` Steven Whitehouse
2016-03-17 16:01 ` Bob Peterson
2016-03-17 16:27 ` Steven Whitehouse
2016-03-21 16:49 ` Benjamin Marzinski
2016-03-21 18:12 ` Bob Peterson
2016-03-21 18:56 ` Benjamin Marzinski
2016-03-23 16:13 ` Bob Peterson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=898570662.42261383.1458749615201.JavaMail.zimbra@redhat.com \
--to=rpeterso@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.