linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: "Drokin, Oleg" <oleg.drokin@intel.com>
Cc: "Dilger, Andreas" <andreas.dilger@intel.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"<linux-fsdevel@vger.kernel.org>" <linux-fsdevel@vger.kernel.org>,
	Theodore Ts'o <tytso@mit.edu>, Mark Fasheh <mfasheh@suse.com>,
	Sage Weil <sage@redhat.com>
Subject: Re: races in ll_splice_alias() and elsewhere (ext4, ocfs2)
Date: Sat, 12 Mar 2016 17:22:13 +0000	[thread overview]
Message-ID: <20160312172213.GW17997@ZenIV.linux.org.uk> (raw)
In-Reply-To: <6FB275BB-15EA-4996-B59D-A3BDC7C50147@intel.com>

On Fri, Mar 11, 2016 at 03:25:22AM +0000, Drokin, Oleg wrote:

> It is, though it's not like we always need to talk to the server in those
> cases.
> Overhead is overhead no matter how small, and an extra check for the d_init
> d_op would happen for every filesystem, not just ceph and Lustre.
> Overhead also has this bad property of adding up.
> Yes, it's minuscule and CPUs are still getting somewhat faster.
> Is it worth it in the long run? May be, I do not have a strong opinion
> here since always doing ll_d_init after d_splice_alias solves the issue
> for us and does it in a way that does not affect anybody else. And as you
> correctly observed, most of users don't really care.--

Maybe...  Keep in mind that d_set_d_op() called just prior to that point
has already done a bunch of checks for methods being non-NULL, so we are
looking at something pretty much guaranteed to be in cache.  So the cost
of test is pretty much nil; it turns into "fetch (with cache hit), test,
branch (expectedly) not taken" in normal case.

We probably have already spent more cycles discussing that than all processors
will spend executing those instructions ;-)

I'm really curious about the ceph situation; fh_to_dentry turns out to
be irrelevant (ceph refuses to export snapshots and inode it'll be looking
for won't be in snapshot, just as its parent), but I wonder why we have
those 3 variants of ->d_op in the first place.

->d_release() is the same in all 3.
->d_prune() is the same in plain and snapshot variants and absent for
children of .snap
->d_revalidate() is nominally different for all 3, but stuff inside the
snapshots and .snap children have equivalent ones *AND* normal ->d_revalidate
appears to cover all three cases -
        dir = ceph_get_dentry_parent_inode(dentry);  

        /* always trust cached snapped dentries, snapdir dentry */
        if (ceph_snap(dir) != CEPH_NOSNAP) {
                dout("d_revalidate %p '%pd' inode %p is SNAPPED\n", dentry,
                     dentry, d_inode(dentry));
                valid = 1;
        } else if (d_really_is_positive(dentry) &&
                   ceph_snap(d_inode(dentry)) == CEPH_SNAPDIR) {
                valid = 1;
        } else if (dentry_lease_is_valid(dentry) ||
in there is explicitly checking the other two.

Sage, is there any reason not to fold all three dentry_operations variants
in one and be done with d_set_d_op()?  I see that in one place you check
->d_op being the plain one, but AFAICS checking ceph_snap() of the parent
would serve just as well there.  BTW, looks like ceph is too pessimistic
about bailing out of RCU mode on ->d_revalidate(), might be worth looking
into at some point...

  reply	other threads:[~2016-03-12 17:22 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-08 16:05 races in ll_splice_alias() Al Viro
2016-03-08 20:44 ` Drokin, Oleg
2016-03-08 21:11   ` Al Viro
2016-03-08 23:18     ` Drokin, Oleg
2016-03-09  0:34       ` Al Viro
2016-03-09  0:53         ` Drokin, Oleg
2016-03-09  1:26           ` Al Viro
2016-03-09  5:20             ` Drokin, Oleg
2016-03-09 23:47             ` Drokin, Oleg
2016-03-10  2:20               ` races in ll_splice_alias() and elsewhere (ext4, ocfs2) Al Viro
2016-03-10  2:59                 ` Al Viro
2016-03-10 23:55                   ` Theodore Ts'o
2016-03-11  3:18                     ` Al Viro
2016-03-11 15:42                       ` Theodore Ts'o
2016-03-10  3:08                 ` Drokin, Oleg
2016-03-10  3:34                   ` Al Viro
2016-03-10  3:46                     ` Drokin, Oleg
2016-03-10  4:22                       ` Drokin, Oleg
2016-03-10  4:43                       ` Al Viro
2016-03-10  5:15                         ` Al Viro
2016-03-11  3:47                           ` Drokin, Oleg
2016-03-10  5:47                         ` Drokin, Oleg
2016-03-10 19:59                 ` Al Viro
2016-03-10 20:34                   ` do we need that smp_wmb() in __d_alloc()? Al Viro
2016-03-10 21:17                     ` Al Viro
2016-03-10 21:22                   ` races in ll_splice_alias() and elsewhere (ext4, ocfs2) Drokin, Oleg
2016-03-10 23:23                     ` Al Viro
2016-03-11  3:25                       ` Drokin, Oleg
2016-03-12 17:22                         ` Al Viro [this message]
2016-03-13 14:35                           ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160312172213.GW17997@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=andreas.dilger@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=oleg.drokin@intel.com \
    --cc=sage@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).