public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Waiman Long <Waiman.Long@hp.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Chandramouleeswaran, Aswin" <aswin@hp.com>,
	"Norton, Scott J" <scott.norton@hp.com>,
	George Spelvin <linux@horizon.com>,
	John Stoffel <john@stoffel.org>
Subject: Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
Date: Sat, 7 Sep 2013 04:01:10 +0100	[thread overview]
Message-ID: <20130907030110.GY13318@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFzwederWB135Ch+PjijjrN-kz9UKw+obC4+M_+xq648PA@mail.gmail.com>

On Fri, Sep 06, 2013 at 05:58:51PM -0700, Linus Torvalds wrote:
> On Fri, Sep 6, 2013 at 5:19 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > (We're bounded in practice by PATH_MAX, so you can't make getcwd()
> > traverse more than about 2000 parents (single character filename plus
> > the slash for each level), and for all I know filesystems might cap it
> > before that, so it's not unbounded, but the difference between "1" and
> > "2000" is pretty damn big)
> 
> .. in particular, it's big enough that one is pretty much guaranteed
> to fit in any reasonable L1 cache (if we have dentry hash chains so
> long that that becomes a problem for traversing a single chain, we're
> screwed anyway), while the other can most likely be a case of "not a
> single L1 cache hit because by the time you fail and go back to the
> start, you've flushed the L1 cache".
> 
> Now, whether 2000 L2 cache misses is long enough to give people a
> chance to run the whole rename system call path in a loop a few times,
> I don't know, but it sure as heck sounds likely.
> 
> Of course, you might still ask "why should we even care?" At least
> without preemption, you might be able to trigger some really excessive
> latencies and possibly a watchdog screaming at you as a result. But
> that said, maybe we wouldn't care. I just think that the solution is
> so simple (what, five extra lines or so) that it's worth avoiding even
> the worry.

We already have that kind of logics - see select_parent() et.al. in
mainline or d_walk() in vfs.git#for-linus (pull request will go in
a few minutes).  With this patch we get

	* plain seqretry loop (d_lookup(), is_subdir(), autofs4_getpath(),
ceph_misc_build_path(), [cifs] build_path_from_dentry(), nfs_path(),
[audit] handle_path())
	* try seqretry once, then switch to write_seqlock() (the things
that got unified into d_walk())
	* try seqretry three times, then switch to write_seqlock() (d_path()
and friends)
	* several pure write_seqlock() users (d_move(), d_set_mounted(),
d_materialize_unique())

The last class is not a problem - these we want as writers.  I really don't
like the way the rest is distributed - if nothing else, nfs_path() and
friends are in exactly the same situation as d_path().  Moreover, why
the distinction between "try once" and "try thrice"?

_If_ we fold the second and the third groups together (and probably have
a bunch from the first one join that), we at least get something
understandable, but the I really wonder if seqlock has the right calling
conventions for that (and at least I'd like to fold the "already got writelock"
flag into seq - we do have a spare bit there).

Comments?

  reply	other threads:[~2013-09-07  3:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-06 16:08 [PATCH v3 0/1] dcache: Translating dentry into pathname without taking rename_lock Waiman Long
2013-09-06 16:08 ` [PATCH v3 1/1] " Waiman Long
2013-09-06 20:52   ` Linus Torvalds
2013-09-06 21:05     ` Al Viro
2013-09-06 21:48       ` Linus Torvalds
2013-09-07  0:00         ` Al Viro
2013-09-07  0:19           ` Linus Torvalds
2013-09-07  0:58             ` Linus Torvalds
2013-09-07  3:01               ` Al Viro [this message]
2013-09-07 17:32                 ` Al Viro
2013-09-08  4:15                   ` Ian Kent
2013-09-08  4:58                     ` Al Viro
2013-09-08  8:51                       ` Ian Kent
2013-09-07 17:52                 ` Linus Torvalds
2013-09-07 18:07                   ` Al Viro
2013-09-07 18:53                     ` Al Viro
2013-09-09 14:31                     ` Waiman Long
2013-09-07  2:24     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130907030110.GY13318@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=Waiman.Long@hp.com \
    --cc=aswin@hp.com \
    --cc=john@stoffel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@horizon.com \
    --cc=scott.norton@hp.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox