From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Waiman Long <Waiman.Long@hp.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"Chandramouleeswaran, Aswin" <aswin@hp.com>,
"Norton, Scott J" <scott.norton@hp.com>,
George Spelvin <linux@horizon.com>,
John Stoffel <john@stoffel.org>
Subject: Re: [PATCH v3 1/1] dcache: Translating dentry into pathname without taking rename_lock
Date: Sat, 7 Sep 2013 04:01:10 +0100 [thread overview]
Message-ID: <20130907030110.GY13318@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFzwederWB135Ch+PjijjrN-kz9UKw+obC4+M_+xq648PA@mail.gmail.com>
On Fri, Sep 06, 2013 at 05:58:51PM -0700, Linus Torvalds wrote:
> On Fri, Sep 6, 2013 at 5:19 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > (We're bounded in practice by PATH_MAX, so you can't make getcwd()
> > traverse more than about 2000 parents (single character filename plus
> > the slash for each level), and for all I know filesystems might cap it
> > before that, so it's not unbounded, but the difference between "1" and
> > "2000" is pretty damn big)
>
> .. in particular, it's big enough that one is pretty much guaranteed
> to fit in any reasonable L1 cache (if we have dentry hash chains so
> long that that becomes a problem for traversing a single chain, we're
> screwed anyway), while the other can most likely be a case of "not a
> single L1 cache hit because by the time you fail and go back to the
> start, you've flushed the L1 cache".
>
> Now, whether 2000 L2 cache misses is long enough to give people a
> chance to run the whole rename system call path in a loop a few times,
> I don't know, but it sure as heck sounds likely.
>
> Of course, you might still ask "why should we even care?" At least
> without preemption, you might be able to trigger some really excessive
> latencies and possibly a watchdog screaming at you as a result. But
> that said, maybe we wouldn't care. I just think that the solution is
> so simple (what, five extra lines or so) that it's worth avoiding even
> the worry.
We already have that kind of logics - see select_parent() et.al. in
mainline or d_walk() in vfs.git#for-linus (pull request will go in
a few minutes). With this patch we get
* plain seqretry loop (d_lookup(), is_subdir(), autofs4_getpath(),
ceph_misc_build_path(), [cifs] build_path_from_dentry(), nfs_path(),
[audit] handle_path())
* try seqretry once, then switch to write_seqlock() (the things
that got unified into d_walk())
* try seqretry three times, then switch to write_seqlock() (d_path()
and friends)
* several pure write_seqlock() users (d_move(), d_set_mounted(),
d_materialize_unique())
The last class is not a problem - these we want as writers. I really don't
like the way the rest is distributed - if nothing else, nfs_path() and
friends are in exactly the same situation as d_path(). Moreover, why
the distinction between "try once" and "try thrice"?
_If_ we fold the second and the third groups together (and probably have
a bunch from the first one join that), we at least get something
understandable, but the I really wonder if seqlock has the right calling
conventions for that (and at least I'd like to fold the "already got writelock"
flag into seq - we do have a spare bit there).
Comments?
next prev parent reply other threads:[~2013-09-07 3:01 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-06 16:08 [PATCH v3 0/1] dcache: Translating dentry into pathname without taking rename_lock Waiman Long
2013-09-06 16:08 ` [PATCH v3 1/1] " Waiman Long
2013-09-06 20:52 ` Linus Torvalds
2013-09-06 21:05 ` Al Viro
2013-09-06 21:48 ` Linus Torvalds
2013-09-07 0:00 ` Al Viro
2013-09-07 0:19 ` Linus Torvalds
2013-09-07 0:58 ` Linus Torvalds
2013-09-07 3:01 ` Al Viro [this message]
2013-09-07 17:32 ` Al Viro
2013-09-08 4:15 ` Ian Kent
2013-09-08 4:58 ` Al Viro
2013-09-08 8:51 ` Ian Kent
2013-09-07 17:52 ` Linus Torvalds
2013-09-07 18:07 ` Al Viro
2013-09-07 18:53 ` Al Viro
2013-09-09 14:31 ` Waiman Long
2013-09-07 2:24 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130907030110.GY13318@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=Waiman.Long@hp.com \
--cc=aswin@hp.com \
--cc=john@stoffel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@horizon.com \
--cc=scott.norton@hp.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.