linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] fs: add predicts based on nd->depth
@ 2025-11-10 16:59 Mateusz Guzik
  2025-11-11  3:32 ` Al Viro
  0 siblings, 1 reply; 3+ messages in thread
From: Mateusz Guzik @ 2025-11-10 16:59 UTC (permalink / raw)
  To: viro; +Cc: brauner, jack, linux-kernel, linux-fsdevel, Mateusz Guzik

Stats from nd->depth usage during the venerable kernel build collected like so:
bpftrace -e 'kprobe:terminate_walk,kprobe:walk_component,kprobe:legitimize_links
{ @[probe] = lhist(((struct nameidata *)arg0)->depth, 0, 8, 1); }'

@[kprobe:legitimize_links]:
[0, 1)           6554906 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1, 2)              3534 |                                                    |

@[kprobe:terminate_walk]:
[0, 1)          12153664 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

@[kprobe:walk_component]:
[0, 1)          53075749 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1, 2)            971421 |                                                    |
[2, 3)             84946 |                                                    |

Given these results:
1. terminate_walk() is called towards the end of the lookup. I failed
   run into a case where it has any depth to clean up. For now predict
   it does not.
2. legitimize_links() is also called towards the end of lookup and most
   of the time there s 0 depth. Patch consumers to avoid calling into it
   in that case.
3. walk_component() is typically called with WALK_MORE and zero depth,
   checked in that order. Check depth first and predict it is 0.
4. link_path_walk() predicts not dealing with a symlink, but the other
   part of symlink handling fails to make the same predict. Add it.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---

v4:
- fix backwards predict in link_path_walk

v3:
- more predicts

This obsoletes the previous patch which only took care of
legitimize_links().

 fs/namei.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 2a112b2c0951..11295fcf877c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -785,7 +785,8 @@ static void leave_rcu(struct nameidata *nd)
 
 static void terminate_walk(struct nameidata *nd)
 {
-	drop_links(nd);
+	if (unlikely(nd->depth))
+		drop_links(nd);
 	if (!(nd->flags & LOOKUP_RCU)) {
 		int i;
 		path_put(&nd->path);
@@ -882,7 +883,7 @@ static bool try_to_unlazy(struct nameidata *nd)
 
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
-	if (unlikely(!legitimize_links(nd)))
+	if (unlikely(nd->depth && !legitimize_links(nd)))
 		goto out1;
 	if (unlikely(!legitimize_path(nd, &nd->path, nd->seq)))
 		goto out;
@@ -917,7 +918,7 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 	int res;
 	BUG_ON(!(nd->flags & LOOKUP_RCU));
 
-	if (unlikely(!legitimize_links(nd)))
+	if (unlikely(nd->depth && !legitimize_links(nd)))
 		goto out2;
 	res = __legitimize_mnt(nd->path.mnt, nd->m_seq);
 	if (unlikely(res)) {
@@ -2179,7 +2180,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 	 * parent relationships.
 	 */
 	if (unlikely(nd->last_type != LAST_NORM)) {
-		if (!(flags & WALK_MORE) && nd->depth)
+		if (unlikely(nd->depth) && !(flags & WALK_MORE))
 			put_link(nd);
 		return handle_dots(nd, nd->last_type);
 	}
@@ -2191,7 +2192,7 @@ static const char *walk_component(struct nameidata *nd, int flags)
 		if (IS_ERR(dentry))
 			return ERR_CAST(dentry);
 	}
-	if (!(flags & WALK_MORE) && nd->depth)
+	if (unlikely(nd->depth) && !(flags & WALK_MORE))
 		put_link(nd);
 	return step_into(nd, flags, dentry);
 }
@@ -2544,7 +2545,7 @@ static int link_path_walk(const char *name, struct nameidata *nd)
 		if (unlikely(!*name)) {
 OK:
 			/* pathname or trailing symlink, done */
-			if (!depth) {
+			if (likely(!depth)) {
 				nd->dir_vfsuid = i_uid_into_vfsuid(idmap, nd->inode);
 				nd->dir_mode = nd->inode->i_mode;
 				nd->flags &= ~LOOKUP_PARENT;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] fs: add predicts based on nd->depth
  2025-11-10 16:59 [PATCH v4] fs: add predicts based on nd->depth Mateusz Guzik
@ 2025-11-11  3:32 ` Al Viro
  2025-11-11 18:00   ` Mateusz Guzik
  0 siblings, 1 reply; 3+ messages in thread
From: Al Viro @ 2025-11-11  3:32 UTC (permalink / raw)
  To: Mateusz Guzik; +Cc: brauner, jack, linux-kernel, linux-fsdevel

On Mon, Nov 10, 2025 at 05:59:01PM +0100, Mateusz Guzik wrote:

> Given these results:
> 1. terminate_walk() is called towards the end of the lookup. I failed
>    run into a case where it has any depth to clean up. For now predict
>    it does not.

Easy - just have an error while resolving a nested symlink in the middle
of pathname.  On success it will have zero nd->depth, of course.

> 2. legitimize_links() is also called towards the end of lookup and most
>    of the time there s 0 depth. Patch consumers to avoid calling into it
>    in that case.

On transition from lazy to non-lazy mode on cache miss, ->d_revalidate()
saying "dunno, try in non-lazy mode", etc.

That can happen inside a nested symlink as well as on the top level, but
the latter is more common on most of the loads.

> 3. walk_component() is typically called with WALK_MORE and zero depth,
>    checked in that order. Check depth first and predict it is 0.

Does it give a measurable effect?

> 4. link_path_walk() predicts not dealing with a symlink, but the other
>    part of symlink handling fails to make the same predict. Add it.

Unconvincing, that - one is "we have a component; what are the odds of that
component being a symlink?", another - "we have reached the end of pathname
or the end of nested symlink; what are the odds of the latter being the case?"

I can believe that answers to both questions are fairly low, but they are
not the same.  I'd expect the latter to be considerably higher than the
former.

> -	if (unlikely(!legitimize_links(nd)))
> +	if (unlikely(nd->depth && !legitimize_links(nd)))

I suspect that 
	if (unlikely(nd->depth) && !legitimize_links(nd))
might be better...

> -	if (unlikely(!legitimize_links(nd)))
> +	if (unlikely(nd->depth && !legitimize_links(nd)))
>  		goto out2;

Ditto.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v4] fs: add predicts based on nd->depth
  2025-11-11  3:32 ` Al Viro
@ 2025-11-11 18:00   ` Mateusz Guzik
  0 siblings, 0 replies; 3+ messages in thread
From: Mateusz Guzik @ 2025-11-11 18:00 UTC (permalink / raw)
  To: Al Viro; +Cc: brauner, jack, linux-kernel, linux-fsdevel

On Tue, Nov 11, 2025 at 4:32 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Nov 10, 2025 at 05:59:01PM +0100, Mateusz Guzik wrote:
>
> > Given these results:
> > 1. terminate_walk() is called towards the end of the lookup. I failed
> >    run into a case where it has any depth to clean up. For now predict
> >    it does not.
>
> Easy - just have an error while resolving a nested symlink in the middle
> of pathname.  On success it will have zero nd->depth, of course.
>

Ok, I'll update the commit message.

> > 2. legitimize_links() is also called towards the end of lookup and most
> >    of the time there s 0 depth. Patch consumers to avoid calling into it
> >    in that case.
>
> On transition from lazy to non-lazy mode on cache miss, ->d_revalidate()
> saying "dunno, try in non-lazy mode", etc.
>
> That can happen inside a nested symlink as well as on the top level, but
> the latter is more common on most of the loads.
>

Given the rest of the e-mail I take it you are clarifying when depth >
0 in this case, as opposed to contesting the predict.

> > 3. walk_component() is typically called with WALK_MORE and zero depth,
> >    checked in that order. Check depth first and predict it is 0.
>
> Does it give a measurable effect?

I did not benchmark this patch at all, merely checked for predicts
going the right way with bpftrace. What do I intend to benchmark is
the following: cleanup and inlining of func calls to walk_component()
and step_into(), which happen every time.

The routine got ->depth predicts in this patch because I was already
sprinkling it.

The current asm of walk_component() is penalized by lookup_slow(!)
being inlined and convincing the compiler to spill extra registers.
step_into() also looks like it can be shortened for the common case.

When inlined the compiler should be able to elide branching on WALK_MORE anyway.

You can find profiling info from a kernel running my patches (modulo
this one) here:
https://lore.kernel.org/linux-fsdevel/20251111-brillant-umgegangen-e7c891513bce@brauner/T/#mee42496c2c6495a54c6b0b4da5c4121540ad92d9

walk_component() is hanging out there at over 2.7% and step_into() is
4.8%. I would suspect there is at least 1% total hiding here for
trivial fixups + inlining.

That said, I can drop this bit no problem and add it later to the
patchset which takes care of both walk_component() and step_into()
(which will come with benchmarks).

>
> > 4. link_path_walk() predicts not dealing with a symlink, but the other
> >    part of symlink handling fails to make the same predict. Add it.
>
> Unconvincing, that - one is "we have a component; what are the odds of that
> component being a symlink?", another - "we have reached the end of pathname
> or the end of nested symlink; what are the odds of the latter being the case?"
>

Fair, so I added the following crapper:

diff --git a/fs/dcache.c b/fs/dcache.c
index de3e4e9777ea..2bac37c09bf6 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3268,3 +3268,7 @@ void __init vfs_caches_init(void)
        bdev_cache_init();
        chrdev_init();
 }
+
+
+void link_path_walk_probe(int);
+void link_path_walk_probe(int) { }
diff --git a/fs/namei.c b/fs/namei.c
index caeed986108d..00125c578af4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2470,6 +2470,8 @@ static inline const char *hash_name(struct
nameidata *nd, const char *name, unsi
   #define LAST_WORD_IS_DOTDOT  0x2e2e
 #endif

+void link_path_walk_probe(int);
+
 /*
  * Name resolution.
  * This is the basic name resolution function, turning a pathname into
@@ -2544,6 +2546,7 @@ static int link_path_walk(const char *name,
struct nameidata *nd)
                } while (unlikely(*name == '/'));
                if (unlikely(!*name)) {
 OK:
+                       link_path_walk_probe(depth);
                        /* pathname or trailing symlink, done */
                        if (likely(!depth)) {
                                nd->dir_vfsuid =
i_uid_into_vfsuid(idmap, nd->inode);

then probing over kernel build:
bpftrace -e 'kprobe:link_path_walk_probe { @[probe] = lhist(arg0, 0, 8, 1); }'
@[kprobe:link_path_walk_probe]:
[0, 1)           7528231 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1, 2)            407905 |@@                                                  |

I think that's nicely skewed.

> I can believe that answers to both questions are fairly low, but they are
> not the same.  I'd expect the latter to be considerably higher than the
> former.
>
> > -     if (unlikely(!legitimize_links(nd)))
> > +     if (unlikely(nd->depth && !legitimize_links(nd)))
>
> I suspect that
>         if (unlikely(nd->depth) && !legitimize_links(nd))
> might be better...
>

Well it says it is unlikely there is depth, but when there is and
legitimize_links is called, it is unlikely it fails. I think this
matches the current intent, except avodiing the func call most of the
time.

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-11-11 18:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-10 16:59 [PATCH v4] fs: add predicts based on nd->depth Mateusz Guzik
2025-11-11  3:32 ` Al Viro
2025-11-11 18:00   ` Mateusz Guzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).