public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH][RFC] make take_dentry_name_snapshot() lockless
@ 2024-12-09  3:52 Al Viro
  2024-12-09  6:33 ` Mateusz Guzik
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Al Viro @ 2024-12-09  3:52 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Linus Torvalds

There's a bunch of places where we are accessing dentry names without
sufficient protection and where locking environment is not predictable
enough to fix the things that way; take_dentry_name_snapshot() is
one variant of solution.  It does, however, have a problem - copying
is cheap, but bouncing ->d_lock may be nasty on seriously shared dentries.

How about the following (completely untested)?

Use ->d_seq instead of grabbing ->d_lock; in case of shortname dentries
that avoids any stores to shared data objects and in case of long names
we are down to (unavoidable) atomic_inc on the external_name refcount.
    
Makes the thing safer as well - the areas where ->d_seq is held odd are
all nested inside the areas where ->d_lock is held, and the latter are
much more numerous.
    
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/dcache.c b/fs/dcache.c
index b4d5e9e1e43d..78fd7e2a3011 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -329,16 +329,34 @@ static inline int dname_external(const struct dentry *dentry)
 
 void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
 {
-	spin_lock(&dentry->d_lock);
+	unsigned seq;
+
+	rcu_read_lock();
+retry:
+	seq = read_seqcount_begin(&dentry->d_seq);
 	name->name = dentry->d_name;
-	if (unlikely(dname_external(dentry))) {
-		atomic_inc(&external_name(dentry)->u.count);
-	} else {
-		memcpy(name->inline_name, dentry->d_iname,
-		       dentry->d_name.len + 1);
+	if (read_seqcount_retry(&dentry->d_seq, seq))
+		goto retry;
+	// ->name and ->len are at least consistent with each other, so if
+	// ->name points to dentry->d_iname, ->len is below DNAME_INLINE_LEN
+	if (likely(name->name.name == dentry->d_iname)) {
+		memcpy(name->inline_name, dentry->d_iname, name->name.len + 1);
 		name->name.name = name->inline_name;
+		if (read_seqcount_retry(&dentry->d_seq, seq))
+			goto retry;
+	} else {
+		struct external_name *p;
+		p = container_of(name->name.name, struct external_name, name[0]);
+		// get a valid reference
+		if (unlikely(!atomic_inc_not_zero(&p->u.count)))
+			goto retry;
+		if (read_seqcount_retry(&dentry->d_seq, seq)) {
+			if (unlikely(atomic_dec_and_test(&p->u.count)))
+				kfree_rcu(p, u.head);
+			goto retry;
+		}
 	}
-	spin_unlock(&dentry->d_lock);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(take_dentry_name_snapshot);
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09  3:52 [PATCH][RFC] make take_dentry_name_snapshot() lockless Al Viro
@ 2024-12-09  6:33 ` Mateusz Guzik
  2024-12-09  6:58   ` Al Viro
  2024-12-09 18:27 ` Linus Torvalds
  2024-12-09 19:06 ` Linus Torvalds
  2 siblings, 1 reply; 25+ messages in thread
From: Mateusz Guzik @ 2024-12-09  6:33 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds

On Mon, Dec 09, 2024 at 03:52:51AM +0000, Al Viro wrote:
> There's a bunch of places where we are accessing dentry names without
> sufficient protection and where locking environment is not predictable
> enough to fix the things that way; take_dentry_name_snapshot() is
> one variant of solution.  It does, however, have a problem - copying
> is cheap, but bouncing ->d_lock may be nasty on seriously shared dentries.
> 
> How about the following (completely untested)?
> 
> Use ->d_seq instead of grabbing ->d_lock; in case of shortname dentries
> that avoids any stores to shared data objects and in case of long names
> we are down to (unavoidable) atomic_inc on the external_name refcount.
>     
> Makes the thing safer as well - the areas where ->d_seq is held odd are
> all nested inside the areas where ->d_lock is held, and the latter are
> much more numerous.

Is there a problem retaining the lock acquire if things fail?

As in maybe loop 2-3 times, but eventually take the lock to guarantee forward
progress.

I don't think there is a *real* workload where this would be a problem,
but with core counts seen today one may be able to purposefuly introduce
stalls when running this.

>     
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b4d5e9e1e43d..78fd7e2a3011 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -329,16 +329,34 @@ static inline int dname_external(const struct dentry *dentry)
>  
>  void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
>  {
> -	spin_lock(&dentry->d_lock);
> +	unsigned seq;
> +
> +	rcu_read_lock();
> +retry:
> +	seq = read_seqcount_begin(&dentry->d_seq);
>  	name->name = dentry->d_name;
> -	if (unlikely(dname_external(dentry))) {
> -		atomic_inc(&external_name(dentry)->u.count);
> -	} else {
> -		memcpy(name->inline_name, dentry->d_iname,
> -		       dentry->d_name.len + 1);
> +	if (read_seqcount_retry(&dentry->d_seq, seq))
> +		goto retry;
> +	// ->name and ->len are at least consistent with each other, so if
> +	// ->name points to dentry->d_iname, ->len is below DNAME_INLINE_LEN
> +	if (likely(name->name.name == dentry->d_iname)) {
> +		memcpy(name->inline_name, dentry->d_iname, name->name.len + 1);
>  		name->name.name = name->inline_name;
> +		if (read_seqcount_retry(&dentry->d_seq, seq))
> +			goto retry;
> +	} else {
> +		struct external_name *p;
> +		p = container_of(name->name.name, struct external_name, name[0]);
> +		// get a valid reference
> +		if (unlikely(!atomic_inc_not_zero(&p->u.count)))
> +			goto retry;
> +		if (read_seqcount_retry(&dentry->d_seq, seq)) {
> +			if (unlikely(atomic_dec_and_test(&p->u.count)))
> +				kfree_rcu(p, u.head);
> +			goto retry;
> +		}
>  	}
> -	spin_unlock(&dentry->d_lock);
> +	rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(take_dentry_name_snapshot);
>  

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09  6:33 ` Mateusz Guzik
@ 2024-12-09  6:58   ` Al Viro
  2024-12-09  7:18     ` Mateusz Guzik
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2024-12-09  6:58 UTC (permalink / raw)
  To: Mateusz Guzik; +Cc: linux-fsdevel, Linus Torvalds

On Mon, Dec 09, 2024 at 07:33:00AM +0100, Mateusz Guzik wrote:
> On Mon, Dec 09, 2024 at 03:52:51AM +0000, Al Viro wrote:
> > There's a bunch of places where we are accessing dentry names without
> > sufficient protection and where locking environment is not predictable
> > enough to fix the things that way; take_dentry_name_snapshot() is
> > one variant of solution.  It does, however, have a problem - copying
> > is cheap, but bouncing ->d_lock may be nasty on seriously shared dentries.
> > 
> > How about the following (completely untested)?
> > 
> > Use ->d_seq instead of grabbing ->d_lock; in case of shortname dentries
> > that avoids any stores to shared data objects and in case of long names
> > we are down to (unavoidable) atomic_inc on the external_name refcount.
> >     
> > Makes the thing safer as well - the areas where ->d_seq is held odd are
> > all nested inside the areas where ->d_lock is held, and the latter are
> > much more numerous.
> 
> Is there a problem retaining the lock acquire if things fail?
> 
> As in maybe loop 2-3 times, but eventually take the lock to guarantee forward
> progress.
> 
> I don't think there is a *real* workload where this would be a problem,
> but with core counts seen today one may be able to purposefuly introduce
> stalls when running this.

By renaming the poor sucker back and forth in a tight loop?  Would be hard
to trigger on anything short of ramfs...

Hell knows - if anything, I was thinking about a variant that would
*not* loop at all, but take seq as an argument and return whether it
had been successful.  That could be adapted to build such thing -
with "pass ->d_seq sampled value (always even) *or* call it with
the name stabilized in some other way (e.g. ->d_lock, rename_lock or
->s_vfs_rename_mutex held) and pass 1 as argument to suppress checks"
for calling conventions.

The thing is, when its done to a chain of ancestors of some dentry,
with rename_lock retries around the entire thing, running into ->d_seq
change pretty much guarantees that you'll need to retry the whole thing
anyway.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09  6:58   ` Al Viro
@ 2024-12-09  7:18     ` Mateusz Guzik
  2024-12-09  7:41       ` Al Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Mateusz Guzik @ 2024-12-09  7:18 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds

On Mon, Dec 9, 2024 at 7:59 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Dec 09, 2024 at 07:33:00AM +0100, Mateusz Guzik wrote:
> > Is there a problem retaining the lock acquire if things fail?
> >
> > As in maybe loop 2-3 times, but eventually take the lock to guarantee forward
> > progress.
> >
> > I don't think there is a *real* workload where this would be a problem,
> > but with core counts seen today one may be able to purposefuly introduce
> > stalls when running this.
>
> By renaming the poor sucker back and forth in a tight loop?  Would be hard
> to trigger on anything short of ramfs...
>
> Hell knows - if anything, I was thinking about a variant that would
> *not* loop at all, but take seq as an argument and return whether it
> had been successful.  That could be adapted to build such thing -
> with "pass ->d_seq sampled value (always even) *or* call it with
> the name stabilized in some other way (e.g. ->d_lock, rename_lock or
> ->s_vfs_rename_mutex held) and pass 1 as argument to suppress checks"
> for calling conventions.
>
> The thing is, when its done to a chain of ancestors of some dentry,
> with rename_lock retries around the entire thing, running into ->d_seq
> change pretty much guarantees that you'll need to retry the whole thing
> anyway.

I'm not caught up on the lists, I presume this came up with grabbing
the name on execve?

Indeed a variant which grabs a seq argument would work nice here,
possibly with a dedicated wrapper handling the locking for standalone
consumers.

The VFS layer is rather inconsistent on the locking front -- as in
some places try a seq-protected op once and resort to locking/erroring
out, others keep looping not having forward progress guarantee at
least in principle.

How much of a problem it is nor is not presumably depends on the
particular case. But my point is if the indefinite looping (however
realistic or hypothetical) can be trivially avoided, it should be.

-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09  7:18     ` Mateusz Guzik
@ 2024-12-09  7:41       ` Al Viro
  0 siblings, 0 replies; 25+ messages in thread
From: Al Viro @ 2024-12-09  7:41 UTC (permalink / raw)
  To: Mateusz Guzik; +Cc: linux-fsdevel, Linus Torvalds

On Mon, Dec 09, 2024 at 08:18:35AM +0100, Mateusz Guzik wrote:

> I'm not caught up on the lists, I presume this came up with grabbing
> the name on execve?

Not at all.  fexecve() nonsense can use either variant (or go fuck itself,
for all I care) - it's not worth optimizing for.

No, the problem is in the things like ceph_mdsc_build_path()...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09  3:52 [PATCH][RFC] make take_dentry_name_snapshot() lockless Al Viro
  2024-12-09  6:33 ` Mateusz Guzik
@ 2024-12-09 18:27 ` Linus Torvalds
  2024-12-09 21:17   ` Al Viro
  2024-12-09 19:06 ` Linus Torvalds
  2 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2024-12-09 18:27 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

On Sun, 8 Dec 2024 at 19:52, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>  void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)

Editing the patch down so that you just see the end result (ie all the
"-" lines are gone):

>  {
> +       unsigned seq;
> +
> +       rcu_read_lock();
> +retry:
> +       seq = read_seqcount_begin(&dentry->d_seq);
>         name->name = dentry->d_name;
> +       if (read_seqcount_retry(&dentry->d_seq, seq))
> +               goto retry;

Ugh. This early retry is cheap on x86, but on a lot of architectures
it will be a fairly expensive read barrier.

I see why you do it, sinc you want 'name.len' and 'name.name' to be
consistent, but I do hate it.

The name consistency issue is really annoying. Do we really need it
here? Because honestly, what you actually *really* care about here is
whether it's inline or not, and you do that test right afterwards:

> +       // ->name and ->len are at least consistent with each other, so if
> +       // ->name points to dentry->d_iname, ->len is below DNAME_INLINE_LEN
> +       if (likely(name->name.name == dentry->d_iname)) {
> +               memcpy(name->inline_name, dentry->d_iname, name->name.len + 1);

and here it would actually be more efficient to just use a
constant-sized memcpy with DNAME_INLINE_LEN, and never care about
'len' at all.

And if the length in name.len isn't right (we'll return it to the
user), the *final* seqcount check will catch it.

And in the other case, all you care about is the ref-count.

End result: I think your early retry is pointless and should just be
avoided. Instead, you should make sure to just read
dentry->d_name.name with a READ_ONCE() (and read the len any way you
want).

Hmm?

            Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09  3:52 [PATCH][RFC] make take_dentry_name_snapshot() lockless Al Viro
  2024-12-09  6:33 ` Mateusz Guzik
  2024-12-09 18:27 ` Linus Torvalds
@ 2024-12-09 19:06 ` Linus Torvalds
  2024-12-09 20:27   ` Al Viro
  2 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2024-12-09 19:06 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

On Sun, 8 Dec 2024 at 19:52, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> +               struct external_name *p;
> +               p = container_of(name->name.name, struct external_name, name[0]);
> +               // get a valid reference
> +               if (unlikely(!atomic_inc_not_zero(&p->u.count)))
> +                       goto retry;

Oh - this is very much *not* safe.

The other comment I had was really about "that's bad for performance".
But this is actually actively buggy.

If the external name ref has gone down to zero, we can *not* do that

    atomic_inc_not_zero(..)

thing any more, because the recount is in a union with the rcu_head
for delaying the free.

In other words: the *name* will exist for the duration of the
rcu_read_lock() we hold, but that "p->u.count" will not. When the
refcount has gone to zero, the refcount is no longer usable.

IOW, you may be happily incrementing what is now a RCU list head
rather than a count.

So NAK. This cannot work.

It's probably easily fixable by just not using a union in struct
external_name, and just having separate fields for the refcount and
the rcu_head, but in the current state your patch is fundamentally and
dangerously buggy.

(Dangerously, because you will never *ever* actually hit that tiny
race - you need to race with a concurrent rename *and* a drop of the
other dentry that got the external ref, so in practice this is likely
entirely impossible to ever hit, but that only makes it more subtle
and dangerous).

            Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09 19:06 ` Linus Torvalds
@ 2024-12-09 20:27   ` Al Viro
  0 siblings, 0 replies; 25+ messages in thread
From: Al Viro @ 2024-12-09 20:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel

On Mon, Dec 09, 2024 at 11:06:48AM -0800, Linus Torvalds wrote:
> On Sun, 8 Dec 2024 at 19:52, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > +               struct external_name *p;
> > +               p = container_of(name->name.name, struct external_name, name[0]);
> > +               // get a valid reference
> > +               if (unlikely(!atomic_inc_not_zero(&p->u.count)))
> > +                       goto retry;
> 
> Oh - this is very much *not* safe.
> 
> The other comment I had was really about "that's bad for performance".
> But this is actually actively buggy.
> 
> If the external name ref has gone down to zero, we can *not* do that
> 
>     atomic_inc_not_zero(..)
> 
> thing any more, because the recount is in a union with the rcu_head
> for delaying the free.

D'oh.  Right you are; missed it...

> In other words: the *name* will exist for the duration of the
> rcu_read_lock() we hold, but that "p->u.count" will not. When the
> refcount has gone to zero, the refcount is no longer usable.
> 
> IOW, you may be happily incrementing what is now a RCU list head
> rather than a count.
> 
> So NAK. This cannot work.
>
> It's probably easily fixable by just not using a union in struct
> external_name, and just having separate fields for the refcount and
> the rcu_head, but in the current state your patch is fundamentally and
> dangerously buggy.

Agreed.  And yes, separating the fields (and slapping a comment explaining
why they can not be combined) would be the easiest solution - any attempts
to be clever here would be too brittle for no good reason.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09 18:27 ` Linus Torvalds
@ 2024-12-09 21:17   ` Al Viro
  2024-12-09 22:28     ` Al Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2024-12-09 21:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel

On Mon, Dec 09, 2024 at 10:27:04AM -0800, Linus Torvalds wrote:

> The name consistency issue is really annoying. Do we really need it
> here? Because honestly, what you actually *really* care about here is
> whether it's inline or not, and you do that test right afterwards:
> 
> > +       // ->name and ->len are at least consistent with each other, so if
> > +       // ->name points to dentry->d_iname, ->len is below DNAME_INLINE_LEN
> > +       if (likely(name->name.name == dentry->d_iname)) {
> > +               memcpy(name->inline_name, dentry->d_iname, name->name.len + 1);
> 
> and here it would actually be more efficient to just use a
> constant-sized memcpy with DNAME_INLINE_LEN, and never care about
> 'len' at all.

Actually, taking a look at what's generated for that memcpy()...  *ow*
amd64 is fine, but anything that doesn't like unaligned accesses is
ending up with really awful code.

gcc does not realize that pointers are word-aligned.  What's more,
even

unsigned long v[5];

void f(unsigned long *w)
{
	memcpu(v, w, sizeof(v));
}

is not enough to convince the damn thing - try it for e.g. alpha and you'll
see arseloads of extq/insq/mskq, all inlined.  

And yes, they are aligned - d_iname follows a pointer, inline_name follows
struct qstr, i.e. u64 + pointer.  How about we add struct inlined_name {
unsigned char name[DNAME_INLINE_LEN];}; and turn d_iname and inline_name
into anon unions with that?  Hell, might even make it an array of unsigned
long and use that to deal with this
                } else {
                        /*
                         * Both are internal.
                         */
                        unsigned int i;
                        BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
                        for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
                                swap(((long *) &dentry->d_iname)[i],
                                     ((long *) &target->d_iname)[i]);
                        }
                }
in swap_names().  With struct assignment in the corresponding case in
copy_name() and in take_dentry_name_snapshot() - that does generate sane
code...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09 21:17   ` Al Viro
@ 2024-12-09 22:28     ` Al Viro
  2024-12-09 22:49       ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2024-12-09 22:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel

On Mon, Dec 09, 2024 at 09:17:08PM +0000, Al Viro wrote:

> And yes, they are aligned - d_iname follows a pointer, inline_name follows
> struct qstr, i.e. u64 + pointer.  How about we add struct inlined_name {
> unsigned char name[DNAME_INLINE_LEN];}; and turn d_iname and inline_name
> into anon unions with that?  Hell, might even make it an array of unsigned
> long and use that to deal with this
>                 } else {
>                         /*
>                          * Both are internal.
>                          */
>                         unsigned int i;
>                         BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
>                         for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
>                                 swap(((long *) &dentry->d_iname)[i],
>                                      ((long *) &target->d_iname)[i]);
>                         }
>                 }
> in swap_names().  With struct assignment in the corresponding case in
> copy_name() and in take_dentry_name_snapshot() - that does generate sane
> code...

Do you have any objections to the diff below?  Completely untested and needs
to be split in two...  It does seem to generate decent code; it obviously
will need to be profiled - copy_name() in the common (short-to-short) case
copies the entire thing, all 5 words of it on 64bit, instead of memcpy()
of just the amount that needs to be copied.  That thing may cross the
cacheline boundary, but both cachelines had been freshly brought into
cache and dirtied, so...

diff --git a/fs/dcache.c b/fs/dcache.c
index b4d5e9e1e43d..687558622acf 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -296,10 +296,8 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
 }
 
 struct external_name {
-	union {
-		atomic_t count;
-		struct rcu_head head;
-	} u;
+	atomic_t count;		// ->count and ->head can't be combined
+	struct rcu_head head;	// see take_dentry_name_snapshot()
 	unsigned char name[];
 };
 
@@ -329,16 +327,33 @@ static inline int dname_external(const struct dentry *dentry)
 
 void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
 {
-	spin_lock(&dentry->d_lock);
-	name->name = dentry->d_name;
-	if (unlikely(dname_external(dentry))) {
-		atomic_inc(&external_name(dentry)->u.count);
-	} else {
-		memcpy(name->inline_name, dentry->d_iname,
-		       dentry->d_name.len + 1);
+	unsigned seq;
+	const unsigned char *s;
+
+	rcu_read_lock();
+retry:
+	seq = read_seqcount_begin(&dentry->d_seq);
+	s = READ_ONCE(dentry->d_name.name);
+	name->name.hash_len = dentry->d_name.hash_len;
+	if (likely(s == dentry->d_iname)) {
 		name->name.name = name->inline_name;
+		name->__inline_name = dentry->__d_iname;
+		if (read_seqcount_retry(&dentry->d_seq, seq))
+			goto retry;
+	} else {
+		struct external_name *p;
+		p = container_of(s, struct external_name, name[0]);
+		name->name.name = s;
+		// get a valid reference
+		if (unlikely(!atomic_inc_not_zero(&p->count)))
+			goto retry;
+		if (read_seqcount_retry(&dentry->d_seq, seq)) {
+			if (unlikely(atomic_dec_and_test(&p->count)))
+				kfree_rcu(p, head);
+			goto retry;
+		}
 	}
-	spin_unlock(&dentry->d_lock);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(take_dentry_name_snapshot);
 
@@ -347,8 +362,8 @@ void release_dentry_name_snapshot(struct name_snapshot *name)
 	if (unlikely(name->name.name != name->inline_name)) {
 		struct external_name *p;
 		p = container_of(name->name.name, struct external_name, name[0]);
-		if (unlikely(atomic_dec_and_test(&p->u.count)))
-			kfree_rcu(p, u.head);
+		if (unlikely(atomic_dec_and_test(&p->count)))
+			kfree_rcu(p, head);
 	}
 }
 EXPORT_SYMBOL(release_dentry_name_snapshot);
@@ -386,7 +401,7 @@ static void dentry_free(struct dentry *dentry)
 	WARN_ON(!hlist_unhashed(&dentry->d_u.d_alias));
 	if (unlikely(dname_external(dentry))) {
 		struct external_name *p = external_name(dentry);
-		if (likely(atomic_dec_and_test(&p->u.count))) {
+		if (likely(atomic_dec_and_test(&p->count))) {
 			call_rcu(&dentry->d_u.d_rcu, __d_free_external);
 			return;
 		}
@@ -1667,7 +1682,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 			kmem_cache_free(dentry_cache, dentry); 
 			return NULL;
 		}
-		atomic_set(&p->u.count, 1);
+		atomic_set(&p->count, 1);
 		dname = p->name;
 	} else  {
 		dname = dentry->d_iname;
@@ -2749,11 +2764,9 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			 * Both are internal.
 			 */
 			unsigned int i;
-			BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
-			for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
-				swap(((long *) &dentry->d_iname)[i],
-				     ((long *) &target->d_iname)[i]);
-			}
+			for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++)
+				swap(dentry->__d_iname.name[i],
+				     target->__d_iname.name[i]);
 		}
 	}
 	swap(dentry->d_name.hash_len, target->d_name.hash_len);
@@ -2765,16 +2778,15 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
 	if (unlikely(dname_external(dentry)))
 		old_name = external_name(dentry);
 	if (unlikely(dname_external(target))) {
-		atomic_inc(&external_name(target)->u.count);
+		atomic_inc(&external_name(target)->count);
 		dentry->d_name = target->d_name;
 	} else {
-		memcpy(dentry->d_iname, target->d_name.name,
-				target->d_name.len + 1);
+		dentry->__d_iname = target->__d_iname;
 		dentry->d_name.name = dentry->d_iname;
 		dentry->d_name.hash_len = target->d_name.hash_len;
 	}
-	if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
-		kfree_rcu(old_name, u.head);
+	if (old_name && likely(atomic_dec_and_test(&old_name->count)))
+		kfree_rcu(old_name, head);
 }
 
 /*
@@ -3189,6 +3201,7 @@ static void __init dcache_init_early(void)
 
 static void __init dcache_init(void)
 {
+	BUILD_BUG_ON(DNAME_INLINE_LEN != sizeof(struct short_name_store));
 	/*
 	 * A constructor could be added for stable state like the lists,
 	 * but it is probably not worth it because of the cache nature
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 1f28f56fd406..d604a4826765 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -77,6 +77,10 @@ extern const struct qstr dotdot_name;
 # endif
 #endif
 
+struct short_name_store {
+	unsigned long name[DNAME_INLINE_LEN / sizeof(unsigned long)];
+};
+
 #define d_lock	d_lockref.lock
 
 struct dentry {
@@ -88,7 +92,10 @@ struct dentry {
 	struct qstr d_name;
 	struct inode *d_inode;		/* Where the name belongs to - NULL is
 					 * negative */
-	unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */
+	union {
+		unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */
+		struct short_name_store __d_iname;
+	};
 	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
 
 	/* Ref lookup also touches following */
@@ -590,7 +597,10 @@ static inline struct inode *d_real_inode(const struct dentry *dentry)
 
 struct name_snapshot {
 	struct qstr name;
-	unsigned char inline_name[DNAME_INLINE_LEN];
+	union {
+		unsigned char inline_name[DNAME_INLINE_LEN];
+		struct short_name_store __inline_name;
+	};
 };
 void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
 void release_dentry_name_snapshot(struct name_snapshot *);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09 22:28     ` Al Viro
@ 2024-12-09 22:49       ` Linus Torvalds
  2024-12-09 22:55         ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2024-12-09 22:49 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 739 bytes --]

On Mon, 9 Dec 2024 at 14:28, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Do you have any objections to the diff below?

I'd split it up a bit more, and in particular, I think I'd want
DNAME_INLINE_LEN to be defined in terms of the long-words, rather than
doing it the other way around with that

        BUILD_BUG_ON(DNAME_INLINE_LEN != sizeof(struct short_name_store));

IOW, I'd *start* with something like the attached, and then build on that..

(By all means turn that

        unsigned long d_iname_words[DNAME_INLINE_WORDS];

into a struct so that you can then do the struct assignment for it - I
just hate the pattern of starting with a byte size and then doing
"DNAME_INLINE_LEN / sizeof(unsigned long)".

Hmm?

            Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1695 bytes --]

 fs/dcache.c            |  2 +-
 include/linux/dcache.h | 13 +++++++++----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b4d5e9e1e43d..d6a451325133 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3196,7 +3196,7 @@ static void __init dcache_init(void)
 	 */
 	dentry_cache = KMEM_CACHE_USERCOPY(dentry,
 		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_ACCOUNT,
-		d_iname);
+		d_iname_words);
 
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index bff956f7b2b9..0daaecd53353 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -68,15 +68,17 @@ extern const struct qstr dotdot_name;
  * large memory footprint increase).
  */
 #ifdef CONFIG_64BIT
-# define DNAME_INLINE_LEN 40 /* 192 bytes */
+# define DNAME_INLINE_WORDS 5 /* 192 bytes */
 #else
 # ifdef CONFIG_SMP
-#  define DNAME_INLINE_LEN 36 /* 128 bytes */
+#  define DNAME_INLINE_WORDS 9 /* 128 bytes */
 # else
-#  define DNAME_INLINE_LEN 44 /* 128 bytes */
+#  define DNAME_INLINE_WORDS 11 /* 128 bytes */
 # endif
 #endif
 
+#define DNAME_INLINE_LEN (DNAME_INLINE_WORDS*sizeof(unsigned long))
+
 #define d_lock	d_lockref.lock
 
 struct dentry {
@@ -88,7 +90,10 @@ struct dentry {
 	struct qstr d_name;
 	struct inode *d_inode;		/* Where the name belongs to - NULL is
 					 * negative */
-	unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */
+	union {				/* small names */
+		unsigned long d_iname_words[DNAME_INLINE_WORDS];
+		DECLARE_FLEX_ARRAY(unsigned char, d_iname);
+	};
 	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
 
 	/* Ref lookup also touches following */

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09 22:49       ` Linus Torvalds
@ 2024-12-09 22:55         ` Linus Torvalds
  2024-12-09 23:12           ` Al Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2024-12-09 22:55 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel

On Mon, 9 Dec 2024 at 14:49, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> IOW, I'd *start* with something like the attached, and then build on that..

Key word being "something like". I just checked, and that suggested
patch will cause build issues in lib/test_printf.c, because it does
things like

    .d_iname = "foo"

and it turns out that doesn't work when it's a flex-array.

It doesn't have to be a flex array, of course - it could easily just
continue to use DNAME_INLINE_LEN (now just defined in terms of "words
* sizeof*(unsigned long)").

I did that flex array thing mainly to see if somebody ends up
depending on the array as such. Clearly that test_printf.c code does
exactly that, but looks like nothing else is.

           Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09 22:55         ` Linus Torvalds
@ 2024-12-09 23:12           ` Al Viro
  2024-12-10  2:45             ` Al Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2024-12-09 23:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel

On Mon, Dec 09, 2024 at 02:55:45PM -0800, Linus Torvalds wrote:
> On Mon, 9 Dec 2024 at 14:49, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > IOW, I'd *start* with something like the attached, and then build on that..
> 
> Key word being "something like". I just checked, and that suggested
> patch will cause build issues in lib/test_printf.c, because it does
> things like
> 
>     .d_iname = "foo"
> 
> and it turns out that doesn't work when it's a flex-array.
> 
> It doesn't have to be a flex array, of course - it could easily just
> continue to use DNAME_INLINE_LEN (now just defined in terms of "words
> * sizeof*(unsigned long)").
> 
> I did that flex array thing mainly to see if somebody ends up
> depending on the array as such. Clearly that test_printf.c code does
> exactly that, but looks like nothing else is.

Actually, grepping for DNAME_INLINE_LEN brings some interesting hits:
drivers/net/ieee802154/adf7242.c:1165:  char debugfs_dir_name[DNAME_INLINE_LEN + 1];
	cargo-culted, AFAICS; would be better off with a constant of their own.

fs/ext4/fast_commit.c:326:              fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
fs/ext4/fast_commit.c:452:      if (dentry->d_name.len > DNAME_INLINE_LEN) {
fs/ext4/fast_commit.c:1332:                     fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
fs/ext4/fast_commit.h:113:      unsigned char fcd_iname[DNAME_INLINE_LEN];      /* Dirent name string */
	Looks like that might want struct name_snapshot, along with
{take,release}_dentry_name_snapshot().

fs/libfs.c:1792:        char strbuf[DNAME_INLINE_LEN];
fs/libfs.c:1819:        if (len <= DNAME_INLINE_LEN - 1) {
	memcpy() in there might very well be better off a struct
assignment.  And that thing probably ought to migrate to fs/dcache.c -
RCU-related considerations in it are much trickier than it is usual for
->d_compare() instances.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-09 23:12           ` Al Viro
@ 2024-12-10  2:45             ` Al Viro
  2024-12-10  2:48               ` [PATCH 1/5] make sure that DCACHE_INLINE_LEN is a multiple of word size Al Viro
  2024-12-23  4:25               ` [PATCH][RFC] make take_dentry_name_snapshot() lockless Al Viro
  0 siblings, 2 replies; 25+ messages in thread
From: Al Viro @ 2024-12-10  2:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, linux-ext4

On Mon, Dec 09, 2024 at 11:12:37PM +0000, Al Viro wrote:
> 
> Actually, grepping for DNAME_INLINE_LEN brings some interesting hits:
> drivers/net/ieee802154/adf7242.c:1165:  char debugfs_dir_name[DNAME_INLINE_LEN + 1];
> 	cargo-culted, AFAICS; would be better off with a constant of their own.
> 
> fs/ext4/fast_commit.c:326:              fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
> fs/ext4/fast_commit.c:452:      if (dentry->d_name.len > DNAME_INLINE_LEN) {
> fs/ext4/fast_commit.c:1332:                     fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
> fs/ext4/fast_commit.h:113:      unsigned char fcd_iname[DNAME_INLINE_LEN];      /* Dirent name string */
> 	Looks like that might want struct name_snapshot, along with
> {take,release}_dentry_name_snapshot().

See viro/vfs.git#work.dcache.  I've thrown ext4/fast_commit conversion
into the end of that pile.  NOTE: that stuff obviously needs profiling.
It does survive light testing (boot/ltp/xfstests), but review and more
testing (including serious profiling) is obviously needed.

Patches in followups...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] make sure that DCACHE_INLINE_LEN is a multiple of word size
  2024-12-10  2:45             ` Al Viro
@ 2024-12-10  2:48               ` Al Viro
  2024-12-10  2:48                 ` [PATCH 2/5] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
                                   ` (3 more replies)
  2024-12-23  4:25               ` [PATCH][RFC] make take_dentry_name_snapshot() lockless Al Viro
  1 sibling, 4 replies; 25+ messages in thread
From: Al Viro @ 2024-12-10  2:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, linux-ext4

... calling the number of words DCACHE_INLINE_WORDS.

The next step will be to have a structure to hold inline name arrays
(both in dentry and in name_snapshot) and use that to alias the
existing arrays of unsigned char there.  That will allow both
full-structure copies and convenient word-by-word accesses.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c            | 4 +---
 include/linux/dcache.h | 8 +++++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b4d5e9e1e43d..ea0f0bea511b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2748,9 +2748,7 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			/*
 			 * Both are internal.
 			 */
-			unsigned int i;
-			BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
-			for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
+			for (int i = 0; i < DNAME_INLINE_WORDS; i++) {
 				swap(((long *) &dentry->d_iname)[i],
 				     ((long *) &target->d_iname)[i]);
 			}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index bff956f7b2b9..42dd89beaf4e 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -68,15 +68,17 @@ extern const struct qstr dotdot_name;
  * large memory footprint increase).
  */
 #ifdef CONFIG_64BIT
-# define DNAME_INLINE_LEN 40 /* 192 bytes */
+# define DNAME_INLINE_WORDS 5 /* 192 bytes */
 #else
 # ifdef CONFIG_SMP
-#  define DNAME_INLINE_LEN 36 /* 128 bytes */
+#  define DNAME_INLINE_WORDS 9 /* 128 bytes */
 # else
-#  define DNAME_INLINE_LEN 44 /* 128 bytes */
+#  define DNAME_INLINE_WORDS 11 /* 128 bytes */
 # endif
 #endif
 
+#define DNAME_INLINE_LEN (DNAME_INLINE_WORDS*sizeof(unsigned long))
+
 #define d_lock	d_lockref.lock
 
 struct dentry {
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5] dcache: back inline names with a struct-wrapped array of unsigned long
  2024-12-10  2:48               ` [PATCH 1/5] make sure that DCACHE_INLINE_LEN is a multiple of word size Al Viro
@ 2024-12-10  2:48                 ` Al Viro
  2024-12-10  2:48                 ` [PATCH 3/5] make take_dentry_name_snapshot() lockless Al Viro
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Al Viro @ 2024-12-10  2:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, linux-ext4

... so that they can be copied with struct assignment (which generates
better code) and accessed word-by-word.  swap_names() used to do the
latter already, using casts, etc.; now that can be done cleanly.

Both dentry->d_iname and name_snapshot->inline_name got such treatment.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c            | 19 +++++++------------
 include/linux/dcache.h | 14 ++++++++++++--
 2 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ea0f0bea511b..007e582c3e68 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -334,8 +334,7 @@ void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry
 	if (unlikely(dname_external(dentry))) {
 		atomic_inc(&external_name(dentry)->u.count);
 	} else {
-		memcpy(name->inline_name, dentry->d_iname,
-		       dentry->d_name.len + 1);
+		name->inline_name_words = dentry->d_iname_words;
 		name->name.name = name->inline_name;
 	}
 	spin_unlock(&dentry->d_lock);
@@ -2729,9 +2728,8 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			 * dentry:internal, target:external.  Steal target's
 			 * storage and make target internal.
 			 */
-			memcpy(target->d_iname, dentry->d_name.name,
-					dentry->d_name.len + 1);
 			dentry->d_name.name = target->d_name.name;
+			target->d_iname_words = dentry->d_iname_words;
 			target->d_name.name = target->d_iname;
 		}
 	} else {
@@ -2740,18 +2738,16 @@ static void swap_names(struct dentry *dentry, struct dentry *target)
 			 * dentry:external, target:internal.  Give dentry's
 			 * storage to target and make dentry internal
 			 */
-			memcpy(dentry->d_iname, target->d_name.name,
-					target->d_name.len + 1);
 			target->d_name.name = dentry->d_name.name;
+			dentry->d_iname_words = target->d_iname_words;
 			dentry->d_name.name = dentry->d_iname;
 		} else {
 			/*
 			 * Both are internal.
 			 */
-			for (int i = 0; i < DNAME_INLINE_WORDS; i++) {
-				swap(((long *) &dentry->d_iname)[i],
-				     ((long *) &target->d_iname)[i]);
-			}
+			for (int i = 0; i < DNAME_INLINE_WORDS; i++)
+				swap(dentry->d_iname_words.words[i],
+				     target->d_iname_words.words[i]);
 		}
 	}
 	swap(dentry->d_name.hash_len, target->d_name.hash_len);
@@ -2766,8 +2762,7 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
 		atomic_inc(&external_name(target)->u.count);
 		dentry->d_name = target->d_name;
 	} else {
-		memcpy(dentry->d_iname, target->d_name.name,
-				target->d_name.len + 1);
+		dentry->d_iname_words = target->d_iname_words;
 		dentry->d_name.name = dentry->d_iname;
 		dentry->d_name.hash_len = target->d_name.hash_len;
 	}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 42dd89beaf4e..766a9156836e 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -79,6 +79,10 @@ extern const struct qstr dotdot_name;
 
 #define DNAME_INLINE_LEN (DNAME_INLINE_WORDS*sizeof(unsigned long))
 
+struct shortname_store {
+	unsigned long words[DNAME_INLINE_WORDS];
+};
+
 #define d_lock	d_lockref.lock
 
 struct dentry {
@@ -90,7 +94,10 @@ struct dentry {
 	struct qstr d_name;
 	struct inode *d_inode;		/* Where the name belongs to - NULL is
 					 * negative */
-	unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */
+	union {
+		unsigned char d_iname[DNAME_INLINE_LEN];	/* small names */
+		struct shortname_store d_iname_words;
+	};
 	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
 
 	/* Ref lookup also touches following */
@@ -591,7 +598,10 @@ static inline struct inode *d_real_inode(const struct dentry *dentry)
 
 struct name_snapshot {
 	struct qstr name;
-	unsigned char inline_name[DNAME_INLINE_LEN];
+	union {
+		unsigned char inline_name[DNAME_INLINE_LEN];
+		struct shortname_store inline_name_words;
+	};
 };
 void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
 void release_dentry_name_snapshot(struct name_snapshot *);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5] make take_dentry_name_snapshot() lockless
  2024-12-10  2:48               ` [PATCH 1/5] make sure that DCACHE_INLINE_LEN is a multiple of word size Al Viro
  2024-12-10  2:48                 ` [PATCH 2/5] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
@ 2024-12-10  2:48                 ` Al Viro
  2024-12-10  2:48                 ` [PATCH 4/5] dissolve external_name.u into separate members Al Viro
  2024-12-10  2:48                 ` [PATCH 5/5] ext4 fast_commit: make use of name_snapshot primitives Al Viro
  3 siblings, 0 replies; 25+ messages in thread
From: Al Viro @ 2024-12-10  2:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, linux-ext4

Use ->d_seq instead of grabbing ->d_lock; in case of shortname dentries
that avoids any stores to shared data objects and in case of long names
we are down to (unavoidable) atomic_inc on the external_name refcount.

Makes the thing safer as well - the areas where ->d_seq is held odd are
all nested inside the areas where ->d_lock is held, and the latter are
much more numerous.

NOTE: we no longer can have external_name.u.count and external_name.u.head
sharing space, now that we have lockless path that might try to grab
a reference on already doomed instance (kudos to Linus for spotting that).
For now just turn that external_name.u into a struct (instead of union)
to reduce the noise in this commit; the next commit will dissolve it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c | 38 ++++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 007e582c3e68..ae13e89ce7d7 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -296,9 +296,9 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
 }
 
 struct external_name {
-	union {
-		atomic_t count;
-		struct rcu_head head;
+	struct {
+		atomic_t count;		// ->count and ->head can't be combined
+		struct rcu_head head;	// see take_dentry_name_snapshot()
 	} u;
 	unsigned char name[];
 };
@@ -329,15 +329,33 @@ static inline int dname_external(const struct dentry *dentry)
 
 void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
 {
-	spin_lock(&dentry->d_lock);
-	name->name = dentry->d_name;
-	if (unlikely(dname_external(dentry))) {
-		atomic_inc(&external_name(dentry)->u.count);
-	} else {
-		name->inline_name_words = dentry->d_iname_words;
+	unsigned seq;
+	const unsigned char *s;
+
+	rcu_read_lock();
+retry:
+	seq = read_seqcount_begin(&dentry->d_seq);
+	s = READ_ONCE(dentry->d_name.name);
+	name->name.hash_len = dentry->d_name.hash_len;
+	if (likely(s == dentry->d_iname)) {
 		name->name.name = name->inline_name;
+		name->inline_name_words = dentry->d_iname_words;
+		if (read_seqcount_retry(&dentry->d_seq, seq))
+			goto retry;
+	} else {
+		struct external_name *p;
+		p = container_of(s, struct external_name, name[0]);
+		name->name.name = s;
+		// get a valid reference
+		if (unlikely(!atomic_inc_not_zero(&p->u.count)))
+			goto retry;
+		if (read_seqcount_retry(&dentry->d_seq, seq)) {
+			if (unlikely(atomic_dec_and_test(&p->u.count)))
+				kfree_rcu(p, u.head);
+			goto retry;
+		}
 	}
-	spin_unlock(&dentry->d_lock);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(take_dentry_name_snapshot);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5] dissolve external_name.u into separate members
  2024-12-10  2:48               ` [PATCH 1/5] make sure that DCACHE_INLINE_LEN is a multiple of word size Al Viro
  2024-12-10  2:48                 ` [PATCH 2/5] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
  2024-12-10  2:48                 ` [PATCH 3/5] make take_dentry_name_snapshot() lockless Al Viro
@ 2024-12-10  2:48                 ` Al Viro
  2024-12-10  2:48                 ` [PATCH 5/5] ext4 fast_commit: make use of name_snapshot primitives Al Viro
  3 siblings, 0 replies; 25+ messages in thread
From: Al Viro @ 2024-12-10  2:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, linux-ext4

kept separate from the previous commit to keep the noise separate
from actual changes...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ae13e89ce7d7..163bb0953b4f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -296,10 +296,8 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c
 }
 
 struct external_name {
-	struct {
-		atomic_t count;		// ->count and ->head can't be combined
-		struct rcu_head head;	// see take_dentry_name_snapshot()
-	} u;
+	atomic_t count;		// ->count and ->head can't be combined
+	struct rcu_head head;	// see take_dentry_name_snapshot()
 	unsigned char name[];
 };
 
@@ -347,11 +345,11 @@ void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry
 		p = container_of(s, struct external_name, name[0]);
 		name->name.name = s;
 		// get a valid reference
-		if (unlikely(!atomic_inc_not_zero(&p->u.count)))
+		if (unlikely(!atomic_inc_not_zero(&p->count)))
 			goto retry;
 		if (read_seqcount_retry(&dentry->d_seq, seq)) {
-			if (unlikely(atomic_dec_and_test(&p->u.count)))
-				kfree_rcu(p, u.head);
+			if (unlikely(atomic_dec_and_test(&p->count)))
+				kfree_rcu(p, head);
 			goto retry;
 		}
 	}
@@ -364,8 +362,8 @@ void release_dentry_name_snapshot(struct name_snapshot *name)
 	if (unlikely(name->name.name != name->inline_name)) {
 		struct external_name *p;
 		p = container_of(name->name.name, struct external_name, name[0]);
-		if (unlikely(atomic_dec_and_test(&p->u.count)))
-			kfree_rcu(p, u.head);
+		if (unlikely(atomic_dec_and_test(&p->count)))
+			kfree_rcu(p, head);
 	}
 }
 EXPORT_SYMBOL(release_dentry_name_snapshot);
@@ -403,7 +401,7 @@ static void dentry_free(struct dentry *dentry)
 	WARN_ON(!hlist_unhashed(&dentry->d_u.d_alias));
 	if (unlikely(dname_external(dentry))) {
 		struct external_name *p = external_name(dentry);
-		if (likely(atomic_dec_and_test(&p->u.count))) {
+		if (likely(atomic_dec_and_test(&p->count))) {
 			call_rcu(&dentry->d_u.d_rcu, __d_free_external);
 			return;
 		}
@@ -1684,7 +1682,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 			kmem_cache_free(dentry_cache, dentry); 
 			return NULL;
 		}
-		atomic_set(&p->u.count, 1);
+		atomic_set(&p->count, 1);
 		dname = p->name;
 	} else  {
 		dname = dentry->d_iname;
@@ -2777,15 +2775,15 @@ static void copy_name(struct dentry *dentry, struct dentry *target)
 	if (unlikely(dname_external(dentry)))
 		old_name = external_name(dentry);
 	if (unlikely(dname_external(target))) {
-		atomic_inc(&external_name(target)->u.count);
+		atomic_inc(&external_name(target)->count);
 		dentry->d_name = target->d_name;
 	} else {
 		dentry->d_iname_words = target->d_iname_words;
 		dentry->d_name.name = dentry->d_iname;
 		dentry->d_name.hash_len = target->d_name.hash_len;
 	}
-	if (old_name && likely(atomic_dec_and_test(&old_name->u.count)))
-		kfree_rcu(old_name, u.head);
+	if (old_name && likely(atomic_dec_and_test(&old_name->count)))
+		kfree_rcu(old_name, head);
 }
 
 /*
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5] ext4 fast_commit: make use of name_snapshot primitives
  2024-12-10  2:48               ` [PATCH 1/5] make sure that DCACHE_INLINE_LEN is a multiple of word size Al Viro
                                   ` (2 preceding siblings ...)
  2024-12-10  2:48                 ` [PATCH 4/5] dissolve external_name.u into separate members Al Viro
@ 2024-12-10  2:48                 ` Al Viro
  3 siblings, 0 replies; 25+ messages in thread
From: Al Viro @ 2024-12-10  2:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, linux-ext4

... rather than open-coding them (and doing pointless work with extra
allocations, etc. for long names).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ext4/fast_commit.c | 29 +++++------------------------
 fs/ext4/fast_commit.h |  3 +--
 2 files changed, 6 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 26c4fc37edcf..da4263a14a20 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -322,9 +322,7 @@ void ext4_fc_del(struct inode *inode)
 	WARN_ON(!list_empty(&ei->i_fc_dilist));
 	spin_unlock(&sbi->s_fc_lock);
 
-	if (fc_dentry->fcd_name.name &&
-		fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
-		kfree(fc_dentry->fcd_name.name);
+	release_dentry_name_snapshot(&fc_dentry->fcd_name);
 	kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
 
 	return;
@@ -449,22 +447,7 @@ static int __track_dentry_update(handle_t *handle, struct inode *inode,
 	node->fcd_op = dentry_update->op;
 	node->fcd_parent = dir->i_ino;
 	node->fcd_ino = inode->i_ino;
-	if (dentry->d_name.len > DNAME_INLINE_LEN) {
-		node->fcd_name.name = kmalloc(dentry->d_name.len, GFP_NOFS);
-		if (!node->fcd_name.name) {
-			kmem_cache_free(ext4_fc_dentry_cachep, node);
-			ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_NOMEM, handle);
-			mutex_lock(&ei->i_fc_lock);
-			return -ENOMEM;
-		}
-		memcpy((u8 *)node->fcd_name.name, dentry->d_name.name,
-			dentry->d_name.len);
-	} else {
-		memcpy(node->fcd_iname, dentry->d_name.name,
-			dentry->d_name.len);
-		node->fcd_name.name = node->fcd_iname;
-	}
-	node->fcd_name.len = dentry->d_name.len;
+	take_dentry_name_snapshot(&node->fcd_name, dentry);
 	INIT_LIST_HEAD(&node->fcd_dilist);
 	spin_lock(&sbi->s_fc_lock);
 	if (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
@@ -832,7 +815,7 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,
 {
 	struct ext4_fc_dentry_info fcd;
 	struct ext4_fc_tl tl;
-	int dlen = fc_dentry->fcd_name.len;
+	int dlen = fc_dentry->fcd_name.name.len;
 	u8 *dst = ext4_fc_reserve_space(sb,
 			EXT4_FC_TAG_BASE_LEN + sizeof(fcd) + dlen, crc);
 
@@ -847,7 +830,7 @@ static bool ext4_fc_add_dentry_tlv(struct super_block *sb, u32 *crc,
 	dst += EXT4_FC_TAG_BASE_LEN;
 	memcpy(dst, &fcd, sizeof(fcd));
 	dst += sizeof(fcd);
-	memcpy(dst, fc_dentry->fcd_name.name, dlen);
+	memcpy(dst, fc_dentry->fcd_name.name.name, dlen);
 
 	return true;
 }
@@ -1328,9 +1311,7 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
 		list_del_init(&fc_dentry->fcd_dilist);
 		spin_unlock(&sbi->s_fc_lock);
 
-		if (fc_dentry->fcd_name.name &&
-			fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
-			kfree(fc_dentry->fcd_name.name);
+		release_dentry_name_snapshot(&fc_dentry->fcd_name);
 		kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
 		spin_lock(&sbi->s_fc_lock);
 	}
diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
index 2fadb2c4780c..3bd534e4dbbf 100644
--- a/fs/ext4/fast_commit.h
+++ b/fs/ext4/fast_commit.h
@@ -109,8 +109,7 @@ struct ext4_fc_dentry_update {
 	int fcd_op;		/* Type of update create / unlink / link */
 	int fcd_parent;		/* Parent inode number */
 	int fcd_ino;		/* Inode number */
-	struct qstr fcd_name;	/* Dirent name */
-	unsigned char fcd_iname[DNAME_INLINE_LEN];	/* Dirent name string */
+	struct name_snapshot fcd_name;	/* Dirent name */
 	struct list_head fcd_list;
 	struct list_head fcd_dilist;
 };
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-10  2:45             ` Al Viro
  2024-12-10  2:48               ` [PATCH 1/5] make sure that DCACHE_INLINE_LEN is a multiple of word size Al Viro
@ 2024-12-23  4:25               ` Al Viro
  2024-12-23  4:37                 ` Al Viro
  2024-12-23 21:31                 ` Jens Axboe
  1 sibling, 2 replies; 25+ messages in thread
From: Al Viro @ 2024-12-23  4:25 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, Linus Torvalds, Jann Horn

On Tue, Dec 10, 2024 at 02:45:23AM +0000, Al Viro wrote:
> On Mon, Dec 09, 2024 at 11:12:37PM +0000, Al Viro wrote:
> > 
> > Actually, grepping for DNAME_INLINE_LEN brings some interesting hits:
> > drivers/net/ieee802154/adf7242.c:1165:  char debugfs_dir_name[DNAME_INLINE_LEN + 1];
> > 	cargo-culted, AFAICS; would be better off with a constant of their own.
> > 
> > fs/ext4/fast_commit.c:326:              fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
> > fs/ext4/fast_commit.c:452:      if (dentry->d_name.len > DNAME_INLINE_LEN) {
> > fs/ext4/fast_commit.c:1332:                     fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
> > fs/ext4/fast_commit.h:113:      unsigned char fcd_iname[DNAME_INLINE_LEN];      /* Dirent name string */
> > 	Looks like that might want struct name_snapshot, along with
> > {take,release}_dentry_name_snapshot().
> 
> See viro/vfs.git#work.dcache.  I've thrown ext4/fast_commit conversion
> into the end of that pile.  NOTE: that stuff obviously needs profiling.
> It does survive light testing (boot/ltp/xfstests), but review and more
> testing (including serious profiling) is obviously needed.
> 
> Patches in followups...

More fun with ->d_name, ->d_iname and friends:

87ce955b24c9 "io_uring: add ->show_fdinfo() for the io_uring file descriptor"
is playing silly buggers with ->d_iname for some reason.  This
        seq_printf(m, "UserFiles:\t%u\n", ctx->file_table.data.nr);
        for (i = 0; has_lock && i < ctx->file_table.data.nr; i++) {
                struct file *f = NULL;

                if (ctx->file_table.data.nodes[i])
                        f = io_slot_file(ctx->file_table.data.nodes[i]);
                if (f)
                        seq_printf(m, "%5u: %s\n", i, file_dentry(f)->d_iname);
                else
                        seq_printf(m, "%5u: <none>\n", i);
        }
produces user-visible data.  For each slot in io_uring file table you
show either that it's empty (fine) or, for files with short names, the
last component of the name (no quoting, etc. - just a string as-is) or
the last short name that dentry used to have.

And that's a user-visible ABI.  What the hell?

NOTE: file here is may be anything whatsoever.  It may be a pipe,
an arbitrary file in tmpfs, a socket, etc.

How hard an ABI it is?  If it's really used by random userland code
(admin tools, etc.), we have a problem.  If that thing is cast in
stone, we'll have to emulate the current behaviour of that code,
no matter what.  I really hope it can be replaced with something
saner, though.

Incidentally, call your file "<none>"; is the current behaviour
the right thing to do?

What behaviour _is_ actually wanted?  Jens, Jann?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-23  4:25               ` [PATCH][RFC] make take_dentry_name_snapshot() lockless Al Viro
@ 2024-12-23  4:37                 ` Al Viro
  2024-12-23 21:31                 ` Jens Axboe
  1 sibling, 0 replies; 25+ messages in thread
From: Al Viro @ 2024-12-23  4:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, Linus Torvalds, Jann Horn

On Mon, Dec 23, 2024 at 04:25:40AM +0000, Al Viro wrote:

> Incidentally, call your file "<none>"; is the current behaviour
> the right thing to do?
> 
> What behaviour _is_ actually wanted?  Jens, Jann?

While we are at it, what do you want to see if your file happens to
be e.g. /home or /boot/efi?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-23  4:25               ` [PATCH][RFC] make take_dentry_name_snapshot() lockless Al Viro
  2024-12-23  4:37                 ` Al Viro
@ 2024-12-23 21:31                 ` Jens Axboe
  2024-12-24 19:18                   ` Al Viro
  1 sibling, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2024-12-23 21:31 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Jann Horn

On 12/22/24 9:25 PM, Al Viro wrote:
> On Tue, Dec 10, 2024 at 02:45:23AM +0000, Al Viro wrote:
>> On Mon, Dec 09, 2024 at 11:12:37PM +0000, Al Viro wrote:
>>>
>>> Actually, grepping for DNAME_INLINE_LEN brings some interesting hits:
>>> drivers/net/ieee802154/adf7242.c:1165:  char debugfs_dir_name[DNAME_INLINE_LEN + 1];
>>> 	cargo-culted, AFAICS; would be better off with a constant of their own.
>>>
>>> fs/ext4/fast_commit.c:326:              fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
>>> fs/ext4/fast_commit.c:452:      if (dentry->d_name.len > DNAME_INLINE_LEN) {
>>> fs/ext4/fast_commit.c:1332:                     fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
>>> fs/ext4/fast_commit.h:113:      unsigned char fcd_iname[DNAME_INLINE_LEN];      /* Dirent name string */
>>> 	Looks like that might want struct name_snapshot, along with
>>> {take,release}_dentry_name_snapshot().
>>
>> See viro/vfs.git#work.dcache.  I've thrown ext4/fast_commit conversion
>> into the end of that pile.  NOTE: that stuff obviously needs profiling.
>> It does survive light testing (boot/ltp/xfstests), but review and more
>> testing (including serious profiling) is obviously needed.
>>
>> Patches in followups...
> 
> More fun with ->d_name, ->d_iname and friends:
> 
> 87ce955b24c9 "io_uring: add ->show_fdinfo() for the io_uring file descriptor"
> is playing silly buggers with ->d_iname for some reason.  This
>         seq_printf(m, "UserFiles:\t%u\n", ctx->file_table.data.nr);
>         for (i = 0; has_lock && i < ctx->file_table.data.nr; i++) {
>                 struct file *f = NULL;
> 
>                 if (ctx->file_table.data.nodes[i])
>                         f = io_slot_file(ctx->file_table.data.nodes[i]);
>                 if (f)
>                         seq_printf(m, "%5u: %s\n", i, file_dentry(f)->d_iname);
>                 else
>                         seq_printf(m, "%5u: <none>\n", i);
>         }
> produces user-visible data.  For each slot in io_uring file table you
> show either that it's empty (fine) or, for files with short names, the
> last component of the name (no quoting, etc. - just a string as-is) or
> the last short name that dentry used to have.
> 
> And that's a user-visible ABI.  What the hell?
> 
> NOTE: file here is may be anything whatsoever.  It may be a pipe,
> an arbitrary file in tmpfs, a socket, etc.
> 
> How hard an ABI it is?  If it's really used by random userland code
> (admin tools, etc.), we have a problem.  If that thing is cast in
> stone, we'll have to emulate the current behaviour of that code,
> no matter what.  I really hope it can be replaced with something
> saner, though.
> 
> Incidentally, call your file "<none>"; is the current behaviour
> the right thing to do?
> 
> What behaviour _is_ actually wanted?  Jens, Jann?

It's not really API, it's just for debugging purposes. Ideal behavior -
show the file name, if possible, if not it can be anything like "anon
inode" or whatever.

IOW, we can change this however we want.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-23 21:31                 ` Jens Axboe
@ 2024-12-24 19:18                   ` Al Viro
  2024-12-24 19:44                     ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2024-12-24 19:18 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, Linus Torvalds, Jann Horn

On Mon, Dec 23, 2024 at 02:31:38PM -0700, Jens Axboe wrote:

> > And that's a user-visible ABI.  What the hell?
> > 
> > NOTE: file here is may be anything whatsoever.  It may be a pipe,
> > an arbitrary file in tmpfs, a socket, etc.
> > 
> > How hard an ABI it is?  If it's really used by random userland code
> > (admin tools, etc.), we have a problem.  If that thing is cast in
> > stone, we'll have to emulate the current behaviour of that code,
> > no matter what.  I really hope it can be replaced with something
> > saner, though.
> > 
> > Incidentally, call your file "<none>"; is the current behaviour
> > the right thing to do?
> > 
> > What behaviour _is_ actually wanted?  Jens, Jann?
> 
> It's not really API, it's just for debugging purposes.

Famous last words...  Let's hope that change won't bring some deployed
userland tool screaming about breakage; it's been there for almost 5 years...

> Ideal behavior -
> show the file name, if possible, if not it can be anything like "anon
> inode" or whatever.
> 
> IOW, we can change this however we want.

All right, how about
		seq_printf(m, "%5u: ", i);
		if (f)
			seq_file_path(m, f, " \t\n\\<");
		else
			seq_puts(m, "<none>");
		seq_puts(m, "\n");
in io_uring_show_fdinfo(), instead of your
                if (f)
			seq_printf(m, "%5u: %s\n", i, file_dentry(f)->d_iname);
		else   
			seq_printf(m, "%5u: <none>\n", i);
Can you live with that?  Said that, I'm not sure that <none> case is worth printing -
you are dumping the entire table, in order of increasing index, so it's not as if
those lines carried any useful information...  If we do not care about those, it
becomes
		if (f) {
			seq_printf(m, "%5u: ", i);
			seq_file_path(m, f, " \t\n\\");
			seq_puts(m, "\n");
		}
- we only need to escape '<' in names to avoid output clashing with <none>...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-24 19:18                   ` Al Viro
@ 2024-12-24 19:44                     ` Linus Torvalds
  2024-12-24 20:24                       ` Al Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2024-12-24 19:44 UTC (permalink / raw)
  To: Al Viro; +Cc: Jens Axboe, linux-fsdevel, Jann Horn

On Tue, 24 Dec 2024 at 11:18, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>                 if (f)
>                         seq_file_path(m, f, " \t\n\\<");
>                 else
>                         seq_puts(m, "<none>");

Maybe seq_file_path() could do the "<none>" thing itself.

Right now it looks like d_path() just oopses with a NULL pointer
dereference if you pass it a NULL path. Our printk() routines -
including '%pd' - tend to do "(null)" for this case (and also handle
ERR_PTR() cases) thanks to the 'check_pointer()' logic.

            Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH][RFC] make take_dentry_name_snapshot() lockless
  2024-12-24 19:44                     ` Linus Torvalds
@ 2024-12-24 20:24                       ` Al Viro
  0 siblings, 0 replies; 25+ messages in thread
From: Al Viro @ 2024-12-24 20:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, linux-fsdevel, Jann Horn

On Tue, Dec 24, 2024 at 11:44:55AM -0800, Linus Torvalds wrote:
> On Tue, 24 Dec 2024 at 11:18, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> >                 if (f)
> >                         seq_file_path(m, f, " \t\n\\<");
> >                 else
> >                         seq_puts(m, "<none>");
> 
> Maybe seq_file_path() could do the "<none>" thing itself.
> 
> Right now it looks like d_path() just oopses with a NULL pointer
> dereference if you pass it a NULL path. Our printk() routines -
> including '%pd' - tend to do "(null)" for this case (and also handle
> ERR_PTR() cases) thanks to the 'check_pointer()' logic.

	Umm...  What about the escape character set?  If we have
seq_file_path() produce "<none>" for a NULL file reference, we'd
need to make sure that file called "<none>" won't get mixed with
that...

	OTOH, there will be leading / in all cases except for
pipes/sockets/anon files/anything that has ->d_dname() and
it looks like the output of all instances has either '/' or
':' in it...  Feels brittle, though.

	Anyway, I'm not at all sure that those <none> lines are
actually useful for anything - we get something like
UserFiles:    512
followed by 512 lines of form index:name or index:<none>, with
indices going from 0 to 511.

	We are dumping an io_uring private descriptor table - all
of it in one call of ->show().  And each line is prepended with
the slot number, so it's not obvious that dumping empty slots
makes sense.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-12-24 20:24 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-09  3:52 [PATCH][RFC] make take_dentry_name_snapshot() lockless Al Viro
2024-12-09  6:33 ` Mateusz Guzik
2024-12-09  6:58   ` Al Viro
2024-12-09  7:18     ` Mateusz Guzik
2024-12-09  7:41       ` Al Viro
2024-12-09 18:27 ` Linus Torvalds
2024-12-09 21:17   ` Al Viro
2024-12-09 22:28     ` Al Viro
2024-12-09 22:49       ` Linus Torvalds
2024-12-09 22:55         ` Linus Torvalds
2024-12-09 23:12           ` Al Viro
2024-12-10  2:45             ` Al Viro
2024-12-10  2:48               ` [PATCH 1/5] make sure that DCACHE_INLINE_LEN is a multiple of word size Al Viro
2024-12-10  2:48                 ` [PATCH 2/5] dcache: back inline names with a struct-wrapped array of unsigned long Al Viro
2024-12-10  2:48                 ` [PATCH 3/5] make take_dentry_name_snapshot() lockless Al Viro
2024-12-10  2:48                 ` [PATCH 4/5] dissolve external_name.u into separate members Al Viro
2024-12-10  2:48                 ` [PATCH 5/5] ext4 fast_commit: make use of name_snapshot primitives Al Viro
2024-12-23  4:25               ` [PATCH][RFC] make take_dentry_name_snapshot() lockless Al Viro
2024-12-23  4:37                 ` Al Viro
2024-12-23 21:31                 ` Jens Axboe
2024-12-24 19:18                   ` Al Viro
2024-12-24 19:44                     ` Linus Torvalds
2024-12-24 20:24                       ` Al Viro
2024-12-09 19:06 ` Linus Torvalds
2024-12-09 20:27   ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox