All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue?
       [not found] <GV0P278MB07187F9B0E7B576AD0B362B485802@GV0P278MB0718.CHEP278.PROD.OUTLOOK.COM>
@ 2024-08-19 16:04 ` Bernd Schubert
  2024-08-22 12:41 ` Miklos Szeredi
  2024-09-25 12:20 ` Miklos Szeredi
  2 siblings, 0 replies; 12+ messages in thread
From: Bernd Schubert @ 2024-08-19 16:04 UTC (permalink / raw)
  To: Laura Promberger, fuse-devel@lists.sourceforge.net,
	linux-fsdevel@vger.kernel.org

Hi Laura,

it is better to use linux-fsdevel if you suspect kernel issues. The
sourceforge list is badly spammed and I'm not sure there are many people
who read it. Personally I get it sorted into special folder.
We should probably ask for a new fuse list on  a kernel server...

On 8/15/24 16:11, Laura Promberger wrote:
> Hi Miklos, and fuse developers,
> 
> I want to say thanks again for implementing the fuse_expire_entry functionality [0]. With this being able to do symlink caching is a well-received feature. But like [1], we also notice that in certain cases corrupted symlinks are returned on `readlink` (new value with old length).
> 
> TL;DR
> Symlink corruption (new value, old length) can occur if the target of a symlink changes; symlink inode stays the same; expire is used for dentries and inval for inodes.
> It is most likely a kernel issue as userland returns the correct new symlink.
> - In healthy calls (kernel) `fuse_change_attributes()` and `fuse_change_attributes_common()` are called. 
> - But for corrupted symlinks `fuse_change_attributes()` exits before `fuse_change_attributes_common()` is called and as such the length stays the old one.
> 
> Why it is able to exit early is not clear to me.
> 
> 
> Nicer formatted version: https://github.com/cvmfs/cvmfs/issues/3626#issuecomment-2291308284
> 
> --------------------------------------------------------
> Before continuing to describe the problem, here a short overview of our file system:
> --------------------------------------------------------
> 
> CernVM-FS [2] is a snapshot-based read-only, distributed file system. The file system does not have genuine inodes but "cvmfs inodes" that are artificially issued by the fuse module to allow keep track of and serve correctly open files during different generations of snapshots (an open file from snapshot A should be still working on the snapshot A data, while a new request should work on the current snapshot D).
> 
> When a new snapshot becomes available the following steps are performed:
> 1. Stop Kernel caching (new dentries timeout = 0)
> 2. Asynchronously evict kernel caches (inval inodes, expire dentries - we have trackers for it to know what the kernel has in its caches)
> 3. Apply new snapshot
>    - this is a critical section and will stop any execution of critical sections within all posix functions (open, readlink, getatt, lookup, read, ...)
>    - mutex to stop and drain readlink requests (wait for drain, and have an extra stop within readlink before the critical section)
>    - evict inodes again
>    - apply new snapshot
>    - restart readlink requests
>    - end critical section
> 4. Turn back on kernel caching
> 
> During all those steps, requests from the user can and will come in. As such, we do not want to fail lookup/open/readlink requests if they are valid requests on the new snapshot (even if the inode is "old"). For open/read etc we have mechanics to figure out if to work on stale cvmfs inodes or replace them with new ones.
> 
> --------------------------------------------------------
> 
> While the event itself is randomly happening, we have a reproducer for our file system that can reliably reproduce it within a couple of minutes. It performs as quickly as possible a readlink on `my-symlink` while new snapshots are applied that change the target `my-symlink` points to. The targets have different length and point to valid files.

Do you have description how to run that reproducer? Would it be possible
to run it in a VM?

> 
> The mutex around the readlink decreases the likelyhood of returning a corrupted symlink but does not fully solve it. Adding sleeps did not change anything. 
> 
> Note:
> - Using no kernel symlink caching will not trigger the bug
> - We must expire dentries and cannot invalidate them due to invalidate destroying mount-on-top paths, which is heavily used by containers
> - We use the lowlevel libfuse API
> - The (kernel) inode of the symlink does not change, even though the target location changes and the snapshot changes
> 
> --------------------------------------------------------
> Debugging results
> --------------------------------------------------------
> 0) This problem only occurs if readlink-calls are performed in a very tight loop during application of a new snapshot. (e.g. when adding a `sleep(1)` inside the loop i do not seem to be able to trigger this issue)
> 1) On the userland everything seems to work correctly:
> - after the new snapshot, cvmfs_readlink()and the subsequent call to cvmfs_getattr() return the correct new value and new length of the symlink. --> But this is ignored. (This can also be seen in fuse_change_attributes() parameter fuse_attr having the correct values)
> 2) It seems to be a kernel space issue:
> - libfuse expire dentry triggers in the fuse kernel module: fuse_dentry_revalidate (see the description below [A] )
> - fuse_dentry_revalidate() calls fuse_change_attributes()
> - fuse_change_attributes() parmeters include `inode` which has the current value, and `fuse_attr` that contains the new values (with correct new length)
> - fuse_change_attributes() calls fuse_change_attributes_common() which updates most of the attribute values of `inode` to the values given in `attr`. Afterwards fuse_change_attributes() continues to update the length.
> 3) In corrupted symlink cases fuse_change_attributes() exits before fuse_change_attributes_common() is called
> 
> For the logs: 
> - Notes/Comments are put in << some comment >>
> - `SYMLINK CHANGED` output is from the user program `corrupt_symlink` what the readlink() returns
> - In all cases the readlink is refered to by inode 270
> - Compared to the raw log output, I did a bit of formatting to make it (hopefully) easier to read
> 
> --------------------------------------------------------
>  Good log
> --------------------------------------------------------
>    08/15/24 11:35:34.959253 fuse_reverse_inval_inode:    func cvmfs2             ino 270 offset 0 len 0
>    08/15/24 11:35:34.959258 fuse_invalidate_attr:        func cvmfs2             inode-inode 270 inode-size 22
> 
> << stop caching here>>
>    08/15/24 11:35:35.984689 fuse_dentry_settime:         func cvmfs2             d_time 0 ino 270 new_time 0
>    08/15/24 11:35:35.984784 fuse_reverse_inval_inode:    func cvmfs2             ino 270 offset 0 len 0
>    08/15/24 11:35:35.984791 fuse_invalidate_attr:        func cvmfs2             inode-inode 270 inode-size 22
>    08/15/24 11:35:35.984855 fuse_reverse_inval_inode:    func cvmfs2             ino 270 offset 0 len 0
>    08/15/24 11:35:35.984859 fuse_invalidate_attr:        func cvmfs2             inode-inode 270 inode-size 22
>    08/15/24 11:35:36.086263 fuse_dentry_revalidate:      func corrupt_symlink    d_time 0 ino 270 size 22 flags 16448
>    08/15/24 11:35:36.086284 fuse_dentry_revalidate:      func corrupt_symlink    d_time 0 ino 270 size 22 flags 16384
>    08/15/24 11:35:36.086566 fuse_invalid_attr:           func corrupt_symlink    ino 270 size 16 blksize 4096
> 
>    08/15/24 11:35:36.086579 fuse_change_attributes:  func corrupt_symlink \
>                                                       inode: ino 270, size 22, version.counter 0, state 0 \
>                                                       fuse_attr: ino 270, size 16, blksize 4096, \
>                                                       fuse_statx: ino 0, size 0, blksize 0, \
>                                                       fuse_inode: ino 0, orig_ino 0, attr_version 0 state 0 \
>                                                       attr_valid 98, attr_version 0
> 
>    08/15/24 11:35:36.086584 fuse_change_attributes_common: func corrupt_symlink 
>                                                             inode: ino 270, i_size 22, \
>                                                             fuse_attr: ino 270, size 16, blksize 4096, \
>                                                             fuse_statx: ino 0, size 0, blksize 0, \
>                                                             attr_valid 0, cache_mask 0
> 
> << restart caching here >>
>    08/15/24 11:35:36.086588 fuse_dentry_settime:      func corrupt_symlink  d_time 0 ino 270 new_time 68641802
>    08/15/24 11:35:36.086598 fuse_get_link:            func corrupt_symlink  dentry-inode 270 dentry-size 16 inode-inode 270 inode-size 16
>    08/15/24 11:35:36.086612 fuse_readlink_page:       func corrupt_symlink  ino 0 inode 270 size 16
> 
> Current time: 08/15/24 09:35:36.086833735 UTC
> SYMLINK CHANGED:  1402844     size 16     symlinktest/1a1b
> << correct value and size >>
> 
>    08/15/24 11:35:36.087078 fuse_dentry_revalidate:   func corrupt_symlink  d_time 0 ino 270 size 16 flags 16448
>    08/15/24 11:35:36.087085 fuse_get_link:            func corrupt_symlink  dentry-inode 270 dentry-size 16 inode-inode 270 inode-size 16
> 
> --------------------------------------------------------
> Bad log - old length, new symlink too long
> --------------------------------------------------------
>    08/15/24 11:35:11.883895 fuse_get_link:            func corrupt_symlink    dentry-inode 270 dentry-size 22 inode-inode 270 inode-size 22
>    08/15/24 11:35:11.884372 fuse_dentry_revalidate:   func corrupt_symlink    d_time 0 ino 270 size 22 flags 16384
> 
> << stop caching here>>
>    08/15/24 11:35:11.884238 fuse_dentry_settime:      func cvmfs2             d_time 0 ino 270 new_time 0
>    08/15/24 11:35:11.884334 fuse_reverse_inval_inode: func cvmfs2             ino 270 offset 0 len 0
>    08/15/24 11:35:11.884338 fuse_invalidate_attr:     func cvmfs2             inode-inode 270 inode-size 22
>    08/15/24 11:35:11.880132 fuse_dentry_revalidate:   func corrupt_symlink    d_time 0 ino 270 size 22 flags 16448
>    08/15/24 11:35:11.880142 fuse_get_link:            func corrupt_symlink    dentry-inode 270 dentry-size 22 inode-inode 270 inode-size 22
>    08/15/24 11:35:11.884041 fuse_dentry_revalidate:   func corrupt_symlink    d_time 0 ino 270 size 22 flags 16448
>    08/15/24 11:35:11.884052 fuse_get_link:            func corrupt_symlink    dentry-inode 270 dentry-size 22 inode-inode 270 inode-size 22
>    08/15/24 11:35:11.884169 fuse_dentry_revalidate:   func corrupt_symlink    d_time 0 ino 270 size 22 flags 16448
>    08/15/24 11:35:11.884174 fuse_get_link:            func corrupt_symlink    dentry-inode 270 dentry-size 22 inode-inode 270 inode-size 22
>    08/15/24 11:35:12.904539 fuse_reverse_inval_inode: func cvmfs2             ino 270 offset 0 len 0
>    08/15/24 11:35:12.904554 fuse_invalidate_attr:     func cvmfs2             inode-inode 270 inode-size 22
>    08/15/24 11:35:12.904592 fuse_reverse_inval_inode: func cvmfs2             ino 270 offset 0 len 0
>    08/15/24 11:35:12.904595 fuse_invalidate_attr:     func cvmfs2             inode-inode 270 inode-size 22
>    08/15/24 11:35:13.004875 fuse_invalid_attr:        func corrupt_symlink    ino 270 size 36 blksize 4096
> 
>    08/15/24 11:35:13.004891 fuse_change_attributes:  func corrupt_symlink \
>                                                       inode: ino 270, size 22, version.counter 0, state 0 \
>                                                       fuse_attr: ino 270, size 36, blksize 4096, \
>                                                       fuse_statx: ino 0, size 0, blksize 0, \
>                                                       fuse_inode: ino 0, orig_ino 0, attr_version 0, state 0 \
>                                                       attr_valid 74, attr_version 0
> 
> <<<< fuse_change_attributes_common is missing >>>>
> 
> << restart caching here >>
>    08/15/24 11:35:13.004897 fuse_dentry_settime:      func corrupt_symlink    d_time 0 ino 270 new_time 68636032
>    08/15/24 11:35:13.004907 fuse_get_link:            func corrupt_symlink    dentry-inode 270 dentry-size 22 inode-inode 270 inode-size 22
>    08/15/24 11:35:13.004920 fuse_readlink_page:       func corrupt_symlink    ino 0 inode 270 size 22
> 
> Current time: 08/15/24 11:35:13.005221851 UTC
> SYMLINK CHANGED:  936856      size 22     symlinktest/10bbbbbbbb
> << wrong value: symlink should point to symlinktest/10bbbbbbbbbb10cccccccccc with length 36 >>
> 
>    08/15/24 11:35:13.007225 fuse_dentry_revalidate:   func corrupt_symlink     d_time 0 ino 270 size 22 flags 16448
>    08/15/24 11:35:13.007244 fuse_get_link:            func corrupt_symlink     dentry-inode 270 dentry-size 22 inode-inode 270 inode-size 22
>    08/15/24 11:35:13.007363 fuse_dentry_revalidate:   func corrupt_symlink     d_time 0 ino 270 size 22 flags 16448
> 
> --------------------------------------------------------
> Bad log - old length, new symlink too short (user will not see that mismatch due to null terminator)
> --------------------------------------------------------
>    08/15/24 11:35:00.338840 fuse_get_link:            func corrupt_symlink    dentry-inode 270 dentry-size 22 inode-inode 270 inode-size 22
>    08/15/24 11:35:00.338888 fuse_dentry_revalidate:   func corrupt_symlink    d_time 0 ino 270 size 22 flags 16448
>    08/15/24 11:35:00.338891 fuse_get_link:            func corrupt_symlink    dentry-inode 270 dentry-size 22 inode-inode 270 inode-size 22
> 
> << stop caching here>>
>    08/15/24 11:35:00.339018 fuse_dentry_settime:      func cvmfs2             d_time 0 ino 270 new_time 0
>    08/15/24 11:35:00.339076 fuse_reverse_inval_inode: func cvmfs2             ino 270 offset 0 len 0
>    08/15/24 11:35:00.339078 fuse_invalidate_attr:     func cvmfs2             inode-inode 270 inode-size 22
>    08/15/24 11:35:00.339110 fuse_dentry_revalidate:   func corrupt_symlink    d_time 0 ino 270 size 22 flags 16384
>    08/15/24 11:35:01.364445 fuse_reverse_inval_inode: func cvmfs2             ino 270 offset 0 len 0
>    08/15/24 11:35:01.364457 fuse_invalidate_attr:     func cvmfs2             inode-inode 270 inode-size 22
>    08/15/24 11:35:01.364484 fuse_reverse_inval_inode: func cvmfs2             ino 270 offset 0 len 0
>    08/15/24 11:35:01.364488 fuse_invalidate_attr:     func cvmfs2             inode-inode 270 inode-size 22
>    08/15/24 11:35:01.464772 fuse_invalid_attr:        func corrupt_symlink    ino 270 size 14 blksize 4096
> 
>    08/15/24 11:35:01.464796 fuse_change_attributes:  func corrupt_symlink \
>                                                       inode: ino 270, size 22, version.counter 0, state 0, \
>                                                       fuse_attr: ino 270, size 14, blksize 4096, \
>                                                       fuse_statx: ino 0, size 0, blksize 0, \
>                                                       fuse_inode: ino 0, orig_ino 0, attr_version 0, state 0 \
>                                                       attr_valid 65, attr_version 0
> 
> <<<< fuse_change_attributes_common is missing >>>>
> 
> << restart caching here >>
>    08/15/24 11:35:01.464803 fuse_dentry_settime:      func corrupt_symlink    d_time 0 ino 270 new_time 68633147
>    08/15/24 11:35:01.464815 fuse_get_link:            func corrupt_symlink    dentry-inode 270 dentry-size 22 inode-inode 270 inode-size 22
>    08/15/24 11:35:01.464836 fuse_readlink_page:       func corrupt_symlink    ino 0 inode 270 size 22
> 
> Current time: 08/15/24 11:35:01.465255963 UTC
> SYMLINK CHANGED:  703167      size 14     symlinktest/fg
> << correct value but kernel works with too large size 22 instead of 14 >>
> 
>    08/15/24 11:35:01.466162 fuse_dentry_revalidate:   func corrupt_symlink    d_time 0 ino 270 size 22 flags 16448
>    08/15/24 11:35:01.466174 fuse_get_link:            func corrupt_symlink    dentry-inode 270 dentry-size 22 inode-inode 270 inode-size 22
>    08/15/24 11:35:01.466447 fuse_dentry_revalidate:   func corrupt_symlink    d_time 0 ino 270 size 22 flags 16448
> 
> --------------------------------------------------------
> My understanding from the logs
> --------------------------------------------------------
> Looking at the good and bad logs, the only difference is fuse_change_attributes_common() not being called from fuse_change_attributes(). This is only possible if the following branch is being taken:
> if ((attr_version != 0 && fi->attr_version > attr_version) ||
>           test_bit(FUSE_I_SIZE_UNSTABLE, &fi->state)) {
>             spin_unlock(&fi->lock);
>             return;
>       }
> However, looking at bpftrace (see code being used below [B]), I do not understand how this branch can be taken as in the logs it is stated: 
> `attr_version == 0` and `&fi->state == 0` with `struct fuse_inode *fi = get_fuse_inode(inode);`


fuse_reverse_inval_inode() has

	fi = get_fuse_inode(inode);
	spin_lock(&fi->lock);
	fi->attr_version = atomic64_inc_return(&fc->attr_version);
	spin_unlock(&fi->lock);


But that is not in fuse_reverse_inval_entry(). Could you test a patch?

> 
> I also have looked where `set_bit(FUSE_I_SIZE_UNSTABLE,` is called and attached to the kprobes of it. But none of them got triggered.
> - fuse_do_setattr
> - fuse_perform_write
> - fuse_file_fallocate
> - __fuse_copy_file_range
> 
> The complete logs and bpftrace script can be found here: https://cernbox.cern.ch/s/EBVedOKe6ng4O0H 
> A nicer formated version can be found here: https://github.com/cvmfs/cvmfs/issues/3626#issuecomment-2291308284 
> 
> If you need any more information or have any idea how to fix it, please let me know.
> 
> Thanks
> Laura
> 
> 
> [0] https://sourceforge.net/p/fuse/mailman/fuse-devel/thread/0158a70b-3d78-ce18-f38f-b8563d9efcdd@cern.ch/
> [1] https://sourceforge.net/p/fuse/mailman/fuse-devel/thread/fbb2f999-6b37-5875-deca-989a75ce3b5b@spawn.link/
> [2] https://github.com/cvmfs/cvmfs
> 
> [A]
> /*
>  * Check whether the dentry is still valid
>  *
>  * If the entry validity timeout has expired and the dentry is
>  * positive, try to redo the lookup.  If the lookup results in a
>  * different inode, then let the VFS invalidate the dentry and redo
>  * the lookup once more.  If the lookup results in the same inode,
>  * then refresh the attributes, timeouts and mark the dentry valid.
>  */
> static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
> 
> [B] (bpftrace version v0.14.0, 6.5.0-26-generic #26~22.04.1-Ubuntu )
> kprobe:fuse_change_attributes
> / ((struct inode *) arg0)->i_ino == 270 / {
>   $x = nsecs;
>   printf("%s fuse_change_attributes:\tfunc %s inode_ino %ld inode_size %ld, inode_version.counter %ld, inode_state %lu fuse_attr ino %ld, fuse_attr size, fuse_attr %ld blksize %ld, fuse_statx ino %ld, fuse_statx size %ld fuse_statx blksize %ld, fuse_inode ino %ld,  fuse_inode orig_ino %lu, fuse_inode attr_version %lu fuse_inode state %ld -- attr_valid %lu, attr_version %lu\n",
>           strftime("%D %T.%f", $x), comm,
>           ((struct inode *) arg0)->i_ino,
>           ((struct inode *) arg0)->i_size,
>           ((struct inode *) arg0)->i_version.counter,
>           ((struct inode *) arg0)->i_state,
>           ((struct fuse_attr*)arg1)->ino,
>           ((struct fuse_attr*)arg1)->size,
>           ((struct fuse_attr*)arg1)->blksize,
>           ((struct fuse_statx*)arg2)->ino,
>           ((struct fuse_statx*)arg2)->size,
>           ((struct fuse_statx*)arg2)->blksize,
>           ((struct fuse_inode *)((struct inode *) arg0)->i_private)->nodeid,
>           ((struct fuse_inode *)((struct inode *) arg0)->i_private)->orig_ino,
>           ((struct fuse_inode *)((struct inode *) arg0)->i_private)->attr_version,
>           ((struct fuse_inode *)((struct inode *) arg0)->i_private)->state,
>           arg3, arg4
>         );
> }

Nice analysis!


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue?
       [not found] <GV0P278MB07187F9B0E7B576AD0B362B485802@GV0P278MB0718.CHEP278.PROD.OUTLOOK.COM>
  2024-08-19 16:04 ` [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue? Bernd Schubert
@ 2024-08-22 12:41 ` Miklos Szeredi
  2024-09-25 12:20 ` Miklos Szeredi
  2 siblings, 0 replies; 12+ messages in thread
From: Miklos Szeredi @ 2024-08-22 12:41 UTC (permalink / raw)
  To: Laura Promberger; +Cc: fuse-devel@lists.sourceforge.net, linux-fsdevel

On Thu, 15 Aug 2024 at 16:45, Laura Promberger <laura.promberger@cern.ch> wrote:

>    08/15/24 11:35:36.086579 fuse_change_attributes:  func corrupt_symlink \
>                                                       inode: ino 270, size 22, version.counter 0, state 0 \
>                                                       fuse_attr: ino 270, size 16, blksize 4096, \
>                                                       fuse_statx: ino 0, size 0, blksize 0, \
>                                                       fuse_inode: ino 0, orig_ino 0, attr_version 0 state 0 \
>                                                       attr_valid 98, attr_version 0

Something's wrong with the debug output since attr_valid should be a
large number (current jiffies + timeout).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue?
       [not found] <GV0P278MB07187F9B0E7B576AD0B362B485802@GV0P278MB0718.CHEP278.PROD.OUTLOOK.COM>
  2024-08-19 16:04 ` [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue? Bernd Schubert
  2024-08-22 12:41 ` Miklos Szeredi
@ 2024-09-25 12:20 ` Miklos Szeredi
  2024-09-25 14:07   ` Bernd Schubert
  2025-02-19 19:54   ` Sam Lewis
  2 siblings, 2 replies; 12+ messages in thread
From: Miklos Szeredi @ 2024-09-25 12:20 UTC (permalink / raw)
  To: Laura Promberger
  Cc: fuse-devel@lists.sourceforge.net, linux-fsdevel, Bernd Schubert

[-- Attachment #1: Type: text/plain, Size: 1419 bytes --]

On Thu, 15 Aug 2024 at 16:45, Laura Promberger <laura.promberger@cern.ch> wrote:

> - But for corrupted symlinks `fuse_change_attributes()` exits before `fuse_change_attributes_common()` is called and as such the length stays the old one.

The reason is that the attr_version check fails.  The trace logs show
a zero attr_version value, which suggests that the check can not fail.
But we know that fuse_dentry_revalidate() supplies a non-zero
attr_version to fuse_change_attributes() and if there's a racing
fuse_reverse_inval_inode() which updates the fuse_inode's
attr_version, then it would result in fuse_change_attributes() exiting
before updating the cached attributes, which is what you observe.

This is probably okay, as the cached attributes remain invalid and the
next call to fuse_change_attributes() will likely update the inode
with the correct values.

The reason this causes problems is that cached symlinks will be
returned through page_get_link(), which truncates the symlink to
inode->i_size.  This is correct for filesystems that don't mutate
symlinks, but for cvmfs it causes problems.

My proposed solution would be to just remove this truncation.  This
can cause a regression in a filesystem that relies on supplying a
symlink larger than the file size, but this is unlikely.   If that
happens we'd need to make this behavior conditional.

Can you please try the  attached patch?

Thanks,
Miklos

[-- Attachment #2: fuse-fix-cached-symlink-size-limiting.patch --]
[-- Type: text/x-patch, Size: 2398 bytes --]

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 54104dd48af7..70fb57714f79 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1632,7 +1632,7 @@ static const char *fuse_get_link(struct dentry *dentry, struct inode *inode,
 		goto out_err;
 
 	if (fc->cache_symlinks)
-		return page_get_link(dentry, inode, callback);
+		return page_get_link_raw(dentry, inode, callback);
 
 	err = -ECHILD;
 	if (!dentry)
diff --git a/fs/namei.c b/fs/namei.c
index 4a4a22a08ac2..6795600c5738 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -5300,10 +5300,9 @@ const char *vfs_get_link(struct dentry *dentry, struct delayed_call *done)
 EXPORT_SYMBOL(vfs_get_link);
 
 /* get the link contents into pagecache */
-const char *page_get_link(struct dentry *dentry, struct inode *inode,
-			  struct delayed_call *callback)
+static char *__page_get_link(struct dentry *dentry, struct inode *inode,
+			     struct delayed_call *callback)
 {
-	char *kaddr;
 	struct page *page;
 	struct address_space *mapping = inode->i_mapping;
 
@@ -5322,8 +5321,23 @@ const char *page_get_link(struct dentry *dentry, struct inode *inode,
 	}
 	set_delayed_call(callback, page_put_link, page);
 	BUG_ON(mapping_gfp_mask(mapping) & __GFP_HIGHMEM);
-	kaddr = page_address(page);
-	nd_terminate_link(kaddr, inode->i_size, PAGE_SIZE - 1);
+	return page_address(page);
+}
+
+const char *page_get_link_raw(struct dentry *dentry, struct inode *inode,
+			      struct delayed_call *callback)
+{
+	return __page_get_link(dentry, inode, callback);
+}
+EXPORT_SYMBOL_GPL(page_get_link_raw);
+
+const char *page_get_link(struct dentry *dentry, struct inode *inode,
+					struct delayed_call *callback)
+{
+	char *kaddr = __page_get_link(dentry, inode, callback);
+
+	if (!IS_ERR(kaddr))
+		nd_terminate_link(kaddr, inode->i_size, PAGE_SIZE - 1);
 	return kaddr;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index eae5b67e4a15..fc90d1f6e8c7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3316,6 +3316,8 @@ extern const struct file_operations generic_ro_fops;
 
 extern int readlink_copy(char __user *, int, const char *);
 extern int page_readlink(struct dentry *, char __user *, int);
+extern const char *page_get_link_raw(struct dentry *, struct inode *,
+				     struct delayed_call *);
 extern const char *page_get_link(struct dentry *, struct inode *,
 				 struct delayed_call *);
 extern void page_put_link(void *);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue?
  2024-09-25 12:20 ` Miklos Szeredi
@ 2024-09-25 14:07   ` Bernd Schubert
  2024-09-25 14:25     ` Miklos Szeredi
  2025-02-19 19:54   ` Sam Lewis
  1 sibling, 1 reply; 12+ messages in thread
From: Bernd Schubert @ 2024-09-25 14:07 UTC (permalink / raw)
  To: Miklos Szeredi, Laura Promberger
  Cc: fuse-devel@lists.sourceforge.net, linux-fsdevel

Hi Miklos,

On 9/25/24 14:20, Miklos Szeredi wrote:
> On Thu, 15 Aug 2024 at 16:45, Laura Promberger <laura.promberger@cern.ch> wrote:
> 
>> - But for corrupted symlinks `fuse_change_attributes()` exits before `fuse_change_attributes_common()` is called and as such the length stays the old one.
> 
> The reason is that the attr_version check fails.  The trace logs show
> a zero attr_version value, which suggests that the check can not fail.
> But we know that fuse_dentry_revalidate() supplies a non-zero
> attr_version to fuse_change_attributes() and if there's a racing
> fuse_reverse_inval_inode() which updates the fuse_inode's
> attr_version, then it would result in fuse_change_attributes() exiting
> before updating the cached attributes, which is what you observe.


I'm a bit confused by this, especially due to "fuse_reverse_inval_inode()",
isn't this about FUSE_NOTIFY_INVAL_ENTRY and the additional flag
FUSE_EXPIRE_ONLY? I.e. the used code path is fuse_reverse_inval_entry()?
And that path doesn't change the attr_version? Which I'm also confused 
about.


> 
> This is probably okay, as the cached attributes remain invalid and the
> next call to fuse_change_attributes() will likely update the inode
> with the correct values.
> 
> The reason this causes problems is that cached symlinks will be
> returned through page_get_link(), which truncates the symlink to
> inode->i_size.  This is correct for filesystems that don't mutate
> symlinks, but for cvmfs it causes problems.
> 
> My proposed solution would be to just remove this truncation.  This
> can cause a regression in a filesystem that relies on supplying a
> symlink larger than the file size, but this is unlikely.   If that
> happens we'd need to make this behavior conditional.

I wonder if we can just repeat operations if we detect changes in the
middle. Hard started to work on a patch, but got distracted and I 
first would like to create a passthrough reproducer.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue?
  2024-09-25 14:07   ` Bernd Schubert
@ 2024-09-25 14:25     ` Miklos Szeredi
  2024-09-25 21:42       ` Bernd Schubert
  0 siblings, 1 reply; 12+ messages in thread
From: Miklos Szeredi @ 2024-09-25 14:25 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Laura Promberger, fuse-devel@lists.sourceforge.net, linux-fsdevel

On Wed, 25 Sept 2024 at 16:07, Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> Hi Miklos,
>
> On 9/25/24 14:20, Miklos Szeredi wrote:
> > On Thu, 15 Aug 2024 at 16:45, Laura Promberger <laura.promberger@cern.ch> wrote:
> >
> >> - But for corrupted symlinks `fuse_change_attributes()` exits before `fuse_change_attributes_common()` is called and as such the length stays the old one.
> >
> > The reason is that the attr_version check fails.  The trace logs show
> > a zero attr_version value, which suggests that the check can not fail.
> > But we know that fuse_dentry_revalidate() supplies a non-zero
> > attr_version to fuse_change_attributes() and if there's a racing
> > fuse_reverse_inval_inode() which updates the fuse_inode's
> > attr_version, then it would result in fuse_change_attributes() exiting
> > before updating the cached attributes, which is what you observe.
>
>
> I'm a bit confused by this, especially due to "fuse_reverse_inval_inode()",
> isn't this about FUSE_NOTIFY_INVAL_ENTRY and the additional flag
> FUSE_EXPIRE_ONLY? I.e. the used code path is fuse_reverse_inval_entry()?
> And that path doesn't change the attr_version? Which I'm also confused
> about.

The trace does have several fuse_reverse_inval_inode() calls, which
made me conclude that this was the cause.

> > This is probably okay, as the cached attributes remain invalid and the
> > next call to fuse_change_attributes() will likely update the inode
> > with the correct values.
> >
> > The reason this causes problems is that cached symlinks will be
> > returned through page_get_link(), which truncates the symlink to
> > inode->i_size.  This is correct for filesystems that don't mutate
> > symlinks, but for cvmfs it causes problems.
> >
> > My proposed solution would be to just remove this truncation.  This
> > can cause a regression in a filesystem that relies on supplying a
> > symlink larger than the file size, but this is unlikely.   If that
> > happens we'd need to make this behavior conditional.
>
> I wonder if we can just repeat operations if we detect changes in the
> middle. Hard started to work on a patch, but got distracted and I
> first would like to create a passthrough reproducer.

I think in this case it's much cleaner to just ignore the file size.
Old, non-cached readlink code never did anything with i_size, why
should the cached one care about it?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue?
  2024-09-25 14:25     ` Miklos Szeredi
@ 2024-09-25 21:42       ` Bernd Schubert
  0 siblings, 0 replies; 12+ messages in thread
From: Bernd Schubert @ 2024-09-25 21:42 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Laura Promberger, fuse-devel@lists.sourceforge.net, linux-fsdevel



On 9/25/24 16:25, Miklos Szeredi wrote:
> On Wed, 25 Sept 2024 at 16:07, Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>> Hi Miklos,
>>
>> On 9/25/24 14:20, Miklos Szeredi wrote:
>>> On Thu, 15 Aug 2024 at 16:45, Laura Promberger <laura.promberger@cern.ch> wrote:
>>>
>>>> - But for corrupted symlinks `fuse_change_attributes()` exits before `fuse_change_attributes_common()` is called and as such the length stays the old one.
>>>
>>> The reason is that the attr_version check fails.  The trace logs show
>>> a zero attr_version value, which suggests that the check can not fail.
>>> But we know that fuse_dentry_revalidate() supplies a non-zero
>>> attr_version to fuse_change_attributes() and if there's a racing
>>> fuse_reverse_inval_inode() which updates the fuse_inode's
>>> attr_version, then it would result in fuse_change_attributes() exiting
>>> before updating the cached attributes, which is what you observe.
>>
>>
>> I'm a bit confused by this, especially due to "fuse_reverse_inval_inode()",
>> isn't this about FUSE_NOTIFY_INVAL_ENTRY and the additional flag
>> FUSE_EXPIRE_ONLY? I.e. the used code path is fuse_reverse_inval_entry()?
>> And that path doesn't change the attr_version? Which I'm also confused
>> about.
> 
> The trace does have several fuse_reverse_inval_inode() calls, which
> made me conclude that this was the cause.

Yeah, you are right, I checked cvmfs and it uses both.

> 
>>> This is probably okay, as the cached attributes remain invalid and the
>>> next call to fuse_change_attributes() will likely update the inode
>>> with the correct values.
>>>
>>> The reason this causes problems is that cached symlinks will be
>>> returned through page_get_link(), which truncates the symlink to
>>> inode->i_size.  This is correct for filesystems that don't mutate
>>> symlinks, but for cvmfs it causes problems.
>>>
>>> My proposed solution would be to just remove this truncation.  This
>>> can cause a regression in a filesystem that relies on supplying a
>>> symlink larger than the file size, but this is unlikely.   If that
>>> happens we'd need to make this behavior conditional.
>>
>> I wonder if we can just repeat operations if we detect changes in the
>> middle. Hard started to work on a patch, but got distracted and I
>> first would like to create a passthrough reproducer.
> 
> I think in this case it's much cleaner to just ignore the file size.
> Old, non-cached readlink code never did anything with i_size, why
> should the cached one care about it?

Yeah, I see your point. (Probably just my too long out-of-tree habit
to avoid vfs changes whenever possible).

Thanks,
Bernd


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue?
  2024-09-25 12:20 ` Miklos Szeredi
  2024-09-25 14:07   ` Bernd Schubert
@ 2025-02-19 19:54   ` Sam Lewis
  2025-02-19 20:22     ` Bernd Schubert
  2025-02-20  9:37     ` Luis Henriques
  1 sibling, 2 replies; 12+ messages in thread
From: Sam Lewis @ 2025-02-19 19:54 UTC (permalink / raw)
  To: miklos; +Cc: bernd.schubert, fuse-devel, laura.promberger, linux-fsdevel

Hi Miklos.

I work at Google on the Android team, and we have a build system that would benefit greatly from the kernel symlink cache. In my testing, I can easily reproduce the truncation using the steps outlined by Laura. I tested your patch and have confirmed it fixes the bug.

What steps need to be taken to merge your fix? Can I help in any way?

Thanks,
Sam Lewis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue?
  2025-02-19 19:54   ` Sam Lewis
@ 2025-02-19 20:22     ` Bernd Schubert
  2025-02-20  9:57       ` Miklos Szeredi
  2025-02-20  9:37     ` Luis Henriques
  1 sibling, 1 reply; 12+ messages in thread
From: Bernd Schubert @ 2025-02-19 20:22 UTC (permalink / raw)
  To: Sam Lewis, miklos; +Cc: fuse-devel, laura.promberger, linux-fsdevel

Hi Sam,

On 2/19/25 20:54, Sam Lewis wrote:
> Hi Miklos.
> 
> I work at Google on the Android team, and we have a build system that would benefit greatly from the kernel symlink cache. In my testing, I can easily reproduce the truncation using the steps outlined by Laura. I tested your patch and have confirmed it fixes the bug.
> 
> What steps need to be taken to merge your fix? Can I help in any way?

I think we should write tests for all of these fuse specific operations,
ideally probably as part of xfstests.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue?
  2025-02-19 19:54   ` Sam Lewis
  2025-02-19 20:22     ` Bernd Schubert
@ 2025-02-20  9:37     ` Luis Henriques
  1 sibling, 0 replies; 12+ messages in thread
From: Luis Henriques @ 2025-02-20  9:37 UTC (permalink / raw)
  To: Sam Lewis via fuse-devel; +Cc: miklos, Sam Lewis, bernd.schubert, linux-fsdevel

On Wed, Feb 19 2025, Sam Lewis via fuse-devel wrote:

> Hi Miklos.
>
> I work at Google on the Android team, and we have a build system that would
> benefit greatly from the kernel symlink cache. In my testing, I can easily
> reproduce the truncation using the steps outlined by Laura. I tested your patch
> and have confirmed it fixes the bug.

Oh, wow!  I've tried to reproduce the issue myself and I've been failing
miserably.  Are you using CVMFS as well, or can you reproduce it with some
synthetic fuse server?  I've tried both approaches but no luck so far.

Cheers,
-- 
Luís

> What steps need to be taken to merge your fix? Can I help in any way?
>
> Thanks,
> Sam Lewis
>
>
> -- 
> fuse-devel mailing list
> To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue?
  2025-02-19 20:22     ` Bernd Schubert
@ 2025-02-20  9:57       ` Miklos Szeredi
  2025-02-20 10:00         ` Bernd Schubert
  0 siblings, 1 reply; 12+ messages in thread
From: Miklos Szeredi @ 2025-02-20  9:57 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Sam Lewis, fuse-devel, laura.promberger, linux-fsdevel

On Wed, 19 Feb 2025 at 21:22, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:

> I think we should write tests for all of these fuse specific operations,
> ideally probably as part of xfstests.

That's a good idea, but for now the above Tested-by should be
sufficient.   I'll post a proper patch.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue?
  2025-02-20  9:57       ` Miklos Szeredi
@ 2025-02-20 10:00         ` Bernd Schubert
  2025-02-20 15:50           ` Sam Lewis
  0 siblings, 1 reply; 12+ messages in thread
From: Bernd Schubert @ 2025-02-20 10:00 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Sam Lewis, fuse-devel, laura.promberger, linux-fsdevel



On 2/20/25 10:57, Miklos Szeredi wrote:
> On Wed, 19 Feb 2025 at 21:22, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
> 
>> I think we should write tests for all of these fuse specific operations,
>> ideally probably as part of xfstests.
> 
> That's a good idea, but for now the above Tested-by should be
> sufficient.   I'll post a proper patch.

I have it on my TODO list. We will need tests for all notify operations.
Having it in libfuse is not sufficient, as that less likely to be run
by kernel devs.
If we are very luckily, I can make someone else from DDN to work on
these tests, but not promised.

@Sam could you please describe your reproducer?

Thanks,
Bernd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue?
  2025-02-20 10:00         ` Bernd Schubert
@ 2025-02-20 15:50           ` Sam Lewis
  0 siblings, 0 replies; 12+ messages in thread
From: Sam Lewis @ 2025-02-20 15:50 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Miklos Szeredi, fuse-devel, laura.promberger, linux-fsdevel

On Thu, Feb 20, 2025 at 5:00 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
> @Sam could you please describe your reproducer?

Absolutely. We have an internal networked filesystem that implements
the FUSE interface – not CVMFS. So stat, readlink, etc end up as RPCs
to another backend.

We need to avoid stale readlink calls, so we clear the kernel symlink
cache whenever we receive a new snapshot from the network, and this is
where the race condition comes in.

I reproduced the bug by interacting with the same filesystem location
on two different machines. On the first machine, we have a C for loop
that calls readlink and prints the destination whenever it changes[1].
On the second machine, I manually switched the symlink back and forth
between two destinations of different lengths using `ln -sf`.

When the kernel cache was enabled, changing the link destination from
"dest" to "longerdest" would result in the first machine printing
"long". It happened very consistently, usually immediately or with 1
or 2 tries. Here are the things that fixed the bug:
- Disabling the kernel cache
- Applying Miklos' patch to a custom kernel
- Uncommenting the 1 second sleep in [1] to make the race condition
very unlikely

I hope that helps!

[1] basically the script seen here:
https://github.com/cvmfs/cvmfs/issues/3626#issue-2390818866

Sam

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-02-20 15:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <GV0P278MB07187F9B0E7B576AD0B362B485802@GV0P278MB0718.CHEP278.PROD.OUTLOOK.COM>
2024-08-19 16:04 ` [fuse-devel] Symlink caching: Updating the target can result in corrupted symlinks - kernel issue? Bernd Schubert
2024-08-22 12:41 ` Miklos Szeredi
2024-09-25 12:20 ` Miklos Szeredi
2024-09-25 14:07   ` Bernd Schubert
2024-09-25 14:25     ` Miklos Szeredi
2024-09-25 21:42       ` Bernd Schubert
2025-02-19 19:54   ` Sam Lewis
2025-02-19 20:22     ` Bernd Schubert
2025-02-20  9:57       ` Miklos Szeredi
2025-02-20 10:00         ` Bernd Schubert
2025-02-20 15:50           ` Sam Lewis
2025-02-20  9:37     ` Luis Henriques

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.