* Re: [fuse-devel] Cross-host entry caching and file open/create [not found] <DM6PR12MB33857B8B2E49DF0DD0C4F950DD5D0@DM6PR12MB3385.namprd12.prod.outlook.com> @ 2020-08-21 15:49 ` Miklos Szeredi 2020-08-26 20:28 ` Ken Schalk 0 siblings, 1 reply; 7+ messages in thread From: Miklos Szeredi @ 2020-08-21 15:49 UTC (permalink / raw) To: Ken Schalk; +Cc: fuse-devel@lists.sourceforge.net, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 751 bytes --] On Thu, Aug 20, 2020 at 12:24 AM Ken Schalk <kschalk@nvidia.com> wrote: > > If the open flags include O_EXCL, then we're seeing a failure with > EEXIST without any call to our FUSE filesystem's create operation (or > any other FUSE operations). The kernel makes this failure decision > based on its cached information about the previously accessed (now > deleted) file. If the open flags do not include O_EXCL, then we're > seeing a failure with ENOENT from our open operation (because the file > does not actually exist anymore), with no call to our create operation > (because the kernel believed that the file existed, causing it to make > a FUSE open request rather than a FUSE create request). Does the attached patch fix it? Thanks, Miklos [-- Attachment #2: fuse-reval-exclusive-create.patch --] [-- Type: text/x-patch, Size: 508 bytes --] diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 26f028bc760b..ec5552ff9826 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -204,7 +204,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) if (inode && is_bad_inode(inode)) goto invalid; else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) || - (flags & LOOKUP_REVAL)) { + (flags & (LOOKUP_EXCL | LOOKUP_REVAL))) { struct fuse_entry_out outarg; FUSE_ARGS(args); struct fuse_forget_link *forget; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* RE: [fuse-devel] Cross-host entry caching and file open/create 2020-08-21 15:49 ` [fuse-devel] Cross-host entry caching and file open/create Miklos Szeredi @ 2020-08-26 20:28 ` Ken Schalk 2020-08-28 21:01 ` Ken Schalk 0 siblings, 1 reply; 7+ messages in thread From: Ken Schalk @ 2020-08-26 20:28 UTC (permalink / raw) To: Miklos Szeredi Cc: fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1430 bytes --] On Aug 21, 2020 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Thu, Aug 20, 2020 at 12:24 AM Ken Schalk <kschalk@nvidia.com> wrote: > > If the open flags include O_EXCL, then we're seeing a failure with > > EEXIST without any call to our FUSE filesystem's create operation (or > > any other FUSE operations). The kernel makes this failure decision > > based on its cached information about the previously accessed (now > > deleted) file. If the open flags do not include O_EXCL, then we're > > seeing a failure with ENOENT from our open operation (because the file > > does not actually exist anymore), with no call to our create operation > > (because the kernel believed that the file existed, causing it to make > > a FUSE open request rather than a FUSE create request). > Does the attached patch fix it? Thanks very much for your help. The patch you provided does solve the problem in the O_CREAT|O_EXCL case (by making a lookup call to re-validate the entry of the since deleted file), but not in the O_CREAT case. (In that case the kernel still winds up making a FUSE open request rather than a FUSE create request.) I'd like to suggest the slightly different attached patch instead, which triggers re-validation in both cases. I wonder if maybe re-validation should be triggered on LOOKUP_OPEN as well, but I suspect that might have more implications I haven't discovered. --Ken Schalk [-- Attachment #2: fuse-reval-create.patch --] [-- Type: application/octet-stream, Size: 513 bytes --] diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 26f028b..1565e19 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -204,7 +204,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) if (inode && is_bad_inode(inode)) goto invalid; else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) || - (flags & LOOKUP_REVAL)) { + (flags & (LOOKUP_CREATE | LOOKUP_REVAL))) { struct fuse_entry_out outarg; FUSE_ARGS(args); struct fuse_forget_link *forget; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* RE: [fuse-devel] Cross-host entry caching and file open/create 2020-08-26 20:28 ` Ken Schalk @ 2020-08-28 21:01 ` Ken Schalk 2020-10-01 18:25 ` Miklos Szeredi 0 siblings, 1 reply; 7+ messages in thread From: Ken Schalk @ 2020-08-28 21:01 UTC (permalink / raw) To: Miklos Szeredi Cc: fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 2251 bytes --] On Aug 26, 2020 Ken Schalk <kschalk@nvidia.com> wrote: > On Aug 21, 2020 Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Thu, Aug 20, 2020 at 12:24 AM Ken Schalk <kschalk@nvidia.com> wrote: > > > If the open flags include O_EXCL, then we're seeing a failure > > > with EEXIST without any call to our FUSE filesystem's create > > > operation (or any other FUSE operations). The kernel makes this > > > failure decision based on its cached information about the > > > previously accessed (now deleted) file. If the open flags do > > > not include O_EXCL, then we're seeing a failure with ENOENT from > > > our open operation (because the file does not actually exist > > > anymore), with no call to our create operation (because the > > > kernel believed that the file existed, causing it to make a FUSE > > > open request rather than a FUSE create request). > > Does the attached patch fix it? > Thanks very much for your help. The patch you provided does solve > the problem in the O_CREAT|O_EXCL case (by making a lookup call to > re-validate the entry of the since deleted file), but not in the > O_CREAT case. (In that case the kernel still winds up making a FUSE > open request rather than a FUSE create request.) I'd like to > suggest the slightly different attached patch instead, which > triggers re-validation in both cases. I'm going to make one additional suggestion here, included in the attached patch: always re-validate a negative entry for file open. This is sort of the dual of the problem of a file recently unlinked on a remote host, as it involves a file recently created on a remote host. If the kernel has cached a negative dentry, it will fail an open system call without any FUSE requests up to the user-space end. This patch gives the user-space side the opportunity to report that a file that previously did not exist has recently been created. (In distributed build automation workloads, opening a file recently created through a mount on a peer host is pretty common.) I would say that this brings the open system call behavior under FUSE into line with what we observe on NFS for cases where a file has been recently created or unlinked on a remote host. --Ken Schalk [-- Attachment #2: fuse-reval-create-or-open-negative.patch --] [-- Type: application/octet-stream, Size: 567 bytes --] diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 26f028b..c4d768c 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -204,7 +204,8 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags) if (inode && is_bad_inode(inode)) goto invalid; else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) || - (flags & LOOKUP_REVAL)) { + (!inode && (flags & LOOKUP_OPEN)) || + (flags & (LOOKUP_CREATE | LOOKUP_REVAL))) { struct fuse_entry_out outarg; FUSE_ARGS(args); struct fuse_forget_link *forget; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [fuse-devel] Cross-host entry caching and file open/create 2020-08-28 21:01 ` Ken Schalk @ 2020-10-01 18:25 ` Miklos Szeredi 2020-11-23 21:07 ` Ken Schalk 0 siblings, 1 reply; 7+ messages in thread From: Miklos Szeredi @ 2020-10-01 18:25 UTC (permalink / raw) To: Ken Schalk Cc: fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org On Fri, Aug 28, 2020 at 11:01 PM Ken Schalk <kschalk@nvidia.com> wrote: > > On Aug 26, 2020 Ken Schalk <kschalk@nvidia.com> wrote: > > On Aug 21, 2020 Miklos Szeredi <miklos@szeredi.hu> wrote: > > > On Thu, Aug 20, 2020 at 12:24 AM Ken Schalk <kschalk@nvidia.com> wrote: > > > > If the open flags include O_EXCL, then we're seeing a failure > > > > with EEXIST without any call to our FUSE filesystem's create > > > > operation (or any other FUSE operations). The kernel makes this > > > > failure decision based on its cached information about the > > > > previously accessed (now deleted) file. If the open flags do > > > > not include O_EXCL, then we're seeing a failure with ENOENT from > > > > our open operation (because the file does not actually exist > > > > anymore), with no call to our create operation (because the > > > > kernel believed that the file existed, causing it to make a FUSE > > > > open request rather than a FUSE create request). > > > > Does the attached patch fix it? > > > Thanks very much for your help. The patch you provided does solve > > the problem in the O_CREAT|O_EXCL case (by making a lookup call to > > re-validate the entry of the since deleted file), but not in the > > O_CREAT case. (In that case the kernel still winds up making a FUSE > > open request rather than a FUSE create request.) I'd like to > > suggest the slightly different attached patch instead, which > > triggers re-validation in both cases. Which is a problem, because that makes O_CREAT on existing files (a fairly common case) add a new synchronous request, possibly resulting in a performance regression. I don't see an easy way this can be fixed, and I'm not sure this needs to be fixed. Are you seeing a real issue with just O_CREAT? Thanks, Miklos ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [fuse-devel] Cross-host entry caching and file open/create 2020-10-01 18:25 ` Miklos Szeredi @ 2020-11-23 21:07 ` Ken Schalk 2020-11-24 8:16 ` Miklos Szeredi 0 siblings, 1 reply; 7+ messages in thread From: Ken Schalk @ 2020-11-23 21:07 UTC (permalink / raw) To: Miklos Szeredi Cc: fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org On Thu, Oct 1, 2020 Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, Aug 28, 2020 at 11:01 PM Ken Schalk <kschalk@nvidia.com> wrote: > > > Thanks very much for your help. The patch you provided does solve > > > the problem in the O_CREAT|O_EXCL case (by making a lookup call to > > > re-validate the entry of the since deleted file), but not in the > > > O_CREAT case. (In that case the kernel still winds up making a FUSE > > > open request rather than a FUSE create request.) I'd like to > > > suggest the slightly different attached patch instead, which > > > triggers re-validation in both cases. > Which is a problem, because that makes O_CREAT on existing files (a > fairly common case) add a new synchronous request, possibly > resulting in a performance regression. > I don't see an easy way this can be fixed, and I'm not sure this > needs to be fixed. > Are you seeing a real issue with just O_CREAT? Yes, we definitely do see issues with just O_CREAT. The specific sequence that involves O_CREAT without O_EXCL is: 1. A file exists and is accessed through our FUSE distributed filesystem on host X. The kernel on host X caches the directory entry for the file. 2. The file is unlinked through our FUSE distributed filesystem on host Y. 3. An open(2) call with O_CREAT for the file occurs on host X. Because the kernel has a cached dentry for the now deleted file, it makes a FUSE open request to our filesystem (rather than a FUSE create request). 4. Our filesystem's open handler finds that the file does not exist (because it was unlinked in step 2) and replies to the open request with ENOENT. (The FUSE open handler cannot tell that O_CREAT was specified in the flags of the syscall as that bit is not passed through in the flags in the FUSE open request, so it can't automatically handle this case by creating the file.) 5. The kernel passes the ENOENT error code on as the result of the open(2) system call, so the file is not created and not opened. To me this seems clearly incorrect in terms of observable behavior. The file does not exist at the point of the open(2) syscall with O_CREAT in step 3 (although the kernel on host X has not become aware of its deletion). The file should be created and opened. An open(2) syscall with O_CREAT shouldn't fail with ENOENT because the file does not exist (which is what happens in this situation currently). I don't see how to avoid this without some kernel-level change with acceptable performance. (We could make the unlink on host Y synchronously perform an entry invalidation across all other hosts where our FUSE daemon is running, but that would be a huge performance problem and a significant addition of complexity in our FUSE daemon.) I believe that there are at least two other ways to resolve this without adding the synchronous lookup request on every open syscall with O_CREAT: - Preserve the O_CREAT bit in the flags passed through in the FUSE open request. (I believe the place where this bit is maked out is in fuse_send_open in fs/fuse/file.c.) That would allow the FUSE open handler to know that file creation was requested and to perform the creation in this case. (I'll mention that this would be similar to a behavior we've implemented in our FUSE create handler to open an existing file when O_EXCL is not in the flags, which allows handling cases where a file was recently created on a remote host.) - If a FUSE open call fails with ENOENT when O_CREAT is used, have the kernel drop the cached dentry and then make a FUSE create request. Thanks. --Ken Schalk ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [fuse-devel] Cross-host entry caching and file open/create 2020-11-23 21:07 ` Ken Schalk @ 2020-11-24 8:16 ` Miklos Szeredi 2020-12-18 4:39 ` Ken Schalk 0 siblings, 1 reply; 7+ messages in thread From: Miklos Szeredi @ 2020-11-24 8:16 UTC (permalink / raw) To: Ken Schalk Cc: fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org On Mon, Nov 23, 2020 at 10:07 PM Ken Schalk <kschalk@nvidia.com> wrote: > > On Thu, Oct 1, 2020 Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Fri, Aug 28, 2020 at 11:01 PM Ken Schalk <kschalk@nvidia.com> wrote: > > > > Thanks very much for your help. The patch you provided does solve > > > > the problem in the O_CREAT|O_EXCL case (by making a lookup call to > > > > re-validate the entry of the since deleted file), but not in the > > > > O_CREAT case. (In that case the kernel still winds up making a FUSE > > > > open request rather than a FUSE create request.) I'd like to > > > > suggest the slightly different attached patch instead, which > > > > triggers re-validation in both cases. > > > Which is a problem, because that makes O_CREAT on existing files (a > > fairly common case) add a new synchronous request, possibly > > resulting in a performance regression. > > > I don't see an easy way this can be fixed, and I'm not sure this > > needs to be fixed. > > > Are you seeing a real issue with just O_CREAT? > > Yes, we definitely do see issues with just O_CREAT. The specific > sequence that involves O_CREAT without O_EXCL is: > > 1. A file exists and is accessed through our FUSE distributed > filesystem on host X. The kernel on host X caches the directory > entry for the file. > > 2. The file is unlinked through our FUSE distributed filesystem on > host Y. > > 3. An open(2) call with O_CREAT for the file occurs on host X. > Because the kernel has a cached dentry for the now deleted file, it > makes a FUSE open request to our filesystem (rather than a FUSE > create request). > > 4. Our filesystem's open handler finds that the file does not exist > (because it was unlinked in step 2) and replies to the open request > with ENOENT. ESTALE is the correct error value here, not ENOENT. I agree this is subtle and not well documented, but it's quite logical: the cache contains stale lookup information and hence open finds the file non-existent. In no case shall the OPEN request return ENOENT, that's up to the LOOKUP request. Exclusive create is a different matter. It must not open an existing file, and so it must never send an OPEN request. Thanks, Miklos ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [fuse-devel] Cross-host entry caching and file open/create 2020-11-24 8:16 ` Miklos Szeredi @ 2020-12-18 4:39 ` Ken Schalk 0 siblings, 0 replies; 7+ messages in thread From: Ken Schalk @ 2020-12-18 4:39 UTC (permalink / raw) To: Miklos Szeredi Cc: fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org On Tue, Nov 24, 2020 Miklos Szeredi <miklos@szeredi.hu> wrote: > ESTALE is the correct error value here, not ENOENT. I agree this is > subtle and not well documented, but it's quite logical: the cache > contains stale lookup information and hence open finds the file > non-existent. Thank you for explaining the use of ESTALE here. It definitely does resolve my O_CREAT without O_EXCL issue in a better way. I would agree that this is not well documented, as other than the kernel source itself the only reference I can find to this is in Documentation/filesystems/path-lookup.rst (which I admit I had not read prior to this). Do you have any recommendation on the portion of my patch relating to negative directory entries? This is another area where currently we don't seem to be able to get precisely the same behavior under FUSE as under NFS. Specifically, a file open under NFS when there is a negative directory entry in the kernel's cache always seems to result in a re-validating GETATTR RPC to the NFS server (which, if the file has been recently created, allows the client to become aware of its existence). Is there a better way we can achieve this with FUSE than the change in the other portion of my patch (forcing a lookup call on any open when there is a negative directory entry for the target of the open)? To again be specific, the sequence of events that I'm trying to get to work differently is: 1. A lookup occurs on host X for name which does not exist, and our FUSE daemon's response causes the kernel to cache a negative directory entry. (I know that we can disable kernel-level caching of negative lookup results, but we would rather not do so for performance.) 2. The name is created as a file through our through our FUSE distributed filesystem on host Y. 3. An open(2) call for the recently created file occurs on host X. Because the kernel has a cached negative dentry for the file (which now exists), the open(2) call fails with ENOENT without any FUSE requests. This comes up in the context of a distributed build system where the work of building an intermediate file may be dispatched to a remote host. Under NFS, the open(2) consults the NFS server and discovers that the name now exists. --Ken Schalk ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-12-18 4:40 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <DM6PR12MB33857B8B2E49DF0DD0C4F950DD5D0@DM6PR12MB3385.namprd12.prod.outlook.com> 2020-08-21 15:49 ` [fuse-devel] Cross-host entry caching and file open/create Miklos Szeredi 2020-08-26 20:28 ` Ken Schalk 2020-08-28 21:01 ` Ken Schalk 2020-10-01 18:25 ` Miklos Szeredi 2020-11-23 21:07 ` Ken Schalk 2020-11-24 8:16 ` Miklos Szeredi 2020-12-18 4:39 ` Ken Schalk
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).