linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
@ 2023-04-11 12:40 Amir Goldstein
  2023-04-12 18:43 ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2023-04-11 12:40 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, Christian Brauner, linux-fsdevel, linux-api

If kernel supports FAN_REPORT_ANY_FID, use this flag to allow testing
also filesystems that do not support fsid or NFS file handles (e.g. fuse).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Jan,

I wanted to run an idea by you.

My motivation is to close functional gaps between fanotify and inotify.

One of the largest gaps right now is that FAN_REPORT_FID is limited
to a subset of local filesystems.

The idea is to report fid's that are "good enough" and that there
is no need to require that fid's can be used by open_by_handle_at()
because that is a non-requirement for most use cases, unpriv listener
in particular.

I chose a rather generic name for the flag to opt-in for "good enough"
fid's.  At first, I was going to make those fid's self describing the
fact that they are not NFS file handles, but in the name of simplicity
to the API, I decided that this is not needed.

The patch below is from the LTP test [1] that verifies reported fid's.
I am posting it because I think that the function fanotify_get_fid()
demonstrates well, how a would-be fanotify library could be used to get
a canonical fid.

That would-be routine can easily return the source of the fid values
for a given filesystem and that information is constant for all objects
on a given filesystem instance.

The choise to encode an actual file_handle of type FILEID_INO64 may
seem controversial at first, but it simplifies things so much, that I
grew very fond of it.

The LTP patch also demonstrated how terribly trivial it would be to
adapt any existing fanotify programs to support any fs.

Kernel patches [2] are pretty simple IMO and
man page patch [3] demonstrates that the API changes are minimal.

Thoughts?

Amir.

P.S.: Apropos closing gaps to inotify, I have WIP patches to add
      FAN_UNMOUNT and possibly FAN_IGNORED events.

[1] https://github.com/amir73il/ltp/commits/fan_report_any_fid
[2] https://github.com/amir73il/linux/commits/fan_report_any_fid
[3] https://github.com/amir73il/man-pages/commits/fan_report_any_fid

 include/lapi/fanotify.h                       |  3 ++
 testcases/kernel/syscalls/fanotify/fanotify.h | 32 +++++++++++++++----
 .../kernel/syscalls/fanotify/fanotify13.c     | 16 +++++++---
 3 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/include/lapi/fanotify.h b/include/lapi/fanotify.h
index 4bd1a113c..511b35bbd 100644
--- a/include/lapi/fanotify.h
+++ b/include/lapi/fanotify.h
@@ -32,6 +32,9 @@
 #define FAN_REPORT_DFID_NAME_TARGET (FAN_REPORT_DFID_NAME | \
 				     FAN_REPORT_FID | FAN_REPORT_TARGET_FID)
 #endif
+#ifndef FAN_REPORT_ANY_FID
+#define FAN_REPORT_ANY_FID	0x00002000
+#endif
 
 /* Non-uapi convenience macros */
 #ifndef FAN_REPORT_DFID_NAME_FID
diff --git a/testcases/kernel/syscalls/fanotify/fanotify.h b/testcases/kernel/syscalls/fanotify/fanotify.h
index 51078103e..b02295138 100644
--- a/testcases/kernel/syscalls/fanotify/fanotify.h
+++ b/testcases/kernel/syscalls/fanotify/fanotify.h
@@ -72,6 +72,10 @@ static inline int safe_fanotify_mark(const char *file, const int lineno,
 #define MAX_HANDLE_SZ		128
 #endif
 
+#ifndef FILEID_INO64
+#define FILEID_INO64		0x80
+#endif
+
 /*
  * Helper function used to obtain fsid and file_handle for a given path.
  * Used by test files correlated to FAN_REPORT_FID functionality.
@@ -80,21 +84,37 @@ static inline void fanotify_get_fid(const char *path, __kernel_fsid_t *fsid,
 				    struct file_handle *handle)
 {
 	int mount_id;
+	struct statx stx;
 	struct statfs stats;
 
+	if (statx(AT_FDCWD, path, 0, 0, &stx) == -1)
+		tst_brk(TBROK | TERRNO,
+			"statx(%s, ...) failed", path);
 	if (statfs(path, &stats) == -1)
 		tst_brk(TBROK | TERRNO,
 			"statfs(%s, ...) failed", path);
 	memcpy(fsid, &stats.f_fsid, sizeof(stats.f_fsid));
 
+	if (!fsid->val[0] && !fsid->val[1]) {
+		/* Fallback to fsid encoded from stx_dev */
+		fsid->val[0] = stx.stx_dev_major;
+		fsid->val[1] = stx.stx_dev_minor;
+	}
+
 	if (name_to_handle_at(AT_FDCWD, path, handle, &mount_id, 0) == -1) {
-		if (errno == EOPNOTSUPP) {
-			tst_brk(TCONF,
-				"filesystem %s does not support file handles",
-				tst_device->fs_type);
+		if (errno != EOPNOTSUPP) {
+			tst_brk(TBROK | TERRNO,
+				"name_to_handle_at(AT_FDCWD, %s, ...) failed", path);
 		}
-		tst_brk(TBROK | TERRNO,
-			"name_to_handle_at(AT_FDCWD, %s, ...) failed", path);
+
+		tst_res(TINFO,
+			"filesystem %s does not support file handles - using ino as fid",
+			tst_device->fs_type);
+
+		/* Fallback to fid encoded from stx_ino */
+		handle->handle_type = FILEID_INO64;
+		handle->handle_bytes = sizeof(stx.stx_ino);
+		memcpy(handle->f_handle, (void *)&stx.stx_ino, sizeof(stx.stx_ino));
 	}
 }
 
diff --git a/testcases/kernel/syscalls/fanotify/fanotify13.c b/testcases/kernel/syscalls/fanotify/fanotify13.c
index c3daaf3a2..e50ad0f75 100644
--- a/testcases/kernel/syscalls/fanotify/fanotify13.c
+++ b/testcases/kernel/syscalls/fanotify/fanotify13.c
@@ -90,6 +90,7 @@ static struct test_case_t {
 
 static int nofid_fd;
 static int fanotify_fd;
+static int fanotify_init_flags;
 static int filesystem_mark_unsupported;
 static char events_buf[BUF_SIZE];
 static struct event_t event_set[EVENT_MAX];
@@ -143,15 +144,16 @@ static void do_test(unsigned int number)
 	struct fanotify_mark_type *mark = &tc->mark;
 
 	tst_res(TINFO,
-		"Test #%d: FAN_REPORT_FID with mark flag: %s",
-		number, mark->name);
+		"Test #%d: %s with mark flag: %s", number,
+		(fanotify_init_flags & FAN_REPORT_ANY_FID) ?
+			"FAN_REPORT_ANY_FID" : "FAN_REPORT_FID", mark->name);
 
 	if (filesystem_mark_unsupported && mark->flag & FAN_MARK_FILESYSTEM) {
 		tst_res(TCONF, "FAN_MARK_FILESYSTEM not supported in kernel?");
 		return;
 	}
 
-	fanotify_fd = SAFE_FANOTIFY_INIT(FAN_CLASS_NOTIF | FAN_REPORT_FID, O_RDONLY);
+	fanotify_fd = SAFE_FANOTIFY_INIT(FAN_CLASS_NOTIF | fanotify_init_flags, O_RDONLY);
 
 	/*
 	 * Place marks on a set of objects and setup the expected masks
@@ -261,7 +263,13 @@ out:
 
 static void do_setup(void)
 {
-	REQUIRE_FANOTIFY_INIT_FLAGS_SUPPORTED_ON_FS(FAN_REPORT_FID, MOUNT_PATH);
+	/* Try FAN_REPORT_ANY_FID */
+	fanotify_init_flags = FAN_REPORT_FID | FAN_REPORT_ANY_FID;
+	if (fanotify_init_flags_supported_on_fs(fanotify_init_flags, MOUNT_PATH)) {
+		/* Fallback to FAN_REPORT_FID */
+		fanotify_init_flags = FAN_REPORT_FID;
+		REQUIRE_FANOTIFY_INIT_FLAGS_SUPPORTED_ON_FS(fanotify_init_flags, MOUNT_PATH);
+	}
 
 	filesystem_mark_unsupported = fanotify_mark_supported_by_kernel(FAN_MARK_FILESYSTEM);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
  2023-04-11 12:40 [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types Amir Goldstein
@ 2023-04-12 18:43 ` Jan Kara
  2023-04-13  9:25   ` Amir Goldstein
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2023-04-12 18:43 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Bobrowski, Christian Brauner, linux-fsdevel,
	linux-api

Hi Amir!

On Tue 11-04-23 15:40:37, Amir Goldstein wrote:
> If kernel supports FAN_REPORT_ANY_FID, use this flag to allow testing
> also filesystems that do not support fsid or NFS file handles (e.g. fuse).
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Jan,
> 
> I wanted to run an idea by you.
> 
> My motivation is to close functional gaps between fanotify and inotify.
> 
> One of the largest gaps right now is that FAN_REPORT_FID is limited
> to a subset of local filesystems.
> 
> The idea is to report fid's that are "good enough" and that there
> is no need to require that fid's can be used by open_by_handle_at()
> because that is a non-requirement for most use cases, unpriv listener
> in particular.

OK. I'd note that if you report only inode number, you are prone to the
problem that some inode gets freed (file deleted) and then reallocated (new
file created) and the resulting identifier is the same. It can be
problematic for a listener to detect these cases and deal with them.
Inotify does not have this problem at least for some cases because 'wd'
uniquely identifies the marked inode. For other cases (like watching dirs)
inotify has similar sort of problems. I'm muttering over this because in
theory filesystems not having i_generation counter on disk could approach
the problem in a similar way as FAT and then we could just use
FILEID_INO64_GEN for the file handle.

Also I have noticed your workaround with using st_dev for fsid. As I've
checked, there are actually very few filesystems that don't set fsid these
days. So maybe we could just get away with still refusing to report on
filesystems without fsid and possibly fixup filesystems which don't set
fsid yet and are used enough so that users complain?

> I chose a rather generic name for the flag to opt-in for "good enough"
> fid's.  At first, I was going to make those fid's self describing the
> fact that they are not NFS file handles, but in the name of simplicity
> to the API, I decided that this is not needed.

I'd like to discuss a bit about the meaning of the flag. On the first look
it is a bit strange to have a flag that says "give me a fh, if you don't
have it, give me ino". It would seem cleaner to have "give me fh" kind of
interface (FAN_REPORT_FID) and "give me ino" kind of interface (new
FAN_REPORT_* flag). I suspect you've chosen the more complex meaning
because you want to allow a usecase where watches of filesystems which
don't support filehandles are mixed with watches of filesystems which do
support filehandles in one notification group and getting filehandles is
actually prefered over getting just inode numbers? Do you see real benefit
in getting file handles when userspace has to implement fallback for
getting just inode numbers anyway?

> The patch below is from the LTP test [1] that verifies reported fid's.
> I am posting it because I think that the function fanotify_get_fid()
> demonstrates well, how a would-be fanotify library could be used to get
> a canonical fid.
> 
> That would-be routine can easily return the source of the fid values
> for a given filesystem and that information is constant for all objects
> on a given filesystem instance.
> 
> The choise to encode an actual file_handle of type FILEID_INO64 may
> seem controversial at first, but it simplifies things so much, that I
> grew very fond of it.

FILEID_INO64 is a bit of a hack in particular because it's difficult to
pretend FILEID_INO64 can be used for NFS. But I agree it is very convenient
:). If we were to do this cleanly we'd have to introduce a new info
structure with ino instead of handle and three new FAN_EVENT_INFO_TYPE_*
types. As I wrote above, we might be able to actually fill-in
FILEID_INO64_GEN which would be less controversial then I suppose.

> The LTP patch also demonstrated how terribly trivial it would be to
> adapt any existing fanotify programs to support any fs.
> 
> Kernel patches [2] are pretty simple IMO and
> man page patch [3] demonstrates that the API changes are minimal.
> 
> Thoughts?

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
  2023-04-12 18:43 ` Jan Kara
@ 2023-04-13  9:25   ` Amir Goldstein
  2023-04-17 16:27     ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2023-04-13  9:25 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, Christian Brauner, linux-fsdevel, linux-api

On Wed, Apr 12, 2023 at 9:44 PM Jan Kara <jack@suse.cz> wrote:
>
> Hi Amir!
>
> On Tue 11-04-23 15:40:37, Amir Goldstein wrote:
> > If kernel supports FAN_REPORT_ANY_FID, use this flag to allow testing
> > also filesystems that do not support fsid or NFS file handles (e.g. fuse).
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >
> > Jan,
> >
> > I wanted to run an idea by you.
> >
> > My motivation is to close functional gaps between fanotify and inotify.
> >
> > One of the largest gaps right now is that FAN_REPORT_FID is limited
> > to a subset of local filesystems.
> >
> > The idea is to report fid's that are "good enough" and that there
> > is no need to require that fid's can be used by open_by_handle_at()
> > because that is a non-requirement for most use cases, unpriv listener
> > in particular.
>
> OK. I'd note that if you report only inode number, you are prone to the
> problem that some inode gets freed (file deleted) and then reallocated (new
> file created) and the resulting identifier is the same. It can be
> problematic for a listener to detect these cases and deal with them.
> Inotify does not have this problem at least for some cases because 'wd'
> uniquely identifies the marked inode. For other cases (like watching dirs)
> inotify has similar sort of problems. I'm muttering over this because in
> theory filesystems not having i_generation counter on disk could approach
> the problem in a similar way as FAT and then we could just use
> FILEID_INO64_GEN for the file handle.

Yes, of course we could.
The problem with that is that user space needs to be able to query the fid
regardless of fanotify.

The fanotify equivalent of wd is the answer to that query.

If any fs would export i_generation via statx, then FILEID_INO64_GEN
would have been my choice.

But if we are going to change some other API for that, I would not change
statx(), I would change name_to_handle_at(...., AT_HANDLE_FID)

This AT_ flag would relax this check in name_to_handle_at():

        /*
         * We need to make sure whether the file system
         * support decoding of the file handle
         */
        if (!path->dentry->d_sb->s_export_op ||
            !path->dentry->d_sb->s_export_op->fh_to_dentry)
                return -EOPNOTSUPP;

And allow the call to proceed to the default export_encode_fh() implementation.
Alas, the default implementation encodes FILEID_INO32_GEN.

I think we can get away with a default implementation for FILEID_INO64_GEN
as long as the former (INO32) is used for fs with export ops without ->encode_fh
(e.g. ext*) and the latter (INO64) is used for fs with no export ops.

>
> Also I have noticed your workaround with using st_dev for fsid. As I've
> checked, there are actually very few filesystems that don't set fsid these
> days. So maybe we could just get away with still refusing to report on
> filesystems without fsid and possibly fixup filesystems which don't set
> fsid yet and are used enough so that users complain?
>

I started going down this path to close the gap with inotify.
inotify is capable of watching all fs including pseudo fs, so I would
like to have this feature parity.

If we can get away with fallback to s_dev as fsid in vfs_statfs()
I have no problem with that, but just to point out - functionally
it is equivalent to do this fallback in userspace library as the
fanotify_get_fid() LTP helper does.

> > I chose a rather generic name for the flag to opt-in for "good enough"
> > fid's.  At first, I was going to make those fid's self describing the
> > fact that they are not NFS file handles, but in the name of simplicity
> > to the API, I decided that this is not needed.
>
> I'd like to discuss a bit about the meaning of the flag. On the first look
> it is a bit strange to have a flag that says "give me a fh, if you don't
> have it, give me ino". It would seem cleaner to have "give me fh" kind of
> interface (FAN_REPORT_FID) and "give me ino" kind of interface (new
> FAN_REPORT_* flag). I suspect you've chosen the more complex meaning
> because you want to allow a usecase where watches of filesystems which
> don't support filehandles are mixed with watches of filesystems which do
> support filehandles in one notification group and getting filehandles is
> actually prefered over getting just inode numbers? Do you see real benefit
> in getting file handles when userspace has to implement fallback for
> getting just inode numbers anyway?
>

Yes, there is a benefit, because a real fhandle has no-reuse guarantee.

Even if we implement the kernel fallback to FILEID_INO64_GEN, it does
not serve as a statement from the filesystem that i_generation is useful
and in fact, i_generation will often be zero in simple fs and ino will be
reusable.

Also, I wanted to have a design where a given fs/object always returns
the same FID regardless of the init flags.

Your question implies that if
"userspace has to implement fallback for getting just inode numbers",
then it doesn't matter if we report fhandle or inode, but it is not accurate.

The fanotify_get_fid() LTP helper always gets a consistent FID for a
given fs/object. You do not need to feed it the fanotify init flags to
provide a consistent answer.

For all the reasons above, I think that a "give me ino'' flag is not useful.
IMO, the flag just needs better marketing.
This is a "I do not need/intend to open_by_handle flag".
Suggestions for a better name are welcome.

For all I care, we do not need to add an opt-in flag at all.
We could simply start to support fs that were not supported before.
This sort of API change is very common and acceptable.

There is no risk if the user tries to call open_by_handle_at() with the
fanotify encoded FID, because in this case the fs is guaranteed to
return ESTALE, because fs does not support file handles.

This is especially true, if we can get away with seamless change
of behavior for vfs_statfs(), because that seamless change would
cause FAN_REPORT_FID to start working on fs like fuse that
support file handles and have zero fsid.

> > The patch below is from the LTP test [1] that verifies reported fid's.
> > I am posting it because I think that the function fanotify_get_fid()
> > demonstrates well, how a would-be fanotify library could be used to get
> > a canonical fid.
> >
> > That would-be routine can easily return the source of the fid values
> > for a given filesystem and that information is constant for all objects
> > on a given filesystem instance.
> >
> > The choise to encode an actual file_handle of type FILEID_INO64 may
> > seem controversial at first, but it simplifies things so much, that I
> > grew very fond of it.
>
> FILEID_INO64 is a bit of a hack in particular because it's difficult to
> pretend FILEID_INO64 can be used for NFS. But I agree it is very convenient
> :). If we were to do this cleanly we'd have to introduce a new info
> structure with ino instead of handle and three new FAN_EVENT_INFO_TYPE_*

Alas, there are more than three:
/* Special info types for FAN_RENAME */
#define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME       10
/* Reserved for FAN_EVENT_INFO_TYPE_OLD_DFID    11 */
#define FAN_EVENT_INFO_TYPE_NEW_DFID_NAME       12
/* Reserved for FAN_EVENT_INFO_TYPE_NEW_DFID    13 */

and I *really* prefer to avoid duplicating all of them.

> types. As I wrote above, we might be able to actually fill-in
> FILEID_INO64_GEN which would be less controversial then I suppose.
>

Yes, that would definitely be better.

Please let me know what you think about the AT_HANDLE_FID
idea and about the seamless fallback in vfs_statfs().

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
  2023-04-13  9:25   ` Amir Goldstein
@ 2023-04-17 16:27     ` Jan Kara
  2023-09-20  8:26       ` Amir Goldstein
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2023-04-17 16:27 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Bobrowski, Christian Brauner, linux-fsdevel,
	linux-api

On Thu 13-04-23 12:25:41, Amir Goldstein wrote:
> On Wed, Apr 12, 2023 at 9:44 PM Jan Kara <jack@suse.cz> wrote:
> >
> > Hi Amir!
> >
> > On Tue 11-04-23 15:40:37, Amir Goldstein wrote:
> > > If kernel supports FAN_REPORT_ANY_FID, use this flag to allow testing
> > > also filesystems that do not support fsid or NFS file handles (e.g. fuse).
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >
> > > Jan,
> > >
> > > I wanted to run an idea by you.
> > >
> > > My motivation is to close functional gaps between fanotify and inotify.
> > >
> > > One of the largest gaps right now is that FAN_REPORT_FID is limited
> > > to a subset of local filesystems.
> > >
> > > The idea is to report fid's that are "good enough" and that there
> > > is no need to require that fid's can be used by open_by_handle_at()
> > > because that is a non-requirement for most use cases, unpriv listener
> > > in particular.
> >
> > OK. I'd note that if you report only inode number, you are prone to the
> > problem that some inode gets freed (file deleted) and then reallocated (new
> > file created) and the resulting identifier is the same. It can be
> > problematic for a listener to detect these cases and deal with them.
> > Inotify does not have this problem at least for some cases because 'wd'
> > uniquely identifies the marked inode. For other cases (like watching dirs)
> > inotify has similar sort of problems. I'm muttering over this because in
> > theory filesystems not having i_generation counter on disk could approach
> > the problem in a similar way as FAT and then we could just use
> > FILEID_INO64_GEN for the file handle.
> 
> Yes, of course we could.
> The problem with that is that user space needs to be able to query the fid
> regardless of fanotify.
> 
> The fanotify equivalent of wd is the answer to that query.
> 
> If any fs would export i_generation via statx, then FILEID_INO64_GEN
> would have been my choice.

One problem with making up i_generation (like FAT does it) is that when
inode gets reclaimed and then refetched from the disk FILEID_INO64_GEN will
change because it's going to have different i_generation. For NFS this is
annoying but I suppose it mostly does not happen since client's accesses
tend to keep the inode in memory. For fanotify it could be more likely to
happen if watching say the whole filesystem and I suppose the watching
application will get confused by this. So I'm not convinced faking
i_generation is a good thing to do. But still I want to brainstorm a bit
about it :)

> But if we are going to change some other API for that, I would not change
> statx(), I would change name_to_handle_at(...., AT_HANDLE_FID)
> 
> This AT_ flag would relax this check in name_to_handle_at():
> 
>         /*
>          * We need to make sure whether the file system
>          * support decoding of the file handle
>          */
>         if (!path->dentry->d_sb->s_export_op ||
>             !path->dentry->d_sb->s_export_op->fh_to_dentry)
>                 return -EOPNOTSUPP;
> 
> And allow the call to proceed to the default export_encode_fh() implementation.
> Alas, the default implementation encodes FILEID_INO32_GEN.
> 
> I think we can get away with a default implementation for FILEID_INO64_GEN
> as long as the former (INO32) is used for fs with export ops without ->encode_fh
> (e.g. ext*) and the latter (INO64) is used for fs with no export ops.

These default calls seem a bit too subtle to me. I'd rather add explicit
handlers to filesystems that really want FILEID_INO32_GEN encoding and then
have a fallback function for filesystems not having s_export_op at all.

But otherwise the proposal to make name_to_handle_at() work even for
filesystems not exportable through NFS makes sense to me. But I guess we
need some buy-in from VFS maintainers for this.

> > Also I have noticed your workaround with using st_dev for fsid. As I've
> > checked, there are actually very few filesystems that don't set fsid these
> > days. So maybe we could just get away with still refusing to report on
> > filesystems without fsid and possibly fixup filesystems which don't set
> > fsid yet and are used enough so that users complain?
> 
> I started going down this path to close the gap with inotify.
> inotify is capable of watching all fs including pseudo fs, so I would
> like to have this feature parity.

Well, but with pseudo filesystems (similarly as with FUSE) the notification
was always unreliable. As in: some cases worked but others did not. I'm not
sure that is something we should try to replicate :)

So still I'd be interested to know which filesystems we are exactly
interested to support and whether we are not better off to explicitely add
fsid support to them like we did for tmpfs.

> If we can get away with fallback to s_dev as fsid in vfs_statfs()
> I have no problem with that, but just to point out - functionally
> it is equivalent to do this fallback in userspace library as the
> fanotify_get_fid() LTP helper does.

Yes, userspace can workaround this but I was more thinking about avoiding
adding these workarounds into fanotify in kernel *and* to userspace.

> > > I chose a rather generic name for the flag to opt-in for "good enough"
> > > fid's.  At first, I was going to make those fid's self describing the
> > > fact that they are not NFS file handles, but in the name of simplicity
> > > to the API, I decided that this is not needed.
> >
> > I'd like to discuss a bit about the meaning of the flag. On the first look
> > it is a bit strange to have a flag that says "give me a fh, if you don't
> > have it, give me ino". It would seem cleaner to have "give me fh" kind of
> > interface (FAN_REPORT_FID) and "give me ino" kind of interface (new
> > FAN_REPORT_* flag). I suspect you've chosen the more complex meaning
> > because you want to allow a usecase where watches of filesystems which
> > don't support filehandles are mixed with watches of filesystems which do
> > support filehandles in one notification group and getting filehandles is
> > actually prefered over getting just inode numbers? Do you see real benefit
> > in getting file handles when userspace has to implement fallback for
> > getting just inode numbers anyway?
> >
> 
> Yes, there is a benefit, because a real fhandle has no-reuse guarantee.
> 
> Even if we implement the kernel fallback to FILEID_INO64_GEN, it does
> not serve as a statement from the filesystem that i_generation is useful
> and in fact, i_generation will often be zero in simple fs and ino will be
> reusable.
> 
> Also, I wanted to have a design where a given fs/object always returns
> the same FID regardless of the init flags.
> 
> Your question implies that if
> "userspace has to implement fallback for getting just inode numbers",
> then it doesn't matter if we report fhandle or inode, but it is not accurate.
> 
> The fanotify_get_fid() LTP helper always gets a consistent FID for a
> given fs/object. You do not need to feed it the fanotify init flags to
> provide a consistent answer.
> 
> For all the reasons above, I think that a "give me ino'' flag is not useful.
> IMO, the flag just needs better marketing.
> This is a "I do not need/intend to open_by_handle flag".
> Suggestions for a better name are welcome.

I see, yes, these reasons make sense.

> For all I care, we do not need to add an opt-in flag at all.
> We could simply start to support fs that were not supported before.
> This sort of API change is very common and acceptable.
> 
> There is no risk if the user tries to call open_by_handle_at() with the
> fanotify encoded FID, because in this case the fs is guaranteed to
> return ESTALE, because fs does not support file handles.
> 
> This is especially true, if we can get away with seamless change
> of behavior for vfs_statfs(), because that seamless change would
> cause FAN_REPORT_FID to start working on fs like fuse that
> support file handles and have zero fsid.

Yeah. Actually I like the idea of a seamless change to start reporting fsid
and also to start reporting "fake" handles. In the past we've already
enabled tmpfs like this...

> > > The patch below is from the LTP test [1] that verifies reported fid's.
> > > I am posting it because I think that the function fanotify_get_fid()
> > > demonstrates well, how a would-be fanotify library could be used to get
> > > a canonical fid.
> > >
> > > That would-be routine can easily return the source of the fid values
> > > for a given filesystem and that information is constant for all objects
> > > on a given filesystem instance.
> > >
> > > The choise to encode an actual file_handle of type FILEID_INO64 may
> > > seem controversial at first, but it simplifies things so much, that I
> > > grew very fond of it.
> >
> > FILEID_INO64 is a bit of a hack in particular because it's difficult to
> > pretend FILEID_INO64 can be used for NFS. But I agree it is very convenient
> > :). If we were to do this cleanly we'd have to introduce a new info
> > structure with ino instead of handle and three new FAN_EVENT_INFO_TYPE_*
> 
> Alas, there are more than three:
> /* Special info types for FAN_RENAME */
> #define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME       10
> /* Reserved for FAN_EVENT_INFO_TYPE_OLD_DFID    11 */
> #define FAN_EVENT_INFO_TYPE_NEW_DFID_NAME       12
> /* Reserved for FAN_EVENT_INFO_TYPE_NEW_DFID    13 */
> 
> and I *really* prefer to avoid duplicating all of them.

Yeah, I was just thinking loud :).

> > types. As I wrote above, we might be able to actually fill-in
> > FILEID_INO64_GEN which would be less controversial then I suppose.
> 
> Yes, that would definitely be better.


								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
  2023-04-17 16:27     ` Jan Kara
@ 2023-09-20  8:26       ` Amir Goldstein
  2023-09-20 10:27         ` Jeff Layton
  2023-09-20 11:04         ` Jan Kara
  0 siblings, 2 replies; 13+ messages in thread
From: Amir Goldstein @ 2023-09-20  8:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, linux-fsdevel, linux-api, Jeff Layton,
	Chuck Lever

On Mon, Apr 17, 2023 at 7:27 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 13-04-23 12:25:41, Amir Goldstein wrote:
> > On Wed, Apr 12, 2023 at 9:44 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > Hi Amir!
> > >
> > > On Tue 11-04-23 15:40:37, Amir Goldstein wrote:
> > > > If kernel supports FAN_REPORT_ANY_FID, use this flag to allow testing
> > > > also filesystems that do not support fsid or NFS file handles (e.g. fuse).
> > > >
> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > ---
> > > >
> > > > Jan,
> > > >
> > > > I wanted to run an idea by you.
> > > >
> > > > My motivation is to close functional gaps between fanotify and inotify.
> > > >
> > > > One of the largest gaps right now is that FAN_REPORT_FID is limited
> > > > to a subset of local filesystems.
> > > >
> > > > The idea is to report fid's that are "good enough" and that there
> > > > is no need to require that fid's can be used by open_by_handle_at()
> > > > because that is a non-requirement for most use cases, unpriv listener
> > > > in particular.
> > >
> > > OK. I'd note that if you report only inode number, you are prone to the
> > > problem that some inode gets freed (file deleted) and then reallocated (new
> > > file created) and the resulting identifier is the same. It can be
> > > problematic for a listener to detect these cases and deal with them.
> > > Inotify does not have this problem at least for some cases because 'wd'
> > > uniquely identifies the marked inode. For other cases (like watching dirs)
> > > inotify has similar sort of problems. I'm muttering over this because in
> > > theory filesystems not having i_generation counter on disk could approach
> > > the problem in a similar way as FAT and then we could just use
> > > FILEID_INO64_GEN for the file handle.
> >
> > Yes, of course we could.
> > The problem with that is that user space needs to be able to query the fid
> > regardless of fanotify.
> >
> > The fanotify equivalent of wd is the answer to that query.
> >
> > If any fs would export i_generation via statx, then FILEID_INO64_GEN
> > would have been my choice.
>
> One problem with making up i_generation (like FAT does it) is that when
> inode gets reclaimed and then refetched from the disk FILEID_INO64_GEN will
> change because it's going to have different i_generation. For NFS this is
> annoying but I suppose it mostly does not happen since client's accesses
> tend to keep the inode in memory. For fanotify it could be more likely to
> happen if watching say the whole filesystem and I suppose the watching
> application will get confused by this. So I'm not convinced faking
> i_generation is a good thing to do. But still I want to brainstorm a bit
> about it :)
>
> > But if we are going to change some other API for that, I would not change
> > statx(), I would change name_to_handle_at(...., AT_HANDLE_FID)
> >
> > This AT_ flag would relax this check in name_to_handle_at():
> >
> >         /*
> >          * We need to make sure whether the file system
> >          * support decoding of the file handle
> >          */
> >         if (!path->dentry->d_sb->s_export_op ||
> >             !path->dentry->d_sb->s_export_op->fh_to_dentry)
> >                 return -EOPNOTSUPP;
> >
> > And allow the call to proceed to the default export_encode_fh() implementation.
> > Alas, the default implementation encodes FILEID_INO32_GEN.
> >
> > I think we can get away with a default implementation for FILEID_INO64_GEN
> > as long as the former (INO32) is used for fs with export ops without ->encode_fh
> > (e.g. ext*) and the latter (INO64) is used for fs with no export ops.
>
> These default calls seem a bit too subtle to me. I'd rather add explicit
> handlers to filesystems that really want FILEID_INO32_GEN encoding and then
> have a fallback function for filesystems not having s_export_op at all.
>
> But otherwise the proposal to make name_to_handle_at() work even for
> filesystems not exportable through NFS makes sense to me. But I guess we
> need some buy-in from VFS maintainers for this.
>

Hi Jan,

I seem to have dropped the ball on this after implementing AT_HANDLE_FID.
It was step one in a larger plan.

Christian, Jeff,

Do you have an objection to this plan:
1. Convert all "legacy" FILEID_INO32_GEN fs with non-empty
    s_export_op and no explicit ->encode_fh() to use an explicit
    generic_encode_ino32_gen_fh()
2. Relax requirement of non-empty s_export_op for AT_HANDLE_FID
    to support encoding a (non-NFS) file id on all fs
3. For fs with empty s_export_op, allow fallback of AT_HANDLE_FID
    in exportfs_encode_inode_fh() to encode FILEID_INO64_GEN


> > > Also I have noticed your workaround with using st_dev for fsid. As I've
> > > checked, there are actually very few filesystems that don't set fsid these
> > > days. So maybe we could just get away with still refusing to report on
> > > filesystems without fsid and possibly fixup filesystems which don't set
> > > fsid yet and are used enough so that users complain?
> >
> > I started going down this path to close the gap with inotify.
> > inotify is capable of watching all fs including pseudo fs, so I would
> > like to have this feature parity.
>
> Well, but with pseudo filesystems (similarly as with FUSE) the notification
> was always unreliable. As in: some cases worked but others did not. I'm not
> sure that is something we should try to replicate :)
>
> So still I'd be interested to know which filesystems we are exactly
> interested to support and whether we are not better off to explicitly add
> fsid support to them like we did for tmpfs.
>

Since this email, kernfs derivative fs gained fsid as well.
Quoting your audit of remaining fs from another thread:

> ...As far as I remember
> fanotify should be now able to handle anything that provides f_fsid in its
> statfs(2) call. And as I'm checking filesystems not setting fsid currently are:
>
> afs, coda, nfs - networking filesystems where inotify and fanotify have
>   dubious value anyway

Be that as it may, there may be users that use inotify on network fs
and it even makes a lot of sense in controlled environments with
single NFS client per NFS export (e.g. home dirs), so I think we will
need to support those fs as well.

Maybe the wise thing to do is to opt-in to monitor those fs after all?
Maybe with explicit opt-in to watch a single fs, fanotify group will
limit itself to marks on a specific sb and then a null fsid won't matter?

>
> configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore,
> ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is
>   quite limited. But some special cases could be useful. Adding fsid support
>   is the same amount of trouble as for kernfs - a few LOC. In fact, we
>   could perhaps add a fstype flag to indicate that this is a filesystem
>   without persistent identification and so uuid should be autogenerated on
>   mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid.
>   This way we could handle all these filesystems with trivial amount of
>   effort.
>

Christian,

I recall that you may have had reservations on initializing s_uuid
and f_fsid in vfs code?
Does an opt-in fstype flag address your concerns?
Will you be ok with doing the tmpfs/kernfs trick for every fs
that opted-in with fstype flag in generic vfs code?

> freevxfs - the only real filesystem without f_fsid. Trivial to handle one
>   way or the other.
>

Last but not least, btrfs subvolumes.
They do have an fsid, but it is different from the sb fsid,
so we disallow (even inode) fanotify marks.

I am not sure how to solve this one,
but if we choose to implement the opt-in fanotify flag for
"watch single fs", we can make this problem go away, along
with the problem of network fs fsid and other odd fs that we
do not want to have to deal with.

On top of everything, it is a fast solution and it doesn't
involve vfs and changing any fs at all.

> > If we can get away with fallback to s_dev as fsid in vfs_statfs()
> > I have no problem with that, but just to point out - functionally
> > it is equivalent to do this fallback in userspace library as the
> > fanotify_get_fid() LTP helper does.
>
> Yes, userspace can workaround this but I was more thinking about avoiding
> adding these workarounds into fanotify in kernel *and* to userspace.
>
> > > > I chose a rather generic name for the flag to opt-in for "good enough"
> > > > fid's.  At first, I was going to make those fid's self describing the
> > > > fact that they are not NFS file handles, but in the name of simplicity
> > > > to the API, I decided that this is not needed.
> > >
> > > I'd like to discuss a bit about the meaning of the flag. On the first look
> > > it is a bit strange to have a flag that says "give me a fh, if you don't
> > > have it, give me ino". It would seem cleaner to have "give me fh" kind of
> > > interface (FAN_REPORT_FID) and "give me ino" kind of interface (new
> > > FAN_REPORT_* flag). I suspect you've chosen the more complex meaning
> > > because you want to allow a usecase where watches of filesystems which
> > > don't support filehandles are mixed with watches of filesystems which do
> > > support filehandles in one notification group and getting filehandles is
> > > actually prefered over getting just inode numbers? Do you see real benefit
> > > in getting file handles when userspace has to implement fallback for
> > > getting just inode numbers anyway?
> > >
> >
> > Yes, there is a benefit, because a real fhandle has no-reuse guarantee.
> >
> > Even if we implement the kernel fallback to FILEID_INO64_GEN, it does
> > not serve as a statement from the filesystem that i_generation is useful
> > and in fact, i_generation will often be zero in simple fs and ino will be
> > reusable.
> >
> > Also, I wanted to have a design where a given fs/object always returns
> > the same FID regardless of the init flags.
> >
> > Your question implies that if
> > "userspace has to implement fallback for getting just inode numbers",
> > then it doesn't matter if we report fhandle or inode, but it is not accurate.
> >
> > The fanotify_get_fid() LTP helper always gets a consistent FID for a
> > given fs/object. You do not need to feed it the fanotify init flags to
> > provide a consistent answer.
> >
> > For all the reasons above, I think that a "give me ino'' flag is not useful.
> > IMO, the flag just needs better marketing.
> > This is a "I do not need/intend to open_by_handle flag".
> > Suggestions for a better name are welcome.
>
> I see, yes, these reasons make sense.
>
> > For all I care, we do not need to add an opt-in flag at all.
> > We could simply start to support fs that were not supported before.
> > This sort of API change is very common and acceptable.
> >
> > There is no risk if the user tries to call open_by_handle_at() with the
> > fanotify encoded FID, because in this case the fs is guaranteed to
> > return ESTALE, because fs does not support file handles.
> >
> > This is especially true, if we can get away with seamless change
> > of behavior for vfs_statfs(), because that seamless change would
> > cause FAN_REPORT_FID to start working on fs like fuse that
> > support file handles and have zero fsid.
>
> Yeah. Actually I like the idea of a seamless change to start reporting fsid
> and also to start reporting "fake" handles. In the past we've already
> enabled tmpfs like this...
>

I am now leaning towards a combination of:
1. Seamless change of behavior for vfs_statfs() and
    name_to_handle_at(..., AT_HANDLE_FID) for the simple cases
    using an opt-in fstype flag
AND
2. Simple interim fallback for other fs with an opt-in fanotify flag (*)

Thoughts?

Thanks,
Amir.

(*) We did some similar opt-in games in overlayfs to support lower
     fs with null uuid - in the default configuration, overlayfs allows a
     single lower layer with null uuid, but not multi lower layers with
     null uuid. When advances features are enabled, even single
     lower layer with null uuid is not allowed

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
  2023-09-20  8:26       ` Amir Goldstein
@ 2023-09-20 10:27         ` Jeff Layton
  2023-09-20 11:04         ` Jan Kara
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2023-09-20 10:27 UTC (permalink / raw)
  To: Amir Goldstein, Jan Kara
  Cc: Christian Brauner, linux-fsdevel, linux-api, Chuck Lever

On Wed, 2023-09-20 at 11:26 +0300, Amir Goldstein wrote:
> On Mon, Apr 17, 2023 at 7:27 PM Jan Kara <jack@suse.cz> wrote:
> > 
> > On Thu 13-04-23 12:25:41, Amir Goldstein wrote:
> > > On Wed, Apr 12, 2023 at 9:44 PM Jan Kara <jack@suse.cz> wrote:
> > > > 
> > > > Hi Amir!
> > > > 
> > > > On Tue 11-04-23 15:40:37, Amir Goldstein wrote:
> > > > > If kernel supports FAN_REPORT_ANY_FID, use this flag to allow testing
> > > > > also filesystems that do not support fsid or NFS file handles (e.g. fuse).
> > > > > 
> > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > ---
> > > > > 
> > > > > Jan,
> > > > > 
> > > > > I wanted to run an idea by you.
> > > > > 
> > > > > My motivation is to close functional gaps between fanotify and inotify.
> > > > > 
> > > > > One of the largest gaps right now is that FAN_REPORT_FID is limited
> > > > > to a subset of local filesystems.
> > > > > 
> > > > > The idea is to report fid's that are "good enough" and that there
> > > > > is no need to require that fid's can be used by open_by_handle_at()
> > > > > because that is a non-requirement for most use cases, unpriv listener
> > > > > in particular.
> > > > 
> > > > OK. I'd note that if you report only inode number, you are prone to the
> > > > problem that some inode gets freed (file deleted) and then reallocated (new
> > > > file created) and the resulting identifier is the same. It can be
> > > > problematic for a listener to detect these cases and deal with them.
> > > > Inotify does not have this problem at least for some cases because 'wd'
> > > > uniquely identifies the marked inode. For other cases (like watching dirs)
> > > > inotify has similar sort of problems. I'm muttering over this because in
> > > > theory filesystems not having i_generation counter on disk could approach
> > > > the problem in a similar way as FAT and then we could just use
> > > > FILEID_INO64_GEN for the file handle.
> > > 
> > > Yes, of course we could.
> > > The problem with that is that user space needs to be able to query the fid
> > > regardless of fanotify.
> > > 
> > > The fanotify equivalent of wd is the answer to that query.
> > > 
> > > If any fs would export i_generation via statx, then FILEID_INO64_GEN
> > > would have been my choice.
> > 
> > One problem with making up i_generation (like FAT does it) is that when
> > inode gets reclaimed and then refetched from the disk FILEID_INO64_GEN will
> > change because it's going to have different i_generation. For NFS this is
> > annoying but I suppose it mostly does not happen since client's accesses
> > tend to keep the inode in memory. For fanotify it could be more likely to
> > happen if watching say the whole filesystem and I suppose the watching
> > application will get confused by this. So I'm not convinced faking
> > i_generation is a good thing to do. But still I want to brainstorm a bit
> > about it :)
> > 
> > > But if we are going to change some other API for that, I would not change
> > > statx(), I would change name_to_handle_at(...., AT_HANDLE_FID)
> > > 
> > > This AT_ flag would relax this check in name_to_handle_at():
> > > 
> > >         /*
> > >          * We need to make sure whether the file system
> > >          * support decoding of the file handle
> > >          */
> > >         if (!path->dentry->d_sb->s_export_op ||
> > >             !path->dentry->d_sb->s_export_op->fh_to_dentry)
> > >                 return -EOPNOTSUPP;
> > > 
> > > And allow the call to proceed to the default export_encode_fh() implementation.
> > > Alas, the default implementation encodes FILEID_INO32_GEN.
> > > 
> > > I think we can get away with a default implementation for FILEID_INO64_GEN
> > > as long as the former (INO32) is used for fs with export ops without ->encode_fh
> > > (e.g. ext*) and the latter (INO64) is used for fs with no export ops.
> > 
> > These default calls seem a bit too subtle to me. I'd rather add explicit
> > handlers to filesystems that really want FILEID_INO32_GEN encoding and then
> > have a fallback function for filesystems not having s_export_op at all.
> > 
> > But otherwise the proposal to make name_to_handle_at() work even for
> > filesystems not exportable through NFS makes sense to me. But I guess we
> > need some buy-in from VFS maintainers for this.
> > 
> 
> Hi Jan,
> 
> I seem to have dropped the ball on this after implementing AT_HANDLE_FID.
> It was step one in a larger plan.
> 
> Christian, Jeff,
> 
> Do you have an objection to this plan:
> 1. Convert all "legacy" FILEID_INO32_GEN fs with non-empty
>     s_export_op and no explicit ->encode_fh() to use an explicit
>     generic_encode_ino32_gen_fh()
> 2. Relax requirement of non-empty s_export_op for AT_HANDLE_FID
>     to support encoding a (non-NFS) file id on all fs
> 3. For fs with empty s_export_op, allow fallback of AT_HANDLE_FID
>     in exportfs_encode_inode_fh() to encode FILEID_INO64_GEN
> 
> 

This plan sounds reasonable.

> > > > Also I have noticed your workaround with using st_dev for fsid. As I've
> > > > checked, there are actually very few filesystems that don't set fsid these
> > > > days. So maybe we could just get away with still refusing to report on
> > > > filesystems without fsid and possibly fixup filesystems which don't set
> > > > fsid yet and are used enough so that users complain?
> > > 
> > > I started going down this path to close the gap with inotify.
> > > inotify is capable of watching all fs including pseudo fs, so I would
> > > like to have this feature parity.
> > 
> > Well, but with pseudo filesystems (similarly as with FUSE) the notification
> > was always unreliable. As in: some cases worked but others did not. I'm not
> > sure that is something we should try to replicate :)
> > 
> > So still I'd be interested to know which filesystems we are exactly
> > interested to support and whether we are not better off to explicitly add
> > fsid support to them like we did for tmpfs.
> > 
> 
> Since this email, kernfs derivative fs gained fsid as well.
> Quoting your audit of remaining fs from another thread:
> 
> > ...As far as I remember
> > fanotify should be now able to handle anything that provides f_fsid in its
> > statfs(2) call. And as I'm checking filesystems not setting fsid currently are:
> > 
> > afs, coda, nfs - networking filesystems where inotify and fanotify have
> >   dubious value anyway
> 
> Be that as it may, there may be users that use inotify on network fs
> and it even makes a lot of sense in controlled environments with
> single NFS client per NFS export (e.g. home dirs), so I think we will
> need to support those fs as well.
> 
> Maybe the wise thing to do is to opt-in to monitor those fs after all?
> Maybe with explicit opt-in to watch a single fs, fanotify group will
> limit itself to marks on a specific sb and then a null fsid won't matter?
> 

Caution here. Most of these filesystems don't have protocol support for
anything like inotify (the known exception being SMB). You can monitor
such a network filesystem, but you won't get events for things that
happen on remote hosts.

This is confusing for users, and we've always rejected supporting the
notify interfaces on NFS for this reason.

> > 
> > configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore,
> > ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is
> >   quite limited. But some special cases could be useful. Adding fsid support
> >   is the same amount of trouble as for kernfs - a few LOC. In fact, we
> >   could perhaps add a fstype flag to indicate that this is a filesystem
> >   without persistent identification and so uuid should be autogenerated on
> >   mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid.
> >   This way we could handle all these filesystems with trivial amount of
> >   effort.
> > 
> 
> Christian,
> 
> I recall that you may have had reservations on initializing s_uuid
> and f_fsid in vfs code?
> Does an opt-in fstype flag address your concerns?
> Will you be ok with doing the tmpfs/kernfs trick for every fs
> that opted-in with fstype flag in generic vfs code?
> 
> > freevxfs - the only real filesystem without f_fsid. Trivial to handle one
> >   way or the other.
> > 
> 
> Last but not least, btrfs subvolumes.
> They do have an fsid, but it is different from the sb fsid,
> so we disallow (even inode) fanotify marks.
> 
> I am not sure how to solve this one,
> but if we choose to implement the opt-in fanotify flag for
> "watch single fs", we can make this problem go away, along
> with the problem of network fs fsid and other odd fs that we
> do not want to have to deal with.
> 
> On top of everything, it is a fast solution and it doesn't
> involve vfs and changing any fs at all.
> 

FWIW, we have a long standing bug open on dealing with btrfs subvolumes
with nfsd:

    https://bugzilla.linux-nfs.org/show_bug.cgi?id=389

You may want to look over that before you dive in to solutions here.

> > > If we can get away with fallback to s_dev as fsid in vfs_statfs()
> > > I have no problem with that, but just to point out - functionally
> > > it is equivalent to do this fallback in userspace library as the
> > > fanotify_get_fid() LTP helper does.
> > 
> > Yes, userspace can workaround this but I was more thinking about avoiding
> > adding these workarounds into fanotify in kernel *and* to userspace.
> > 
> > > > > I chose a rather generic name for the flag to opt-in for "good enough"
> > > > > fid's.  At first, I was going to make those fid's self describing the
> > > > > fact that they are not NFS file handles, but in the name of simplicity
> > > > > to the API, I decided that this is not needed.
> > > > 
> > > > I'd like to discuss a bit about the meaning of the flag. On the first look
> > > > it is a bit strange to have a flag that says "give me a fh, if you don't
> > > > have it, give me ino". It would seem cleaner to have "give me fh" kind of
> > > > interface (FAN_REPORT_FID) and "give me ino" kind of interface (new
> > > > FAN_REPORT_* flag). I suspect you've chosen the more complex meaning
> > > > because you want to allow a usecase where watches of filesystems which
> > > > don't support filehandles are mixed with watches of filesystems which do
> > > > support filehandles in one notification group and getting filehandles is
> > > > actually prefered over getting just inode numbers? Do you see real benefit
> > > > in getting file handles when userspace has to implement fallback for
> > > > getting just inode numbers anyway?
> > > > 
> > > 
> > > Yes, there is a benefit, because a real fhandle has no-reuse guarantee.
> > > 
> > > Even if we implement the kernel fallback to FILEID_INO64_GEN, it does
> > > not serve as a statement from the filesystem that i_generation is useful
> > > and in fact, i_generation will often be zero in simple fs and ino will be
> > > reusable.
> > > 
> > > Also, I wanted to have a design where a given fs/object always returns
> > > the same FID regardless of the init flags.
> > > 
> > > Your question implies that if
> > > "userspace has to implement fallback for getting just inode numbers",
> > > then it doesn't matter if we report fhandle or inode, but it is not accurate.
> > > 
> > > The fanotify_get_fid() LTP helper always gets a consistent FID for a
> > > given fs/object. You do not need to feed it the fanotify init flags to
> > > provide a consistent answer.
> > > 
> > > For all the reasons above, I think that a "give me ino'' flag is not useful.
> > > IMO, the flag just needs better marketing.
> > > This is a "I do not need/intend to open_by_handle flag".
> > > Suggestions for a better name are welcome.
> > 
> > I see, yes, these reasons make sense.
> > 
> > > For all I care, we do not need to add an opt-in flag at all.
> > > We could simply start to support fs that were not supported before.
> > > This sort of API change is very common and acceptable.
> > > 
> > > There is no risk if the user tries to call open_by_handle_at() with the
> > > fanotify encoded FID, because in this case the fs is guaranteed to
> > > return ESTALE, because fs does not support file handles.
> > > 
> > > This is especially true, if we can get away with seamless change
> > > of behavior for vfs_statfs(), because that seamless change would
> > > cause FAN_REPORT_FID to start working on fs like fuse that
> > > support file handles and have zero fsid.
> > 
> > Yeah. Actually I like the idea of a seamless change to start reporting fsid
> > and also to start reporting "fake" handles. In the past we've already
> > enabled tmpfs like this...
> > 
> 
> I am now leaning towards a combination of:
> 1. Seamless change of behavior for vfs_statfs() and
>     name_to_handle_at(..., AT_HANDLE_FID) for the simple cases
>     using an opt-in fstype flag
> AND
> 2. Simple interim fallback for other fs with an opt-in fanotify flag (*)
> 
> Thoughts?
> 
> Thanks,
> Amir.
> 
> (*) We did some similar opt-in games in overlayfs to support lower
>      fs with null uuid - in the default configuration, overlayfs allows a
>      single lower layer with null uuid, but not multi lower layers with
>      null uuid. When advances features are enabled, even single
>      lower layer with null uuid is not allowed

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
  2023-09-20  8:26       ` Amir Goldstein
  2023-09-20 10:27         ` Jeff Layton
@ 2023-09-20 11:04         ` Jan Kara
  2023-09-20 12:41           ` Amir Goldstein
  2023-10-25 14:11           ` Amir Goldstein
  1 sibling, 2 replies; 13+ messages in thread
From: Jan Kara @ 2023-09-20 11:04 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Christian Brauner, linux-fsdevel, linux-api,
	Jeff Layton, Chuck Lever

On Wed 20-09-23 11:26:38, Amir Goldstein wrote:
> On Mon, Apr 17, 2023 at 7:27 PM Jan Kara <jack@suse.cz> wrote:
> > > > > My motivation is to close functional gaps between fanotify and inotify.
> > > > >
> > > > > One of the largest gaps right now is that FAN_REPORT_FID is limited
> > > > > to a subset of local filesystems.
> > > > >
> > > > > The idea is to report fid's that are "good enough" and that there
> > > > > is no need to require that fid's can be used by open_by_handle_at()
> > > > > because that is a non-requirement for most use cases, unpriv listener
> > > > > in particular.
> > > >
> > > > OK. I'd note that if you report only inode number, you are prone to the
> > > > problem that some inode gets freed (file deleted) and then reallocated (new
> > > > file created) and the resulting identifier is the same. It can be
> > > > problematic for a listener to detect these cases and deal with them.
> > > > Inotify does not have this problem at least for some cases because 'wd'
> > > > uniquely identifies the marked inode. For other cases (like watching dirs)
> > > > inotify has similar sort of problems. I'm muttering over this because in
> > > > theory filesystems not having i_generation counter on disk could approach
> > > > the problem in a similar way as FAT and then we could just use
> > > > FILEID_INO64_GEN for the file handle.
> > >
> > > Yes, of course we could.
> > > The problem with that is that user space needs to be able to query the fid
> > > regardless of fanotify.
> > >
> > > The fanotify equivalent of wd is the answer to that query.
> > >
> > > If any fs would export i_generation via statx, then FILEID_INO64_GEN
> > > would have been my choice.
> >
> > One problem with making up i_generation (like FAT does it) is that when
> > inode gets reclaimed and then refetched from the disk FILEID_INO64_GEN will
> > change because it's going to have different i_generation. For NFS this is
> > annoying but I suppose it mostly does not happen since client's accesses
> > tend to keep the inode in memory. For fanotify it could be more likely to
> > happen if watching say the whole filesystem and I suppose the watching
> > application will get confused by this. So I'm not convinced faking
> > i_generation is a good thing to do. But still I want to brainstorm a bit
> > about it :)
> >
> > > But if we are going to change some other API for that, I would not change
> > > statx(), I would change name_to_handle_at(...., AT_HANDLE_FID)
> > >
> > > This AT_ flag would relax this check in name_to_handle_at():
> > >
> > >         /*
> > >          * We need to make sure whether the file system
> > >          * support decoding of the file handle
> > >          */
> > >         if (!path->dentry->d_sb->s_export_op ||
> > >             !path->dentry->d_sb->s_export_op->fh_to_dentry)
> > >                 return -EOPNOTSUPP;
> > >
> > > And allow the call to proceed to the default export_encode_fh() implementation.
> > > Alas, the default implementation encodes FILEID_INO32_GEN.
> > >
> > > I think we can get away with a default implementation for FILEID_INO64_GEN
> > > as long as the former (INO32) is used for fs with export ops without ->encode_fh
> > > (e.g. ext*) and the latter (INO64) is used for fs with no export ops.
> >
> > These default calls seem a bit too subtle to me. I'd rather add explicit
> > handlers to filesystems that really want FILEID_INO32_GEN encoding and then
> > have a fallback function for filesystems not having s_export_op at all.
> >
> > But otherwise the proposal to make name_to_handle_at() work even for
> > filesystems not exportable through NFS makes sense to me. But I guess we
> > need some buy-in from VFS maintainers for this.
> >
> 
> Hi Jan,
> 
> I seem to have dropped the ball on this after implementing AT_HANDLE_FID.
> It was step one in a larger plan.

No problem, I forgot about this as well :)

> Christian, Jeff,
> 
> Do you have an objection to this plan:
> 1. Convert all "legacy" FILEID_INO32_GEN fs with non-empty
>     s_export_op and no explicit ->encode_fh() to use an explicit
>     generic_encode_ino32_gen_fh()
> 2. Relax requirement of non-empty s_export_op for AT_HANDLE_FID
>     to support encoding a (non-NFS) file id on all fs
> 3. For fs with empty s_export_op, allow fallback of AT_HANDLE_FID
>     in exportfs_encode_inode_fh() to encode FILEID_INO64_GEN
> 
> 
> > > > Also I have noticed your workaround with using st_dev for fsid. As I've
> > > > checked, there are actually very few filesystems that don't set fsid these
> > > > days. So maybe we could just get away with still refusing to report on
> > > > filesystems without fsid and possibly fixup filesystems which don't set
> > > > fsid yet and are used enough so that users complain?
> > >
> > > I started going down this path to close the gap with inotify.
> > > inotify is capable of watching all fs including pseudo fs, so I would
> > > like to have this feature parity.
> >
> > Well, but with pseudo filesystems (similarly as with FUSE) the notification
> > was always unreliable. As in: some cases worked but others did not. I'm not
> > sure that is something we should try to replicate :)
> >
> > So still I'd be interested to know which filesystems we are exactly
> > interested to support and whether we are not better off to explicitly add
> > fsid support to them like we did for tmpfs.
> >
> 
> Since this email, kernfs derivative fs gained fsid as well.
> Quoting your audit of remaining fs from another thread:
> 
> > ...As far as I remember
> > fanotify should be now able to handle anything that provides f_fsid in its
> > statfs(2) call. And as I'm checking filesystems not setting fsid currently are:
> >
> > afs, coda, nfs - networking filesystems where inotify and fanotify have
> >   dubious value anyway
> 
> Be that as it may, there may be users that use inotify on network fs
> and it even makes a lot of sense in controlled environments with
> single NFS client per NFS export (e.g. home dirs), so I think we will
> need to support those fs as well.

Fair enough.

> Maybe the wise thing to do is to opt-in to monitor those fs after all?
> Maybe with explicit opt-in to watch a single fs, fanotify group will
> limit itself to marks on a specific sb and then a null fsid won't matter?
 
We have virtual filesystems with all sorts of missing or weird notification
functionality and we don't flag that in any way. So making a special flag
for network filesystems seems a bit arbitrary. I'd just make them provide
fsid and be done with it.

> > configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore,
> > ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is
> >   quite limited. But some special cases could be useful. Adding fsid support
> >   is the same amount of trouble as for kernfs - a few LOC. In fact, we
> >   could perhaps add a fstype flag to indicate that this is a filesystem
> >   without persistent identification and so uuid should be autogenerated on
> >   mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid.
> >   This way we could handle all these filesystems with trivial amount of
> >   effort.
> >
> 
> Christian,
> 
> I recall that you may have had reservations on initializing s_uuid
> and f_fsid in vfs code?
> Does an opt-in fstype flag address your concerns?
> Will you be ok with doing the tmpfs/kernfs trick for every fs
> that opted-in with fstype flag in generic vfs code?
> 
> > freevxfs - the only real filesystem without f_fsid. Trivial to handle one
> >   way or the other.
> >
> 
> Last but not least, btrfs subvolumes.
> They do have an fsid, but it is different from the sb fsid,
> so we disallow (even inode) fanotify marks.
>
> I am not sure how to solve this one,
> but if we choose to implement the opt-in fanotify flag for
> "watch single fs", we can make this problem go away, along
> with the problem of network fs fsid and other odd fs that we
> do not want to have to deal with.
> 
> On top of everything, it is a fast solution and it doesn't
> involve vfs and changing any fs at all.

Yeah, right, forgot about this one. Thanks for reminding me. But this is
mostly a kernel internal implementation issue and doesn't seem to be a
principial problem so I'd prefer not to complicate the uAPI for this. We
could for example mandate a special super_operation for fetching fsid for a
dentry for filesystems which don't have uniform fsids over the whole
filesystem (i.e., btrfs) and call this when generating event for such
filesystems. Or am I missing some other complication?
 
> > > If we can get away with fallback to s_dev as fsid in vfs_statfs()
> > > I have no problem with that, but just to point out - functionally
> > > it is equivalent to do this fallback in userspace library as the
> > > fanotify_get_fid() LTP helper does.
> >
> > Yes, userspace can workaround this but I was more thinking about avoiding
> > adding these workarounds into fanotify in kernel *and* to userspace.
> >
> > > > > I chose a rather generic name for the flag to opt-in for "good enough"
> > > > > fid's.  At first, I was going to make those fid's self describing the
> > > > > fact that they are not NFS file handles, but in the name of simplicity
> > > > > to the API, I decided that this is not needed.
> > > >
> > > > I'd like to discuss a bit about the meaning of the flag. On the first look
> > > > it is a bit strange to have a flag that says "give me a fh, if you don't
> > > > have it, give me ino". It would seem cleaner to have "give me fh" kind of
> > > > interface (FAN_REPORT_FID) and "give me ino" kind of interface (new
> > > > FAN_REPORT_* flag). I suspect you've chosen the more complex meaning
> > > > because you want to allow a usecase where watches of filesystems which
> > > > don't support filehandles are mixed with watches of filesystems which do
> > > > support filehandles in one notification group and getting filehandles is
> > > > actually prefered over getting just inode numbers? Do you see real benefit
> > > > in getting file handles when userspace has to implement fallback for
> > > > getting just inode numbers anyway?
> > > >
> > >
> > > Yes, there is a benefit, because a real fhandle has no-reuse guarantee.
> > >
> > > Even if we implement the kernel fallback to FILEID_INO64_GEN, it does
> > > not serve as a statement from the filesystem that i_generation is useful
> > > and in fact, i_generation will often be zero in simple fs and ino will be
> > > reusable.
> > >
> > > Also, I wanted to have a design where a given fs/object always returns
> > > the same FID regardless of the init flags.
> > >
> > > Your question implies that if
> > > "userspace has to implement fallback for getting just inode numbers",
> > > then it doesn't matter if we report fhandle or inode, but it is not accurate.
> > >
> > > The fanotify_get_fid() LTP helper always gets a consistent FID for a
> > > given fs/object. You do not need to feed it the fanotify init flags to
> > > provide a consistent answer.
> > >
> > > For all the reasons above, I think that a "give me ino'' flag is not useful.
> > > IMO, the flag just needs better marketing.
> > > This is a "I do not need/intend to open_by_handle flag".
> > > Suggestions for a better name are welcome.
> >
> > I see, yes, these reasons make sense.
> >
> > > For all I care, we do not need to add an opt-in flag at all.
> > > We could simply start to support fs that were not supported before.
> > > This sort of API change is very common and acceptable.
> > >
> > > There is no risk if the user tries to call open_by_handle_at() with the
> > > fanotify encoded FID, because in this case the fs is guaranteed to
> > > return ESTALE, because fs does not support file handles.
> > >
> > > This is especially true, if we can get away with seamless change
> > > of behavior for vfs_statfs(), because that seamless change would
> > > cause FAN_REPORT_FID to start working on fs like fuse that
> > > support file handles and have zero fsid.
> >
> > Yeah. Actually I like the idea of a seamless change to start reporting fsid
> > and also to start reporting "fake" handles. In the past we've already
> > enabled tmpfs like this...
> >
> 
> I am now leaning towards a combination of:
> 1. Seamless change of behavior for vfs_statfs() and
>     name_to_handle_at(..., AT_HANDLE_FID) for the simple cases
>     using an opt-in fstype flag

Ack.

> AND
> 2. Simple interim fallback for other fs with an opt-in fanotify flag (*)

I'm not sold on the special flag yet (see above) ;).

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
  2023-09-20 11:04         ` Jan Kara
@ 2023-09-20 12:41           ` Amir Goldstein
  2023-09-20 13:48             ` Jan Kara
  2023-10-25 14:11           ` Amir Goldstein
  1 sibling, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2023-09-20 12:41 UTC (permalink / raw)
  To: Jan Kara, Jeff Layton
  Cc: Christian Brauner, linux-fsdevel, linux-api, Chuck Lever

On Wed, Sep 20, 2023 at 2:04 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 20-09-23 11:26:38, Amir Goldstein wrote:
> > On Mon, Apr 17, 2023 at 7:27 PM Jan Kara <jack@suse.cz> wrote:
> > > > > > My motivation is to close functional gaps between fanotify and inotify.
> > > > > >
> > > > > > One of the largest gaps right now is that FAN_REPORT_FID is limited
> > > > > > to a subset of local filesystems.
> > > > > >
> > > > > > The idea is to report fid's that are "good enough" and that there
> > > > > > is no need to require that fid's can be used by open_by_handle_at()
> > > > > > because that is a non-requirement for most use cases, unpriv listener
> > > > > > in particular.
> > > > >
> > > > > OK. I'd note that if you report only inode number, you are prone to the
> > > > > problem that some inode gets freed (file deleted) and then reallocated (new
> > > > > file created) and the resulting identifier is the same. It can be
> > > > > problematic for a listener to detect these cases and deal with them.
> > > > > Inotify does not have this problem at least for some cases because 'wd'
> > > > > uniquely identifies the marked inode. For other cases (like watching dirs)
> > > > > inotify has similar sort of problems. I'm muttering over this because in
> > > > > theory filesystems not having i_generation counter on disk could approach
> > > > > the problem in a similar way as FAT and then we could just use
> > > > > FILEID_INO64_GEN for the file handle.
> > > >
> > > > Yes, of course we could.
> > > > The problem with that is that user space needs to be able to query the fid
> > > > regardless of fanotify.
> > > >
> > > > The fanotify equivalent of wd is the answer to that query.
> > > >
> > > > If any fs would export i_generation via statx, then FILEID_INO64_GEN
> > > > would have been my choice.
> > >
> > > One problem with making up i_generation (like FAT does it) is that when
> > > inode gets reclaimed and then refetched from the disk FILEID_INO64_GEN will
> > > change because it's going to have different i_generation. For NFS this is
> > > annoying but I suppose it mostly does not happen since client's accesses
> > > tend to keep the inode in memory. For fanotify it could be more likely to
> > > happen if watching say the whole filesystem and I suppose the watching
> > > application will get confused by this. So I'm not convinced faking
> > > i_generation is a good thing to do. But still I want to brainstorm a bit
> > > about it :)
> > >
> > > > But if we are going to change some other API for that, I would not change
> > > > statx(), I would change name_to_handle_at(...., AT_HANDLE_FID)
> > > >
> > > > This AT_ flag would relax this check in name_to_handle_at():
> > > >
> > > >         /*
> > > >          * We need to make sure whether the file system
> > > >          * support decoding of the file handle
> > > >          */
> > > >         if (!path->dentry->d_sb->s_export_op ||
> > > >             !path->dentry->d_sb->s_export_op->fh_to_dentry)
> > > >                 return -EOPNOTSUPP;
> > > >
> > > > And allow the call to proceed to the default export_encode_fh() implementation.
> > > > Alas, the default implementation encodes FILEID_INO32_GEN.
> > > >
> > > > I think we can get away with a default implementation for FILEID_INO64_GEN
> > > > as long as the former (INO32) is used for fs with export ops without ->encode_fh
> > > > (e.g. ext*) and the latter (INO64) is used for fs with no export ops.
> > >
> > > These default calls seem a bit too subtle to me. I'd rather add explicit
> > > handlers to filesystems that really want FILEID_INO32_GEN encoding and then
> > > have a fallback function for filesystems not having s_export_op at all.
> > >
> > > But otherwise the proposal to make name_to_handle_at() work even for
> > > filesystems not exportable through NFS makes sense to me. But I guess we
> > > need some buy-in from VFS maintainers for this.
> > >
> >
> > Hi Jan,
> >
> > I seem to have dropped the ball on this after implementing AT_HANDLE_FID.
> > It was step one in a larger plan.
>
> No problem, I forgot about this as well :)
>
> > Christian, Jeff,
> >
> > Do you have an objection to this plan:
> > 1. Convert all "legacy" FILEID_INO32_GEN fs with non-empty
> >     s_export_op and no explicit ->encode_fh() to use an explicit
> >     generic_encode_ino32_gen_fh()
> > 2. Relax requirement of non-empty s_export_op for AT_HANDLE_FID
> >     to support encoding a (non-NFS) file id on all fs
> > 3. For fs with empty s_export_op, allow fallback of AT_HANDLE_FID
> >     in exportfs_encode_inode_fh() to encode FILEID_INO64_GEN
> >
> >
> > > > > Also I have noticed your workaround with using st_dev for fsid. As I've
> > > > > checked, there are actually very few filesystems that don't set fsid these
> > > > > days. So maybe we could just get away with still refusing to report on
> > > > > filesystems without fsid and possibly fixup filesystems which don't set
> > > > > fsid yet and are used enough so that users complain?
> > > >
> > > > I started going down this path to close the gap with inotify.
> > > > inotify is capable of watching all fs including pseudo fs, so I would
> > > > like to have this feature parity.
> > >
> > > Well, but with pseudo filesystems (similarly as with FUSE) the notification
> > > was always unreliable. As in: some cases worked but others did not. I'm not
> > > sure that is something we should try to replicate :)
> > >
> > > So still I'd be interested to know which filesystems we are exactly
> > > interested to support and whether we are not better off to explicitly add
> > > fsid support to them like we did for tmpfs.
> > >
> >
> > Since this email, kernfs derivative fs gained fsid as well.
> > Quoting your audit of remaining fs from another thread:
> >
> > > ...As far as I remember
> > > fanotify should be now able to handle anything that provides f_fsid in its
> > > statfs(2) call. And as I'm checking filesystems not setting fsid currently are:
> > >
> > > afs, coda, nfs - networking filesystems where inotify and fanotify have
> > >   dubious value anyway
> >
> > Be that as it may, there may be users that use inotify on network fs
> > and it even makes a lot of sense in controlled environments with
> > single NFS client per NFS export (e.g. home dirs), so I think we will
> > need to support those fs as well.
>
> Fair enough.
>
> > Maybe the wise thing to do is to opt-in to monitor those fs after all?
> > Maybe with explicit opt-in to watch a single fs, fanotify group will
> > limit itself to marks on a specific sb and then a null fsid won't matter?
>
> We have virtual filesystems with all sorts of missing or weird notification
> functionality and we don't flag that in any way. So making a special flag
> for network filesystems seems a bit arbitrary. I'd just make them provide
> fsid and be done with it.
>

OK. I will try.

However, in reply to Jeff's comment:

> Caution here. Most of these filesystems don't have protocol support for
> anything like inotify (the known exception being SMB). You can monitor
> such a network filesystem, but you won't get events for things that
> happen on remote hosts.

Regardless of the fsid question, when we discussed remote notifications
support for FUSE/SMB, we raised the point that which notifications the
user gets (local or remote) are ambiguous and one suggestion was to
be explicit about requesting LOCAL or REMOTE notifications (or both).

Among the filesystems that currently support fanotify, except for the
most recent kernfs family, I think all of them are "purely local".
For the purpose of this discussion I consider debugfs and such to have
REMOTE notifications, which are not triggered from user vfs syscalls.

The one exception is smb, but only with config CIFS_NFSD_EXPORT
and that one depends on BROKEN.

If we did want to require an explicit FAN_LOCAL_NOTIF flag to allow
setting marks on fs which may have changes not via local syscalls,
it may be a good idea to flag those fs and disallow them without explicit
opt-in, before we add fanotify support to fs with missing notifications?
Perhaps before the official release of 6.6?

> > > configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore,
> > > ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is
> > >   quite limited. But some special cases could be useful. Adding fsid support
> > >   is the same amount of trouble as for kernfs - a few LOC. In fact, we
> > >   could perhaps add a fstype flag to indicate that this is a filesystem
> > >   without persistent identification and so uuid should be autogenerated on
> > >   mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid.
> > >   This way we could handle all these filesystems with trivial amount of
> > >   effort.
> > >
> >
> > Christian,
> >
> > I recall that you may have had reservations on initializing s_uuid
> > and f_fsid in vfs code?
> > Does an opt-in fstype flag address your concerns?
> > Will you be ok with doing the tmpfs/kernfs trick for every fs
> > that opted-in with fstype flag in generic vfs code?
> >
> > > freevxfs - the only real filesystem without f_fsid. Trivial to handle one
> > >   way or the other.
> > >
> >
> > Last but not least, btrfs subvolumes.
> > They do have an fsid, but it is different from the sb fsid,
> > so we disallow (even inode) fanotify marks.
> >
> > I am not sure how to solve this one,
> > but if we choose to implement the opt-in fanotify flag for
> > "watch single fs", we can make this problem go away, along
> > with the problem of network fs fsid and other odd fs that we
> > do not want to have to deal with.
> >
> > On top of everything, it is a fast solution and it doesn't
> > involve vfs and changing any fs at all.
>
> Yeah, right, forgot about this one. Thanks for reminding me. But this is
> mostly a kernel internal implementation issue and doesn't seem to be a
> principial problem so I'd prefer not to complicate the uAPI for this. We
> could for example mandate a special super_operation for fetching fsid for a
> dentry for filesystems which don't have uniform fsids over the whole
> filesystem (i.e., btrfs) and call this when generating event for such
> filesystems. Or am I missing some other complication?
>

The problem is the other way around :)
btrfs_statfs() takes a dentry and returns the fsid of the subvolume.
That is the fsid that users will get when querying the path to be marked.

If users had a flag to statfs() to request the "btrfs root volume fsid",
then fanotify could also report the root fsid and everyone will be happy
because the btrfs file handle already contains the subvolume root
object id (FILEID_BTRFS_WITH_PARENT_ROOT), but that is not
what users get for statfs() and that is not what fanotify documentation
says about how to query fsid.

We could report the subvolume fsid for marked inode/mount
that is not a problem - we just cache the subvol fsid in inode/mount
connector, but that fsid will be inconsistent with the fsid in the sb
connector, so the same object (in subvolume) can get events
with different fsid (e.g. if one event is in mask of sb and another
event is in mask of inode).

As Jeff said, nfsd also have issues with exporting btrfs subvolumes,
because of these oddities and I have no desire to solve those issues.

I think we could relax the EXDEV case for unpriv fanotify, because
inode marks should not have this problem?

That would be an easy way to get feature parity with inotify wrt btrfs.

> > > > If we can get away with fallback to s_dev as fsid in vfs_statfs()
> > > > I have no problem with that, but just to point out - functionally
> > > > it is equivalent to do this fallback in userspace library as the
> > > > fanotify_get_fid() LTP helper does.
> > >
> > > Yes, userspace can workaround this but I was more thinking about avoiding
> > > adding these workarounds into fanotify in kernel *and* to userspace.
> > >
> > > > > > I chose a rather generic name for the flag to opt-in for "good enough"
> > > > > > fid's.  At first, I was going to make those fid's self describing the
> > > > > > fact that they are not NFS file handles, but in the name of simplicity
> > > > > > to the API, I decided that this is not needed.
> > > > >
> > > > > I'd like to discuss a bit about the meaning of the flag. On the first look
> > > > > it is a bit strange to have a flag that says "give me a fh, if you don't
> > > > > have it, give me ino". It would seem cleaner to have "give me fh" kind of
> > > > > interface (FAN_REPORT_FID) and "give me ino" kind of interface (new
> > > > > FAN_REPORT_* flag). I suspect you've chosen the more complex meaning
> > > > > because you want to allow a usecase where watches of filesystems which
> > > > > don't support filehandles are mixed with watches of filesystems which do
> > > > > support filehandles in one notification group and getting filehandles is
> > > > > actually prefered over getting just inode numbers? Do you see real benefit
> > > > > in getting file handles when userspace has to implement fallback for
> > > > > getting just inode numbers anyway?
> > > > >
> > > >
> > > > Yes, there is a benefit, because a real fhandle has no-reuse guarantee.
> > > >
> > > > Even if we implement the kernel fallback to FILEID_INO64_GEN, it does
> > > > not serve as a statement from the filesystem that i_generation is useful
> > > > and in fact, i_generation will often be zero in simple fs and ino will be
> > > > reusable.
> > > >
> > > > Also, I wanted to have a design where a given fs/object always returns
> > > > the same FID regardless of the init flags.
> > > >
> > > > Your question implies that if
> > > > "userspace has to implement fallback for getting just inode numbers",
> > > > then it doesn't matter if we report fhandle or inode, but it is not accurate.
> > > >
> > > > The fanotify_get_fid() LTP helper always gets a consistent FID for a
> > > > given fs/object. You do not need to feed it the fanotify init flags to
> > > > provide a consistent answer.
> > > >
> > > > For all the reasons above, I think that a "give me ino'' flag is not useful.
> > > > IMO, the flag just needs better marketing.
> > > > This is a "I do not need/intend to open_by_handle flag".
> > > > Suggestions for a better name are welcome.
> > >
> > > I see, yes, these reasons make sense.
> > >
> > > > For all I care, we do not need to add an opt-in flag at all.
> > > > We could simply start to support fs that were not supported before.
> > > > This sort of API change is very common and acceptable.
> > > >
> > > > There is no risk if the user tries to call open_by_handle_at() with the
> > > > fanotify encoded FID, because in this case the fs is guaranteed to
> > > > return ESTALE, because fs does not support file handles.
> > > >
> > > > This is especially true, if we can get away with seamless change
> > > > of behavior for vfs_statfs(), because that seamless change would
> > > > cause FAN_REPORT_FID to start working on fs like fuse that
> > > > support file handles and have zero fsid.
> > >
> > > Yeah. Actually I like the idea of a seamless change to start reporting fsid
> > > and also to start reporting "fake" handles. In the past we've already
> > > enabled tmpfs like this...
> > >
> >
> > I am now leaning towards a combination of:
> > 1. Seamless change of behavior for vfs_statfs() and
> >     name_to_handle_at(..., AT_HANDLE_FID) for the simple cases
> >     using an opt-in fstype flag
>
> Ack.
>
> > AND
> > 2. Simple interim fallback for other fs with an opt-in fanotify flag (*)
>
> I'm not sold on the special flag yet (see above) ;).

OK, will try to make progress without an opt-in flag.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
  2023-09-20 12:41           ` Amir Goldstein
@ 2023-09-20 13:48             ` Jan Kara
  2023-09-20 15:12               ` Amir Goldstein
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2023-09-20 13:48 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Jeff Layton, Christian Brauner, linux-fsdevel,
	linux-api, Chuck Lever

On Wed 20-09-23 15:41:06, Amir Goldstein wrote:
> On Wed, Sep 20, 2023 at 2:04 PM Jan Kara <jack@suse.cz> wrote:
> > On Wed 20-09-23 11:26:38, Amir Goldstein wrote:
> > > Be that as it may, there may be users that use inotify on network fs
> > > and it even makes a lot of sense in controlled environments with
> > > single NFS client per NFS export (e.g. home dirs), so I think we will
> > > need to support those fs as well.
> >
> > Fair enough.
> >
> > > Maybe the wise thing to do is to opt-in to monitor those fs after all?
> > > Maybe with explicit opt-in to watch a single fs, fanotify group will
> > > limit itself to marks on a specific sb and then a null fsid won't matter?
> >
> > We have virtual filesystems with all sorts of missing or weird notification
> > functionality and we don't flag that in any way. So making a special flag
> > for network filesystems seems a bit arbitrary. I'd just make them provide
> > fsid and be done with it.
> >
> 
> OK. I will try.
> 
> However, in reply to Jeff's comment:
> 
> > Caution here. Most of these filesystems don't have protocol support for
> > anything like inotify (the known exception being SMB). You can monitor
> > such a network filesystem, but you won't get events for things that
> > happen on remote hosts.
> 
> Regardless of the fsid question, when we discussed remote notifications
> support for FUSE/SMB, we raised the point that which notifications the
> user gets (local or remote) are ambiguous and one suggestion was to
> be explicit about requesting LOCAL or REMOTE notifications (or both).
> 
> Among the filesystems that currently support fanotify, except for the
> most recent kernfs family, I think all of them are "purely local".
> For the purpose of this discussion I consider debugfs and such to have
> REMOTE notifications, which are not triggered from user vfs syscalls.

Well, now you are speaking of FAN_REPORT_FID mode. There I agree we
currently support only filesystems with a sane behavior. But there are
definitely existing users of fanotify in "standard" file-descriptor mode
for filesystems such as sysfs, proc, etc. So the new flag would have to
change behavior only to FAN_REPORT_FID groups and that's why I think it's a
bit odd.

> The one exception is smb, but only with config CIFS_NFSD_EXPORT
> and that one depends on BROKEN.
> 
> If we did want to require an explicit FAN_LOCAL_NOTIF flag to allow
> setting marks on fs which may have changes not via local syscalls,
> it may be a good idea to flag those fs and disallow them without explicit
> opt-in, before we add fanotify support to fs with missing notifications?
> Perhaps before the official release of 6.6?

Yeah, overall I agree it would be nice to differentiate filesystems where
we aren't going to generate all the events. But as I write above, there are
already existing users for non-FID mode so we cannot have that flag in the
general setting.

> > > > configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore,
> > > > ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is
> > > >   quite limited. But some special cases could be useful. Adding fsid support
> > > >   is the same amount of trouble as for kernfs - a few LOC. In fact, we
> > > >   could perhaps add a fstype flag to indicate that this is a filesystem
> > > >   without persistent identification and so uuid should be autogenerated on
> > > >   mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid.
> > > >   This way we could handle all these filesystems with trivial amount of
> > > >   effort.
> > > >
> > >
> > > Christian,
> > >
> > > I recall that you may have had reservations on initializing s_uuid
> > > and f_fsid in vfs code?
> > > Does an opt-in fstype flag address your concerns?
> > > Will you be ok with doing the tmpfs/kernfs trick for every fs
> > > that opted-in with fstype flag in generic vfs code?
> > >
> > > > freevxfs - the only real filesystem without f_fsid. Trivial to handle one
> > > >   way or the other.
> > > >
> > >
> > > Last but not least, btrfs subvolumes.
> > > They do have an fsid, but it is different from the sb fsid,
> > > so we disallow (even inode) fanotify marks.
> > >
> > > I am not sure how to solve this one,
> > > but if we choose to implement the opt-in fanotify flag for
> > > "watch single fs", we can make this problem go away, along
> > > with the problem of network fs fsid and other odd fs that we
> > > do not want to have to deal with.
> > >
> > > On top of everything, it is a fast solution and it doesn't
> > > involve vfs and changing any fs at all.
> >
> > Yeah, right, forgot about this one. Thanks for reminding me. But this is
> > mostly a kernel internal implementation issue and doesn't seem to be a
> > principial problem so I'd prefer not to complicate the uAPI for this. We
> > could for example mandate a special super_operation for fetching fsid for a
> > dentry for filesystems which don't have uniform fsids over the whole
> > filesystem (i.e., btrfs) and call this when generating event for such
> > filesystems. Or am I missing some other complication?
> >
> 
> The problem is the other way around :)
> btrfs_statfs() takes a dentry and returns the fsid of the subvolume.
> That is the fsid that users will get when querying the path to be marked.

Yup.

> If users had a flag to statfs() to request the "btrfs root volume fsid",
> then fanotify could also report the root fsid and everyone will be happy
> because the btrfs file handle already contains the subvolume root
> object id (FILEID_BTRFS_WITH_PARENT_ROOT), but that is not
> what users get for statfs() and that is not what fanotify documentation
> says about how to query fsid.
> 
> We could report the subvolume fsid for marked inode/mount
> that is not a problem - we just cache the subvol fsid in inode/mount
> connector, but that fsid will be inconsistent with the fsid in the sb
> connector, so the same object (in subvolume) can get events
> with different fsid (e.g. if one event is in mask of sb and another
> event is in mask of inode).

Yes. I'm sorry I didn't describe all the details. My idea was to report
even on a dentry with the fsid statfs(2) would return on it. We don't want
to call dentry_statfs() on each event (it's costly and we don't always have
the dentry available) but we can have a special callback into the
filesystem to get us just the fsid (which is very cheap) and call *that* on
the inode on which the event happens to get fsid for the event. So yes, the
sb mark would be returning events with different fsids for btrfs. Or we
could compare the obtained fsid with the one in the root volume and ignore
the event if they mismatch (that would be more like the different subvolume
=> different filesystem point of view and would require some more work on
fanotify side to remember fsid in the sb mark and not in the sb connector).

> As Jeff said, nfsd also have issues with exporting btrfs subvolumes,
> because of these oddities and I have no desire to solve those issues.
> 
> I think we could relax the EXDEV case for unpriv fanotify, because
> inode marks should not have this problem?

Yes, inode marks definitely don't have the problem so enabling them is a
no-brainer.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
  2023-09-20 13:48             ` Jan Kara
@ 2023-09-20 15:12               ` Amir Goldstein
  2023-09-20 16:36                 ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2023-09-20 15:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jeff Layton, Christian Brauner, linux-fsdevel, linux-api,
	Chuck Lever

On Wed, Sep 20, 2023 at 4:48 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 20-09-23 15:41:06, Amir Goldstein wrote:
> > On Wed, Sep 20, 2023 at 2:04 PM Jan Kara <jack@suse.cz> wrote:
> > > On Wed 20-09-23 11:26:38, Amir Goldstein wrote:
> > > > Be that as it may, there may be users that use inotify on network fs
> > > > and it even makes a lot of sense in controlled environments with
> > > > single NFS client per NFS export (e.g. home dirs), so I think we will
> > > > need to support those fs as well.
> > >
> > > Fair enough.
> > >
> > > > Maybe the wise thing to do is to opt-in to monitor those fs after all?
> > > > Maybe with explicit opt-in to watch a single fs, fanotify group will
> > > > limit itself to marks on a specific sb and then a null fsid won't matter?
> > >
> > > We have virtual filesystems with all sorts of missing or weird notification
> > > functionality and we don't flag that in any way. So making a special flag
> > > for network filesystems seems a bit arbitrary. I'd just make them provide
> > > fsid and be done with it.
> > >
> >
> > OK. I will try.
> >
> > However, in reply to Jeff's comment:
> >
> > > Caution here. Most of these filesystems don't have protocol support for
> > > anything like inotify (the known exception being SMB). You can monitor
> > > such a network filesystem, but you won't get events for things that
> > > happen on remote hosts.
> >
> > Regardless of the fsid question, when we discussed remote notifications
> > support for FUSE/SMB, we raised the point that which notifications the
> > user gets (local or remote) are ambiguous and one suggestion was to
> > be explicit about requesting LOCAL or REMOTE notifications (or both).
> >
> > Among the filesystems that currently support fanotify, except for the
> > most recent kernfs family, I think all of them are "purely local".
> > For the purpose of this discussion I consider debugfs and such to have
> > REMOTE notifications, which are not triggered from user vfs syscalls.
>
> Well, now you are speaking of FAN_REPORT_FID mode. There I agree we
> currently support only filesystems with a sane behavior. But there are
> definitely existing users of fanotify in "standard" file-descriptor mode
> for filesystems such as sysfs, proc, etc. So the new flag would have to
> change behavior only to FAN_REPORT_FID groups and that's why I think it's a
> bit odd.
>

No flag is fine by me.

> > The one exception is smb, but only with config CIFS_NFSD_EXPORT
> > and that one depends on BROKEN.
> >
> > If we did want to require an explicit FAN_LOCAL_NOTIF flag to allow
> > setting marks on fs which may have changes not via local syscalls,
> > it may be a good idea to flag those fs and disallow them without explicit
> > opt-in, before we add fanotify support to fs with missing notifications?
> > Perhaps before the official release of 6.6?
>
> Yeah, overall I agree it would be nice to differentiate filesystems where
> we aren't going to generate all the events. But as I write above, there are
> already existing users for non-FID mode so we cannot have that flag in the
> general setting.
>
> > > > > configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore,
> > > > > ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is
> > > > >   quite limited. But some special cases could be useful. Adding fsid support
> > > > >   is the same amount of trouble as for kernfs - a few LOC. In fact, we
> > > > >   could perhaps add a fstype flag to indicate that this is a filesystem
> > > > >   without persistent identification and so uuid should be autogenerated on
> > > > >   mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid.
> > > > >   This way we could handle all these filesystems with trivial amount of
> > > > >   effort.
> > > > >
> > > >
> > > > Christian,
> > > >
> > > > I recall that you may have had reservations on initializing s_uuid
> > > > and f_fsid in vfs code?
> > > > Does an opt-in fstype flag address your concerns?
> > > > Will you be ok with doing the tmpfs/kernfs trick for every fs
> > > > that opted-in with fstype flag in generic vfs code?
> > > >
> > > > > freevxfs - the only real filesystem without f_fsid. Trivial to handle one
> > > > >   way or the other.
> > > > >
> > > >
> > > > Last but not least, btrfs subvolumes.
> > > > They do have an fsid, but it is different from the sb fsid,
> > > > so we disallow (even inode) fanotify marks.
> > > >
> > > > I am not sure how to solve this one,
> > > > but if we choose to implement the opt-in fanotify flag for
> > > > "watch single fs", we can make this problem go away, along
> > > > with the problem of network fs fsid and other odd fs that we
> > > > do not want to have to deal with.
> > > >
> > > > On top of everything, it is a fast solution and it doesn't
> > > > involve vfs and changing any fs at all.
> > >
> > > Yeah, right, forgot about this one. Thanks for reminding me. But this is
> > > mostly a kernel internal implementation issue and doesn't seem to be a
> > > principial problem so I'd prefer not to complicate the uAPI for this. We
> > > could for example mandate a special super_operation for fetching fsid for a
> > > dentry for filesystems which don't have uniform fsids over the whole
> > > filesystem (i.e., btrfs) and call this when generating event for such
> > > filesystems. Or am I missing some other complication?
> > >
> >
> > The problem is the other way around :)
> > btrfs_statfs() takes a dentry and returns the fsid of the subvolume.
> > That is the fsid that users will get when querying the path to be marked.
>
> Yup.
>
> > If users had a flag to statfs() to request the "btrfs root volume fsid",
> > then fanotify could also report the root fsid and everyone will be happy
> > because the btrfs file handle already contains the subvolume root
> > object id (FILEID_BTRFS_WITH_PARENT_ROOT), but that is not
> > what users get for statfs() and that is not what fanotify documentation
> > says about how to query fsid.
> >
> > We could report the subvolume fsid for marked inode/mount
> > that is not a problem - we just cache the subvol fsid in inode/mount
> > connector, but that fsid will be inconsistent with the fsid in the sb
> > connector, so the same object (in subvolume) can get events
> > with different fsid (e.g. if one event is in mask of sb and another
> > event is in mask of inode).
>
> Yes. I'm sorry I didn't describe all the details. My idea was to report
> even on a dentry with the fsid statfs(2) would return on it. We don't want
> to call dentry_statfs() on each event (it's costly and we don't always have
> the dentry available) but we can have a special callback into the
> filesystem to get us just the fsid (which is very cheap) and call *that* on
> the inode on which the event happens to get fsid for the event. So yes, the
> sb mark would be returning events with different fsids for btrfs. Or we
> could compare the obtained fsid with the one in the root volume and ignore
> the event if they mismatch (that would be more like the different subvolume
> => different filesystem point of view and would require some more work on
> fanotify side to remember fsid in the sb mark and not in the sb connector).
>

It sounds like a big project.
I am not sure it is really needed.

On second thought, maybe getting different events on the
same subvol with different fsid is not that bad, because for
btrfs, it is possible to resolve the path of an fid in subvol
from either the root mount or the subvol mount.
IOW, subvol_fsid+fid and root_fsid+fid are two ways to
describe the same unique object.

Remember that we have two use cases for fsid+fid:
1. (unpriv/priv) User queries fsid+fid, sets an inode mark on path,
    stores fsid+fid<->path in a map to match events to path later
2. (priv-only) User queries fsid, sets a sb/mount mark on path,
    stores fsid<->path to match event to mntfd and
    resolves path by handle from mntfd+fid

Suppose we only allow to set sb marks on btrfs root volume,
but we relax the rule to allow btrfs inode/mount mark on subvol.

Use case #1 is not a problem.
If we prefer inode conn->fsid, to sb/mount conn->fsid, whichever
conn->fsid that is chosen, reports an fsid+fid that user has
recorded the path for, even if sb/mount marks also exist.

In use case #2,
- If the mark was on root sb, then the user has
  the root volume mntfd mapped to root fsid
- If the mark was on subvol mount, then the user has
  the subvol mntfd mapped to subvol fsid
- If the user has both marks it also has both mntfds,
  so it does not matter if events carry the root fsid
  (event is in the root sb mark mask) or if they carry the
  subvol fsid (event is in the subvol mount mark mask)

This needs a POC, but I think it should work.
In any case I can fix the btrfs inode marks.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
  2023-09-20 15:12               ` Amir Goldstein
@ 2023-09-20 16:36                 ` Jan Kara
  2023-10-18 18:35                   ` Amir Goldstein
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2023-09-20 16:36 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Jeff Layton, Christian Brauner, linux-fsdevel,
	linux-api, Chuck Lever

On Wed 20-09-23 18:12:00, Amir Goldstein wrote:
> On Wed, Sep 20, 2023 at 4:48 PM Jan Kara <jack@suse.cz> wrote:
> > > If users had a flag to statfs() to request the "btrfs root volume fsid",
> > > then fanotify could also report the root fsid and everyone will be happy
> > > because the btrfs file handle already contains the subvolume root
> > > object id (FILEID_BTRFS_WITH_PARENT_ROOT), but that is not
> > > what users get for statfs() and that is not what fanotify documentation
> > > says about how to query fsid.
> > >
> > > We could report the subvolume fsid for marked inode/mount
> > > that is not a problem - we just cache the subvol fsid in inode/mount
> > > connector, but that fsid will be inconsistent with the fsid in the sb
> > > connector, so the same object (in subvolume) can get events
> > > with different fsid (e.g. if one event is in mask of sb and another
> > > event is in mask of inode).
> >
> > Yes. I'm sorry I didn't describe all the details. My idea was to report
> > even on a dentry with the fsid statfs(2) would return on it. We don't want
> > to call dentry_statfs() on each event (it's costly and we don't always have
> > the dentry available) but we can have a special callback into the
> > filesystem to get us just the fsid (which is very cheap) and call *that* on
> > the inode on which the event happens to get fsid for the event. So yes, the
> > sb mark would be returning events with different fsids for btrfs. Or we
> > could compare the obtained fsid with the one in the root volume and ignore
> > the event if they mismatch (that would be more like the different subvolume
> > => different filesystem point of view and would require some more work on
> > fanotify side to remember fsid in the sb mark and not in the sb connector).
> >
> 
> It sounds like a big project.

Actually it should be pretty simple as I imagine it. Maybe I can quickly
hack a POC.

> I am not sure it is really needed.
>
> On second thought, maybe getting different events on the
> same subvol with different fsid is not that bad, because for
> btrfs, it is possible to resolve the path of an fid in subvol
> from either the root mount or the subvol mount.
> IOW, subvol_fsid+fid and root_fsid+fid are two ways to
> describe the same unique object.
> 
> Remember that we have two use cases for fsid+fid:
> 1. (unpriv/priv) User queries fsid+fid, sets an inode mark on path,
>     stores fsid+fid<->path in a map to match events to path later
> 2. (priv-only) User queries fsid, sets a sb/mount mark on path,
>     stores fsid<->path to match event to mntfd and
>     resolves path by handle from mntfd+fid

You're right that for open_by_handle_at() the fsid is actually only good
for getting *any* path on the superblock where file handle can be used so
any of the fsids provided by btrfs is OK. What will be a slight catch is
that if you would be using name_to_handle_at() to match what you've got
from fanotify you will never be able to identify some fids. I'm not sure
how serious that would be...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
  2023-09-20 16:36                 ` Jan Kara
@ 2023-10-18 18:35                   ` Amir Goldstein
  0 siblings, 0 replies; 13+ messages in thread
From: Amir Goldstein @ 2023-10-18 18:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jeff Layton, Christian Brauner, linux-fsdevel, linux-api,
	Chuck Lever

On Wed, Sep 20, 2023 at 7:36 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 20-09-23 18:12:00, Amir Goldstein wrote:
> > On Wed, Sep 20, 2023 at 4:48 PM Jan Kara <jack@suse.cz> wrote:
> > > > If users had a flag to statfs() to request the "btrfs root volume fsid",
> > > > then fanotify could also report the root fsid and everyone will be happy
> > > > because the btrfs file handle already contains the subvolume root
> > > > object id (FILEID_BTRFS_WITH_PARENT_ROOT), but that is not
> > > > what users get for statfs() and that is not what fanotify documentation
> > > > says about how to query fsid.
> > > >
> > > > We could report the subvolume fsid for marked inode/mount
> > > > that is not a problem - we just cache the subvol fsid in inode/mount
> > > > connector, but that fsid will be inconsistent with the fsid in the sb
> > > > connector, so the same object (in subvolume) can get events
> > > > with different fsid (e.g. if one event is in mask of sb and another
> > > > event is in mask of inode).
> > >
> > > Yes. I'm sorry I didn't describe all the details. My idea was to report
> > > even on a dentry with the fsid statfs(2) would return on it. We don't want
> > > to call dentry_statfs() on each event (it's costly and we don't always have
> > > the dentry available) but we can have a special callback into the
> > > filesystem to get us just the fsid (which is very cheap) and call *that* on
> > > the inode on which the event happens to get fsid for the event. So yes, the
> > > sb mark would be returning events with different fsids for btrfs. Or we
> > > could compare the obtained fsid with the one in the root volume and ignore
> > > the event if they mismatch (that would be more like the different subvolume
> > > => different filesystem point of view and would require some more work on
> > > fanotify side to remember fsid in the sb mark and not in the sb connector).
> > >
> >
> > It sounds like a big project.
>
> Actually it should be pretty simple as I imagine it. Maybe I can quickly
> hack a POC.
>

I think I get what you mean.
Pushed POC (only compile tested so far) to branch inode_fsid on my github [1]
and posted the patches for seamless support for non-decodable fid.

Let me know if that is what you meant.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/inode_fsid

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types
  2023-09-20 11:04         ` Jan Kara
  2023-09-20 12:41           ` Amir Goldstein
@ 2023-10-25 14:11           ` Amir Goldstein
  1 sibling, 0 replies; 13+ messages in thread
From: Amir Goldstein @ 2023-10-25 14:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, linux-fsdevel, linux-api, Jeff Layton,
	Chuck Lever

> > Hi Jan,
> >
> > I seem to have dropped the ball on this after implementing AT_HANDLE_FID.
> > It was step one in a larger plan.
>
> No problem, I forgot about this as well :)
>

Following up...

> > Christian, Jeff,
> >
> > Do you have an objection to this plan:
> > 1. Convert all "legacy" FILEID_INO32_GEN fs with non-empty
> >     s_export_op and no explicit ->encode_fh() to use an explicit
> >     generic_encode_ino32_gen_fh()
> > 2. Relax requirement of non-empty s_export_op for AT_HANDLE_FID
> >     to support encoding a (non-NFS) file id on all fs
> > 3. For fs with empty s_export_op, allow fallback of AT_HANDLE_FID
> >     in exportfs_encode_inode_fh() to encode FILEID_INO64_GEN
> >

These are now queued on Christian's vfs.f_fsid branch.

> >
> > > > > Also I have noticed your workaround with using st_dev for fsid. As I've
> > > > > checked, there are actually very few filesystems that don't set fsid these
> > > > > days. So maybe we could just get away with still refusing to report on
> > > > > filesystems without fsid and possibly fixup filesystems which don't set
> > > > > fsid yet and are used enough so that users complain?
> > > >
> > > > I started going down this path to close the gap with inotify.
> > > > inotify is capable of watching all fs including pseudo fs, so I would
> > > > like to have this feature parity.
> > >
> > > Well, but with pseudo filesystems (similarly as with FUSE) the notification
> > > was always unreliable. As in: some cases worked but others did not. I'm not
> > > sure that is something we should try to replicate :)
> > >
> > > So still I'd be interested to know which filesystems we are exactly
> > > interested to support and whether we are not better off to explicitly add
> > > fsid support to them like we did for tmpfs.
> > >
> >
> > Since this email, kernfs derivative fs gained fsid as well.
> > Quoting your audit of remaining fs from another thread:
> >
> > > ...As far as I remember
> > > fanotify should be now able to handle anything that provides f_fsid in its
> > > statfs(2) call. And as I'm checking filesystems not setting fsid currently are:
> > >
> > > afs, coda, nfs - networking filesystems where inotify and fanotify have
> > >   dubious value anyway
> >
> > Be that as it may, there may be users that use inotify on network fs
> > and it even makes a lot of sense in controlled environments with
> > single NFS client per NFS export (e.g. home dirs), so I think we will
> > need to support those fs as well.
>
> Fair enough.

I have sent an fsid patch for NFS and for FUSE.
I am not going to deal with afs and coda unless there is explicit demand.

>
> > Maybe the wise thing to do is to opt-in to monitor those fs after all?
> > Maybe with explicit opt-in to watch a single fs, fanotify group will
> > limit itself to marks on a specific sb and then a null fsid won't matter?
>
> We have virtual filesystems with all sorts of missing or weird notification
> functionality and we don't flag that in any way. So making a special flag
> for network filesystems seems a bit arbitrary. I'd just make them provide
> fsid and be done with it.
>
> > > configfs, debugfs, devpts, efivarfs, hugetlbfs, openpromfs, proc, pstore,
> > > ramfs, sysfs, tracefs - virtual filesystems where fsnotify functionality is
> > >   quite limited. But some special cases could be useful. Adding fsid support
> > >   is the same amount of trouble as for kernfs - a few LOC. In fact, we
> > >   could perhaps add a fstype flag to indicate that this is a filesystem
> > >   without persistent identification and so uuid should be autogenerated on
> > >   mount (likely in alloc_super()) and f_fsid generated from sb->s_uuid.
> > >   This way we could handle all these filesystems with trivial amount of
> > >   effort.
> > >

Patch for simple fs fsid also queued on Christian's vfs.f_fsid branch.

> >
> > Christian,
> >
> > I recall that you may have had reservations on initializing s_uuid
> > and f_fsid in vfs code?
> > Does an opt-in fstype flag address your concerns?
> > Will you be ok with doing the tmpfs/kernfs trick for every fs
> > that opted-in with fstype flag in generic vfs code?
> >
> > > freevxfs - the only real filesystem without f_fsid. Trivial to handle one
> > >   way or the other.
> > >

fsid patch was posted for freevxfs.
fsid patch for gfs2 was posted and applied by the maintainer.

> >
> > Last but not least, btrfs subvolumes.
> > They do have an fsid, but it is different from the sb fsid,
> > so we disallow (even inode) fanotify marks.
> >
> > I am not sure how to solve this one,
> > but if we choose to implement the opt-in fanotify flag for
> > "watch single fs", we can make this problem go away, along
> > with the problem of network fs fsid and other odd fs that we
> > do not want to have to deal with.
> >
> > On top of everything, it is a fast solution and it doesn't
> > involve vfs and changing any fs at all.
>
> Yeah, right, forgot about this one. Thanks for reminding me. But this is
> mostly a kernel internal implementation issue and doesn't seem to be a
> principial problem so I'd prefer not to complicate the uAPI for this. We
> could for example mandate a special super_operation for fetching fsid for a
> dentry for filesystems which don't have uniform fsids over the whole
> filesystem (i.e., btrfs) and call this when generating event for such
> filesystems. Or am I missing some other complication?
>

No complication AFAICS.
btrfs fsid patches posted for review.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-10-25 14:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-11 12:40 [RFC][PATCH] fanotify: Enable FAN_REPORT_FID on more filesystem types Amir Goldstein
2023-04-12 18:43 ` Jan Kara
2023-04-13  9:25   ` Amir Goldstein
2023-04-17 16:27     ` Jan Kara
2023-09-20  8:26       ` Amir Goldstein
2023-09-20 10:27         ` Jeff Layton
2023-09-20 11:04         ` Jan Kara
2023-09-20 12:41           ` Amir Goldstein
2023-09-20 13:48             ` Jan Kara
2023-09-20 15:12               ` Amir Goldstein
2023-09-20 16:36                 ` Jan Kara
2023-10-18 18:35                   ` Amir Goldstein
2023-10-25 14:11           ` Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).