linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] fanotify: add pre-content hooks
@ 2024-07-25 18:19 Josef Bacik
  2024-07-25 18:19 ` [PATCH 01/10] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
                   ` (9 more replies)
  0 siblings, 10 replies; 47+ messages in thread
From: Josef Bacik @ 2024-07-25 18:19 UTC (permalink / raw)
  To: kernel-team, linux-fsdevel, jack, amir73il, brauner

Hello,

These are the patches for the bare bones pre-content fanotify support.  The
majority of this work is Amir's, my contribution to this has solely been around
adding the page fault hooks, testing and validating everything.  I'm sending it
because Amir is traveling a bunch, and I touched it last so I'm going to take
all the hate and he can take all the credit.

There is a PoC that I've been using to validate this work, you can find the git
repo here

https://github.com/josefbacik/remote-fetch

This consists of 3 different tools.

1. populate.  This just creates all the stub files in the directory from the
   source directory.  Just run ./populate ~/linux ~/hsm-linux and it'll
   recursively create all of the stub files and directories.
2. remote-fetch.  This is the actual PoC, you just point it at the source and
   destination directory and then you can do whatever.  ./remote-fetch ~/linux
   ~/hsm-linux.
3. mmap-validate.  This was to validate the pagefault thing, this is likely what
   will be turned into the selftest with remote-fetch.  It creates a file and
   then you can validate the file matches the right pattern with both normal
   reads and mmap.  Normally I do something like

   ./mmap-validate create ~/src/foo
   ./populate ~/src ~/dst
   ./rmeote-fetch ~/src ~/dst
   ./mmap-validate validate ~/dst/foo

I did a bunch of testing, I also got some performance numbers.  I copied a
kernel tree, and then did remote-fetch, and then make -j4

Normal
real    9m49.709s
user    28m11.372s
sys     4m57.304s

HSM
real    10m6.454s
user    29m10.517s
sys     5m2.617s

So ~17 seconds more to build with HSM.  I then did a make mrproper on both trees
to see the size

[root@fedora ~]# du -hs /src/linux
1.6G    /src/linux
[root@fedora ~]# du -hs dst
125M    dst

This mirrors the sort of savings we've seen in production.

Meta has had these patches (minus the page fault patch) deployed in production
for almost a year with our own utility for doing on-demand package fetching.
The savings from this has been pretty significant.

The page-fault hooks are necessary for the last thing we need, which is
on-demand range fetching of executables.  Some of our binaries are several gigs
large, having the ability to remote fetch them on demand is a huge win for us
not only with space savings, but with startup time of containers.

There will be tests for this going into LTP once we're satisfied with the
patches and they're on their way upstream.  Thanks,

Josef

Amir Goldstein (8):
  fsnotify: introduce pre-content permission event
  fsnotify: generate pre-content permission event on open
  fanotify: introduce FAN_PRE_ACCESS permission event
  fanotify: introduce FAN_PRE_MODIFY permission event
  fanotify: pass optional file access range in pre-content event
  fanotify: rename a misnamed constant
  fanotify: report file range info with pre-content events
  fanotify: allow to set errno in FAN_DENY permission response

Josef Bacik (2):
  fanotify: don't skip extra event info if no info_mode is set
  fsnotify: generate pre-content permission event on page fault

 fs/namei.c                         |   9 +++
 fs/notify/fanotify/fanotify.c      |  29 ++++++--
 fs/notify/fanotify/fanotify.h      |  10 +++
 fs/notify/fanotify/fanotify_user.c | 111 +++++++++++++++++++++++------
 fs/notify/fsnotify.c               |  15 +++-
 include/linux/fanotify.h           |  24 +++++--
 include/linux/fsnotify.h           |  54 ++++++++++++--
 include/linux/fsnotify_backend.h   |  59 ++++++++++++++-
 include/uapi/linux/fanotify.h      |  17 +++++
 mm/filemap.c                       |  50 +++++++++++--
 security/selinux/hooks.c           |   3 +-
 11 files changed, 335 insertions(+), 46 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 01/10] fanotify: don't skip extra event info if no info_mode is set
  2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
@ 2024-07-25 18:19 ` Josef Bacik
  2024-08-01 17:59   ` Jan Kara
  2024-07-25 18:19 ` [PATCH 02/10] fsnotify: introduce pre-content permission event Josef Bacik
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2024-07-25 18:19 UTC (permalink / raw)
  To: kernel-team, linux-fsdevel, jack, amir73il, brauner

Previously we would only include optional information if you requested
it via an FAN_ flag at fanotify_init time (FAN_REPORT_FID for example).
However this isn't necessary as the event length is encoded in the
metadata, and if the user doesn't want to consume the information they
don't have to.  With the PRE_ACCESS events we will always generate range
information, so drop this check in order to allow this extra
information to be exported without needing to have another flag.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/notify/fanotify/fanotify_user.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9ec313e9f6e1..2e2fba8a9d20 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -160,9 +160,6 @@ static size_t fanotify_event_len(unsigned int info_mode,
 	int fh_len;
 	int dot_len = 0;
 
-	if (!info_mode)
-		return event_len;
-
 	if (fanotify_is_error_event(event->mask))
 		event_len += FANOTIFY_ERROR_INFO_LEN;
 
@@ -740,12 +737,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 	if (fanotify_is_perm_event(event->mask))
 		FANOTIFY_PERM(event)->fd = fd;
 
-	if (info_mode) {
-		ret = copy_info_records_to_user(event, info, info_mode, pidfd,
-						buf, count);
-		if (ret < 0)
-			goto out_close_fd;
-	}
+	ret = copy_info_records_to_user(event, info, info_mode, pidfd,
+					buf, count);
+	if (ret < 0)
+		goto out_close_fd;
 
 	if (f)
 		fd_install(fd, f);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 02/10] fsnotify: introduce pre-content permission event
  2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
  2024-07-25 18:19 ` [PATCH 01/10] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
@ 2024-07-25 18:19 ` Josef Bacik
  2024-08-01 16:31   ` Jan Kara
  2024-07-25 18:19 ` [PATCH 03/10] fsnotify: generate pre-content permission event on open Josef Bacik
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2024-07-25 18:19 UTC (permalink / raw)
  To: kernel-team, linux-fsdevel, jack, amir73il, brauner

From: Amir Goldstein <amir73il@gmail.com>

The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM,
but it meant for a different use case of filling file content before
access to a file range, so it has slightly different semantics.

Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, same as
we did for FS_OPEN_PERM/FS_OPEN_EXEC_PERM.

FS_PRE_MODIFY is a new permission event, with similar semantics as
FS_PRE_ACCESS, which is called before a file is modified.

FS_ACCESS_PERM is reported also on blockdev and pipes, but the new
pre-content events are only reported for regular files and dirs.

The pre-content events are meant to be used by hierarchical storage
managers that want to fill the content of files on first access.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fsnotify.c             |  2 +-
 include/linux/fsnotify.h         | 27 ++++++++++++++++++++++++---
 include/linux/fsnotify_backend.h | 13 +++++++++++--
 security/selinux/hooks.c         |  3 ++-
 4 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 272c8a1dab3c..1ca4a8da7f29 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -621,7 +621,7 @@ static __init int fsnotify_init(void)
 {
 	int ret;
 
-	BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 23);
+	BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25);
 
 	ret = init_srcu_struct(&fsnotify_mark_srcu);
 	if (ret)
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 278620e063ab..028ce807805a 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -133,12 +133,13 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
 
 #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
 /*
- * fsnotify_file_area_perm - permission hook before access to file range
+ * fsnotify_file_area_perm - permission hook before access/modify of file range
  */
 static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
 					  const loff_t *ppos, size_t count)
 {
-	__u32 fsnotify_mask = FS_ACCESS_PERM;
+	struct inode *inode = file_inode(file);
+	__u32 fsnotify_mask;
 
 	/*
 	 * filesystem may be modified in the context of permission events
@@ -147,7 +148,27 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
 	 */
 	lockdep_assert_once(file_write_not_started(file));
 
-	if (!(perm_mask & MAY_READ))
+	/*
+	 * Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events.
+	 */
+	if (perm_mask & MAY_READ) {
+		int ret = fsnotify_file(file, FS_ACCESS_PERM);
+
+		if (ret)
+			return ret;
+	}
+
+	/*
+	 * Pre-content events are only reported for regular files and dirs.
+	 */
+	if (!S_ISDIR(inode->i_mode) && !S_ISREG(inode->i_mode))
+		return 0;
+
+	if (perm_mask & MAY_WRITE)
+		fsnotify_mask = FS_PRE_MODIFY;
+	else if (perm_mask & MAY_READ)
+		fsnotify_mask = FS_PRE_ACCESS;
+	else
 		return 0;
 
 	return fsnotify_file(file, fsnotify_mask);
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 8be029bc50b1..21e72b837ec5 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -56,6 +56,9 @@
 #define FS_ACCESS_PERM		0x00020000	/* access event in a permissions hook */
 #define FS_OPEN_EXEC_PERM	0x00040000	/* open/exec event in a permission hook */
 
+#define FS_PRE_ACCESS		0x00100000	/* Pre-content access hook */
+#define FS_PRE_MODIFY		0x00200000	/* Pre-content modify hook */
+
 /*
  * Set on inode mark that cares about things that happen to its children.
  * Always set for dnotify and inotify.
@@ -77,8 +80,14 @@
  */
 #define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE | FS_RENAME)
 
-#define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
-				  FS_OPEN_EXEC_PERM)
+/* Content events can be used to inspect file content */
+#define FSNOTIFY_CONTENT_PERM_EVENTS (FS_OPEN_PERM | FS_OPEN_EXEC_PERM | \
+				      FS_ACCESS_PERM)
+/* Pre-content events can be used to fill file content */
+#define FSNOTIFY_PRE_CONTENT_EVENTS  (FS_PRE_ACCESS | FS_PRE_MODIFY)
+
+#define ALL_FSNOTIFY_PERM_EVENTS (FSNOTIFY_CONTENT_PERM_EVENTS | \
+				  FSNOTIFY_PRE_CONTENT_EVENTS)
 
 /*
  * This is a list of all events that may get sent to a parent that is watching
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 55c78c318ccd..2997edf3e7cd 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3406,7 +3406,8 @@ static int selinux_path_notify(const struct path *path, u64 mask,
 		perm |= FILE__WATCH_WITH_PERM;
 
 	/* watches on read-like events need the file:watch_reads permission */
-	if (mask & (FS_ACCESS | FS_ACCESS_PERM | FS_CLOSE_NOWRITE))
+	if (mask & (FS_ACCESS | FS_ACCESS_PERM | FS_PRE_ACCESS |
+		    FS_CLOSE_NOWRITE))
 		perm |= FILE__WATCH_READS;
 
 	return path_has_perm(current_cred(), path, perm);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 03/10] fsnotify: generate pre-content permission event on open
  2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
  2024-07-25 18:19 ` [PATCH 01/10] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
  2024-07-25 18:19 ` [PATCH 02/10] fsnotify: introduce pre-content permission event Josef Bacik
@ 2024-07-25 18:19 ` Josef Bacik
  2024-08-01 17:01   ` Jan Kara
  2024-07-25 18:19 ` [PATCH 04/10] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2024-07-25 18:19 UTC (permalink / raw)
  To: kernel-team, linux-fsdevel, jack, amir73il, brauner

From: Amir Goldstein <amir73il@gmail.com>

FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on open depending on
file open mode.  The pre-content event will be generated in addition to
FS_OPEN_PERM, but without sb_writers held and after file was truncated
in case file was opened with O_CREAT and/or O_TRUNC.

The event will have a range info of (0..0) to provide an opportunity
to fill entire file content on open.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/namei.c               |  9 +++++++++
 include/linux/fsnotify.h | 10 +++++++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3a4c40e12f78..c16487e3742d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3735,6 +3735,15 @@ static int do_open(struct nameidata *nd,
 	}
 	if (do_truncate)
 		mnt_drop_write(nd->path.mnt);
+
+	/*
+	 * This permission hook is different than fsnotify_open_perm() hook.
+	 * This is a pre-content hook that is called without sb_writers held
+	 * and after the file was truncated.
+	 */
+	if (!error)
+		error = fsnotify_file_perm(file, MAY_OPEN);
+
 	return error;
 }
 
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 028ce807805a..4103dd797477 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -168,6 +168,10 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
 		fsnotify_mask = FS_PRE_MODIFY;
 	else if (perm_mask & MAY_READ)
 		fsnotify_mask = FS_PRE_ACCESS;
+	else if (perm_mask & MAY_OPEN && file->f_mode & FMODE_WRITER)
+		fsnotify_mask = FS_PRE_MODIFY;
+	else if (perm_mask & MAY_OPEN)
+		fsnotify_mask = FS_PRE_ACCESS;
 	else
 		return 0;
 
@@ -176,10 +180,14 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
 
 /*
  * fsnotify_file_perm - permission hook before file access
+ *
+ * Called from read()/write() with perm_mas MAY_READ/MAY_WRITE.
+ * Called from open() with MAY_OPEN in addition to fsnotify_open_perm(),
+ * but without sb_writers held and after the file was truncated.
  */
 static inline int fsnotify_file_perm(struct file *file, int perm_mask)
 {
-	return fsnotify_file_area_perm(file, perm_mask, NULL, 0);
+	return fsnotify_file_area_perm(file, perm_mask, &file->f_pos, 0);
 }
 
 /*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 04/10] fanotify: introduce FAN_PRE_ACCESS permission event
  2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
                   ` (2 preceding siblings ...)
  2024-07-25 18:19 ` [PATCH 03/10] fsnotify: generate pre-content permission event on open Josef Bacik
@ 2024-07-25 18:19 ` Josef Bacik
  2024-08-01 17:04   ` Jan Kara
  2024-07-25 18:19 ` [PATCH 05/10] fanotify: introduce FAN_PRE_MODIFY " Josef Bacik
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2024-07-25 18:19 UTC (permalink / raw)
  To: kernel-team, linux-fsdevel, jack, amir73il, brauner

From: Amir Goldstein <amir73il@gmail.com>

Similar to FAN_ACCESS_PERM permission event, but it is only allowed with
class FAN_CLASS_PRE_CONTENT and only allowed on regular files are dirs.

Unlike FAN_ACCESS_PERM, it is safe to write to the file being accessed
in the context of the event handler.

This pre-content event is meant to be used by hierarchical storage
managers that want to fill the content of files on first read access.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      |  3 ++-
 fs/notify/fanotify/fanotify_user.c | 17 ++++++++++++++---
 include/linux/fanotify.h           | 14 ++++++++++----
 include/uapi/linux/fanotify.h      |  2 ++
 4 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 224bccaab4cc..7dac8e4486df 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -910,8 +910,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 	BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
 	BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
 	BUILD_BUG_ON(FAN_RENAME != FS_RENAME);
+	BUILD_BUG_ON(FAN_PRE_ACCESS != FS_PRE_ACCESS);
 
-	BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 21);
+	BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 22);
 
 	mask = fanotify_group_event_mask(group, iter_info, &match_mask,
 					 mask, data, data_type, dir);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 2e2fba8a9d20..c294849e474f 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1628,6 +1628,7 @@ static int fanotify_events_supported(struct fsnotify_group *group,
 				     unsigned int flags)
 {
 	unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
+	bool is_dir = d_is_dir(path->dentry);
 	/* Strict validation of events in non-dir inode mask with v5.17+ APIs */
 	bool strict_dir_events = FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID) ||
 				 (mask & FAN_RENAME) ||
@@ -1665,9 +1666,15 @@ static int fanotify_events_supported(struct fsnotify_group *group,
 	 * but because we always allowed it, error only when using new APIs.
 	 */
 	if (strict_dir_events && mark_type == FAN_MARK_INODE &&
-	    !d_is_dir(path->dentry) && (mask & FANOTIFY_DIRONLY_EVENT_BITS))
+	    !is_dir && (mask & FANOTIFY_DIRONLY_EVENT_BITS))
 		return -ENOTDIR;
 
+	/* Pre-content events are only supported on regular files and dirs */
+	if (mask & FANOTIFY_PRE_CONTENT_EVENTS) {
+		if (!is_dir && !d_is_reg(path->dentry))
+			return -EINVAL;
+	}
+
 	return 0;
 }
 
@@ -1769,11 +1776,15 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 		goto fput_and_out;
 
 	/*
-	 * Permission events require minimum priority FAN_CLASS_CONTENT.
+	 * Permission events are not allowed for FAN_CLASS_NOTIF.
+	 * Pre-content permission events are not allowed for FAN_CLASS_CONTENT.
 	 */
 	ret = -EINVAL;
 	if (mask & FANOTIFY_PERM_EVENTS &&
-	    group->priority < FSNOTIFY_PRIO_CONTENT)
+	    group->priority == FSNOTIFY_PRIO_NORMAL)
+		goto fput_and_out;
+	else if (mask & FANOTIFY_PRE_CONTENT_EVENTS &&
+		 group->priority == FSNOTIFY_PRIO_CONTENT)
 		goto fput_and_out;
 
 	if (mask & FAN_FS_ERROR &&
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 4f1c4f603118..5c811baf44d2 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -88,6 +88,16 @@
 #define FANOTIFY_DIRENT_EVENTS	(FAN_MOVE | FAN_CREATE | FAN_DELETE | \
 				 FAN_RENAME)
 
+/* Content events can be used to inspect file content */
+#define FANOTIFY_CONTENT_PERM_EVENTS (FAN_OPEN_PERM | FAN_OPEN_EXEC_PERM | \
+				      FAN_ACCESS_PERM)
+/* Pre-content events can be used to fill file content */
+#define FANOTIFY_PRE_CONTENT_EVENTS  (FAN_PRE_ACCESS)
+
+/* Events that require a permission response from user */
+#define FANOTIFY_PERM_EVENTS	(FANOTIFY_CONTENT_PERM_EVENTS | \
+				 FANOTIFY_PRE_CONTENT_EVENTS)
+
 /* Events that can be reported with event->fd */
 #define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS)
 
@@ -103,10 +113,6 @@
 				 FANOTIFY_INODE_EVENTS | \
 				 FANOTIFY_ERROR_EVENTS)
 
-/* Events that require a permission response from user */
-#define FANOTIFY_PERM_EVENTS	(FAN_OPEN_PERM | FAN_ACCESS_PERM | \
-				 FAN_OPEN_EXEC_PERM)
-
 /* Extra flags that may be reported with event or control handling of events */
 #define FANOTIFY_EVENT_FLAGS	(FAN_EVENT_ON_CHILD | FAN_ONDIR)
 
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index a37de58ca571..3ae43867d318 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -26,6 +26,8 @@
 #define FAN_ACCESS_PERM		0x00020000	/* File accessed in perm check */
 #define FAN_OPEN_EXEC_PERM	0x00040000	/* File open/exec in perm check */
 
+#define FAN_PRE_ACCESS		0x00100000	/* Pre-content access hook */
+
 #define FAN_EVENT_ON_CHILD	0x08000000	/* Interested in child events */
 
 #define FAN_RENAME		0x10000000	/* File was renamed */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 05/10] fanotify: introduce FAN_PRE_MODIFY permission event
  2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
                   ` (3 preceding siblings ...)
  2024-07-25 18:19 ` [PATCH 04/10] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
@ 2024-07-25 18:19 ` Josef Bacik
  2024-08-01 17:09   ` Jan Kara
  2024-07-25 18:19 ` [PATCH 06/10] fanotify: pass optional file access range in pre-content event Josef Bacik
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2024-07-25 18:19 UTC (permalink / raw)
  To: kernel-team, linux-fsdevel, jack, amir73il, brauner

From: Amir Goldstein <amir73il@gmail.com>

Generate FAN_PRE_MODIFY permission event from fsnotify_file_perm()
pre-write hook to notify fanotify listeners on an intent to make
modification to a file.

Like FAN_PRE_ACCESS, it is only allowed with FAN_CLASS_PRE_CONTENT
and unlike FAN_MODIFY, it is only allowed on regular files.

Like FAN_PRE_ACCESS, it is generated without sb_start_write() held,
so it is safe for to perform filesystem modifications in the the
context of event handler.

This pre-content event is meant to be used by hierarchical storage
managers that want to fill the content of files on first write access.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      | 3 ++-
 fs/notify/fanotify/fanotify_user.c | 2 ++
 include/linux/fanotify.h           | 3 ++-
 include/uapi/linux/fanotify.h      | 1 +
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 7dac8e4486df..b163594843f5 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -911,8 +911,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 	BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
 	BUILD_BUG_ON(FAN_RENAME != FS_RENAME);
 	BUILD_BUG_ON(FAN_PRE_ACCESS != FS_PRE_ACCESS);
+	BUILD_BUG_ON(FAN_PRE_MODIFY != FS_PRE_MODIFY);
 
-	BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 22);
+	BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 23);
 
 	mask = fanotify_group_event_mask(group, iter_info, &match_mask,
 					 mask, data, data_type, dir);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index c294849e474f..3a7101544f30 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1673,6 +1673,8 @@ static int fanotify_events_supported(struct fsnotify_group *group,
 	if (mask & FANOTIFY_PRE_CONTENT_EVENTS) {
 		if (!is_dir && !d_is_reg(path->dentry))
 			return -EINVAL;
+		if (is_dir && mask & FAN_PRE_MODIFY)
+			return -EISDIR;
 	}
 
 	return 0;
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 5c811baf44d2..ae6cb2688d52 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -92,7 +92,8 @@
 #define FANOTIFY_CONTENT_PERM_EVENTS (FAN_OPEN_PERM | FAN_OPEN_EXEC_PERM | \
 				      FAN_ACCESS_PERM)
 /* Pre-content events can be used to fill file content */
-#define FANOTIFY_PRE_CONTENT_EVENTS  (FAN_PRE_ACCESS)
+#define FANOTIFY_PRE_CONTENT_EVENTS  (FAN_PRE_ACCESS | FAN_PRE_MODIFY)
+#define FANOTIFY_PRE_MODIFY_EVENTS   (FAN_PRE_MODIFY)
 
 /* Events that require a permission response from user */
 #define FANOTIFY_PERM_EVENTS	(FANOTIFY_CONTENT_PERM_EVENTS | \
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 3ae43867d318..c8dacedf73b9 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -27,6 +27,7 @@
 #define FAN_OPEN_EXEC_PERM	0x00040000	/* File open/exec in perm check */
 
 #define FAN_PRE_ACCESS		0x00100000	/* Pre-content access hook */
+#define FAN_PRE_MODIFY		0x00200000	/* Pre-content modify hook */
 
 #define FAN_EVENT_ON_CHILD	0x08000000	/* Interested in child events */
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 06/10] fanotify: pass optional file access range in pre-content event
  2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
                   ` (4 preceding siblings ...)
  2024-07-25 18:19 ` [PATCH 05/10] fanotify: introduce FAN_PRE_MODIFY " Josef Bacik
@ 2024-07-25 18:19 ` Josef Bacik
  2024-08-01 17:16   ` Jan Kara
  2024-07-25 18:19 ` [PATCH 07/10] fanotify: rename a misnamed constant Josef Bacik
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2024-07-25 18:19 UTC (permalink / raw)
  To: kernel-team, linux-fsdevel, jack, amir73il, brauner

From: Amir Goldstein <amir73il@gmail.com>

We would like to add file range information to pre-content events.

Pass a struct file_range with optional offset and length to event handler
along with pre-content permission event.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c    | 12 ++++++++++--
 fs/notify/fanotify/fanotify.h    |  2 ++
 include/linux/fsnotify.h         | 17 ++++++++++++++++-
 include/linux/fsnotify_backend.h | 32 ++++++++++++++++++++++++++++++++
 4 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index b163594843f5..8fa439bd47d6 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -549,9 +549,13 @@ static struct fanotify_event *fanotify_alloc_path_event(const struct path *path,
 	return &pevent->fae;
 }
 
-static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path,
+static struct fanotify_event *fanotify_alloc_perm_event(const void *data,
+							int data_type,
 							gfp_t gfp)
 {
+	const struct path *path = fsnotify_data_path(data, data_type);
+	const struct file_range *range =
+			    fsnotify_data_file_range(data, data_type);
 	struct fanotify_perm_event *pevent;
 
 	pevent = kmem_cache_alloc(fanotify_perm_event_cachep, gfp);
@@ -565,6 +569,10 @@ static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path,
 	pevent->hdr.len = 0;
 	pevent->state = FAN_EVENT_INIT;
 	pevent->path = *path;
+	if (range) {
+		pevent->ppos = range->ppos;
+		pevent->count = range->count;
+	}
 	path_get(path);
 
 	return &pevent->fae;
@@ -802,7 +810,7 @@ static struct fanotify_event *fanotify_alloc_event(
 	old_memcg = set_active_memcg(group->memcg);
 
 	if (fanotify_is_perm_event(mask)) {
-		event = fanotify_alloc_perm_event(path, gfp);
+		event = fanotify_alloc_perm_event(data, data_type, gfp);
 	} else if (fanotify_is_error_event(mask)) {
 		event = fanotify_alloc_error_event(group, fsid, data,
 						   data_type, &hash);
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index e5ab33cae6a7..93598b7d5952 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -425,6 +425,8 @@ FANOTIFY_PE(struct fanotify_event *event)
 struct fanotify_perm_event {
 	struct fanotify_event fae;
 	struct path path;
+	const loff_t *ppos;		/* optional file range info */
+	size_t count;
 	u32 response;			/* userspace answer to the event */
 	unsigned short state;		/* state of the event */
 	int fd;		/* fd we passed to userspace for this event */
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 4103dd797477..3c558c76bd5d 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -132,6 +132,21 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
 }
 
 #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
+static inline int fsnotify_file_range(struct file *file, __u32 mask,
+				      const loff_t *ppos, size_t count)
+{
+	struct file_range range;
+
+	if (file->f_mode & FMODE_NONOTIFY)
+		return 0;
+
+	range.path = &file->f_path;
+	range.ppos = ppos;
+	range.count = count;
+	return fsnotify_parent(range.path->dentry, mask, &range,
+			       FSNOTIFY_EVENT_FILE_RANGE);
+}
+
 /*
  * fsnotify_file_area_perm - permission hook before access/modify of file range
  */
@@ -175,7 +190,7 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
 	else
 		return 0;
 
-	return fsnotify_file(file, fsnotify_mask);
+	return fsnotify_file_range(file, fsnotify_mask, ppos, count);
 }
 
 /*
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 21e72b837ec5..36c3d18cc40a 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -298,6 +298,7 @@ static inline void fsnotify_group_assert_locked(struct fsnotify_group *group)
 /* When calling fsnotify tell it if the data is a path or inode */
 enum fsnotify_data_type {
 	FSNOTIFY_EVENT_NONE,
+	FSNOTIFY_EVENT_FILE_RANGE,
 	FSNOTIFY_EVENT_PATH,
 	FSNOTIFY_EVENT_INODE,
 	FSNOTIFY_EVENT_DENTRY,
@@ -310,6 +311,17 @@ struct fs_error_report {
 	struct super_block *sb;
 };
 
+struct file_range {
+	const struct path *path;
+	const loff_t *ppos;
+	size_t count;
+};
+
+static inline const struct path *file_range_path(const struct file_range *range)
+{
+	return range->path;
+}
+
 static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
 {
 	switch (data_type) {
@@ -319,6 +331,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
 		return d_inode(data);
 	case FSNOTIFY_EVENT_PATH:
 		return d_inode(((const struct path *)data)->dentry);
+	case FSNOTIFY_EVENT_FILE_RANGE:
+		return d_inode(file_range_path(data)->dentry);
 	case FSNOTIFY_EVENT_ERROR:
 		return ((struct fs_error_report *)data)->inode;
 	default:
@@ -334,6 +348,8 @@ static inline struct dentry *fsnotify_data_dentry(const void *data, int data_typ
 		return (struct dentry *)data;
 	case FSNOTIFY_EVENT_PATH:
 		return ((const struct path *)data)->dentry;
+	case FSNOTIFY_EVENT_FILE_RANGE:
+		return file_range_path(data)->dentry;
 	default:
 		return NULL;
 	}
@@ -345,6 +361,8 @@ static inline const struct path *fsnotify_data_path(const void *data,
 	switch (data_type) {
 	case FSNOTIFY_EVENT_PATH:
 		return data;
+	case FSNOTIFY_EVENT_FILE_RANGE:
+		return file_range_path(data);
 	default:
 		return NULL;
 	}
@@ -360,6 +378,8 @@ static inline struct super_block *fsnotify_data_sb(const void *data,
 		return ((struct dentry *)data)->d_sb;
 	case FSNOTIFY_EVENT_PATH:
 		return ((const struct path *)data)->dentry->d_sb;
+	case FSNOTIFY_EVENT_FILE_RANGE:
+		return file_range_path(data)->dentry->d_sb;
 	case FSNOTIFY_EVENT_ERROR:
 		return ((struct fs_error_report *) data)->sb;
 	default:
@@ -379,6 +399,18 @@ static inline struct fs_error_report *fsnotify_data_error_report(
 	}
 }
 
+static inline const struct file_range *fsnotify_data_file_range(
+							const void *data,
+							int data_type)
+{
+	switch (data_type) {
+	case FSNOTIFY_EVENT_FILE_RANGE:
+		return (struct file_range *)data;
+	default:
+		return NULL;
+	}
+}
+
 /*
  * Index to merged marks iterator array that correlates to a type of watch.
  * The type of watched object can be deduced from the iterator type, but not
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 07/10] fanotify: rename a misnamed constant
  2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
                   ` (5 preceding siblings ...)
  2024-07-25 18:19 ` [PATCH 06/10] fanotify: pass optional file access range in pre-content event Josef Bacik
@ 2024-07-25 18:19 ` Josef Bacik
  2024-08-01 17:19   ` Jan Kara
  2024-07-25 18:19 ` [PATCH 08/10] fanotify: report file range info with pre-content events Josef Bacik
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2024-07-25 18:19 UTC (permalink / raw)
  To: kernel-team, linux-fsdevel, jack, amir73il, brauner

From: Amir Goldstein <amir73il@gmail.com>

FANOTIFY_PIDFD_INFO_HDR_LEN is not the length of the header.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 3a7101544f30..5ece186d5c50 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -119,7 +119,7 @@ struct kmem_cache *fanotify_perm_event_cachep __ro_after_init;
 #define FANOTIFY_EVENT_ALIGN 4
 #define FANOTIFY_FID_INFO_HDR_LEN \
 	(sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
-#define FANOTIFY_PIDFD_INFO_HDR_LEN \
+#define FANOTIFY_PIDFD_INFO_LEN \
 	sizeof(struct fanotify_event_info_pidfd)
 #define FANOTIFY_ERROR_INFO_LEN \
 	(sizeof(struct fanotify_event_info_error))
@@ -174,14 +174,14 @@ static size_t fanotify_event_len(unsigned int info_mode,
 		dot_len = 1;
 	}
 
-	if (info_mode & FAN_REPORT_PIDFD)
-		event_len += FANOTIFY_PIDFD_INFO_HDR_LEN;
-
 	if (fanotify_event_has_object_fh(event)) {
 		fh_len = fanotify_event_object_fh_len(event);
 		event_len += fanotify_fid_info_len(fh_len, dot_len);
 	}
 
+	if (info_mode & FAN_REPORT_PIDFD)
+		event_len += FANOTIFY_PIDFD_INFO_LEN;
+
 	return event_len;
 }
 
@@ -511,7 +511,7 @@ static int copy_pidfd_info_to_user(int pidfd,
 				   size_t count)
 {
 	struct fanotify_event_info_pidfd info = { };
-	size_t info_len = FANOTIFY_PIDFD_INFO_HDR_LEN;
+	size_t info_len = FANOTIFY_PIDFD_INFO_LEN;
 
 	if (WARN_ON_ONCE(info_len > count))
 		return -EFAULT;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 08/10] fanotify: report file range info with pre-content events
  2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
                   ` (6 preceding siblings ...)
  2024-07-25 18:19 ` [PATCH 07/10] fanotify: rename a misnamed constant Josef Bacik
@ 2024-07-25 18:19 ` Josef Bacik
  2024-08-01 17:38   ` Jan Kara
  2024-07-25 18:19 ` [PATCH 09/10] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
  2024-07-25 18:19 ` [PATCH 10/10] fsnotify: generate pre-content permission event on page fault Josef Bacik
  9 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2024-07-25 18:19 UTC (permalink / raw)
  To: kernel-team, linux-fsdevel, jack, amir73il, brauner

From: Amir Goldstein <amir73il@gmail.com>

With group class FAN_CLASS_PRE_CONTENT, report offset and length info
along with FAN_PRE_ACCESS and FAN_PRE_MODIFY permission events.

This information is meant to be used by hierarchical storage managers
that want to fill partial content of files on first access to range.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.h      |  8 +++++++
 fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++++
 include/uapi/linux/fanotify.h      |  7 ++++++
 3 files changed, 53 insertions(+)

diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 93598b7d5952..7f06355afa1f 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask)
 		mask & FANOTIFY_PERM_EVENTS;
 }
 
+static inline bool fanotify_event_has_access_range(struct fanotify_event *event)
+{
+	if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS))
+		return false;
+
+	return FANOTIFY_PERM(event)->ppos;
+}
+
 static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse)
 {
 	return container_of(fse, struct fanotify_event, fse);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 5ece186d5c50..c3c8b2ea80b6 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -123,6 +123,8 @@ struct kmem_cache *fanotify_perm_event_cachep __ro_after_init;
 	sizeof(struct fanotify_event_info_pidfd)
 #define FANOTIFY_ERROR_INFO_LEN \
 	(sizeof(struct fanotify_event_info_error))
+#define FANOTIFY_RANGE_INFO_LEN \
+	(sizeof(struct fanotify_event_info_range))
 
 static int fanotify_fid_info_len(int fh_len, int name_len)
 {
@@ -182,6 +184,9 @@ static size_t fanotify_event_len(unsigned int info_mode,
 	if (info_mode & FAN_REPORT_PIDFD)
 		event_len += FANOTIFY_PIDFD_INFO_LEN;
 
+	if (fanotify_event_has_access_range(event))
+		event_len += FANOTIFY_RANGE_INFO_LEN;
+
 	return event_len;
 }
 
@@ -526,6 +531,30 @@ static int copy_pidfd_info_to_user(int pidfd,
 	return info_len;
 }
 
+static size_t copy_range_info_to_user(struct fanotify_event *event,
+				      char __user *buf, int count)
+{
+	struct fanotify_perm_event *pevent = FANOTIFY_PERM(event);
+	struct fanotify_event_info_range info = { };
+	size_t info_len = FANOTIFY_RANGE_INFO_LEN;
+
+	if (WARN_ON_ONCE(info_len > count))
+		return -EFAULT;
+
+	if (WARN_ON_ONCE(!pevent->ppos))
+		return 0;
+
+	info.hdr.info_type = FAN_EVENT_INFO_TYPE_RANGE;
+	info.hdr.len = info_len;
+	info.offset = *(pevent->ppos);
+	info.count = pevent->count;
+
+	if (copy_to_user(buf, &info, info_len))
+		return -EFAULT;
+
+	return info_len;
+}
+
 static int copy_info_records_to_user(struct fanotify_event *event,
 				     struct fanotify_info *info,
 				     unsigned int info_mode, int pidfd,
@@ -647,6 +676,15 @@ static int copy_info_records_to_user(struct fanotify_event *event,
 		total_bytes += ret;
 	}
 
+	if (fanotify_event_has_access_range(event)) {
+		ret = copy_range_info_to_user(event, buf, count);
+		if (ret < 0)
+			return ret;
+		buf += ret;
+		count -= ret;
+		total_bytes += ret;
+	}
+
 	return total_bytes;
 }
 
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index c8dacedf73b9..7c92d0f6bf71 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -145,6 +145,7 @@ struct fanotify_event_metadata {
 #define FAN_EVENT_INFO_TYPE_DFID	3
 #define FAN_EVENT_INFO_TYPE_PIDFD	4
 #define FAN_EVENT_INFO_TYPE_ERROR	5
+#define FAN_EVENT_INFO_TYPE_RANGE	6
 
 /* Special info types for FAN_RENAME */
 #define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME	10
@@ -191,6 +192,12 @@ struct fanotify_event_info_error {
 	__u32 error_count;
 };
 
+struct fanotify_event_info_range {
+	struct fanotify_event_info_header hdr;
+	__u32 count;
+	__u64 offset;
+};
+
 /*
  * User space may need to record additional information about its decision.
  * The extra information type records what kind of information is included.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 09/10] fanotify: allow to set errno in FAN_DENY permission response
  2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
                   ` (7 preceding siblings ...)
  2024-07-25 18:19 ` [PATCH 08/10] fanotify: report file range info with pre-content events Josef Bacik
@ 2024-07-25 18:19 ` Josef Bacik
  2024-08-01 21:14   ` Jan Kara
  2024-07-25 18:19 ` [PATCH 10/10] fsnotify: generate pre-content permission event on page fault Josef Bacik
  9 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2024-07-25 18:19 UTC (permalink / raw)
  To: kernel-team, linux-fsdevel, jack, amir73il, brauner

From: Amir Goldstein <amir73il@gmail.com>

With FAN_DENY response, user trying to perform the filesystem operation
gets an error with errno set to EPERM.

It is useful for hierarchical storage management (HSM) service to be able
to deny access for reasons more diverse than EPERM, for example EAGAIN,
if HSM could retry the operation later.

Allow fanotify groups with priority FAN_CLASSS_PRE_CONTENT to responsd
to permission events with the response value FAN_DENY_ERRNO(errno),
instead of FAN_DENY to return a custom error.

Limit custom error to values to some errors expected on read(2)/write(2)
and open(2) of regular files. This list could be extended in the future.
Userspace can test for legitimate values of FAN_DENY_ERRNO(errno) by
writing a response to an fanotify group fd with a value of FAN_NOFD
in the fd field of the response.

The change in fanotify_response is backward compatible, because errno is
written in the high 8 bits of the 32bit response field and old kernels
reject respose value with high bits set.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      | 13 ++++++++++---
 fs/notify/fanotify/fanotify_user.c | 31 +++++++++++++++++++++++++++---
 include/linux/fanotify.h           |  9 ++++++++-
 include/uapi/linux/fanotify.h      |  7 +++++++
 4 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 8fa439bd47d6..48539acc32e0 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -224,7 +224,7 @@ static int fanotify_get_response(struct fsnotify_group *group,
 				 struct fanotify_perm_event *event,
 				 struct fsnotify_iter_info *iter_info)
 {
-	int ret;
+	int ret, errno;
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
@@ -258,18 +258,25 @@ static int fanotify_get_response(struct fsnotify_group *group,
 	}
 
 	/* userspace responded, convert to something usable */
-	switch (event->response & FANOTIFY_RESPONSE_ACCESS) {
+	switch (FAN_RESPONSE_ACCESS(event->response)) {
 	case FAN_ALLOW:
 		ret = 0;
 		break;
 	case FAN_DENY:
+		/* Check custom errno from pre-content events */
+		errno = FAN_RESPONSE_ERRNO(event->response);
+		if (errno) {
+			ret = -errno;
+			break;
+		}
+		fallthrough;
 	default:
 		ret = -EPERM;
 	}
 
 	/* Check if the response should be audited */
 	if (event->response & FAN_AUDIT)
-		audit_fanotify(event->response & ~FAN_AUDIT,
+		audit_fanotify(FAN_RESPONSE_ACCESS(event->response),
 			       &event->audit_rule);
 
 	pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index c3c8b2ea80b6..b4d810168521 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -337,11 +337,12 @@ static int process_access_response(struct fsnotify_group *group,
 	struct fanotify_perm_event *event;
 	int fd = response_struct->fd;
 	u32 response = response_struct->response;
+	int errno = FAN_RESPONSE_ERRNO(response);
 	int ret = info_len;
 	struct fanotify_response_info_audit_rule friar;
 
-	pr_debug("%s: group=%p fd=%d response=%u buf=%p size=%zu\n", __func__,
-		 group, fd, response, info, info_len);
+	pr_debug("%s: group=%p fd=%d response=%x errno=%d buf=%p size=%zu\n",
+		 __func__, group, fd, response, errno, info, info_len);
 	/*
 	 * make sure the response is valid, if invalid we do nothing and either
 	 * userspace can send a valid response or we will clean it up after the
@@ -350,9 +351,33 @@ static int process_access_response(struct fsnotify_group *group,
 	if (response & ~FANOTIFY_RESPONSE_VALID_MASK)
 		return -EINVAL;
 
-	switch (response & FANOTIFY_RESPONSE_ACCESS) {
+	switch (FAN_RESPONSE_ACCESS(response)) {
 	case FAN_ALLOW:
+		if (errno)
+			return -EINVAL;
+		break;
 	case FAN_DENY:
+		/* Custom errno is supported only for pre-content groups */
+		if (errno && group->priority != FSNOTIFY_PRIO_PRE_CONTENT)
+			return -EINVAL;
+
+		/*
+		 * Limit errno to values expected on open(2)/read(2)/write(2)
+		 * of regular files.
+		 */
+		switch (errno) {
+		case 0:
+		case EIO:
+		case EPERM:
+		case EBUSY:
+		case ETXTBSY:
+		case EAGAIN:
+		case ENOSPC:
+		case EDQUOT:
+			break;
+		default:
+			return -EINVAL;
+		}
 		break;
 	default:
 		return -EINVAL;
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index ae6cb2688d52..76d818a7d654 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -132,7 +132,14 @@
 /* These masks check for invalid bits in permission responses. */
 #define FANOTIFY_RESPONSE_ACCESS (FAN_ALLOW | FAN_DENY)
 #define FANOTIFY_RESPONSE_FLAGS (FAN_AUDIT | FAN_INFO)
-#define FANOTIFY_RESPONSE_VALID_MASK (FANOTIFY_RESPONSE_ACCESS | FANOTIFY_RESPONSE_FLAGS)
+#define FANOTIFY_RESPONSE_ERRNO	(FAN_ERRNO_MASK << FAN_ERRNO_SHIFT)
+#define FANOTIFY_RESPONSE_VALID_MASK \
+	(FANOTIFY_RESPONSE_ACCESS | FANOTIFY_RESPONSE_FLAGS | \
+	 FANOTIFY_RESPONSE_ERRNO)
+
+/* errno other than EPERM can specified in upper byte of deny response */
+#define FAN_RESPONSE_ACCESS(res)	((res) & FANOTIFY_RESPONSE_ACCESS)
+#define FAN_RESPONSE_ERRNO(res)		((int)((res) >> FAN_ERRNO_SHIFT))
 
 /* Do not use these old uapi constants internally */
 #undef FAN_ALL_CLASS_BITS
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 7c92d0f6bf71..2206b3ec01c9 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -233,6 +233,13 @@ struct fanotify_response_info_audit_rule {
 /* Legit userspace responses to a _PERM event */
 #define FAN_ALLOW	0x01
 #define FAN_DENY	0x02
+/* errno other than EPERM can specified in upper byte of deny response */
+#define FAN_ERRNO_BITS	8
+#define FAN_ERRNO_SHIFT (32 - FAN_ERRNO_BITS)
+#define FAN_ERRNO_MASK	((1 << FAN_ERRNO_BITS) - 1)
+#define FAN_DENY_ERRNO(err) \
+	(FAN_DENY | ((((__u32)(err)) & FAN_ERRNO_MASK) << FAN_ERRNO_SHIFT))
+
 #define FAN_AUDIT	0x10	/* Bitmask to create audit record for result */
 #define FAN_INFO	0x20	/* Bitmask to indicate additional information */
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
  2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
                   ` (8 preceding siblings ...)
  2024-07-25 18:19 ` [PATCH 09/10] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
@ 2024-07-25 18:19 ` Josef Bacik
  2024-07-25 20:19   ` Amir Goldstein
  2024-08-01 21:40   ` Jan Kara
  9 siblings, 2 replies; 47+ messages in thread
From: Josef Bacik @ 2024-07-25 18:19 UTC (permalink / raw)
  To: kernel-team, linux-fsdevel, jack, amir73il, brauner

FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on page fault depending
on the faulting method.

This pre-content event is meant to be used by hierarchical storage
managers that want to fill in the file content on first read access.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/notify/fsnotify.c             | 13 +++++++++
 include/linux/fsnotify_backend.h | 14 +++++++++
 mm/filemap.c                     | 50 ++++++++++++++++++++++++++++----
 3 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 1ca4a8da7f29..435232d46b4f 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -28,6 +28,19 @@ void __fsnotify_vfsmount_delete(struct vfsmount *mnt)
 	fsnotify_clear_marks_by_mount(mnt);
 }
 
+bool fsnotify_file_has_content_watches(struct file *file)
+{
+	struct inode *inode = file_inode(file);
+	struct super_block *sb = inode->i_sb;
+	struct mount *mnt = real_mount(file->f_path.mnt);
+	u32 mask = inode->i_fsnotify_mask;
+
+	mask |= mnt->mnt_fsnotify_mask;
+	mask |= sb->s_fsnotify_mask;
+
+	return !!(mask & FSNOTIFY_PRE_CONTENT_EVENTS);
+}
+
 /**
  * fsnotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
  * @sb: superblock being unmounted.
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 36c3d18cc40a..6983fbf096b8 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -900,6 +900,15 @@ static inline void fsnotify_init_event(struct fsnotify_event *event)
 	INIT_LIST_HEAD(&event->list);
 }
 
+#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
+bool fsnotify_file_has_content_watches(struct file *file);
+#else
+static inline bool fsnotify_file_has_content_watches(struct file *file)
+{
+	return false;
+}
+#endif /* CONFIG_FANOTIFY_ACCESS_PERMISSIONS */
+
 #else
 
 static inline int fsnotify(__u32 mask, const void *data, int data_type,
@@ -938,6 +947,11 @@ static inline u32 fsnotify_get_cookie(void)
 static inline void fsnotify_unmount_inodes(struct super_block *sb)
 {}
 
+static inline bool fsnotify_file_has_content_watches(struct file *file)
+{
+	return false;
+}
+
 #endif	/* CONFIG_FSNOTIFY */
 
 #endif	/* __KERNEL __ */
diff --git a/mm/filemap.c b/mm/filemap.c
index ca8c8d889eef..cc9d7885bbe3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -46,6 +46,7 @@
 #include <linux/pipe_fs_i.h>
 #include <linux/splice.h>
 #include <linux/rcupdate_wait.h>
+#include <linux/fsnotify.h>
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 #include "internal.h"
@@ -3112,13 +3113,13 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
  * that.  If we didn't pin a file then we return NULL.  The file that is
  * returned needs to be fput()'ed when we're done with it.
  */
-static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
+static struct file *do_sync_mmap_readahead(struct vm_fault *vmf,
+					   struct file *fpin)
 {
 	struct file *file = vmf->vma->vm_file;
 	struct file_ra_state *ra = &file->f_ra;
 	struct address_space *mapping = file->f_mapping;
 	DEFINE_READAHEAD(ractl, file, ra, mapping, vmf->pgoff);
-	struct file *fpin = NULL;
 	unsigned long vm_flags = vmf->vma->vm_flags;
 	unsigned int mmap_miss;
 
@@ -3182,12 +3183,12 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
  * was pinned if we have to drop the mmap_lock in order to do IO.
  */
 static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
-					    struct folio *folio)
+					    struct folio *folio,
+					    struct file *fpin)
 {
 	struct file *file = vmf->vma->vm_file;
 	struct file_ra_state *ra = &file->f_ra;
 	DEFINE_READAHEAD(ractl, file, ra, file->f_mapping, vmf->pgoff);
-	struct file *fpin = NULL;
 	unsigned int mmap_miss;
 
 	/* If we don't want any read-ahead, don't bother */
@@ -3287,6 +3288,35 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 	if (unlikely(index >= max_idx))
 		return VM_FAULT_SIGBUS;
 
+	/*
+	 * If we have pre-content watchers then we need to generate events on
+	 * page fault so that we can populate any data before the fault.
+	 *
+	 * We only do this on the first pass through, otherwise the populating
+	 * application could potentially deadlock on the mmap lock if it tries
+	 * to populate it with mmap.
+	 */
+	if (fault_flag_allow_retry_first(vmf->flags) &&
+	    fsnotify_file_has_content_watches(file)) {
+		int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_READ;
+		loff_t pos = vmf->pgoff << PAGE_SHIFT;
+
+		fpin = maybe_unlock_mmap_for_io(vmf, fpin);
+
+		/*
+		 * We can only emit the event if we did actually release the
+		 * mmap lock.
+		 */
+		if (fpin) {
+			error = fsnotify_file_area_perm(fpin, mask, &pos,
+							PAGE_SIZE);
+			if (error) {
+				fput(fpin);
+				return VM_FAULT_ERROR;
+			}
+		}
+	}
+
 	/*
 	 * Do we have something in the page cache already?
 	 */
@@ -3297,7 +3327,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 		 * the lock.
 		 */
 		if (!(vmf->flags & FAULT_FLAG_TRIED))
-			fpin = do_async_mmap_readahead(vmf, folio);
+			fpin = do_async_mmap_readahead(vmf, folio, fpin);
 		if (unlikely(!folio_test_uptodate(folio))) {
 			filemap_invalidate_lock_shared(mapping);
 			mapping_locked = true;
@@ -3311,7 +3341,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 		count_vm_event(PGMAJFAULT);
 		count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
 		ret = VM_FAULT_MAJOR;
-		fpin = do_sync_mmap_readahead(vmf);
+		fpin = do_sync_mmap_readahead(vmf, fpin);
 retry_find:
 		/*
 		 * See comment in filemap_create_folio() why we need
@@ -3604,6 +3634,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	struct vm_area_struct *vma = vmf->vma;
 	struct file *file = vma->vm_file;
 	struct address_space *mapping = file->f_mapping;
+	struct inode *inode = mapping->host;
 	pgoff_t last_pgoff = start_pgoff;
 	unsigned long addr;
 	XA_STATE(xas, &mapping->i_pages, start_pgoff);
@@ -3612,6 +3643,13 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	unsigned long rss = 0;
 	unsigned int nr_pages = 0, mmap_miss = 0, mmap_miss_saved, folio_type;
 
+	/*
+	 * We are under RCU, we can't emit events here, we need to force a
+	 * normal fault to make sure the events get sent.
+	 */
+	if (fsnotify_file_has_content_watches(file))
+		return ret;
+
 	rcu_read_lock();
 	folio = next_uptodate_folio(&xas, mapping, end_pgoff);
 	if (!folio)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
  2024-07-25 18:19 ` [PATCH 10/10] fsnotify: generate pre-content permission event on page fault Josef Bacik
@ 2024-07-25 20:19   ` Amir Goldstein
  2024-07-29 17:11     ` Josef Bacik
  2024-08-01 21:40   ` Jan Kara
  1 sibling, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2024-07-25 20:19 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-fsdevel, jack, brauner

On Thu, Jul 25, 2024 at 9:20 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on page fault depending
> on the faulting method.
>
> This pre-content event is meant to be used by hierarchical storage
> managers that want to fill in the file content on first read access.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/notify/fsnotify.c             | 13 +++++++++
>  include/linux/fsnotify_backend.h | 14 +++++++++
>  mm/filemap.c                     | 50 ++++++++++++++++++++++++++++----
>  3 files changed, 71 insertions(+), 6 deletions(-)
>
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 1ca4a8da7f29..435232d46b4f 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -28,6 +28,19 @@ void __fsnotify_vfsmount_delete(struct vfsmount *mnt)
>         fsnotify_clear_marks_by_mount(mnt);
>  }
>
> +bool fsnotify_file_has_content_watches(struct file *file)

nit: has_pre_content_watches...

> +{
> +       struct inode *inode = file_inode(file);
> +       struct super_block *sb = inode->i_sb;
> +       struct mount *mnt = real_mount(file->f_path.mnt);
> +       u32 mask = inode->i_fsnotify_mask;
> +
> +       mask |= mnt->mnt_fsnotify_mask;
> +       mask |= sb->s_fsnotify_mask;
> +
> +       return !!(mask & FSNOTIFY_PRE_CONTENT_EVENTS);

This can use the fsnotify_object_watched() helper, and it will need
the READ_ONCE() that are just being added to avoid data races.

> +}
> +
>  /**
>   * fsnotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
>   * @sb: superblock being unmounted.
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 36c3d18cc40a..6983fbf096b8 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -900,6 +900,15 @@ static inline void fsnotify_init_event(struct fsnotify_event *event)
>         INIT_LIST_HEAD(&event->list);
>  }
>
> +#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> +bool fsnotify_file_has_content_watches(struct file *file);
> +#else
> +static inline bool fsnotify_file_has_content_watches(struct file *file)
> +{
> +       return false;
> +}
> +#endif /* CONFIG_FANOTIFY_ACCESS_PERMISSIONS */
> +
>  #else
>
>  static inline int fsnotify(__u32 mask, const void *data, int data_type,
> @@ -938,6 +947,11 @@ static inline u32 fsnotify_get_cookie(void)
>  static inline void fsnotify_unmount_inodes(struct super_block *sb)
>  {}
>
> +static inline bool fsnotify_file_has_content_watches(struct file *file)
> +{
> +       return false;
> +}
> +
>  #endif /* CONFIG_FSNOTIFY */
>
>  #endif /* __KERNEL __ */
> diff --git a/mm/filemap.c b/mm/filemap.c
> index ca8c8d889eef..cc9d7885bbe3 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -46,6 +46,7 @@
>  #include <linux/pipe_fs_i.h>
>  #include <linux/splice.h>
>  #include <linux/rcupdate_wait.h>
> +#include <linux/fsnotify.h>
>  #include <asm/pgalloc.h>
>  #include <asm/tlbflush.h>
>  #include "internal.h"
> @@ -3112,13 +3113,13 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
>   * that.  If we didn't pin a file then we return NULL.  The file that is
>   * returned needs to be fput()'ed when we're done with it.
>   */
> -static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
> +static struct file *do_sync_mmap_readahead(struct vm_fault *vmf,
> +                                          struct file *fpin)
>  {
>         struct file *file = vmf->vma->vm_file;
>         struct file_ra_state *ra = &file->f_ra;
>         struct address_space *mapping = file->f_mapping;
>         DEFINE_READAHEAD(ractl, file, ra, mapping, vmf->pgoff);
> -       struct file *fpin = NULL;
>         unsigned long vm_flags = vmf->vma->vm_flags;
>         unsigned int mmap_miss;
>
> @@ -3182,12 +3183,12 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
>   * was pinned if we have to drop the mmap_lock in order to do IO.
>   */
>  static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
> -                                           struct folio *folio)
> +                                           struct folio *folio,
> +                                           struct file *fpin)
>  {

If I am reading correctly, iomap (i.e. xfs) write shared memory fault
does not reach this code?

Do we care about writable shared memory faults use case for HSM?
It does not sound very relevant to HSM, but we cannot just ignore it..

>         struct file *file = vmf->vma->vm_file;
>         struct file_ra_state *ra = &file->f_ra;
>         DEFINE_READAHEAD(ractl, file, ra, file->f_mapping, vmf->pgoff);
> -       struct file *fpin = NULL;
>         unsigned int mmap_miss;
>
>         /* If we don't want any read-ahead, don't bother */
> @@ -3287,6 +3288,35 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
>         if (unlikely(index >= max_idx))
>                 return VM_FAULT_SIGBUS;
>
> +       /*
> +        * If we have pre-content watchers then we need to generate events on
> +        * page fault so that we can populate any data before the fault.
> +        *
> +        * We only do this on the first pass through, otherwise the populating
> +        * application could potentially deadlock on the mmap lock if it tries
> +        * to populate it with mmap.
> +        */
> +       if (fault_flag_allow_retry_first(vmf->flags) &&
> +           fsnotify_file_has_content_watches(file)) {
> +               int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_READ;
> +               loff_t pos = vmf->pgoff << PAGE_SHIFT;
> +
> +               fpin = maybe_unlock_mmap_for_io(vmf, fpin);
> +
> +               /*
> +                * We can only emit the event if we did actually release the
> +                * mmap lock.
> +                */
> +               if (fpin) {
> +                       error = fsnotify_file_area_perm(fpin, mask, &pos,
> +                                                       PAGE_SIZE);

This is going to also emit a FAN_ACCESS_PERM event.
Heritage of the fact that read() has to emit FAN_ACCESS_PERM
for backward compat and a design decision to emit both
FAN_ACCESS_PERM and FAN_PRE_ACCESS to avoid carrying the
API baggage of the legacy FAN_ACCESS_PERM to pre-content events.

Suggestion as workaround - use MAY_ACCESS instead of MAY_READ
here and emit FAN_PRE_ACCESS with MAY_READ | MAY_ACCESS.

Thank you for pushing my patches through!
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
  2024-07-25 20:19   ` Amir Goldstein
@ 2024-07-29 17:11     ` Josef Bacik
  2024-07-29 18:57       ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2024-07-29 17:11 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: kernel-team, linux-fsdevel, jack, brauner

On Thu, Jul 25, 2024 at 11:19:33PM +0300, Amir Goldstein wrote:
> On Thu, Jul 25, 2024 at 9:20 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on page fault depending
> > on the faulting method.
> >
> > This pre-content event is meant to be used by hierarchical storage
> > managers that want to fill in the file content on first read access.
> >
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/notify/fsnotify.c             | 13 +++++++++
> >  include/linux/fsnotify_backend.h | 14 +++++++++
> >  mm/filemap.c                     | 50 ++++++++++++++++++++++++++++----
> >  3 files changed, 71 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> > index 1ca4a8da7f29..435232d46b4f 100644
> > --- a/fs/notify/fsnotify.c
> > +++ b/fs/notify/fsnotify.c
> > @@ -28,6 +28,19 @@ void __fsnotify_vfsmount_delete(struct vfsmount *mnt)
> >         fsnotify_clear_marks_by_mount(mnt);
> >  }
> >
> > +bool fsnotify_file_has_content_watches(struct file *file)
> 
> nit: has_pre_content_watches...
> 
> > +{
> > +       struct inode *inode = file_inode(file);
> > +       struct super_block *sb = inode->i_sb;
> > +       struct mount *mnt = real_mount(file->f_path.mnt);
> > +       u32 mask = inode->i_fsnotify_mask;
> > +
> > +       mask |= mnt->mnt_fsnotify_mask;
> > +       mask |= sb->s_fsnotify_mask;
> > +
> > +       return !!(mask & FSNOTIFY_PRE_CONTENT_EVENTS);
> 
> This can use the fsnotify_object_watched() helper, and it will need
> the READ_ONCE() that are just being added to avoid data races.
> 
> > +}
> > +
> >  /**
> >   * fsnotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
> >   * @sb: superblock being unmounted.
> > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> > index 36c3d18cc40a..6983fbf096b8 100644
> > --- a/include/linux/fsnotify_backend.h
> > +++ b/include/linux/fsnotify_backend.h
> > @@ -900,6 +900,15 @@ static inline void fsnotify_init_event(struct fsnotify_event *event)
> >         INIT_LIST_HEAD(&event->list);
> >  }
> >
> > +#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> > +bool fsnotify_file_has_content_watches(struct file *file);
> > +#else
> > +static inline bool fsnotify_file_has_content_watches(struct file *file)
> > +{
> > +       return false;
> > +}
> > +#endif /* CONFIG_FANOTIFY_ACCESS_PERMISSIONS */
> > +
> >  #else
> >
> >  static inline int fsnotify(__u32 mask, const void *data, int data_type,
> > @@ -938,6 +947,11 @@ static inline u32 fsnotify_get_cookie(void)
> >  static inline void fsnotify_unmount_inodes(struct super_block *sb)
> >  {}
> >
> > +static inline bool fsnotify_file_has_content_watches(struct file *file)
> > +{
> > +       return false;
> > +}
> > +
> >  #endif /* CONFIG_FSNOTIFY */
> >
> >  #endif /* __KERNEL __ */
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index ca8c8d889eef..cc9d7885bbe3 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -46,6 +46,7 @@
> >  #include <linux/pipe_fs_i.h>
> >  #include <linux/splice.h>
> >  #include <linux/rcupdate_wait.h>
> > +#include <linux/fsnotify.h>
> >  #include <asm/pgalloc.h>
> >  #include <asm/tlbflush.h>
> >  #include "internal.h"
> > @@ -3112,13 +3113,13 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
> >   * that.  If we didn't pin a file then we return NULL.  The file that is
> >   * returned needs to be fput()'ed when we're done with it.
> >   */
> > -static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
> > +static struct file *do_sync_mmap_readahead(struct vm_fault *vmf,
> > +                                          struct file *fpin)
> >  {
> >         struct file *file = vmf->vma->vm_file;
> >         struct file_ra_state *ra = &file->f_ra;
> >         struct address_space *mapping = file->f_mapping;
> >         DEFINE_READAHEAD(ractl, file, ra, mapping, vmf->pgoff);
> > -       struct file *fpin = NULL;
> >         unsigned long vm_flags = vmf->vma->vm_flags;
> >         unsigned int mmap_miss;
> >
> > @@ -3182,12 +3183,12 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
> >   * was pinned if we have to drop the mmap_lock in order to do IO.
> >   */
> >  static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
> > -                                           struct folio *folio)
> > +                                           struct folio *folio,
> > +                                           struct file *fpin)
> >  {
> 
> If I am reading correctly, iomap (i.e. xfs) write shared memory fault
> does not reach this code?
> 
> Do we care about writable shared memory faults use case for HSM?
> It does not sound very relevant to HSM, but we cannot just ignore it..
> 

Sorry I realized I went off to try and solve this problem and never responded to
you.  I'm addressing the other comments, but this one is a little tricky.

We're kind of stuck between a rock and a hard place with this.  I had originally
put this before the ->fault() callback, but purposefully moved it into
filemap_fault() because I want to be able to drop the mmap lock while we're
waiting for a response from the HSM.

The reason to do this is because there are things that take the mmap lock for
simple things outside of the process, like /proc/$PID/smaps and other related
things, and this can cause high priority tasks to block behind possibly low
priority IO, creating a priority inversion.

Now, I'm not sure how widespread of a problem this is anymore, I know there's
been work done to the kernel and tools to avoid this style of problem.  I'm ok
with a "try it and see" approach, but I don't love that.

However I think putting fsnotify hooks into XFS itself for this particular path
is a good choice either.  What do you think?  Just move it to before ->fault(),
leave the mmap lock in place, and be done with it?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
  2024-07-29 17:11     ` Josef Bacik
@ 2024-07-29 18:57       ` Amir Goldstein
  2024-07-30 12:18         ` Jan Kara
  0 siblings, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2024-07-29 18:57 UTC (permalink / raw)
  To: Josef Bacik, jack; +Cc: kernel-team, linux-fsdevel, brauner

On Mon, Jul 29, 2024 at 8:11 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Thu, Jul 25, 2024 at 11:19:33PM +0300, Amir Goldstein wrote:
> > On Thu, Jul 25, 2024 at 9:20 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > >
> > > FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on page fault depending
> > > on the faulting method.
> > >
> > > This pre-content event is meant to be used by hierarchical storage
> > > managers that want to fill in the file content on first read access.
> > >
> > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > > ---
> > >  fs/notify/fsnotify.c             | 13 +++++++++
> > >  include/linux/fsnotify_backend.h | 14 +++++++++
> > >  mm/filemap.c                     | 50 ++++++++++++++++++++++++++++----
> > >  3 files changed, 71 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> > > index 1ca4a8da7f29..435232d46b4f 100644
> > > --- a/fs/notify/fsnotify.c
> > > +++ b/fs/notify/fsnotify.c
> > > @@ -28,6 +28,19 @@ void __fsnotify_vfsmount_delete(struct vfsmount *mnt)
> > >         fsnotify_clear_marks_by_mount(mnt);
> > >  }
> > >
> > > +bool fsnotify_file_has_content_watches(struct file *file)
> >
> > nit: has_pre_content_watches...
> >
> > > +{
> > > +       struct inode *inode = file_inode(file);
> > > +       struct super_block *sb = inode->i_sb;
> > > +       struct mount *mnt = real_mount(file->f_path.mnt);
> > > +       u32 mask = inode->i_fsnotify_mask;
> > > +
> > > +       mask |= mnt->mnt_fsnotify_mask;
> > > +       mask |= sb->s_fsnotify_mask;
> > > +
> > > +       return !!(mask & FSNOTIFY_PRE_CONTENT_EVENTS);
> >
> > This can use the fsnotify_object_watched() helper, and it will need
> > the READ_ONCE() that are just being added to avoid data races.
> >
> > > +}
> > > +
> > >  /**
> > >   * fsnotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
> > >   * @sb: superblock being unmounted.
> > > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> > > index 36c3d18cc40a..6983fbf096b8 100644
> > > --- a/include/linux/fsnotify_backend.h
> > > +++ b/include/linux/fsnotify_backend.h
> > > @@ -900,6 +900,15 @@ static inline void fsnotify_init_event(struct fsnotify_event *event)
> > >         INIT_LIST_HEAD(&event->list);
> > >  }
> > >
> > > +#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> > > +bool fsnotify_file_has_content_watches(struct file *file);
> > > +#else
> > > +static inline bool fsnotify_file_has_content_watches(struct file *file)
> > > +{
> > > +       return false;
> > > +}
> > > +#endif /* CONFIG_FANOTIFY_ACCESS_PERMISSIONS */
> > > +
> > >  #else
> > >
> > >  static inline int fsnotify(__u32 mask, const void *data, int data_type,
> > > @@ -938,6 +947,11 @@ static inline u32 fsnotify_get_cookie(void)
> > >  static inline void fsnotify_unmount_inodes(struct super_block *sb)
> > >  {}
> > >
> > > +static inline bool fsnotify_file_has_content_watches(struct file *file)
> > > +{
> > > +       return false;
> > > +}
> > > +
> > >  #endif /* CONFIG_FSNOTIFY */
> > >
> > >  #endif /* __KERNEL __ */
> > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > index ca8c8d889eef..cc9d7885bbe3 100644
> > > --- a/mm/filemap.c
> > > +++ b/mm/filemap.c
> > > @@ -46,6 +46,7 @@
> > >  #include <linux/pipe_fs_i.h>
> > >  #include <linux/splice.h>
> > >  #include <linux/rcupdate_wait.h>
> > > +#include <linux/fsnotify.h>
> > >  #include <asm/pgalloc.h>
> > >  #include <asm/tlbflush.h>
> > >  #include "internal.h"
> > > @@ -3112,13 +3113,13 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
> > >   * that.  If we didn't pin a file then we return NULL.  The file that is
> > >   * returned needs to be fput()'ed when we're done with it.
> > >   */
> > > -static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
> > > +static struct file *do_sync_mmap_readahead(struct vm_fault *vmf,
> > > +                                          struct file *fpin)
> > >  {
> > >         struct file *file = vmf->vma->vm_file;
> > >         struct file_ra_state *ra = &file->f_ra;
> > >         struct address_space *mapping = file->f_mapping;
> > >         DEFINE_READAHEAD(ractl, file, ra, mapping, vmf->pgoff);
> > > -       struct file *fpin = NULL;
> > >         unsigned long vm_flags = vmf->vma->vm_flags;
> > >         unsigned int mmap_miss;
> > >
> > > @@ -3182,12 +3183,12 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
> > >   * was pinned if we have to drop the mmap_lock in order to do IO.
> > >   */
> > >  static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
> > > -                                           struct folio *folio)
> > > +                                           struct folio *folio,
> > > +                                           struct file *fpin)
> > >  {
> >
> > If I am reading correctly, iomap (i.e. xfs) write shared memory fault
> > does not reach this code?
> >
> > Do we care about writable shared memory faults use case for HSM?
> > It does not sound very relevant to HSM, but we cannot just ignore it..
> >
>
> Sorry I realized I went off to try and solve this problem and never responded to
> you.  I'm addressing the other comments, but this one is a little tricky.
>
> We're kind of stuck between a rock and a hard place with this.  I had originally
> put this before the ->fault() callback, but purposefully moved it into
> filemap_fault() because I want to be able to drop the mmap lock while we're
> waiting for a response from the HSM.
>
> The reason to do this is because there are things that take the mmap lock for
> simple things outside of the process, like /proc/$PID/smaps and other related
> things, and this can cause high priority tasks to block behind possibly low
> priority IO, creating a priority inversion.
>
> Now, I'm not sure how widespread of a problem this is anymore, I know there's
> been work done to the kernel and tools to avoid this style of problem.  I'm ok
> with a "try it and see" approach, but I don't love that.
>

I defer this question to Jan.

> However I think putting fsnotify hooks into XFS itself for this particular path
> is a good choice either.

I think you meant "not a good choice" and I agree -
it is not only xfs, but could be any fs that will be converted to iomap
Other fs have ->fault != filemap_fault, even if they do end up calling
filemap_fault, IOW, there is no API guarantee that they will.

> What do you think?  Just move it to before ->fault(),
> leave the mmap lock in place, and be done with it?

If Jan blesses the hook called with mmap lock, then yeh,
putting the hook in the most generic "vfs" code would be
the best choice for maintenance.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
  2024-07-29 18:57       ` Amir Goldstein
@ 2024-07-30 12:18         ` Jan Kara
  2024-07-30 16:51           ` Josef Bacik
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2024-07-30 12:18 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Josef Bacik, jack, kernel-team, linux-fsdevel, brauner

On Mon 29-07-24 21:57:34, Amir Goldstein wrote:
> On Mon, Jul 29, 2024 at 8:11 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > > If I am reading correctly, iomap (i.e. xfs) write shared memory fault
> > > does not reach this code?
> > >
> > > Do we care about writable shared memory faults use case for HSM?
> > > It does not sound very relevant to HSM, but we cannot just ignore it..
> > >
> >
> > Sorry I realized I went off to try and solve this problem and never responded to
> > you.  I'm addressing the other comments, but this one is a little tricky.
> >
> > We're kind of stuck between a rock and a hard place with this.  I had originally
> > put this before the ->fault() callback, but purposefully moved it into
> > filemap_fault() because I want to be able to drop the mmap lock while we're
> > waiting for a response from the HSM.
> >
> > The reason to do this is because there are things that take the mmap lock for
> > simple things outside of the process, like /proc/$PID/smaps and other related
> > things, and this can cause high priority tasks to block behind possibly low
> > priority IO, creating a priority inversion.
> >
> > Now, I'm not sure how widespread of a problem this is anymore, I know there's
> > been work done to the kernel and tools to avoid this style of problem.  I'm ok
> > with a "try it and see" approach, but I don't love that.
> >
> 
> I defer this question to Jan.
> 
> > However I think putting fsnotify hooks into XFS itself for this particular path
> > is a good choice either.
> 
> I think you meant "not a good choice" and I agree -
> it is not only xfs, but could be any fs that will be converted to iomap
> Other fs have ->fault != filemap_fault, even if they do end up calling
> filemap_fault, IOW, there is no API guarantee that they will.
> 
> > What do you think?  Just move it to before ->fault(),
> > leave the mmap lock in place, and be done with it?
> 
> If Jan blesses the hook called with mmap lock, then yeh,
> putting the hook in the most generic "vfs" code would be
> the best choice for maintenance.

Well, I agree with Josef's comment about a rock and a hard place. For once,
regardless whether the hook will happen from before ->fault or from inside
the ->fault handler there will be fault callers where we cannot drop
mmap_lock (not all archs support dropping mmap_lock inside a fault AFAIR -
but a quick grep seems to show that these days maybe they do, also some
callers - most notably through GUP - don't allow dropping of mmap_lock
inside fault). So we have to have a way to handle a fault without
FAULT_FLAG_ALLOW_RETRY flag.

Now of course waiting for userspace reply to fanotify event with mmap_lock
held is ... dangerous. For example consider application with two threads:

T1					T2
page fault on inode I			write to inode I
  lock mm->mmap_lock			  inode_lock(I)
    send fanotify event			  ...
					  fault_in_iov_iter_readable()
					    lock mm->mmap_lock -> blocks
					      behind T1

now the HSM handler needs to fill in contents of inode I requested by the
page fault:

  inode_lock(I) -> deadlock

So conceptually I think the flow could look like (in __do_fault):

	if (!(vmf->flags & FAULT_FLAG_TRIED) &&
	    fsnotify_may_send_pre_content_event()) {
		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
			return VM_FAULT_RETRY;
		fpin = maybe_unlock_mmap_for_io(vmf, NULL);
		if (!fpin)
			return ???VM_FAULT_SIGSEGV???;
		err = fsnotify_fault(...);
		if (err)
			return VM_FAULT_SIGBUS | VM_FAULT_RETRY;
		/*
		 * We are fine with proceeding with the fault. Retry the fault
		 * to let the filesystem handle it.
		 */
		return VM_FAULT_RETRY;
	}

The downside is that if we enter the page fault without ability to drop
mmap_lock on a file needing HSM handling, we get SIGSEGV. I'm not sure it
matters in practice because these are not that common paths e.g. stuff like
putting a breakpoint / uprobe on a file but maybe there are some surprises.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
  2024-07-30 12:18         ` Jan Kara
@ 2024-07-30 16:51           ` Josef Bacik
  2024-08-01 21:34             ` Jan Kara
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2024-07-30 16:51 UTC (permalink / raw)
  To: Jan Kara; +Cc: Amir Goldstein, kernel-team, linux-fsdevel, brauner

On Tue, Jul 30, 2024 at 02:18:37PM +0200, Jan Kara wrote:
> On Mon 29-07-24 21:57:34, Amir Goldstein wrote:
> > On Mon, Jul 29, 2024 at 8:11 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > > > If I am reading correctly, iomap (i.e. xfs) write shared memory fault
> > > > does not reach this code?
> > > >
> > > > Do we care about writable shared memory faults use case for HSM?
> > > > It does not sound very relevant to HSM, but we cannot just ignore it..
> > > >
> > >
> > > Sorry I realized I went off to try and solve this problem and never responded to
> > > you.  I'm addressing the other comments, but this one is a little tricky.
> > >
> > > We're kind of stuck between a rock and a hard place with this.  I had originally
> > > put this before the ->fault() callback, but purposefully moved it into
> > > filemap_fault() because I want to be able to drop the mmap lock while we're
> > > waiting for a response from the HSM.
> > >
> > > The reason to do this is because there are things that take the mmap lock for
> > > simple things outside of the process, like /proc/$PID/smaps and other related
> > > things, and this can cause high priority tasks to block behind possibly low
> > > priority IO, creating a priority inversion.
> > >
> > > Now, I'm not sure how widespread of a problem this is anymore, I know there's
> > > been work done to the kernel and tools to avoid this style of problem.  I'm ok
> > > with a "try it and see" approach, but I don't love that.
> > >
> > 
> > I defer this question to Jan.
> > 
> > > However I think putting fsnotify hooks into XFS itself for this particular path
> > > is a good choice either.
> > 
> > I think you meant "not a good choice" and I agree -
> > it is not only xfs, but could be any fs that will be converted to iomap
> > Other fs have ->fault != filemap_fault, even if they do end up calling
> > filemap_fault, IOW, there is no API guarantee that they will.
> > 
> > > What do you think?  Just move it to before ->fault(),
> > > leave the mmap lock in place, and be done with it?
> > 
> > If Jan blesses the hook called with mmap lock, then yeh,
> > putting the hook in the most generic "vfs" code would be
> > the best choice for maintenance.
> 
> Well, I agree with Josef's comment about a rock and a hard place. For once,
> regardless whether the hook will happen from before ->fault or from inside
> the ->fault handler there will be fault callers where we cannot drop
> mmap_lock (not all archs support dropping mmap_lock inside a fault AFAIR -
> but a quick grep seems to show that these days maybe they do, also some
> callers - most notably through GUP - don't allow dropping of mmap_lock
> inside fault). So we have to have a way to handle a fault without
> FAULT_FLAG_ALLOW_RETRY flag.
> 
> Now of course waiting for userspace reply to fanotify event with mmap_lock
> held is ... dangerous. For example consider application with two threads:
> 
> T1					T2
> page fault on inode I			write to inode I
>   lock mm->mmap_lock			  inode_lock(I)
>     send fanotify event			  ...
> 					  fault_in_iov_iter_readable()
> 					    lock mm->mmap_lock -> blocks
> 					      behind T1
> 
> now the HSM handler needs to fill in contents of inode I requested by the
> page fault:
> 
>   inode_lock(I) -> deadlock
> 
> So conceptually I think the flow could look like (in __do_fault):
> 
> 	if (!(vmf->flags & FAULT_FLAG_TRIED) &&
> 	    fsnotify_may_send_pre_content_event()) {
> 		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> 			return VM_FAULT_RETRY;
> 		fpin = maybe_unlock_mmap_for_io(vmf, NULL);
> 		if (!fpin)
> 			return ???VM_FAULT_SIGSEGV???;
> 		err = fsnotify_fault(...);
> 		if (err)
> 			return VM_FAULT_SIGBUS | VM_FAULT_RETRY;
> 		/*
> 		 * We are fine with proceeding with the fault. Retry the fault
> 		 * to let the filesystem handle it.
> 		 */
> 		return VM_FAULT_RETRY;
> 	}
> 
> The downside is that if we enter the page fault without ability to drop
> mmap_lock on a file needing HSM handling, we get SIGSEGV. I'm not sure it
> matters in practice because these are not that common paths e.g. stuff like
> putting a breakpoint / uprobe on a file but maybe there are some surprises.
> 

The only thing I don't like about this is that now the fault handler loses the
ability to drop the mmap sem.  I think in practice this doesn't really matter,
because we're trying to avoid doing IO while under the mmap sem, and presumably
this will have instantiated the pages to avoid the IO.

However if you use reflink this wouldn't be the case, and now we've
re-introduced the priority inversion possiblity.

I'm leaning more towards just putting the fsnotify hook in the xfs code for the
write case, and anybody else who implements their own ->fault without calling
the generic one will just have to do the same.

That being said I'm not going to die on any particular hill here, if you still
want to do the above before the ->fault() handler then I'll update my code to do
that and we can move on.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/10] fsnotify: introduce pre-content permission event
  2024-07-25 18:19 ` [PATCH 02/10] fsnotify: introduce pre-content permission event Josef Bacik
@ 2024-08-01 16:31   ` Jan Kara
  2024-08-03 16:52     ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2024-08-01 16:31 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-fsdevel, jack, amir73il, brauner

On Thu 25-07-24 14:19:39, Josef Bacik wrote:
> From: Amir Goldstein <amir73il@gmail.com>
> 
> The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM,
> but it meant for a different use case of filling file content before
> access to a file range, so it has slightly different semantics.
> 
> Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, same as
> we did for FS_OPEN_PERM/FS_OPEN_EXEC_PERM.
> 
> FS_PRE_MODIFY is a new permission event, with similar semantics as
> FS_PRE_ACCESS, which is called before a file is modified.
> 
> FS_ACCESS_PERM is reported also on blockdev and pipes, but the new
> pre-content events are only reported for regular files and dirs.
> 
> The pre-content events are meant to be used by hierarchical storage
> managers that want to fill the content of files on first access.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

The patch looks good. Just out of curiosity:

> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 8be029bc50b1..21e72b837ec5 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -56,6 +56,9 @@
>  #define FS_ACCESS_PERM		0x00020000	/* access event in a permissions hook */
>  #define FS_OPEN_EXEC_PERM	0x00040000	/* open/exec event in a permission hook */
>  
> +#define FS_PRE_ACCESS		0x00100000	/* Pre-content access hook */
> +#define FS_PRE_MODIFY		0x00200000	/* Pre-content modify hook */

Why is a hole left here in the flag space?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/10] fsnotify: generate pre-content permission event on open
  2024-07-25 18:19 ` [PATCH 03/10] fsnotify: generate pre-content permission event on open Josef Bacik
@ 2024-08-01 17:01   ` Jan Kara
  2024-08-03 16:53     ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2024-08-01 17:01 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-fsdevel, jack, amir73il, brauner

On Thu 25-07-24 14:19:40, Josef Bacik wrote:
> From: Amir Goldstein <amir73il@gmail.com>
> 
> FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on open depending on
> file open mode.  The pre-content event will be generated in addition to
> FS_OPEN_PERM, but without sb_writers held and after file was truncated
> in case file was opened with O_CREAT and/or O_TRUNC.
> 
> The event will have a range info of (0..0) to provide an opportunity
> to fill entire file content on open.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

...

> @@ -176,10 +180,14 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
>  
>  /*
>   * fsnotify_file_perm - permission hook before file access
> + *
> + * Called from read()/write() with perm_mas MAY_READ/MAY_WRITE.
					^^^ perm_mask

> + * Called from open() with MAY_OPEN in addition to fsnotify_open_perm(),
> + * but without sb_writers held and after the file was truncated.

This sentence is a bit confusing to me (although I think I understand what
you want to say). How about just:

 * Called from open() with MAY_OPEN without sb_writers held and after the
 * file was truncated. Note that this is a different event from
 * fsnotify_open_perm().

>   */
>  static inline int fsnotify_file_perm(struct file *file, int perm_mask)
>  {
> -	return fsnotify_file_area_perm(file, perm_mask, NULL, 0);
> +	return fsnotify_file_area_perm(file, perm_mask, &file->f_pos, 0);
>  }

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/10] fanotify: introduce FAN_PRE_ACCESS permission event
  2024-07-25 18:19 ` [PATCH 04/10] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
@ 2024-08-01 17:04   ` Jan Kara
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2024-08-01 17:04 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-fsdevel, jack, amir73il, brauner

On Thu 25-07-24 14:19:41, Josef Bacik wrote:
> From: Amir Goldstein <amir73il@gmail.com>
> 
> Similar to FAN_ACCESS_PERM permission event, but it is only allowed with
> class FAN_CLASS_PRE_CONTENT and only allowed on regular files are dirs.
								^^ and

> Unlike FAN_ACCESS_PERM, it is safe to write to the file being accessed
> in the context of the event handler.
> 
> This pre-content event is meant to be used by hierarchical storage
> managers that want to fill the content of files on first read access.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Otherwise the patch looks good.

								Honza

> ---
>  fs/notify/fanotify/fanotify.c      |  3 ++-
>  fs/notify/fanotify/fanotify_user.c | 17 ++++++++++++++---
>  include/linux/fanotify.h           | 14 ++++++++++----
>  include/uapi/linux/fanotify.h      |  2 ++
>  4 files changed, 28 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 224bccaab4cc..7dac8e4486df 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -910,8 +910,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
>  	BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
>  	BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
>  	BUILD_BUG_ON(FAN_RENAME != FS_RENAME);
> +	BUILD_BUG_ON(FAN_PRE_ACCESS != FS_PRE_ACCESS);
>  
> -	BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 21);
> +	BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 22);
>  
>  	mask = fanotify_group_event_mask(group, iter_info, &match_mask,
>  					 mask, data, data_type, dir);
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 2e2fba8a9d20..c294849e474f 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -1628,6 +1628,7 @@ static int fanotify_events_supported(struct fsnotify_group *group,
>  				     unsigned int flags)
>  {
>  	unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
> +	bool is_dir = d_is_dir(path->dentry);
>  	/* Strict validation of events in non-dir inode mask with v5.17+ APIs */
>  	bool strict_dir_events = FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID) ||
>  				 (mask & FAN_RENAME) ||
> @@ -1665,9 +1666,15 @@ static int fanotify_events_supported(struct fsnotify_group *group,
>  	 * but because we always allowed it, error only when using new APIs.
>  	 */
>  	if (strict_dir_events && mark_type == FAN_MARK_INODE &&
> -	    !d_is_dir(path->dentry) && (mask & FANOTIFY_DIRONLY_EVENT_BITS))
> +	    !is_dir && (mask & FANOTIFY_DIRONLY_EVENT_BITS))
>  		return -ENOTDIR;
>  
> +	/* Pre-content events are only supported on regular files and dirs */
> +	if (mask & FANOTIFY_PRE_CONTENT_EVENTS) {
> +		if (!is_dir && !d_is_reg(path->dentry))
> +			return -EINVAL;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -1769,11 +1776,15 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>  		goto fput_and_out;
>  
>  	/*
> -	 * Permission events require minimum priority FAN_CLASS_CONTENT.
> +	 * Permission events are not allowed for FAN_CLASS_NOTIF.
> +	 * Pre-content permission events are not allowed for FAN_CLASS_CONTENT.
>  	 */
>  	ret = -EINVAL;
>  	if (mask & FANOTIFY_PERM_EVENTS &&
> -	    group->priority < FSNOTIFY_PRIO_CONTENT)
> +	    group->priority == FSNOTIFY_PRIO_NORMAL)
> +		goto fput_and_out;
> +	else if (mask & FANOTIFY_PRE_CONTENT_EVENTS &&
> +		 group->priority == FSNOTIFY_PRIO_CONTENT)
>  		goto fput_and_out;
>  
>  	if (mask & FAN_FS_ERROR &&
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 4f1c4f603118..5c811baf44d2 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -88,6 +88,16 @@
>  #define FANOTIFY_DIRENT_EVENTS	(FAN_MOVE | FAN_CREATE | FAN_DELETE | \
>  				 FAN_RENAME)
>  
> +/* Content events can be used to inspect file content */
> +#define FANOTIFY_CONTENT_PERM_EVENTS (FAN_OPEN_PERM | FAN_OPEN_EXEC_PERM | \
> +				      FAN_ACCESS_PERM)
> +/* Pre-content events can be used to fill file content */
> +#define FANOTIFY_PRE_CONTENT_EVENTS  (FAN_PRE_ACCESS)
> +
> +/* Events that require a permission response from user */
> +#define FANOTIFY_PERM_EVENTS	(FANOTIFY_CONTENT_PERM_EVENTS | \
> +				 FANOTIFY_PRE_CONTENT_EVENTS)
> +
>  /* Events that can be reported with event->fd */
>  #define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS)
>  
> @@ -103,10 +113,6 @@
>  				 FANOTIFY_INODE_EVENTS | \
>  				 FANOTIFY_ERROR_EVENTS)
>  
> -/* Events that require a permission response from user */
> -#define FANOTIFY_PERM_EVENTS	(FAN_OPEN_PERM | FAN_ACCESS_PERM | \
> -				 FAN_OPEN_EXEC_PERM)
> -
>  /* Extra flags that may be reported with event or control handling of events */
>  #define FANOTIFY_EVENT_FLAGS	(FAN_EVENT_ON_CHILD | FAN_ONDIR)
>  
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index a37de58ca571..3ae43867d318 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -26,6 +26,8 @@
>  #define FAN_ACCESS_PERM		0x00020000	/* File accessed in perm check */
>  #define FAN_OPEN_EXEC_PERM	0x00040000	/* File open/exec in perm check */
>  
> +#define FAN_PRE_ACCESS		0x00100000	/* Pre-content access hook */
> +
>  #define FAN_EVENT_ON_CHILD	0x08000000	/* Interested in child events */
>  
>  #define FAN_RENAME		0x10000000	/* File was renamed */
> -- 
> 2.43.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/10] fanotify: introduce FAN_PRE_MODIFY permission event
  2024-07-25 18:19 ` [PATCH 05/10] fanotify: introduce FAN_PRE_MODIFY " Josef Bacik
@ 2024-08-01 17:09   ` Jan Kara
  2024-08-03 16:55     ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2024-08-01 17:09 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-fsdevel, jack, amir73il, brauner

On Thu 25-07-24 14:19:42, Josef Bacik wrote:
> From: Amir Goldstein <amir73il@gmail.com>
> 
> Generate FAN_PRE_MODIFY permission event from fsnotify_file_perm()
> pre-write hook to notify fanotify listeners on an intent to make
> modification to a file.
> 
> Like FAN_PRE_ACCESS, it is only allowed with FAN_CLASS_PRE_CONTENT
> and unlike FAN_MODIFY, it is only allowed on regular files.
> 
> Like FAN_PRE_ACCESS, it is generated without sb_start_write() held,
> so it is safe for to perform filesystem modifications in the the
		^^^ seems superfluous			   ^^^ twice "the"

> context of event handler.
...
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 5c811baf44d2..ae6cb2688d52 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -92,7 +92,8 @@
>  #define FANOTIFY_CONTENT_PERM_EVENTS (FAN_OPEN_PERM | FAN_OPEN_EXEC_PERM | \
>  				      FAN_ACCESS_PERM)
>  /* Pre-content events can be used to fill file content */
> -#define FANOTIFY_PRE_CONTENT_EVENTS  (FAN_PRE_ACCESS)
> +#define FANOTIFY_PRE_CONTENT_EVENTS  (FAN_PRE_ACCESS | FAN_PRE_MODIFY)
> +#define FANOTIFY_PRE_MODIFY_EVENTS   (FAN_PRE_MODIFY)

I didn't find FANOTIFY_PRE_MODIFY_EVENTS used anywhere?

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 06/10] fanotify: pass optional file access range in pre-content event
  2024-07-25 18:19 ` [PATCH 06/10] fanotify: pass optional file access range in pre-content event Josef Bacik
@ 2024-08-01 17:16   ` Jan Kara
  2024-08-03 17:00     ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2024-08-01 17:16 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-fsdevel, jack, amir73il, brauner

On Thu 25-07-24 14:19:43, Josef Bacik wrote:
> From: Amir Goldstein <amir73il@gmail.com>
> 
> We would like to add file range information to pre-content events.
> 
> Pass a struct file_range with optional offset and length to event handler
> along with pre-content permission event.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

...

> @@ -565,6 +569,10 @@ static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path,
>  	pevent->hdr.len = 0;
>  	pevent->state = FAN_EVENT_INIT;
>  	pevent->path = *path;
> +	if (range) {
> +		pevent->ppos = range->ppos;
> +		pevent->count = range->count;
> +	}

Shouldn't we initialze ppos & count in case range info isn't passed?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/10] fanotify: rename a misnamed constant
  2024-07-25 18:19 ` [PATCH 07/10] fanotify: rename a misnamed constant Josef Bacik
@ 2024-08-01 17:19   ` Jan Kara
  2024-08-01 17:23     ` Jan Kara
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2024-08-01 17:19 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-fsdevel, jack, amir73il, brauner

On Thu 25-07-24 14:19:44, Josef Bacik wrote:
> From: Amir Goldstein <amir73il@gmail.com>
> 
> FANOTIFY_PIDFD_INFO_HDR_LEN is not the length of the header.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/notify/fanotify/fanotify_user.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 3a7101544f30..5ece186d5c50 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -119,7 +119,7 @@ struct kmem_cache *fanotify_perm_event_cachep __ro_after_init;
>  #define FANOTIFY_EVENT_ALIGN 4
>  #define FANOTIFY_FID_INFO_HDR_LEN \
>  	(sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
> -#define FANOTIFY_PIDFD_INFO_HDR_LEN \
> +#define FANOTIFY_PIDFD_INFO_LEN \

OK, but then FANOTIFY_FID_INFO_HDR_LEN should be renamed as well? Or what's
the difference?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/10] fanotify: rename a misnamed constant
  2024-08-01 17:19   ` Jan Kara
@ 2024-08-01 17:23     ` Jan Kara
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2024-08-01 17:23 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-fsdevel, jack, amir73il, brauner

On Thu 01-08-24 19:19:50, Jan Kara wrote:
> On Thu 25-07-24 14:19:44, Josef Bacik wrote:
> > From: Amir Goldstein <amir73il@gmail.com>
> > 
> > FANOTIFY_PIDFD_INFO_HDR_LEN is not the length of the header.
> > 
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/notify/fanotify/fanotify_user.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > index 3a7101544f30..5ece186d5c50 100644
> > --- a/fs/notify/fanotify/fanotify_user.c
> > +++ b/fs/notify/fanotify/fanotify_user.c
> > @@ -119,7 +119,7 @@ struct kmem_cache *fanotify_perm_event_cachep __ro_after_init;
> >  #define FANOTIFY_EVENT_ALIGN 4
> >  #define FANOTIFY_FID_INFO_HDR_LEN \
> >  	(sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
> > -#define FANOTIFY_PIDFD_INFO_HDR_LEN \
> > +#define FANOTIFY_PIDFD_INFO_LEN \
> 
> OK, but then FANOTIFY_FID_INFO_HDR_LEN should be renamed as well? Or what's
> the difference?

Ah, never mind. I've realized that to FID we indeed need to add the file
handle length. So there is a difference and the cleanup makes sense.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/10] fanotify: report file range info with pre-content events
  2024-07-25 18:19 ` [PATCH 08/10] fanotify: report file range info with pre-content events Josef Bacik
@ 2024-08-01 17:38   ` Jan Kara
  2024-10-24 10:06     ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2024-08-01 17:38 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-fsdevel, jack, amir73il, brauner

On Thu 25-07-24 14:19:45, Josef Bacik wrote:
> From: Amir Goldstein <amir73il@gmail.com>
> 
> With group class FAN_CLASS_PRE_CONTENT, report offset and length info
> along with FAN_PRE_ACCESS and FAN_PRE_MODIFY permission events.
> 
> This information is meant to be used by hierarchical storage managers
> that want to fill partial content of files on first access to range.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/notify/fanotify/fanotify.h      |  8 +++++++
>  fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++++
>  include/uapi/linux/fanotify.h      |  7 ++++++
>  3 files changed, 53 insertions(+)
> 
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index 93598b7d5952..7f06355afa1f 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask)
>  		mask & FANOTIFY_PERM_EVENTS;
>  }
>  
> +static inline bool fanotify_event_has_access_range(struct fanotify_event *event)
> +{
> +	if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS))
> +		return false;
> +
> +	return FANOTIFY_PERM(event)->ppos;
> +}

Now I'm a bit confused. Can we have legally NULL ppos for an event from
FANOTIFY_PRE_CONTENT_EVENTS?

> +static size_t copy_range_info_to_user(struct fanotify_event *event,
> +				      char __user *buf, int count)
> +{
> +	struct fanotify_perm_event *pevent = FANOTIFY_PERM(event);
> +	struct fanotify_event_info_range info = { };
> +	size_t info_len = FANOTIFY_RANGE_INFO_LEN;
> +
> +	if (WARN_ON_ONCE(info_len > count))
> +		return -EFAULT;
> +
> +	if (WARN_ON_ONCE(!pevent->ppos))
> +		return 0;

I think we should be returning some error here. Maybe EINVAL? Otherwise
fanotify_event_len() will return different length than we actually generate
and that could lead to strange failures later.

> +
> +	info.hdr.info_type = FAN_EVENT_INFO_TYPE_RANGE;
> +	info.hdr.len = info_len;
> +	info.offset = *(pevent->ppos);
> +	info.count = pevent->count;
> +
> +	if (copy_to_user(buf, &info, info_len))
> +		return -EFAULT;
> +
> +	return info_len;
> +}
> +
>  static int copy_info_records_to_user(struct fanotify_event *event,
>  				     struct fanotify_info *info,
>  				     unsigned int info_mode, int pidfd,
...
> @@ -191,6 +192,12 @@ struct fanotify_event_info_error {
>  	__u32 error_count;
>  };
>  
> +struct fanotify_event_info_range {
> +	struct fanotify_event_info_header hdr;
> +	__u32 count;
> +	__u64 offset;
> +};

Just for the sake of future-proofing, I'd make 'count' u64. I understand it
means growing fanotify_event_info_range by 8 bytes and we currently never
do reads or writes larger than 2 GB but I don't think saving bytes like
this is really a good tradeoff these days...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/10] fanotify: don't skip extra event info if no info_mode is set
  2024-07-25 18:19 ` [PATCH 01/10] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
@ 2024-08-01 17:59   ` Jan Kara
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2024-08-01 17:59 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-fsdevel, jack, amir73il, brauner

On Thu 25-07-24 14:19:38, Josef Bacik wrote:
> Previously we would only include optional information if you requested
> it via an FAN_ flag at fanotify_init time (FAN_REPORT_FID for example).
> However this isn't necessary as the event length is encoded in the
> metadata, and if the user doesn't want to consume the information they
> don't have to.

So I somewhat disagree with this statement because historically there was
fanotify userspace that completely ignored event length reported in the
event and assumed particular hardcoded constant that was working for years.
That's why we are careful and add info to *existing* events only if
userspace explicitely asks for it (by which it obviously also acknowledges
that it is ready to parse/skip it). Now here we are adding range info only
to new events so preexisting userspace isn't a problem in this case. And I
agree it is kind of pointless to add a new flag, just to be able to tell
you want info without which these events are pointless.

Also as far as I can tell these changes will not result in any new info
records to be generated for existing events. So I would just change the
description to something like:

New pre-content events will be path events but they will also carry
additional range information. Remove the optimization to skip checking
whether info structures need to be generated for path events. This results
in no change in generated info structures for existing events.

								Honza

>  With the PRE_ACCESS events we will always generate range
> information, so drop this check in order to allow this extra
> information to be exported without needing to have another flag.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/notify/fanotify/fanotify_user.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 9ec313e9f6e1..2e2fba8a9d20 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -160,9 +160,6 @@ static size_t fanotify_event_len(unsigned int info_mode,
>  	int fh_len;
>  	int dot_len = 0;
>  
> -	if (!info_mode)
> -		return event_len;
> -
>  	if (fanotify_is_error_event(event->mask))
>  		event_len += FANOTIFY_ERROR_INFO_LEN;
>  
> @@ -740,12 +737,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>  	if (fanotify_is_perm_event(event->mask))
>  		FANOTIFY_PERM(event)->fd = fd;
>  
> -	if (info_mode) {
> -		ret = copy_info_records_to_user(event, info, info_mode, pidfd,
> -						buf, count);
> -		if (ret < 0)
> -			goto out_close_fd;
> -	}
> +	ret = copy_info_records_to_user(event, info, info_mode, pidfd,
> +					buf, count);
> +	if (ret < 0)
> +		goto out_close_fd;
>  
>  	if (f)
>  		fd_install(fd, f);
> -- 
> 2.43.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 09/10] fanotify: allow to set errno in FAN_DENY permission response
  2024-07-25 18:19 ` [PATCH 09/10] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
@ 2024-08-01 21:14   ` Jan Kara
  2024-08-03 17:06     ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2024-08-01 21:14 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-fsdevel, jack, amir73il, brauner

On Thu 25-07-24 14:19:46, Josef Bacik wrote:
> From: Amir Goldstein <amir73il@gmail.com>
> 
> With FAN_DENY response, user trying to perform the filesystem operation
> gets an error with errno set to EPERM.
> 
> It is useful for hierarchical storage management (HSM) service to be able
> to deny access for reasons more diverse than EPERM, for example EAGAIN,
> if HSM could retry the operation later.
> 
> Allow fanotify groups with priority FAN_CLASSS_PRE_CONTENT to responsd
> to permission events with the response value FAN_DENY_ERRNO(errno),
> instead of FAN_DENY to return a custom error.
> 
> Limit custom error to values to some errors expected on read(2)/write(2)
	^^^ parse error. Perhaps: "Limit custom error values to errors
expected on read..."

> and open(2) of regular files. This list could be extended in the future.
> Userspace can test for legitimate values of FAN_DENY_ERRNO(errno) by
> writing a response to an fanotify group fd with a value of FAN_NOFD
> in the fd field of the response.
> 
> The change in fanotify_response is backward compatible, because errno is
> written in the high 8 bits of the 32bit response field and old kernels
> reject respose value with high bits set.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

...

> @@ -258,18 +258,25 @@ static int fanotify_get_response(struct fsnotify_group *group,
>  	}
>  
>  	/* userspace responded, convert to something usable */
> -	switch (event->response & FANOTIFY_RESPONSE_ACCESS) {
> +	switch (FAN_RESPONSE_ACCESS(event->response)) {
>  	case FAN_ALLOW:
>  		ret = 0;
>  		break;
>  	case FAN_DENY:
> +		/* Check custom errno from pre-content events */
> +		errno = FAN_RESPONSE_ERRNO(event->response);
> +		if (errno) {
> +			ret = -errno;
> +			break;
> +		}
> +		fallthrough;
>  	default:
>  		ret = -EPERM;
>  	}
>  
>  	/* Check if the response should be audited */
>  	if (event->response & FAN_AUDIT)
> -		audit_fanotify(event->response & ~FAN_AUDIT,
> +		audit_fanotify(FAN_RESPONSE_ACCESS(event->response),
>  			       &event->audit_rule);

I think you need to also keep FAN_INFO in the flags not to break some
userspace possibly parsing audit requests.

> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index c3c8b2ea80b6..b4d810168521 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -337,11 +337,12 @@ static int process_access_response(struct fsnotify_group *group,
>  	struct fanotify_perm_event *event;
>  	int fd = response_struct->fd;
>  	u32 response = response_struct->response;
> +	int errno = FAN_RESPONSE_ERRNO(response);
>  	int ret = info_len;
>  	struct fanotify_response_info_audit_rule friar;
>  
> -	pr_debug("%s: group=%p fd=%d response=%u buf=%p size=%zu\n", __func__,
> -		 group, fd, response, info, info_len);
> +	pr_debug("%s: group=%p fd=%d response=%x errno=%d buf=%p size=%zu\n",
> +		 __func__, group, fd, response, errno, info, info_len);
>  	/*
>  	 * make sure the response is valid, if invalid we do nothing and either
>  	 * userspace can send a valid response or we will clean it up after the
> @@ -350,9 +351,33 @@ static int process_access_response(struct fsnotify_group *group,
>  	if (response & ~FANOTIFY_RESPONSE_VALID_MASK)
>  		return -EINVAL;
>  
> -	switch (response & FANOTIFY_RESPONSE_ACCESS) {
> +	switch (FAN_RESPONSE_ACCESS(response)) {
>  	case FAN_ALLOW:
> +		if (errno)
> +			return -EINVAL;
> +		break;
>  	case FAN_DENY:
> +		/* Custom errno is supported only for pre-content groups */
> +		if (errno && group->priority != FSNOTIFY_PRIO_PRE_CONTENT)
> +			return -EINVAL;
> +
> +		/*
> +		 * Limit errno to values expected on open(2)/read(2)/write(2)
> +		 * of regular files.
> +		 */
> +		switch (errno) {
> +		case 0:
> +		case EIO:
> +		case EPERM:
> +		case EBUSY:
> +		case ETXTBSY:
> +		case EAGAIN:
> +		case ENOSPC:
> +		case EDQUOT:
> +			break;
> +		default:
> +			return -EINVAL;
> +		}
>  		break;
>  	default:
>  		return -EINVAL;
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index ae6cb2688d52..76d818a7d654 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -132,7 +132,14 @@
>  /* These masks check for invalid bits in permission responses. */
>  #define FANOTIFY_RESPONSE_ACCESS (FAN_ALLOW | FAN_DENY)
>  #define FANOTIFY_RESPONSE_FLAGS (FAN_AUDIT | FAN_INFO)
> -#define FANOTIFY_RESPONSE_VALID_MASK (FANOTIFY_RESPONSE_ACCESS | FANOTIFY_RESPONSE_FLAGS)
> +#define FANOTIFY_RESPONSE_ERRNO	(FAN_ERRNO_MASK << FAN_ERRNO_SHIFT)
> +#define FANOTIFY_RESPONSE_VALID_MASK \
> +	(FANOTIFY_RESPONSE_ACCESS | FANOTIFY_RESPONSE_FLAGS | \
> +	 FANOTIFY_RESPONSE_ERRNO)
> +
> +/* errno other than EPERM can specified in upper byte of deny response */
> +#define FAN_RESPONSE_ACCESS(res)	((res) & FANOTIFY_RESPONSE_ACCESS)
> +#define FAN_RESPONSE_ERRNO(res)		((int)((res) >> FAN_ERRNO_SHIFT))

I have to say I find the names FANOTIFY_RESPONSE_ERRNO and
FAN_RESPONSE_ERRNO() (and similarly with FAN_RESPONSE_ACCESS) very similar
and thus confusing. I was staring at it for 5 minutes wondering how comes
it compiles before I realized one prefix is shorter than the other one so
the indentifiers are indeed different. Maybe we'd make these inline
functions instead of macros and name them like:

fanotify_get_response_decision()
fanotify_get_response_errno()

???

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
  2024-07-30 16:51           ` Josef Bacik
@ 2024-08-01 21:34             ` Jan Kara
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2024-08-01 21:34 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Jan Kara, Amir Goldstein, kernel-team, linux-fsdevel, brauner

On Tue 30-07-24 12:51:49, Josef Bacik wrote:
> On Tue, Jul 30, 2024 at 02:18:37PM +0200, Jan Kara wrote:
> > On Mon 29-07-24 21:57:34, Amir Goldstein wrote:
> > > On Mon, Jul 29, 2024 at 8:11 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > > > > If I am reading correctly, iomap (i.e. xfs) write shared memory fault
> > > > > does not reach this code?
> > > > >
> > > > > Do we care about writable shared memory faults use case for HSM?
> > > > > It does not sound very relevant to HSM, but we cannot just ignore it..
> > > > >
> > > >
> > > > Sorry I realized I went off to try and solve this problem and never responded to
> > > > you.  I'm addressing the other comments, but this one is a little tricky.
> > > >
> > > > We're kind of stuck between a rock and a hard place with this.  I had originally
> > > > put this before the ->fault() callback, but purposefully moved it into
> > > > filemap_fault() because I want to be able to drop the mmap lock while we're
> > > > waiting for a response from the HSM.
> > > >
> > > > The reason to do this is because there are things that take the mmap lock for
> > > > simple things outside of the process, like /proc/$PID/smaps and other related
> > > > things, and this can cause high priority tasks to block behind possibly low
> > > > priority IO, creating a priority inversion.
> > > >
> > > > Now, I'm not sure how widespread of a problem this is anymore, I know there's
> > > > been work done to the kernel and tools to avoid this style of problem.  I'm ok
> > > > with a "try it and see" approach, but I don't love that.
> > > >
> > > 
> > > I defer this question to Jan.
> > > 
> > > > However I think putting fsnotify hooks into XFS itself for this particular path
> > > > is a good choice either.
> > > 
> > > I think you meant "not a good choice" and I agree -
> > > it is not only xfs, but could be any fs that will be converted to iomap
> > > Other fs have ->fault != filemap_fault, even if they do end up calling
> > > filemap_fault, IOW, there is no API guarantee that they will.
> > > 
> > > > What do you think?  Just move it to before ->fault(),
> > > > leave the mmap lock in place, and be done with it?
> > > 
> > > If Jan blesses the hook called with mmap lock, then yeh,
> > > putting the hook in the most generic "vfs" code would be
> > > the best choice for maintenance.
> > 
> > Well, I agree with Josef's comment about a rock and a hard place. For once,
> > regardless whether the hook will happen from before ->fault or from inside
> > the ->fault handler there will be fault callers where we cannot drop
> > mmap_lock (not all archs support dropping mmap_lock inside a fault AFAIR -
> > but a quick grep seems to show that these days maybe they do, also some
> > callers - most notably through GUP - don't allow dropping of mmap_lock
> > inside fault). So we have to have a way to handle a fault without
> > FAULT_FLAG_ALLOW_RETRY flag.
> > 
> > Now of course waiting for userspace reply to fanotify event with mmap_lock
> > held is ... dangerous. For example consider application with two threads:
> > 
> > T1					T2
> > page fault on inode I			write to inode I
> >   lock mm->mmap_lock			  inode_lock(I)
> >     send fanotify event			  ...
> > 					  fault_in_iov_iter_readable()
> > 					    lock mm->mmap_lock -> blocks
> > 					      behind T1
> > 
> > now the HSM handler needs to fill in contents of inode I requested by the
> > page fault:
> > 
> >   inode_lock(I) -> deadlock
> > 
> > So conceptually I think the flow could look like (in __do_fault):
> > 
> > 	if (!(vmf->flags & FAULT_FLAG_TRIED) &&
> > 	    fsnotify_may_send_pre_content_event()) {
> > 		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> > 			return VM_FAULT_RETRY;
> > 		fpin = maybe_unlock_mmap_for_io(vmf, NULL);
> > 		if (!fpin)
> > 			return ???VM_FAULT_SIGSEGV???;
> > 		err = fsnotify_fault(...);
> > 		if (err)
> > 			return VM_FAULT_SIGBUS | VM_FAULT_RETRY;
> > 		/*
> > 		 * We are fine with proceeding with the fault. Retry the fault
> > 		 * to let the filesystem handle it.
> > 		 */
> > 		return VM_FAULT_RETRY;
> > 	}
> > 
> > The downside is that if we enter the page fault without ability to drop
> > mmap_lock on a file needing HSM handling, we get SIGSEGV. I'm not sure it
> > matters in practice because these are not that common paths e.g. stuff like
> > putting a breakpoint / uprobe on a file but maybe there are some surprises.
> > 
> 
> The only thing I don't like about this is that now the fault handler
> loses the ability to drop the mmap sem.  I think in practice this doesn't
> really matter, because we're trying to avoid doing IO while under the
> mmap sem, and presumably this will have instantiated the pages to avoid
> the IO.
> 
> However if you use reflink this wouldn't be the case, and now we've
> re-introduced the priority inversion possiblity.

Hum, you're right. I was focused on handling the case when HSM needs to
pull in the page but the common case when the page is local, just not in
cache, is also very important.

> I'm leaning more towards just putting the fsnotify hook in the xfs code
> for the write case, and anybody else who implements their own ->fault
> without calling the generic one will just have to do the same.
> 
> That being said I'm not going to die on any particular hill here, if you
> still want to do the above before the ->fault() handler then I'll update
> my code to do that and we can move on.  Thanks,

I think for now issuing the pre content event from ->fault is OK. I'm not
thrilled about it (since it ultimately means duplication among page fault
handlers) but I don't see a cleaner solution that would perform reasonably
well.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
  2024-07-25 18:19 ` [PATCH 10/10] fsnotify: generate pre-content permission event on page fault Josef Bacik
  2024-07-25 20:19   ` Amir Goldstein
@ 2024-08-01 21:40   ` Jan Kara
  2024-08-02 16:03     ` Josef Bacik
  1 sibling, 1 reply; 47+ messages in thread
From: Jan Kara @ 2024-08-01 21:40 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-fsdevel, jack, amir73il, brauner

On Thu 25-07-24 14:19:47, Josef Bacik wrote:
> FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on page fault depending
> on the faulting method.
> 
> This pre-content event is meant to be used by hierarchical storage
> managers that want to fill in the file content on first read access.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
...
> @@ -3287,6 +3288,35 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
>  	if (unlikely(index >= max_idx))
>  		return VM_FAULT_SIGBUS;
>  
> +	/*
> +	 * If we have pre-content watchers then we need to generate events on
> +	 * page fault so that we can populate any data before the fault.
> +	 *
> +	 * We only do this on the first pass through, otherwise the populating
> +	 * application could potentially deadlock on the mmap lock if it tries
> +	 * to populate it with mmap.
> +	 */
> +	if (fault_flag_allow_retry_first(vmf->flags) &&
> +	    fsnotify_file_has_content_watches(file)) {

I'm somewhat nervous that if ALLOW_RETRY isn't set, we'd silently jump into
readpage code without ever sending pre-content event and thus we'd possibly
expose invalid content to userspace? I think we should fail the fault if
fsnotify_file_has_content_watches(file) && !(vmf->flags &
FAULT_FLAG_ALLOW_RETRY).

> +		int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_READ;
> +		loff_t pos = vmf->pgoff << PAGE_SHIFT;
> +
> +		fpin = maybe_unlock_mmap_for_io(vmf, fpin);
> +
> +		/*
> +		 * We can only emit the event if we did actually release the
> +		 * mmap lock.
> +		 */
> +		if (fpin) {
> +			error = fsnotify_file_area_perm(fpin, mask, &pos,
> +							PAGE_SIZE);
> +			if (error) {
> +				fput(fpin);
> +				return VM_FAULT_ERROR;
> +			}
> +		}
> +	}
> +
>  	/*
>  	 * Do we have something in the page cache already?
>  	 */
...
> @@ -3612,6 +3643,13 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>  	unsigned long rss = 0;
>  	unsigned int nr_pages = 0, mmap_miss = 0, mmap_miss_saved, folio_type;
>  
> +	/*
> +	 * We are under RCU, we can't emit events here, we need to force a
> +	 * normal fault to make sure the events get sent.
> +	 */
> +	if (fsnotify_file_has_content_watches(file))
> +		return ret;
> +

I don't think we need to do anything for filemap_map_pages(). The call just
inserts page cache content into page tables and whatever is in the page
cache and has folio_uptodate() set should be already valid file content,
shouldn't it?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
  2024-08-01 21:40   ` Jan Kara
@ 2024-08-02 16:03     ` Josef Bacik
  2024-08-05 12:13       ` Jan Kara
  0 siblings, 1 reply; 47+ messages in thread
From: Josef Bacik @ 2024-08-02 16:03 UTC (permalink / raw)
  To: Jan Kara; +Cc: kernel-team, linux-fsdevel, amir73il, brauner

On Thu, Aug 01, 2024 at 11:40:25PM +0200, Jan Kara wrote:
> On Thu 25-07-24 14:19:47, Josef Bacik wrote:
> > FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on page fault depending
> > on the faulting method.
> > 
> > This pre-content event is meant to be used by hierarchical storage
> > managers that want to fill in the file content on first read access.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ...
> > @@ -3287,6 +3288,35 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
> >  	if (unlikely(index >= max_idx))
> >  		return VM_FAULT_SIGBUS;
> >  
> > +	/*
> > +	 * If we have pre-content watchers then we need to generate events on
> > +	 * page fault so that we can populate any data before the fault.
> > +	 *
> > +	 * We only do this on the first pass through, otherwise the populating
> > +	 * application could potentially deadlock on the mmap lock if it tries
> > +	 * to populate it with mmap.
> > +	 */
> > +	if (fault_flag_allow_retry_first(vmf->flags) &&
> > +	    fsnotify_file_has_content_watches(file)) {
> 
> I'm somewhat nervous that if ALLOW_RETRY isn't set, we'd silently jump into
> readpage code without ever sending pre-content event and thus we'd possibly
> expose invalid content to userspace? I think we should fail the fault if
> fsnotify_file_has_content_watches(file) && !(vmf->flags &
> FAULT_FLAG_ALLOW_RETRY).

I was worried about this too but it seems to not be the case that we'll not ever
have ALLOW_RETRY.  That being said I'm fine turning this into a sigbus.

> 
> > +		int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_READ;
> > +		loff_t pos = vmf->pgoff << PAGE_SHIFT;
> > +
> > +		fpin = maybe_unlock_mmap_for_io(vmf, fpin);
> > +
> > +		/*
> > +		 * We can only emit the event if we did actually release the
> > +		 * mmap lock.
> > +		 */
> > +		if (fpin) {
> > +			error = fsnotify_file_area_perm(fpin, mask, &pos,
> > +							PAGE_SIZE);
> > +			if (error) {
> > +				fput(fpin);
> > +				return VM_FAULT_ERROR;
> > +			}
> > +		}
> > +	}
> > +
> >  	/*
> >  	 * Do we have something in the page cache already?
> >  	 */
> ...
> > @@ -3612,6 +3643,13 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
> >  	unsigned long rss = 0;
> >  	unsigned int nr_pages = 0, mmap_miss = 0, mmap_miss_saved, folio_type;
> >  
> > +	/*
> > +	 * We are under RCU, we can't emit events here, we need to force a
> > +	 * normal fault to make sure the events get sent.
> > +	 */
> > +	if (fsnotify_file_has_content_watches(file))
> > +		return ret;
> > +
> 
> I don't think we need to do anything for filemap_map_pages(). The call just
> inserts page cache content into page tables and whatever is in the page
> cache and has folio_uptodate() set should be already valid file content,
> shouldn't it?

I'll make this comment more clear.  filemap_fault() will start readahead, but
we'll only emit the event for the page size that we're faulting.  I had looked
at putting this at the readahead place and figuring out the readahead size, but
literally anything could trigger readahead so it's better to just not allow
filemap_map_pages() to happen, otherwise we'll end up with empty pages (if the
content hasn't been populated yet) and never emit an event for those ranges.
Thanks,

Josef

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/10] fsnotify: introduce pre-content permission event
  2024-08-01 16:31   ` Jan Kara
@ 2024-08-03 16:52     ` Amir Goldstein
  2024-10-25  7:55       ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2024-08-03 16:52 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, kernel-team, linux-fsdevel, brauner

On Thu, Aug 1, 2024 at 6:31 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 25-07-24 14:19:39, Josef Bacik wrote:
> > From: Amir Goldstein <amir73il@gmail.com>
> >
> > The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM,
> > but it meant for a different use case of filling file content before
> > access to a file range, so it has slightly different semantics.
> >
> > Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, same as
> > we did for FS_OPEN_PERM/FS_OPEN_EXEC_PERM.
> >
> > FS_PRE_MODIFY is a new permission event, with similar semantics as
> > FS_PRE_ACCESS, which is called before a file is modified.
> >
> > FS_ACCESS_PERM is reported also on blockdev and pipes, but the new
> > pre-content events are only reported for regular files and dirs.
> >
> > The pre-content events are meant to be used by hierarchical storage
> > managers that want to fill the content of files on first access.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> The patch looks good. Just out of curiosity:
>
> > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> > index 8be029bc50b1..21e72b837ec5 100644
> > --- a/include/linux/fsnotify_backend.h
> > +++ b/include/linux/fsnotify_backend.h
> > @@ -56,6 +56,9 @@
> >  #define FS_ACCESS_PERM               0x00020000      /* access event in a permissions hook */
> >  #define FS_OPEN_EXEC_PERM    0x00040000      /* open/exec event in a permission hook */
> >
> > +#define FS_PRE_ACCESS                0x00100000      /* Pre-content access hook */
> > +#define FS_PRE_MODIFY                0x00200000      /* Pre-content modify hook */
>
> Why is a hole left here in the flag space?

Can't remember.

Currently we have a draft design for two more events
FS_PATH_ACCESS, FS_PATH_MODIFY
https://github.com/amir73il/man-pages/commits/fan_pre_path

So might have been a desire to keep the pre-events group on the nibble.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/10] fsnotify: generate pre-content permission event on open
  2024-08-01 17:01   ` Jan Kara
@ 2024-08-03 16:53     ` Amir Goldstein
  0 siblings, 0 replies; 47+ messages in thread
From: Amir Goldstein @ 2024-08-03 16:53 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, kernel-team, linux-fsdevel, brauner

On Thu, Aug 1, 2024 at 7:01 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 25-07-24 14:19:40, Josef Bacik wrote:
> > From: Amir Goldstein <amir73il@gmail.com>
> >
> > FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on open depending on
> > file open mode.  The pre-content event will be generated in addition to
> > FS_OPEN_PERM, but without sb_writers held and after file was truncated
> > in case file was opened with O_CREAT and/or O_TRUNC.
> >
> > The event will have a range info of (0..0) to provide an opportunity
> > to fill entire file content on open.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> ...
>
> > @@ -176,10 +180,14 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
> >
> >  /*
> >   * fsnotify_file_perm - permission hook before file access
> > + *
> > + * Called from read()/write() with perm_mas MAY_READ/MAY_WRITE.
>                                         ^^^ perm_mask
>
> > + * Called from open() with MAY_OPEN in addition to fsnotify_open_perm(),
> > + * but without sb_writers held and after the file was truncated.
>
> This sentence is a bit confusing to me (although I think I understand what
> you want to say). How about just:
>
>  * Called from open() with MAY_OPEN without sb_writers held and after the
>  * file was truncated. Note that this is a different event from
>  * fsnotify_open_perm().

sounds good.

Thanks,
Amir,

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/10] fanotify: introduce FAN_PRE_MODIFY permission event
  2024-08-01 17:09   ` Jan Kara
@ 2024-08-03 16:55     ` Amir Goldstein
  2024-08-05 11:18       ` Jan Kara
  0 siblings, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2024-08-03 16:55 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, kernel-team, linux-fsdevel, brauner

On Thu, Aug 1, 2024 at 7:09 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 25-07-24 14:19:42, Josef Bacik wrote:
> > From: Amir Goldstein <amir73il@gmail.com>
> >
> > Generate FAN_PRE_MODIFY permission event from fsnotify_file_perm()
> > pre-write hook to notify fanotify listeners on an intent to make
> > modification to a file.
> >
> > Like FAN_PRE_ACCESS, it is only allowed with FAN_CLASS_PRE_CONTENT
> > and unlike FAN_MODIFY, it is only allowed on regular files.
> >
> > Like FAN_PRE_ACCESS, it is generated without sb_start_write() held,
> > so it is safe for to perform filesystem modifications in the the
>                 ^^^ seems superfluous                      ^^^ twice "the"
>
> > context of event handler.
> ...
> > diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> > index 5c811baf44d2..ae6cb2688d52 100644
> > --- a/include/linux/fanotify.h
> > +++ b/include/linux/fanotify.h
> > @@ -92,7 +92,8 @@
> >  #define FANOTIFY_CONTENT_PERM_EVENTS (FAN_OPEN_PERM | FAN_OPEN_EXEC_PERM | \
> >                                     FAN_ACCESS_PERM)
> >  /* Pre-content events can be used to fill file content */
> > -#define FANOTIFY_PRE_CONTENT_EVENTS  (FAN_PRE_ACCESS)
> > +#define FANOTIFY_PRE_CONTENT_EVENTS  (FAN_PRE_ACCESS | FAN_PRE_MODIFY)
> > +#define FANOTIFY_PRE_MODIFY_EVENTS   (FAN_PRE_MODIFY)
>
> I didn't find FANOTIFY_PRE_MODIFY_EVENTS used anywhere?

Right. It is used later in the sb_write_barrier patches.
We can introduce it later if you prefer.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 06/10] fanotify: pass optional file access range in pre-content event
  2024-08-01 17:16   ` Jan Kara
@ 2024-08-03 17:00     ` Amir Goldstein
  2024-08-05 11:20       ` Jan Kara
  0 siblings, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2024-08-03 17:00 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, kernel-team, linux-fsdevel, brauner

On Thu, Aug 1, 2024 at 7:16 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 25-07-24 14:19:43, Josef Bacik wrote:
> > From: Amir Goldstein <amir73il@gmail.com>
> >
> > We would like to add file range information to pre-content events.
> >
> > Pass a struct file_range with optional offset and length to event handler
> > along with pre-content permission event.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> ...
>
> > @@ -565,6 +569,10 @@ static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path,
> >       pevent->hdr.len = 0;
> >       pevent->state = FAN_EVENT_INIT;
> >       pevent->path = *path;
> > +     if (range) {
> > +             pevent->ppos = range->ppos;
> > +             pevent->count = range->count;
> > +     }
>
> Shouldn't we initialze ppos & count in case range info isn't passed?

Currently, range info is always passed in case of
fanotify_event_has_access_range(), but for robustness I guess we should.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 09/10] fanotify: allow to set errno in FAN_DENY permission response
  2024-08-01 21:14   ` Jan Kara
@ 2024-08-03 17:06     ` Amir Goldstein
  0 siblings, 0 replies; 47+ messages in thread
From: Amir Goldstein @ 2024-08-03 17:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, kernel-team, linux-fsdevel, brauner

On Thu, Aug 1, 2024 at 11:14 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 25-07-24 14:19:46, Josef Bacik wrote:
> > From: Amir Goldstein <amir73il@gmail.com>
> >
> > With FAN_DENY response, user trying to perform the filesystem operation
> > gets an error with errno set to EPERM.
> >
> > It is useful for hierarchical storage management (HSM) service to be able
> > to deny access for reasons more diverse than EPERM, for example EAGAIN,
> > if HSM could retry the operation later.
> >
> > Allow fanotify groups with priority FAN_CLASSS_PRE_CONTENT to responsd
> > to permission events with the response value FAN_DENY_ERRNO(errno),
> > instead of FAN_DENY to return a custom error.
> >
> > Limit custom error to values to some errors expected on read(2)/write(2)
>         ^^^ parse error. Perhaps: "Limit custom error values to errors
> expected on read..."
>
> > and open(2) of regular files. This list could be extended in the future.
> > Userspace can test for legitimate values of FAN_DENY_ERRNO(errno) by
> > writing a response to an fanotify group fd with a value of FAN_NOFD
> > in the fd field of the response.
> >
> > The change in fanotify_response is backward compatible, because errno is
> > written in the high 8 bits of the 32bit response field and old kernels
> > reject respose value with high bits set.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> ...
>
> > @@ -258,18 +258,25 @@ static int fanotify_get_response(struct fsnotify_group *group,
> >       }
> >
> >       /* userspace responded, convert to something usable */
> > -     switch (event->response & FANOTIFY_RESPONSE_ACCESS) {
> > +     switch (FAN_RESPONSE_ACCESS(event->response)) {
> >       case FAN_ALLOW:
> >               ret = 0;
> >               break;
> >       case FAN_DENY:
> > +             /* Check custom errno from pre-content events */
> > +             errno = FAN_RESPONSE_ERRNO(event->response);
> > +             if (errno) {
> > +                     ret = -errno;
> > +                     break;
> > +             }
> > +             fallthrough;
> >       default:
> >               ret = -EPERM;
> >       }
> >
> >       /* Check if the response should be audited */
> >       if (event->response & FAN_AUDIT)
> > -             audit_fanotify(event->response & ~FAN_AUDIT,
> > +             audit_fanotify(FAN_RESPONSE_ACCESS(event->response),
> >                              &event->audit_rule);
>
> I think you need to also keep FAN_INFO in the flags not to break some
> userspace possibly parsing audit requests.
>
> > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > index c3c8b2ea80b6..b4d810168521 100644
> > --- a/fs/notify/fanotify/fanotify_user.c
> > +++ b/fs/notify/fanotify/fanotify_user.c
> > @@ -337,11 +337,12 @@ static int process_access_response(struct fsnotify_group *group,
> >       struct fanotify_perm_event *event;
> >       int fd = response_struct->fd;
> >       u32 response = response_struct->response;
> > +     int errno = FAN_RESPONSE_ERRNO(response);
> >       int ret = info_len;
> >       struct fanotify_response_info_audit_rule friar;
> >
> > -     pr_debug("%s: group=%p fd=%d response=%u buf=%p size=%zu\n", __func__,
> > -              group, fd, response, info, info_len);
> > +     pr_debug("%s: group=%p fd=%d response=%x errno=%d buf=%p size=%zu\n",
> > +              __func__, group, fd, response, errno, info, info_len);
> >       /*
> >        * make sure the response is valid, if invalid we do nothing and either
> >        * userspace can send a valid response or we will clean it up after the
> > @@ -350,9 +351,33 @@ static int process_access_response(struct fsnotify_group *group,
> >       if (response & ~FANOTIFY_RESPONSE_VALID_MASK)
> >               return -EINVAL;
> >
> > -     switch (response & FANOTIFY_RESPONSE_ACCESS) {
> > +     switch (FAN_RESPONSE_ACCESS(response)) {
> >       case FAN_ALLOW:
> > +             if (errno)
> > +                     return -EINVAL;
> > +             break;
> >       case FAN_DENY:
> > +             /* Custom errno is supported only for pre-content groups */
> > +             if (errno && group->priority != FSNOTIFY_PRIO_PRE_CONTENT)
> > +                     return -EINVAL;
> > +
> > +             /*
> > +              * Limit errno to values expected on open(2)/read(2)/write(2)
> > +              * of regular files.
> > +              */
> > +             switch (errno) {
> > +             case 0:
> > +             case EIO:
> > +             case EPERM:
> > +             case EBUSY:
> > +             case ETXTBSY:
> > +             case EAGAIN:
> > +             case ENOSPC:
> > +             case EDQUOT:
> > +                     break;
> > +             default:
> > +                     return -EINVAL;
> > +             }
> >               break;
> >       default:
> >               return -EINVAL;
> > diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> > index ae6cb2688d52..76d818a7d654 100644
> > --- a/include/linux/fanotify.h
> > +++ b/include/linux/fanotify.h
> > @@ -132,7 +132,14 @@
> >  /* These masks check for invalid bits in permission responses. */
> >  #define FANOTIFY_RESPONSE_ACCESS (FAN_ALLOW | FAN_DENY)
> >  #define FANOTIFY_RESPONSE_FLAGS (FAN_AUDIT | FAN_INFO)
> > -#define FANOTIFY_RESPONSE_VALID_MASK (FANOTIFY_RESPONSE_ACCESS | FANOTIFY_RESPONSE_FLAGS)
> > +#define FANOTIFY_RESPONSE_ERRNO      (FAN_ERRNO_MASK << FAN_ERRNO_SHIFT)
> > +#define FANOTIFY_RESPONSE_VALID_MASK \
> > +     (FANOTIFY_RESPONSE_ACCESS | FANOTIFY_RESPONSE_FLAGS | \
> > +      FANOTIFY_RESPONSE_ERRNO)
> > +
> > +/* errno other than EPERM can specified in upper byte of deny response */
> > +#define FAN_RESPONSE_ACCESS(res)     ((res) & FANOTIFY_RESPONSE_ACCESS)
> > +#define FAN_RESPONSE_ERRNO(res)              ((int)((res) >> FAN_ERRNO_SHIFT))
>
> I have to say I find the names FANOTIFY_RESPONSE_ERRNO and
> FAN_RESPONSE_ERRNO() (and similarly with FAN_RESPONSE_ACCESS) very similar
> and thus confusing. I was staring at it for 5 minutes wondering how comes
> it compiles before I realized one prefix is shorter than the other one so
> the indentifiers are indeed different. Maybe we'd make these inline
> functions instead of macros and name them like:
>
> fanotify_get_response_decision()
> fanotify_get_response_errno()

Sounds good to me.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/10] fanotify: introduce FAN_PRE_MODIFY permission event
  2024-08-03 16:55     ` Amir Goldstein
@ 2024-08-05 11:18       ` Jan Kara
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2024-08-05 11:18 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Josef Bacik, kernel-team, linux-fsdevel, brauner

On Sat 03-08-24 18:55:42, Amir Goldstein wrote:
> On Thu, Aug 1, 2024 at 7:09 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Thu 25-07-24 14:19:42, Josef Bacik wrote:
> > > From: Amir Goldstein <amir73il@gmail.com>
> > >
> > > Generate FAN_PRE_MODIFY permission event from fsnotify_file_perm()
> > > pre-write hook to notify fanotify listeners on an intent to make
> > > modification to a file.
> > >
> > > Like FAN_PRE_ACCESS, it is only allowed with FAN_CLASS_PRE_CONTENT
> > > and unlike FAN_MODIFY, it is only allowed on regular files.
> > >
> > > Like FAN_PRE_ACCESS, it is generated without sb_start_write() held,
> > > so it is safe for to perform filesystem modifications in the the
> >                 ^^^ seems superfluous                      ^^^ twice "the"
> >
> > > context of event handler.
> > ...
> > > diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> > > index 5c811baf44d2..ae6cb2688d52 100644
> > > --- a/include/linux/fanotify.h
> > > +++ b/include/linux/fanotify.h
> > > @@ -92,7 +92,8 @@
> > >  #define FANOTIFY_CONTENT_PERM_EVENTS (FAN_OPEN_PERM | FAN_OPEN_EXEC_PERM | \
> > >                                     FAN_ACCESS_PERM)
> > >  /* Pre-content events can be used to fill file content */
> > > -#define FANOTIFY_PRE_CONTENT_EVENTS  (FAN_PRE_ACCESS)
> > > +#define FANOTIFY_PRE_CONTENT_EVENTS  (FAN_PRE_ACCESS | FAN_PRE_MODIFY)
> > > +#define FANOTIFY_PRE_MODIFY_EVENTS   (FAN_PRE_MODIFY)
> >
> > I didn't find FANOTIFY_PRE_MODIFY_EVENTS used anywhere?
> 
> Right. It is used later in the sb_write_barrier patches.
> We can introduce it later if you prefer.

If you say it eventually gets used then I'm fine with this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 06/10] fanotify: pass optional file access range in pre-content event
  2024-08-03 17:00     ` Amir Goldstein
@ 2024-08-05 11:20       ` Jan Kara
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2024-08-05 11:20 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Josef Bacik, kernel-team, linux-fsdevel, brauner

On Sat 03-08-24 19:00:22, Amir Goldstein wrote:
> On Thu, Aug 1, 2024 at 7:16 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Thu 25-07-24 14:19:43, Josef Bacik wrote:
> > > From: Amir Goldstein <amir73il@gmail.com>
> > >
> > > We would like to add file range information to pre-content events.
> > >
> > > Pass a struct file_range with optional offset and length to event handler
> > > along with pre-content permission event.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >
> > ...
> >
> > > @@ -565,6 +569,10 @@ static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path,
> > >       pevent->hdr.len = 0;
> > >       pevent->state = FAN_EVENT_INIT;
> > >       pevent->path = *path;
> > > +     if (range) {
> > > +             pevent->ppos = range->ppos;
> > > +             pevent->count = range->count;
> > > +     }
> >
> > Shouldn't we initialze ppos & count in case range info isn't passed?
> 
> Currently, range info is always passed in case of
> fanotify_event_has_access_range(), but for robustness I guess we should.

Yeah, I agree there's no bug currently. This is mostly for future-proofing
and peace of mind :)

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
  2024-08-02 16:03     ` Josef Bacik
@ 2024-08-05 12:13       ` Jan Kara
  2024-08-07 19:04         ` Josef Bacik
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2024-08-05 12:13 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Jan Kara, kernel-team, linux-fsdevel, amir73il, brauner

On Fri 02-08-24 12:03:57, Josef Bacik wrote:
> On Thu, Aug 01, 2024 at 11:40:25PM +0200, Jan Kara wrote:
> > On Thu 25-07-24 14:19:47, Josef Bacik wrote:
> > > FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on page fault depending
> > > on the faulting method.
> > > 
> > > This pre-content event is meant to be used by hierarchical storage
> > > managers that want to fill in the file content on first read access.
> > > 
> > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ...
> > > @@ -3287,6 +3288,35 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
> > >  	if (unlikely(index >= max_idx))
> > >  		return VM_FAULT_SIGBUS;
> > >  
> > > +	/*
> > > +	 * If we have pre-content watchers then we need to generate events on
> > > +	 * page fault so that we can populate any data before the fault.
> > > +	 *
> > > +	 * We only do this on the first pass through, otherwise the populating
> > > +	 * application could potentially deadlock on the mmap lock if it tries
> > > +	 * to populate it with mmap.
> > > +	 */
> > > +	if (fault_flag_allow_retry_first(vmf->flags) &&
> > > +	    fsnotify_file_has_content_watches(file)) {
> > 
> > I'm somewhat nervous that if ALLOW_RETRY isn't set, we'd silently jump into
> > readpage code without ever sending pre-content event and thus we'd possibly
> > expose invalid content to userspace? I think we should fail the fault if
> > fsnotify_file_has_content_watches(file) && !(vmf->flags &
> > FAULT_FLAG_ALLOW_RETRY).
> 
> I was worried about this too but it seems to not be the case that we'll not ever
> have ALLOW_RETRY.  That being said I'm fine turning this into a sigbus.

Do you mean that with your workloads we always have ALLOW_RETRY set? As I
wrote, currently you'd have to try really hard to hit such paths but they
are there - for example if you place uprobe on an address in a VMA that is
not present, the page fault is going to happen without ALLOW_RETRY set.

> > > +		int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_READ;
> > > +		loff_t pos = vmf->pgoff << PAGE_SHIFT;
> > > +
> > > +		fpin = maybe_unlock_mmap_for_io(vmf, fpin);
> > > +
> > > +		/*
> > > +		 * We can only emit the event if we did actually release the
> > > +		 * mmap lock.
> > > +		 */
> > > +		if (fpin) {
> > > +			error = fsnotify_file_area_perm(fpin, mask, &pos,
> > > +							PAGE_SIZE);
> > > +			if (error) {
> > > +				fput(fpin);
> > > +				return VM_FAULT_ERROR;
> > > +			}
> > > +		}
> > > +	}
> > > +
> > >  	/*
> > >  	 * Do we have something in the page cache already?
> > >  	 */
> > ...
> > > @@ -3612,6 +3643,13 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
> > >  	unsigned long rss = 0;
> > >  	unsigned int nr_pages = 0, mmap_miss = 0, mmap_miss_saved, folio_type;
> > >  
> > > +	/*
> > > +	 * We are under RCU, we can't emit events here, we need to force a
> > > +	 * normal fault to make sure the events get sent.
> > > +	 */
> > > +	if (fsnotify_file_has_content_watches(file))
> > > +		return ret;
> > > +
> > 
> > I don't think we need to do anything for filemap_map_pages(). The call just
> > inserts page cache content into page tables and whatever is in the page
> > cache and has folio_uptodate() set should be already valid file content,
> > shouldn't it?
> 
> I'll make this comment more clear.  filemap_fault() will start readahead,
> but we'll only emit the event for the page size that we're faulting.  I
> had looked at putting this at the readahead place and figuring out the
> readahead size, but literally anything could trigger readahead so it's
> better to just not allow filemap_map_pages() to happen, otherwise we'll
> end up with empty pages (if the content hasn't been populated yet) and
> never emit an event for those ranges.

This seems like an interesting problem. Even ordinary read(2) will trigger
readahead and as you say, we would be instantiating folios with wrong
content (zeros) due to that. It seems as a fragile design to keep such
folios in the page cache and place checks in all the places that could
possibly make their content visible to the user. I'd rather make sure that
if we pull folios into page cache (and set folio_uptodate() bit), their
content is indeed valid.

What we could do is to turn off readahead on the inode if
fsnotify_file_has_content_watches() is true. Essentially the handler of the
precontent event can do a much better job of prefilling the page cache with
whatever content is needed in a range that makes sense. And then we could
leave filemap_map_pages() intact. What do you think guys?

								Honza


> Thanks,
> 
> Josef
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
  2024-08-05 12:13       ` Jan Kara
@ 2024-08-07 19:04         ` Josef Bacik
  0 siblings, 0 replies; 47+ messages in thread
From: Josef Bacik @ 2024-08-07 19:04 UTC (permalink / raw)
  To: Jan Kara; +Cc: kernel-team, linux-fsdevel, amir73il, brauner

On Mon, Aug 05, 2024 at 02:13:49PM +0200, Jan Kara wrote:
> On Fri 02-08-24 12:03:57, Josef Bacik wrote:
> > On Thu, Aug 01, 2024 at 11:40:25PM +0200, Jan Kara wrote:
> > > On Thu 25-07-24 14:19:47, Josef Bacik wrote:
> > > > FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on page fault depending
> > > > on the faulting method.
> > > > 
> > > > This pre-content event is meant to be used by hierarchical storage
> > > > managers that want to fill in the file content on first read access.
> > > > 
> > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > > ...
> > > > @@ -3287,6 +3288,35 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
> > > >  	if (unlikely(index >= max_idx))
> > > >  		return VM_FAULT_SIGBUS;
> > > >  
> > > > +	/*
> > > > +	 * If we have pre-content watchers then we need to generate events on
> > > > +	 * page fault so that we can populate any data before the fault.
> > > > +	 *
> > > > +	 * We only do this on the first pass through, otherwise the populating
> > > > +	 * application could potentially deadlock on the mmap lock if it tries
> > > > +	 * to populate it with mmap.
> > > > +	 */
> > > > +	if (fault_flag_allow_retry_first(vmf->flags) &&
> > > > +	    fsnotify_file_has_content_watches(file)) {
> > > 
> > > I'm somewhat nervous that if ALLOW_RETRY isn't set, we'd silently jump into
> > > readpage code without ever sending pre-content event and thus we'd possibly
> > > expose invalid content to userspace? I think we should fail the fault if
> > > fsnotify_file_has_content_watches(file) && !(vmf->flags &
> > > FAULT_FLAG_ALLOW_RETRY).
> > 
> > I was worried about this too but it seems to not be the case that we'll not ever
> > have ALLOW_RETRY.  That being said I'm fine turning this into a sigbus.
> 
> Do you mean that with your workloads we always have ALLOW_RETRY set? As I
> wrote, currently you'd have to try really hard to hit such paths but they
> are there - for example if you place uprobe on an address in a VMA that is
> not present, the page fault is going to happen without ALLOW_RETRY set.

From what I can tell we almost always have FOLL_UNLOCKABLE set, which is what
translates into ALLOW_RETRY.  There's definitely some paths that can get there,
but as far as what happens in a normal environment we're going to almost always
have ALLOW_RETRY set.

This does leave a hole in some corner cases.  I'm content to say "don't do that"
if you want to use these hooks.

Optionally we could add a FAN_PRE_MMAP hook in vm_mmap() for the range that is
being mmap'ed to make sure we never miss any events, and then applications can
decide if they want to risk it with the pagefault hooks or enable the mmap hooks
for absolute certainty.

> 
> > > > +		int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_READ;
> > > > +		loff_t pos = vmf->pgoff << PAGE_SHIFT;
> > > > +
> > > > +		fpin = maybe_unlock_mmap_for_io(vmf, fpin);
> > > > +
> > > > +		/*
> > > > +		 * We can only emit the event if we did actually release the
> > > > +		 * mmap lock.
> > > > +		 */
> > > > +		if (fpin) {
> > > > +			error = fsnotify_file_area_perm(fpin, mask, &pos,
> > > > +							PAGE_SIZE);
> > > > +			if (error) {
> > > > +				fput(fpin);
> > > > +				return VM_FAULT_ERROR;
> > > > +			}
> > > > +		}
> > > > +	}
> > > > +
> > > >  	/*
> > > >  	 * Do we have something in the page cache already?
> > > >  	 */
> > > ...
> > > > @@ -3612,6 +3643,13 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
> > > >  	unsigned long rss = 0;
> > > >  	unsigned int nr_pages = 0, mmap_miss = 0, mmap_miss_saved, folio_type;
> > > >  
> > > > +	/*
> > > > +	 * We are under RCU, we can't emit events here, we need to force a
> > > > +	 * normal fault to make sure the events get sent.
> > > > +	 */
> > > > +	if (fsnotify_file_has_content_watches(file))
> > > > +		return ret;
> > > > +
> > > 
> > > I don't think we need to do anything for filemap_map_pages(). The call just
> > > inserts page cache content into page tables and whatever is in the page
> > > cache and has folio_uptodate() set should be already valid file content,
> > > shouldn't it?
> > 
> > I'll make this comment more clear.  filemap_fault() will start readahead,
> > but we'll only emit the event for the page size that we're faulting.  I
> > had looked at putting this at the readahead place and figuring out the
> > readahead size, but literally anything could trigger readahead so it's
> > better to just not allow filemap_map_pages() to happen, otherwise we'll
> > end up with empty pages (if the content hasn't been populated yet) and
> > never emit an event for those ranges.
> 
> This seems like an interesting problem. Even ordinary read(2) will trigger
> readahead and as you say, we would be instantiating folios with wrong
> content (zeros) due to that. It seems as a fragile design to keep such
> folios in the page cache and place checks in all the places that could
> possibly make their content visible to the user. I'd rather make sure that
> if we pull folios into page cache (and set folio_uptodate() bit), their
> content is indeed valid.

The hook exists before we go looking in pagecache, so we're fine with read(2),
the only problem is mmap (AFAICT, I am not very smart after all).

> 
> What we could do is to turn off readahead on the inode if
> fsnotify_file_has_content_watches() is true. Essentially the handler of the
> precontent event can do a much better job of prefilling the page cache with
> whatever content is needed in a range that makes sense. And then we could
> leave filemap_map_pages() intact. What do you think guys?

I had considered this but decided against it because it seemed like a big
hammer, but if you're cool with it then so am I.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/10] fanotify: report file range info with pre-content events
  2024-08-01 17:38   ` Jan Kara
@ 2024-10-24 10:06     ` Amir Goldstein
  2024-10-24 16:35       ` Jan Kara
  0 siblings, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2024-10-24 10:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, kernel-team, linux-fsdevel, brauner

On Thu, Aug 1, 2024 at 7:38 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 25-07-24 14:19:45, Josef Bacik wrote:
> > From: Amir Goldstein <amir73il@gmail.com>
> >
> > With group class FAN_CLASS_PRE_CONTENT, report offset and length info
> > along with FAN_PRE_ACCESS and FAN_PRE_MODIFY permission events.
> >
> > This information is meant to be used by hierarchical storage managers
> > that want to fill partial content of files on first access to range.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/notify/fanotify/fanotify.h      |  8 +++++++
> >  fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++++
> >  include/uapi/linux/fanotify.h      |  7 ++++++
> >  3 files changed, 53 insertions(+)
> >
> > diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> > index 93598b7d5952..7f06355afa1f 100644
> > --- a/fs/notify/fanotify/fanotify.h
> > +++ b/fs/notify/fanotify/fanotify.h
> > @@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask)
> >               mask & FANOTIFY_PERM_EVENTS;
> >  }
> >
> > +static inline bool fanotify_event_has_access_range(struct fanotify_event *event)
> > +{
> > +     if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS))
> > +             return false;
> > +
> > +     return FANOTIFY_PERM(event)->ppos;
> > +}
>
> Now I'm a bit confused. Can we have legally NULL ppos for an event from
> FANOTIFY_PRE_CONTENT_EVENTS?
>

Sorry for the very late reply...

The short answer is that NULL FANOTIFY_PERM(event)->ppos
simply means that fanotify_alloc_perm_event() was called with NULL
range, which is the very common case of legacy permission events.

The long answer is a bit convoluted, so bare with me.
The long answer is to the question whether fsnotify_file_range() can
be called with a NULL ppos.

This shouldn't be possible AFAIK for regular files and directories,
unless some fs that is marked with FS_ALLOW_HSM opens a regular
file with FMODE_STREAM, which should not be happening IMO,
but then the assertion belongs inside fsnotify_file_range().

However, there was another way to get NULL ppos before I added the patch
"fsnotify: generate pre-content permission event on open"

Which made this "half intentional" change:
 static inline int fsnotify_file_perm(struct file *file, int perm_mask)
 {
-       return fsnotify_file_area_perm(file, perm_mask, NULL, 0);
+       return fsnotify_file_area_perm(file, perm_mask, &file->f_pos, 0);
 }

In order to implement:
"The event will have a range info of (0..0) to provide an opportunity
 to fill the entire file content on open."

The problem is that do_open() was not the only caller of fsnotify_file_perm().
There is another call from iterate_dir() and the change above causes
FS_PRE_ACCESS events on readdir to report the directory f_pos -
Do we want that? I think we do, but HSM should be able to tell the
difference between opendir() and readdir(), because my HSM only
wants to fill dir content on the latter.

I think that we need to decide if we want to allow pre-content events
with no range reported (e.g. for readdir()) or if pre-content events must
report a range, can report (0..-1) or something for "entire range".

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/10] fanotify: report file range info with pre-content events
  2024-10-24 10:06     ` Amir Goldstein
@ 2024-10-24 16:35       ` Jan Kara
  2024-10-24 16:49         ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2024-10-24 16:35 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Josef Bacik, kernel-team, linux-fsdevel, brauner

On Thu 24-10-24 12:06:35, Amir Goldstein wrote:
> On Thu, Aug 1, 2024 at 7:38 PM Jan Kara <jack@suse.cz> wrote:
> > On Thu 25-07-24 14:19:45, Josef Bacik wrote:
> > > From: Amir Goldstein <amir73il@gmail.com>
> > >
> > > With group class FAN_CLASS_PRE_CONTENT, report offset and length info
> > > along with FAN_PRE_ACCESS and FAN_PRE_MODIFY permission events.
> > >
> > > This information is meant to be used by hierarchical storage managers
> > > that want to fill partial content of files on first access to range.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > >  fs/notify/fanotify/fanotify.h      |  8 +++++++
> > >  fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++++
> > >  include/uapi/linux/fanotify.h      |  7 ++++++
> > >  3 files changed, 53 insertions(+)
> > >
> > > diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> > > index 93598b7d5952..7f06355afa1f 100644
> > > --- a/fs/notify/fanotify/fanotify.h
> > > +++ b/fs/notify/fanotify/fanotify.h
> > > @@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask)
> > >               mask & FANOTIFY_PERM_EVENTS;
> > >  }
> > >
> > > +static inline bool fanotify_event_has_access_range(struct fanotify_event *event)
> > > +{
> > > +     if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS))
> > > +             return false;
> > > +
> > > +     return FANOTIFY_PERM(event)->ppos;
> > > +}
> >
> > Now I'm a bit confused. Can we have legally NULL ppos for an event from
> > FANOTIFY_PRE_CONTENT_EVENTS?
> >
> 
> Sorry for the very late reply...
> 
> The short answer is that NULL FANOTIFY_PERM(event)->ppos
> simply means that fanotify_alloc_perm_event() was called with NULL
> range, which is the very common case of legacy permission events.
> 
> The long answer is a bit convoluted, so bare with me.
> The long answer is to the question whether fsnotify_file_range() can
> be called with a NULL ppos.
> 
> This shouldn't be possible AFAIK for regular files and directories,
> unless some fs that is marked with FS_ALLOW_HSM opens a regular
> file with FMODE_STREAM, which should not be happening IMO,
> but then the assertion belongs inside fsnotify_file_range().
> 
> However, there was another way to get NULL ppos before I added the patch
> "fsnotify: generate pre-content permission event on open"
> 
> Which made this "half intentional" change:
>  static inline int fsnotify_file_perm(struct file *file, int perm_mask)
>  {
> -       return fsnotify_file_area_perm(file, perm_mask, NULL, 0);
> +       return fsnotify_file_area_perm(file, perm_mask, &file->f_pos, 0);
>  }
> 
> In order to implement:
> "The event will have a range info of (0..0) to provide an opportunity
>  to fill the entire file content on open."
> 
> The problem is that do_open() was not the only caller of fsnotify_file_perm().
> There is another call from iterate_dir() and the change above causes
> FS_PRE_ACCESS events on readdir to report the directory f_pos -
> Do we want that? I think we do, but HSM should be able to tell the
> difference between opendir() and readdir(), because my HSM only
> wants to fill dir content on the latter.

Well, I'm not so sure we want to report fpos on opendir / readdir(). For
directories fpos is an opaque cookie that is filesystem dependent and you
are not even allowed to carry it from open to open. It is valid only within
that one open-close session if I remember right. So userspace HSM cannot do
much with it and in my opinion reporting it to userspace is a recipe for
abuse...

I'm undecided whether we want to allow pre-access events without range or
enforce 0-0 range. I don't think there's a big practical difference.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/10] fanotify: report file range info with pre-content events
  2024-10-24 16:35       ` Jan Kara
@ 2024-10-24 16:49         ` Amir Goldstein
  2024-10-24 16:56           ` Jan Kara
  0 siblings, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2024-10-24 16:49 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, kernel-team, linux-fsdevel, brauner

On Thu, Oct 24, 2024 at 6:35 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 24-10-24 12:06:35, Amir Goldstein wrote:
> > On Thu, Aug 1, 2024 at 7:38 PM Jan Kara <jack@suse.cz> wrote:
> > > On Thu 25-07-24 14:19:45, Josef Bacik wrote:
> > > > From: Amir Goldstein <amir73il@gmail.com>
> > > >
> > > > With group class FAN_CLASS_PRE_CONTENT, report offset and length info
> > > > along with FAN_PRE_ACCESS and FAN_PRE_MODIFY permission events.
> > > >
> > > > This information is meant to be used by hierarchical storage managers
> > > > that want to fill partial content of files on first access to range.
> > > >
> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > ---
> > > >  fs/notify/fanotify/fanotify.h      |  8 +++++++
> > > >  fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++++
> > > >  include/uapi/linux/fanotify.h      |  7 ++++++
> > > >  3 files changed, 53 insertions(+)
> > > >
> > > > diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> > > > index 93598b7d5952..7f06355afa1f 100644
> > > > --- a/fs/notify/fanotify/fanotify.h
> > > > +++ b/fs/notify/fanotify/fanotify.h
> > > > @@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask)
> > > >               mask & FANOTIFY_PERM_EVENTS;
> > > >  }
> > > >
> > > > +static inline bool fanotify_event_has_access_range(struct fanotify_event *event)
> > > > +{
> > > > +     if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS))
> > > > +             return false;
> > > > +
> > > > +     return FANOTIFY_PERM(event)->ppos;
> > > > +}
> > >
> > > Now I'm a bit confused. Can we have legally NULL ppos for an event from
> > > FANOTIFY_PRE_CONTENT_EVENTS?
> > >
> >
> > Sorry for the very late reply...
> >
> > The short answer is that NULL FANOTIFY_PERM(event)->ppos
> > simply means that fanotify_alloc_perm_event() was called with NULL
> > range, which is the very common case of legacy permission events.
> >
> > The long answer is a bit convoluted, so bare with me.
> > The long answer is to the question whether fsnotify_file_range() can
> > be called with a NULL ppos.
> >
> > This shouldn't be possible AFAIK for regular files and directories,
> > unless some fs that is marked with FS_ALLOW_HSM opens a regular
> > file with FMODE_STREAM, which should not be happening IMO,
> > but then the assertion belongs inside fsnotify_file_range().
> >
> > However, there was another way to get NULL ppos before I added the patch
> > "fsnotify: generate pre-content permission event on open"
> >
> > Which made this "half intentional" change:
> >  static inline int fsnotify_file_perm(struct file *file, int perm_mask)
> >  {
> > -       return fsnotify_file_area_perm(file, perm_mask, NULL, 0);
> > +       return fsnotify_file_area_perm(file, perm_mask, &file->f_pos, 0);
> >  }
> >
> > In order to implement:
> > "The event will have a range info of (0..0) to provide an opportunity
> >  to fill the entire file content on open."
> >
> > The problem is that do_open() was not the only caller of fsnotify_file_perm().
> > There is another call from iterate_dir() and the change above causes
> > FS_PRE_ACCESS events on readdir to report the directory f_pos -
> > Do we want that? I think we do, but HSM should be able to tell the
> > difference between opendir() and readdir(), because my HSM only
> > wants to fill dir content on the latter.
>
> Well, I'm not so sure we want to report fpos on opendir / readdir(). For
> directories fpos is an opaque cookie that is filesystem dependent and you
> are not even allowed to carry it from open to open. It is valid only within
> that one open-close session if I remember right. So userspace HSM cannot do
> much with it and in my opinion reporting it to userspace is a recipe for
> abuse...
>
> I'm undecided whether we want to allow pre-access events without range or
> enforce 0-0 range. I don't think there's a big practical difference.
>

So there is a practical difference.
My HSM wants to fill dir content on readdir() and not on opendir().
Other HSMs may want to fill dir content on opendir().
It could do that if opendir() (as does open()) reports range [0..0]
and readdir() reports no range.

I agree that it is somewhat ugly. I am willing to take other semantics
as long as HSM can tell the difference between opendir() and readdir().

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/10] fanotify: report file range info with pre-content events
  2024-10-24 16:49         ` Amir Goldstein
@ 2024-10-24 16:56           ` Jan Kara
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2024-10-24 16:56 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Josef Bacik, kernel-team, linux-fsdevel, brauner

On Thu 24-10-24 18:49:02, Amir Goldstein wrote:
> On Thu, Oct 24, 2024 at 6:35 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Thu 24-10-24 12:06:35, Amir Goldstein wrote:
> > > On Thu, Aug 1, 2024 at 7:38 PM Jan Kara <jack@suse.cz> wrote:
> > > > On Thu 25-07-24 14:19:45, Josef Bacik wrote:
> > > > > From: Amir Goldstein <amir73il@gmail.com>
> > > > >
> > > > > With group class FAN_CLASS_PRE_CONTENT, report offset and length info
> > > > > along with FAN_PRE_ACCESS and FAN_PRE_MODIFY permission events.
> > > > >
> > > > > This information is meant to be used by hierarchical storage managers
> > > > > that want to fill partial content of files on first access to range.
> > > > >
> > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > ---
> > > > >  fs/notify/fanotify/fanotify.h      |  8 +++++++
> > > > >  fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++++
> > > > >  include/uapi/linux/fanotify.h      |  7 ++++++
> > > > >  3 files changed, 53 insertions(+)
> > > > >
> > > > > diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> > > > > index 93598b7d5952..7f06355afa1f 100644
> > > > > --- a/fs/notify/fanotify/fanotify.h
> > > > > +++ b/fs/notify/fanotify/fanotify.h
> > > > > @@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask)
> > > > >               mask & FANOTIFY_PERM_EVENTS;
> > > > >  }
> > > > >
> > > > > +static inline bool fanotify_event_has_access_range(struct fanotify_event *event)
> > > > > +{
> > > > > +     if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS))
> > > > > +             return false;
> > > > > +
> > > > > +     return FANOTIFY_PERM(event)->ppos;
> > > > > +}
> > > >
> > > > Now I'm a bit confused. Can we have legally NULL ppos for an event from
> > > > FANOTIFY_PRE_CONTENT_EVENTS?
> > > >
> > >
> > > Sorry for the very late reply...
> > >
> > > The short answer is that NULL FANOTIFY_PERM(event)->ppos
> > > simply means that fanotify_alloc_perm_event() was called with NULL
> > > range, which is the very common case of legacy permission events.
> > >
> > > The long answer is a bit convoluted, so bare with me.
> > > The long answer is to the question whether fsnotify_file_range() can
> > > be called with a NULL ppos.
> > >
> > > This shouldn't be possible AFAIK for regular files and directories,
> > > unless some fs that is marked with FS_ALLOW_HSM opens a regular
> > > file with FMODE_STREAM, which should not be happening IMO,
> > > but then the assertion belongs inside fsnotify_file_range().
> > >
> > > However, there was another way to get NULL ppos before I added the patch
> > > "fsnotify: generate pre-content permission event on open"
> > >
> > > Which made this "half intentional" change:
> > >  static inline int fsnotify_file_perm(struct file *file, int perm_mask)
> > >  {
> > > -       return fsnotify_file_area_perm(file, perm_mask, NULL, 0);
> > > +       return fsnotify_file_area_perm(file, perm_mask, &file->f_pos, 0);
> > >  }
> > >
> > > In order to implement:
> > > "The event will have a range info of (0..0) to provide an opportunity
> > >  to fill the entire file content on open."
> > >
> > > The problem is that do_open() was not the only caller of fsnotify_file_perm().
> > > There is another call from iterate_dir() and the change above causes
> > > FS_PRE_ACCESS events on readdir to report the directory f_pos -
> > > Do we want that? I think we do, but HSM should be able to tell the
> > > difference between opendir() and readdir(), because my HSM only
> > > wants to fill dir content on the latter.
> >
> > Well, I'm not so sure we want to report fpos on opendir / readdir(). For
> > directories fpos is an opaque cookie that is filesystem dependent and you
> > are not even allowed to carry it from open to open. It is valid only within
> > that one open-close session if I remember right. So userspace HSM cannot do
> > much with it and in my opinion reporting it to userspace is a recipe for
> > abuse...
> >
> > I'm undecided whether we want to allow pre-access events without range or
> > enforce 0-0 range. I don't think there's a big practical difference.
> >
> 
> So there is a practical difference.
> My HSM wants to fill dir content on readdir() and not on opendir().
> Other HSMs may want to fill dir content on opendir().
> It could do that if opendir() (as does open()) reports range [0..0]
> and readdir() reports no range.

OK, this looks reasonably consistent so ack from me.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/10] fsnotify: introduce pre-content permission event
  2024-08-03 16:52     ` Amir Goldstein
@ 2024-10-25  7:55       ` Amir Goldstein
  2024-10-25 13:09         ` Jan Kara
  0 siblings, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2024-10-25  7:55 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, kernel-team, linux-fsdevel, brauner

On Sat, Aug 3, 2024 at 6:52 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Aug 1, 2024 at 6:31 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Thu 25-07-24 14:19:39, Josef Bacik wrote:
> > > From: Amir Goldstein <amir73il@gmail.com>
> > >
> > > The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM,
> > > but it meant for a different use case of filling file content before
> > > access to a file range, so it has slightly different semantics.
> > >
> > > Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, same as
> > > we did for FS_OPEN_PERM/FS_OPEN_EXEC_PERM.
> > >
> > > FS_PRE_MODIFY is a new permission event, with similar semantics as
> > > FS_PRE_ACCESS, which is called before a file is modified.
> > >
> > > FS_ACCESS_PERM is reported also on blockdev and pipes, but the new
> > > pre-content events are only reported for regular files and dirs.
> > >
> > > The pre-content events are meant to be used by hierarchical storage
> > > managers that want to fill the content of files on first access.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >
> > The patch looks good. Just out of curiosity:
> >
> > > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> > > index 8be029bc50b1..21e72b837ec5 100644
> > > --- a/include/linux/fsnotify_backend.h
> > > +++ b/include/linux/fsnotify_backend.h
> > > @@ -56,6 +56,9 @@
> > >  #define FS_ACCESS_PERM               0x00020000      /* access event in a permissions hook */
> > >  #define FS_OPEN_EXEC_PERM    0x00040000      /* open/exec event in a permission hook */
> > >
> > > +#define FS_PRE_ACCESS                0x00100000      /* Pre-content access hook */
> > > +#define FS_PRE_MODIFY                0x00200000      /* Pre-content modify hook */
> >
> > Why is a hole left here in the flag space?
>
> Can't remember.
>
> Currently we have a draft design for two more events
> FS_PATH_ACCESS, FS_PATH_MODIFY
> https://github.com/amir73il/man-pages/commits/fan_pre_path
>
> So might have been a desire to keep the pre-events group on the nibble.

Funny story.

I straced a program with latest FS_PRE_ACCESS (0x00080000) and
see what I got:

fanotify_mark(3, FAN_MARK_ADD|FAN_MARK_MOUNT,
FAN_CLOSE_WRITE|FAN_OPEN_PERM|FAN_ACCESS_PERM|FAN_DIR_MODIFY|FAN_ONDIR,
AT_FDCWD, "/vdd") = 0

"FAN_DIR_MODIFY"! a blast from the past [1]

It would have been nice if we reserved 0x00080000 for FAN_PATH_MODIFY [2]
to be a bit less confusing for users with old strace.

WDYT?

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20200612093343.5669-18-amir73il@gmail.com/
[2] https://github.com/amir73il/man-pages/commits/fan_pre_path

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/10] fsnotify: introduce pre-content permission event
  2024-10-25  7:55       ` Amir Goldstein
@ 2024-10-25 13:09         ` Jan Kara
  2024-10-25 13:39           ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Kara @ 2024-10-25 13:09 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Josef Bacik, kernel-team, linux-fsdevel, brauner

On Fri 25-10-24 09:55:21, Amir Goldstein wrote:
> On Sat, Aug 3, 2024 at 6:52 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > On Thu, Aug 1, 2024 at 6:31 PM Jan Kara <jack@suse.cz> wrote:
> > > On Thu 25-07-24 14:19:39, Josef Bacik wrote:
> > > > From: Amir Goldstein <amir73il@gmail.com>
> > > >
> > > > The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM,
> > > > but it meant for a different use case of filling file content before
> > > > access to a file range, so it has slightly different semantics.
> > > >
> > > > Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, same as
> > > > we did for FS_OPEN_PERM/FS_OPEN_EXEC_PERM.
> > > >
> > > > FS_PRE_MODIFY is a new permission event, with similar semantics as
> > > > FS_PRE_ACCESS, which is called before a file is modified.
> > > >
> > > > FS_ACCESS_PERM is reported also on blockdev and pipes, but the new
> > > > pre-content events are only reported for regular files and dirs.
> > > >
> > > > The pre-content events are meant to be used by hierarchical storage
> > > > managers that want to fill the content of files on first access.
> > > >
> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > >
> > > The patch looks good. Just out of curiosity:
> > >
> > > > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> > > > index 8be029bc50b1..21e72b837ec5 100644
> > > > --- a/include/linux/fsnotify_backend.h
> > > > +++ b/include/linux/fsnotify_backend.h
> > > > @@ -56,6 +56,9 @@
> > > >  #define FS_ACCESS_PERM               0x00020000      /* access event in a permissions hook */
> > > >  #define FS_OPEN_EXEC_PERM    0x00040000      /* open/exec event in a permission hook */
> > > >
> > > > +#define FS_PRE_ACCESS                0x00100000      /* Pre-content access hook */
> > > > +#define FS_PRE_MODIFY                0x00200000      /* Pre-content modify hook */
> > >
> > > Why is a hole left here in the flag space?
> >
> > Can't remember.
> >
> > Currently we have a draft design for two more events
> > FS_PATH_ACCESS, FS_PATH_MODIFY
> > https://github.com/amir73il/man-pages/commits/fan_pre_path
> >
> > So might have been a desire to keep the pre-events group on the nibble.
> 
> Funny story.
> 
> I straced a program with latest FS_PRE_ACCESS (0x00080000) and
> see what I got:
> 
> fanotify_mark(3, FAN_MARK_ADD|FAN_MARK_MOUNT,
> FAN_CLOSE_WRITE|FAN_OPEN_PERM|FAN_ACCESS_PERM|FAN_DIR_MODIFY|FAN_ONDIR,
> AT_FDCWD, "/vdd") = 0
> 
> "FAN_DIR_MODIFY"! a blast from the past [1]
> 
> It would have been nice if we reserved 0x00080000 for FAN_PATH_MODIFY [2]
> to be a bit less confusing for users with old strace.
> 
> WDYT?

Yeah, reusing that bit for something semantically close would reduce some
confusion. But realistically I don't think FAN_DIR_MODIFY go wide use when
it was never supported in a released upstream kernel.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/10] fsnotify: introduce pre-content permission event
  2024-10-25 13:09         ` Jan Kara
@ 2024-10-25 13:39           ` Amir Goldstein
  2024-10-26  6:58             ` Amir Goldstein
  0 siblings, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2024-10-25 13:39 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, kernel-team, linux-fsdevel, brauner

On Fri, Oct 25, 2024 at 3:09 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 25-10-24 09:55:21, Amir Goldstein wrote:
> > On Sat, Aug 3, 2024 at 6:52 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > On Thu, Aug 1, 2024 at 6:31 PM Jan Kara <jack@suse.cz> wrote:
> > > > On Thu 25-07-24 14:19:39, Josef Bacik wrote:
> > > > > From: Amir Goldstein <amir73il@gmail.com>
> > > > >
> > > > > The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM,
> > > > > but it meant for a different use case of filling file content before
> > > > > access to a file range, so it has slightly different semantics.
> > > > >
> > > > > Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, same as
> > > > > we did for FS_OPEN_PERM/FS_OPEN_EXEC_PERM.
> > > > >
> > > > > FS_PRE_MODIFY is a new permission event, with similar semantics as
> > > > > FS_PRE_ACCESS, which is called before a file is modified.
> > > > >
> > > > > FS_ACCESS_PERM is reported also on blockdev and pipes, but the new
> > > > > pre-content events are only reported for regular files and dirs.
> > > > >
> > > > > The pre-content events are meant to be used by hierarchical storage
> > > > > managers that want to fill the content of files on first access.
> > > > >
> > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > >
> > > > The patch looks good. Just out of curiosity:
> > > >
> > > > > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> > > > > index 8be029bc50b1..21e72b837ec5 100644
> > > > > --- a/include/linux/fsnotify_backend.h
> > > > > +++ b/include/linux/fsnotify_backend.h
> > > > > @@ -56,6 +56,9 @@
> > > > >  #define FS_ACCESS_PERM               0x00020000      /* access event in a permissions hook */
> > > > >  #define FS_OPEN_EXEC_PERM    0x00040000      /* open/exec event in a permission hook */
> > > > >
> > > > > +#define FS_PRE_ACCESS                0x00100000      /* Pre-content access hook */
> > > > > +#define FS_PRE_MODIFY                0x00200000      /* Pre-content modify hook */
> > > >
> > > > Why is a hole left here in the flag space?
> > >
> > > Can't remember.
> > >
> > > Currently we have a draft design for two more events
> > > FS_PATH_ACCESS, FS_PATH_MODIFY
> > > https://github.com/amir73il/man-pages/commits/fan_pre_path
> > >
> > > So might have been a desire to keep the pre-events group on the nibble.
> >
> > Funny story.
> >
> > I straced a program with latest FS_PRE_ACCESS (0x00080000) and
> > see what I got:
> >
> > fanotify_mark(3, FAN_MARK_ADD|FAN_MARK_MOUNT,
> > FAN_CLOSE_WRITE|FAN_OPEN_PERM|FAN_ACCESS_PERM|FAN_DIR_MODIFY|FAN_ONDIR,
> > AT_FDCWD, "/vdd") = 0
> >
> > "FAN_DIR_MODIFY"! a blast from the past [1]
> >
> > It would have been nice if we reserved 0x00080000 for FAN_PATH_MODIFY [2]
> > to be a bit less confusing for users with old strace.
> >
> > WDYT?
>
> Yeah, reusing that bit for something semantically close would reduce some
> confusion. But realistically I don't think FAN_DIR_MODIFY go wide use when
> it was never supported in a released upstream kernel.

No, but its legacy lives in strace forever...

Anyway I included a patch in my fan_pre_access branch to reserve this bit
because what have we got to loose..

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/10] fsnotify: introduce pre-content permission event
  2024-10-25 13:39           ` Amir Goldstein
@ 2024-10-26  6:58             ` Amir Goldstein
  2024-10-31 12:47               ` Jan Kara
  0 siblings, 1 reply; 47+ messages in thread
From: Amir Goldstein @ 2024-10-26  6:58 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, kernel-team, linux-fsdevel, brauner

On Fri, Oct 25, 2024 at 3:39 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Fri, Oct 25, 2024 at 3:09 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Fri 25-10-24 09:55:21, Amir Goldstein wrote:
> > > On Sat, Aug 3, 2024 at 6:52 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > On Thu, Aug 1, 2024 at 6:31 PM Jan Kara <jack@suse.cz> wrote:
> > > > > On Thu 25-07-24 14:19:39, Josef Bacik wrote:
> > > > > > From: Amir Goldstein <amir73il@gmail.com>
> > > > > >
> > > > > > The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM,
> > > > > > but it meant for a different use case of filling file content before
> > > > > > access to a file range, so it has slightly different semantics.
> > > > > >
> > > > > > Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, same as
> > > > > > we did for FS_OPEN_PERM/FS_OPEN_EXEC_PERM.
> > > > > >
> > > > > > FS_PRE_MODIFY is a new permission event, with similar semantics as
> > > > > > FS_PRE_ACCESS, which is called before a file is modified.
> > > > > >
> > > > > > FS_ACCESS_PERM is reported also on blockdev and pipes, but the new
> > > > > > pre-content events are only reported for regular files and dirs.
> > > > > >
> > > > > > The pre-content events are meant to be used by hierarchical storage
> > > > > > managers that want to fill the content of files on first access.
> > > > > >
> > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > >
> > > > > The patch looks good. Just out of curiosity:
> > > > >
> > > > > > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> > > > > > index 8be029bc50b1..21e72b837ec5 100644
> > > > > > --- a/include/linux/fsnotify_backend.h
> > > > > > +++ b/include/linux/fsnotify_backend.h
> > > > > > @@ -56,6 +56,9 @@
> > > > > >  #define FS_ACCESS_PERM               0x00020000      /* access event in a permissions hook */
> > > > > >  #define FS_OPEN_EXEC_PERM    0x00040000      /* open/exec event in a permission hook */
> > > > > >
> > > > > > +#define FS_PRE_ACCESS                0x00100000      /* Pre-content access hook */
> > > > > > +#define FS_PRE_MODIFY                0x00200000      /* Pre-content modify hook */
> > > > >
> > > > > Why is a hole left here in the flag space?
> > > >
> > > > Can't remember.
> > > >
> > > > Currently we have a draft design for two more events
> > > > FS_PATH_ACCESS, FS_PATH_MODIFY
> > > > https://github.com/amir73il/man-pages/commits/fan_pre_path
> > > >
> > > > So might have been a desire to keep the pre-events group on the nibble.
> > >
> > > Funny story.
> > >
> > > I straced a program with latest FS_PRE_ACCESS (0x00080000) and
> > > see what I got:
> > >
> > > fanotify_mark(3, FAN_MARK_ADD|FAN_MARK_MOUNT,
> > > FAN_CLOSE_WRITE|FAN_OPEN_PERM|FAN_ACCESS_PERM|FAN_DIR_MODIFY|FAN_ONDIR,
> > > AT_FDCWD, "/vdd") = 0
> > >
> > > "FAN_DIR_MODIFY"! a blast from the past [1]
> > >
> > > It would have been nice if we reserved 0x00080000 for FAN_PATH_MODIFY [2]
> > > to be a bit less confusing for users with old strace.
> > >
> > > WDYT?
> >
> > Yeah, reusing that bit for something semantically close would reduce some
> > confusion. But realistically I don't think FAN_DIR_MODIFY go wide use when
> > it was never supported in a released upstream kernel.
>
> No, but its legacy lives in strace forever...
>

Speaking of legacy events, you will notice that in the fan_pre_access
branch I swapped the order of FS_PRE_ACCESS to be generated
before FS_ACCESS_PERM.

It is a semantic difference that probably does not matter much in practice,
but I justified it as "need to fill the content before content can be inspected"
because FS_ACCESS_PERM is the legacy Anti-malware event.

This order is also aligned with the priority group associated with those
events (PRE_CONTENT before CONTENT).

But from a wider POV, my feeling is that FS_ACCESS_PERM is not
really used by anyone and it is baggage that we need to try to get rid of.
It is not worth the bloat of the inlined fsnotify_file_area_perm() hook.
It is not worth the wasted cycles in the __fsnotify_parent() call that will
not be optimized when there is any high priority group listener on the sb.

I am tempted to try and combine the PRE/PERM access events into
a single event and make sure that no fanotify group can subscribe to
both of them at the same time, so a combined event can never be seen,
but it is not very easy to rationalize this API.

For example, if we would have required FAN_REPORT_RANGE init flag
for subscribing to FAN_PRE_ACCESS, then we could have denied the legacy
FAN_ACCESS_PERM in this group, but I don't think that we want to do that (?).

WDYT? Am I overthinking again?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/10] fsnotify: introduce pre-content permission event
  2024-10-26  6:58             ` Amir Goldstein
@ 2024-10-31 12:47               ` Jan Kara
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2024-10-31 12:47 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Josef Bacik, kernel-team, linux-fsdevel, brauner

On Sat 26-10-24 08:58:47, Amir Goldstein wrote:
> On Fri, Oct 25, 2024 at 3:39 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Fri, Oct 25, 2024 at 3:09 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Fri 25-10-24 09:55:21, Amir Goldstein wrote:
> > > > On Sat, Aug 3, 2024 at 6:52 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > On Thu, Aug 1, 2024 at 6:31 PM Jan Kara <jack@suse.cz> wrote:
> > > > > > On Thu 25-07-24 14:19:39, Josef Bacik wrote:
> > > > > > > From: Amir Goldstein <amir73il@gmail.com>
> > > > > > >
> > > > > > > The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM,
> > > > > > > but it meant for a different use case of filling file content before
> > > > > > > access to a file range, so it has slightly different semantics.
> > > > > > >
> > > > > > > Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, same as
> > > > > > > we did for FS_OPEN_PERM/FS_OPEN_EXEC_PERM.
> > > > > > >
> > > > > > > FS_PRE_MODIFY is a new permission event, with similar semantics as
> > > > > > > FS_PRE_ACCESS, which is called before a file is modified.
> > > > > > >
> > > > > > > FS_ACCESS_PERM is reported also on blockdev and pipes, but the new
> > > > > > > pre-content events are only reported for regular files and dirs.
> > > > > > >
> > > > > > > The pre-content events are meant to be used by hierarchical storage
> > > > > > > managers that want to fill the content of files on first access.
> > > > > > >
> > > > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > > > >
> > > > > > The patch looks good. Just out of curiosity:
> > > > > >
> > > > > > > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> > > > > > > index 8be029bc50b1..21e72b837ec5 100644
> > > > > > > --- a/include/linux/fsnotify_backend.h
> > > > > > > +++ b/include/linux/fsnotify_backend.h
> > > > > > > @@ -56,6 +56,9 @@
> > > > > > >  #define FS_ACCESS_PERM               0x00020000      /* access event in a permissions hook */
> > > > > > >  #define FS_OPEN_EXEC_PERM    0x00040000      /* open/exec event in a permission hook */
> > > > > > >
> > > > > > > +#define FS_PRE_ACCESS                0x00100000      /* Pre-content access hook */
> > > > > > > +#define FS_PRE_MODIFY                0x00200000      /* Pre-content modify hook */
> > > > > >
> > > > > > Why is a hole left here in the flag space?
> > > > >
> > > > > Can't remember.
> > > > >
> > > > > Currently we have a draft design for two more events
> > > > > FS_PATH_ACCESS, FS_PATH_MODIFY
> > > > > https://github.com/amir73il/man-pages/commits/fan_pre_path
> > > > >
> > > > > So might have been a desire to keep the pre-events group on the nibble.
> > > >
> > > > Funny story.
> > > >
> > > > I straced a program with latest FS_PRE_ACCESS (0x00080000) and
> > > > see what I got:
> > > >
> > > > fanotify_mark(3, FAN_MARK_ADD|FAN_MARK_MOUNT,
> > > > FAN_CLOSE_WRITE|FAN_OPEN_PERM|FAN_ACCESS_PERM|FAN_DIR_MODIFY|FAN_ONDIR,
> > > > AT_FDCWD, "/vdd") = 0
> > > >
> > > > "FAN_DIR_MODIFY"! a blast from the past [1]
> > > >
> > > > It would have been nice if we reserved 0x00080000 for FAN_PATH_MODIFY [2]
> > > > to be a bit less confusing for users with old strace.
> > > >
> > > > WDYT?
> > >
> > > Yeah, reusing that bit for something semantically close would reduce some
> > > confusion. But realistically I don't think FAN_DIR_MODIFY go wide use when
> > > it was never supported in a released upstream kernel.
> >
> > No, but its legacy lives in strace forever...
> >
> 
> Speaking of legacy events, you will notice that in the fan_pre_access
> branch I swapped the order of FS_PRE_ACCESS to be generated
> before FS_ACCESS_PERM.
> 
> It is a semantic difference that probably does not matter much in practice,
> but I justified it as "need to fill the content before content can be inspected"
> because FS_ACCESS_PERM is the legacy Anti-malware event.
> 
> This order is also aligned with the priority group associated with those
> events (PRE_CONTENT before CONTENT).

Yes, I've noticed this and it makes sense. Thanks for the expanded
rationale.

> But from a wider POV, my feeling is that FS_ACCESS_PERM is not
> really used by anyone and it is baggage that we need to try to get rid of.
> It is not worth the bloat of the inlined fsnotify_file_area_perm() hook.
> It is not worth the wasted cycles in the __fsnotify_parent() call that will
> not be optimized when there is any high priority group listener on the sb.
> 
> I am tempted to try and combine the PRE/PERM access events into
> a single event and make sure that no fanotify group can subscribe to
> both of them at the same time, so a combined event can never be seen,
> but it is not very easy to rationalize this API.
> 
> For example, if we would have required FAN_REPORT_RANGE init flag
> for subscribing to FAN_PRE_ACCESS, then we could have denied the legacy
> FAN_ACCESS_PERM in this group, but I don't think that we want to do that (?).

Yeah, this would look a bit weird in the API. If you really think that
FAN_ACCESS_PERM is dead (which it may well be but I would not bet on it),
then we could start a deprecation period for it and if nobody comes back to
us saying they still use it, we can then remove it from the kernel
altogether.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2024-10-31 12:48 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
2024-07-25 18:19 ` [PATCH 01/10] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
2024-08-01 17:59   ` Jan Kara
2024-07-25 18:19 ` [PATCH 02/10] fsnotify: introduce pre-content permission event Josef Bacik
2024-08-01 16:31   ` Jan Kara
2024-08-03 16:52     ` Amir Goldstein
2024-10-25  7:55       ` Amir Goldstein
2024-10-25 13:09         ` Jan Kara
2024-10-25 13:39           ` Amir Goldstein
2024-10-26  6:58             ` Amir Goldstein
2024-10-31 12:47               ` Jan Kara
2024-07-25 18:19 ` [PATCH 03/10] fsnotify: generate pre-content permission event on open Josef Bacik
2024-08-01 17:01   ` Jan Kara
2024-08-03 16:53     ` Amir Goldstein
2024-07-25 18:19 ` [PATCH 04/10] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
2024-08-01 17:04   ` Jan Kara
2024-07-25 18:19 ` [PATCH 05/10] fanotify: introduce FAN_PRE_MODIFY " Josef Bacik
2024-08-01 17:09   ` Jan Kara
2024-08-03 16:55     ` Amir Goldstein
2024-08-05 11:18       ` Jan Kara
2024-07-25 18:19 ` [PATCH 06/10] fanotify: pass optional file access range in pre-content event Josef Bacik
2024-08-01 17:16   ` Jan Kara
2024-08-03 17:00     ` Amir Goldstein
2024-08-05 11:20       ` Jan Kara
2024-07-25 18:19 ` [PATCH 07/10] fanotify: rename a misnamed constant Josef Bacik
2024-08-01 17:19   ` Jan Kara
2024-08-01 17:23     ` Jan Kara
2024-07-25 18:19 ` [PATCH 08/10] fanotify: report file range info with pre-content events Josef Bacik
2024-08-01 17:38   ` Jan Kara
2024-10-24 10:06     ` Amir Goldstein
2024-10-24 16:35       ` Jan Kara
2024-10-24 16:49         ` Amir Goldstein
2024-10-24 16:56           ` Jan Kara
2024-07-25 18:19 ` [PATCH 09/10] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
2024-08-01 21:14   ` Jan Kara
2024-08-03 17:06     ` Amir Goldstein
2024-07-25 18:19 ` [PATCH 10/10] fsnotify: generate pre-content permission event on page fault Josef Bacik
2024-07-25 20:19   ` Amir Goldstein
2024-07-29 17:11     ` Josef Bacik
2024-07-29 18:57       ` Amir Goldstein
2024-07-30 12:18         ` Jan Kara
2024-07-30 16:51           ` Josef Bacik
2024-08-01 21:34             ` Jan Kara
2024-08-01 21:40   ` Jan Kara
2024-08-02 16:03     ` Josef Bacik
2024-08-05 12:13       ` Jan Kara
2024-08-07 19:04         ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).