* [PATCH v5 00/18] fanotify: add pre-content hooks
@ 2024-09-04 20:27 Josef Bacik
2024-09-04 20:27 ` [PATCH v5 01/18] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
` (19 more replies)
0 siblings, 20 replies; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:27 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
v4: https://lore.kernel.org/linux-fsdevel/cover.1723670362.git.josef@toxicpanda.com/
v3: https://lore.kernel.org/linux-fsdevel/cover.1723228772.git.josef@toxicpanda.com/
v2: https://lore.kernel.org/linux-fsdevel/cover.1723144881.git.josef@toxicpanda.com/
v1: https://lore.kernel.org/linux-fsdevel/cover.1721931241.git.josef@toxicpanda.com/
v4->v5:
- Cleaned up the various "I'll fix it on commit" notes that Jan made since I had
to respin the series anyway.
- Renamed the filemap pagefault helper for fsnotify per Christians suggestion.
- Added a FS_ALLOW_HSM flag per Jan's comments, based on Amir's rough sketch.
- Added a patch to disable btrfs defrag on pre-content watched files.
- Added a patch to turn on FS_ALLOW_HSM for all the file systems that I tested.
- Added two fstests (which will be posted separately) to validate everything,
re-validated the series with btrfs, xfs, ext4, and bcachefs to make sure I
didn't break anything.
v3->v4:
- Trying to send a final verson Friday at 5pm before you go on vacation is a
recipe for silly mistakes, fixed the xfs handling yet again, per Christoph's
review.
- Reworked the file system helper so it's handling of fpin was a little less
silly, per Chinner's suggestion.
- Updated the return values to not or in VM_FAULT_RETRY, as we have a comment
in filemap_fault that says if VM_FAULT_ERROR is set we won't have
VM_FAULT_RETRY set.
v2->v3:
- Fix the pagefault path to do MAY_ACCESS instead, updated the perm handler to
emit PRE_ACCESS in this case, so we can avoid the extraneous perm event as per
Amir's suggestion.
- Reworked the exported helper so the per-filesystem changes are much smaller,
per Amir's suggestion.
- Fixed the screwup for DAX writes per Chinner's suggestion.
- Added Christian's reviewed-by's where appropriate.
v1->v2:
- reworked the page fault logic based on Jan's suggestion and turned it into a
helper.
- Added 3 patches per-fs where we need to call the fsnotify helper from their
->fault handlers.
- Disabled readahead in the case that there's a pre-content watch in place.
- Disabled huge faults when there's a pre-content watch in place (entirely
because it's untested, theoretically it should be straightforward to do).
- Updated the command numbers.
- Addressed the random spelling/grammer mistakes that Jan pointed out.
- Addressed the other random nits from Jan.
--- Original email ---
Hello,
These are the patches for the bare bones pre-content fanotify support. The
majority of this work is Amir's, my contribution to this has solely been around
adding the page fault hooks, testing and validating everything. I'm sending it
because Amir is traveling a bunch, and I touched it last so I'm going to take
all the hate and he can take all the credit.
There is a PoC that I've been using to validate this work, you can find the git
repo here
https://github.com/josefbacik/remote-fetch
This consists of 3 different tools.
1. populate. This just creates all the stub files in the directory from the
source directory. Just run ./populate ~/linux ~/hsm-linux and it'll
recursively create all of the stub files and directories.
2. remote-fetch. This is the actual PoC, you just point it at the source and
destination directory and then you can do whatever. ./remote-fetch ~/linux
~/hsm-linux.
3. mmap-validate. This was to validate the pagefault thing, this is likely what
will be turned into the selftest with remote-fetch. It creates a file and
then you can validate the file matches the right pattern with both normal
reads and mmap. Normally I do something like
./mmap-validate create ~/src/foo
./populate ~/src ~/dst
./rmeote-fetch ~/src ~/dst
./mmap-validate validate ~/dst/foo
I did a bunch of testing, I also got some performance numbers. I copied a
kernel tree, and then did remote-fetch, and then make -j4
Normal
real 9m49.709s
user 28m11.372s
sys 4m57.304s
HSM
real 10m6.454s
user 29m10.517s
sys 5m2.617s
So ~17 seconds more to build with HSM. I then did a make mrproper on both trees
to see the size
[root@fedora ~]# du -hs /src/linux
1.6G /src/linux
[root@fedora ~]# du -hs dst
125M dst
This mirrors the sort of savings we've seen in production.
Meta has had these patches (minus the page fault patch) deployed in production
for almost a year with our own utility for doing on-demand package fetching.
The savings from this has been pretty significant.
The page-fault hooks are necessary for the last thing we need, which is
on-demand range fetching of executables. Some of our binaries are several gigs
large, having the ability to remote fetch them on demand is a huge win for us
not only with space savings, but with startup time of containers.
There will be tests for this going into LTP once we're satisfied with the
patches and they're on their way upstream. Thanks,
Josef
Amir Goldstein (8):
fsnotify: introduce pre-content permission event
fsnotify: generate pre-content permission event on open
fanotify: introduce FAN_PRE_ACCESS permission event
fanotify: introduce FAN_PRE_MODIFY permission event
fanotify: pass optional file access range in pre-content event
fanotify: rename a misnamed constant
fanotify: report file range info with pre-content events
fanotify: allow to set errno in FAN_DENY permission response
Josef Bacik (10):
fanotify: don't skip extra event info if no info_mode is set
fs: add a flag to indicate the fs supports pre-content events
fanotify: add a helper to check for pre content events
fanotify: disable readahead if we have pre-content watches
mm: don't allow huge faults for files with pre content watches
fsnotify: generate pre-content permission event on page fault
bcachefs: add pre-content fsnotify hook to fault
xfs: add pre-content fsnotify hook for write faults
btrfs: disable defrag on pre-content watched files
fs: enable pre-content events on supported file systems
fs/bcachefs/fs-io-pagecache.c | 4 +
fs/bcachefs/fs.c | 2 +-
fs/btrfs/ioctl.c | 9 ++
fs/btrfs/super.c | 3 +-
fs/ext4/super.c | 6 +-
fs/namei.c | 9 ++
fs/notify/fanotify/fanotify.c | 33 ++++++--
fs/notify/fanotify/fanotify.h | 15 ++++
fs/notify/fanotify/fanotify_user.c | 119 ++++++++++++++++++++++-----
fs/notify/fsnotify.c | 17 +++-
fs/xfs/xfs_file.c | 4 +
fs/xfs/xfs_super.c | 2 +-
include/linux/fanotify.h | 20 +++--
include/linux/fs.h | 1 +
include/linux/fsnotify.h | 58 +++++++++++--
include/linux/fsnotify_backend.h | 59 ++++++++++++-
include/linux/mm.h | 1 +
include/uapi/linux/fanotify.h | 18 ++++
mm/filemap.c | 128 +++++++++++++++++++++++++++--
mm/memory.c | 22 +++++
mm/readahead.c | 13 +++
security/selinux/hooks.c | 3 +-
22 files changed, 489 insertions(+), 57 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v5 01/18] fanotify: don't skip extra event info if no info_mode is set
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
@ 2024-09-04 20:27 ` Josef Bacik
2024-09-05 7:49 ` Amir Goldstein
2024-09-04 20:27 ` [PATCH v5 02/18] fsnotify: introduce pre-content permission event Josef Bacik
` (18 subsequent siblings)
19 siblings, 1 reply; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:27 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
New pre-content events will be path events but they will also carry
additional range information. Remove the optimization to skip checking
whether info structures need to be generated for path events. This
results in no change in generated info structures for existing events.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/notify/fanotify/fanotify_user.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9ec313e9f6e1..2e2fba8a9d20 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -160,9 +160,6 @@ static size_t fanotify_event_len(unsigned int info_mode,
int fh_len;
int dot_len = 0;
- if (!info_mode)
- return event_len;
-
if (fanotify_is_error_event(event->mask))
event_len += FANOTIFY_ERROR_INFO_LEN;
@@ -740,12 +737,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
if (fanotify_is_perm_event(event->mask))
FANOTIFY_PERM(event)->fd = fd;
- if (info_mode) {
- ret = copy_info_records_to_user(event, info, info_mode, pidfd,
- buf, count);
- if (ret < 0)
- goto out_close_fd;
- }
+ ret = copy_info_records_to_user(event, info, info_mode, pidfd,
+ buf, count);
+ if (ret < 0)
+ goto out_close_fd;
if (f)
fd_install(fd, f);
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 02/18] fsnotify: introduce pre-content permission event
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
2024-09-04 20:27 ` [PATCH v5 01/18] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
@ 2024-09-04 20:27 ` Josef Bacik
2024-09-04 20:27 ` [PATCH v5 03/18] fsnotify: generate pre-content permission event on open Josef Bacik
` (17 subsequent siblings)
19 siblings, 0 replies; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:27 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
From: Amir Goldstein <amir73il@gmail.com>
The new FS_PRE_ACCESS permission event is similar to FS_ACCESS_PERM,
but it meant for a different use case of filling file content before
access to a file range, so it has slightly different semantics.
Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events, same as
we did for FS_OPEN_PERM/FS_OPEN_EXEC_PERM.
FS_PRE_MODIFY is a new permission event, with similar semantics as
FS_PRE_ACCESS, which is called before a file is modified.
FS_ACCESS_PERM is reported also on blockdev and pipes, but the new
pre-content events are only reported for regular files and dirs.
The pre-content events are meant to be used by hierarchical storage
managers that want to fill the content of files on first access.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/notify/fsnotify.c | 2 +-
include/linux/fsnotify.h | 27 ++++++++++++++++++++++++---
include/linux/fsnotify_backend.h | 13 +++++++++++--
security/selinux/hooks.c | 3 ++-
4 files changed, 38 insertions(+), 7 deletions(-)
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 272c8a1dab3c..1ca4a8da7f29 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -621,7 +621,7 @@ static __init int fsnotify_init(void)
{
int ret;
- BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 23);
+ BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25);
ret = init_srcu_struct(&fsnotify_mark_srcu);
if (ret)
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 278620e063ab..7600a0c045ba 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -133,12 +133,13 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
/*
- * fsnotify_file_area_perm - permission hook before access to file range
+ * fsnotify_file_area_perm - permission hook before access/modify of file range
*/
static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
const loff_t *ppos, size_t count)
{
- __u32 fsnotify_mask = FS_ACCESS_PERM;
+ struct inode *inode = file_inode(file);
+ __u32 fsnotify_mask;
/*
* filesystem may be modified in the context of permission events
@@ -147,7 +148,27 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
*/
lockdep_assert_once(file_write_not_started(file));
- if (!(perm_mask & MAY_READ))
+ /*
+ * Generate FS_PRE_ACCESS/FS_ACCESS_PERM as two seperate events.
+ */
+ if (perm_mask & MAY_READ) {
+ int ret = fsnotify_file(file, FS_ACCESS_PERM);
+
+ if (ret)
+ return ret;
+ }
+
+ /*
+ * Pre-content events are only reported for regular files and dirs.
+ */
+ if (!S_ISDIR(inode->i_mode) && !S_ISREG(inode->i_mode))
+ return 0;
+
+ if (perm_mask & MAY_WRITE)
+ fsnotify_mask = FS_PRE_MODIFY;
+ else if (perm_mask & (MAY_READ | MAY_ACCESS))
+ fsnotify_mask = FS_PRE_ACCESS;
+ else
return 0;
return fsnotify_file(file, fsnotify_mask);
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 8be029bc50b1..200a5e3b1cd4 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -56,6 +56,9 @@
#define FS_ACCESS_PERM 0x00020000 /* access event in a permissions hook */
#define FS_OPEN_EXEC_PERM 0x00040000 /* open/exec event in a permission hook */
+#define FS_PRE_ACCESS 0x00080000 /* Pre-content access hook */
+#define FS_PRE_MODIFY 0x00100000 /* Pre-content modify hook */
+
/*
* Set on inode mark that cares about things that happen to its children.
* Always set for dnotify and inotify.
@@ -77,8 +80,14 @@
*/
#define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE | FS_RENAME)
-#define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
- FS_OPEN_EXEC_PERM)
+/* Content events can be used to inspect file content */
+#define FSNOTIFY_CONTENT_PERM_EVENTS (FS_OPEN_PERM | FS_OPEN_EXEC_PERM | \
+ FS_ACCESS_PERM)
+/* Pre-content events can be used to fill file content */
+#define FSNOTIFY_PRE_CONTENT_EVENTS (FS_PRE_ACCESS | FS_PRE_MODIFY)
+
+#define ALL_FSNOTIFY_PERM_EVENTS (FSNOTIFY_CONTENT_PERM_EVENTS | \
+ FSNOTIFY_PRE_CONTENT_EVENTS)
/*
* This is a list of all events that may get sent to a parent that is watching
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 55c78c318ccd..2997edf3e7cd 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3406,7 +3406,8 @@ static int selinux_path_notify(const struct path *path, u64 mask,
perm |= FILE__WATCH_WITH_PERM;
/* watches on read-like events need the file:watch_reads permission */
- if (mask & (FS_ACCESS | FS_ACCESS_PERM | FS_CLOSE_NOWRITE))
+ if (mask & (FS_ACCESS | FS_ACCESS_PERM | FS_PRE_ACCESS |
+ FS_CLOSE_NOWRITE))
perm |= FILE__WATCH_READS;
return path_has_perm(current_cred(), path, perm);
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 03/18] fsnotify: generate pre-content permission event on open
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
2024-09-04 20:27 ` [PATCH v5 01/18] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
2024-09-04 20:27 ` [PATCH v5 02/18] fsnotify: introduce pre-content permission event Josef Bacik
@ 2024-09-04 20:27 ` Josef Bacik
2024-10-24 13:06 ` Amir Goldstein
2024-09-04 20:27 ` [PATCH v5 04/18] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
` (16 subsequent siblings)
19 siblings, 1 reply; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:27 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
From: Amir Goldstein <amir73il@gmail.com>
FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on open depending on
file open mode. The pre-content event will be generated in addition to
FS_OPEN_PERM, but without sb_writers held and after file was truncated
in case file was opened with O_CREAT and/or O_TRUNC.
The event will have a range info of (0..0) to provide an opportunity
to fill entire file content on open.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
---
fs/namei.c | 9 +++++++++
include/linux/fsnotify.h | 10 +++++++++-
2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/namei.c b/fs/namei.c
index 3a4c40e12f78..c16487e3742d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3735,6 +3735,15 @@ static int do_open(struct nameidata *nd,
}
if (do_truncate)
mnt_drop_write(nd->path.mnt);
+
+ /*
+ * This permission hook is different than fsnotify_open_perm() hook.
+ * This is a pre-content hook that is called without sb_writers held
+ * and after the file was truncated.
+ */
+ if (!error)
+ error = fsnotify_file_perm(file, MAY_OPEN);
+
return error;
}
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 7600a0c045ba..fb3837b8de4c 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -168,6 +168,10 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
fsnotify_mask = FS_PRE_MODIFY;
else if (perm_mask & (MAY_READ | MAY_ACCESS))
fsnotify_mask = FS_PRE_ACCESS;
+ else if (perm_mask & MAY_OPEN && file->f_mode & FMODE_WRITER)
+ fsnotify_mask = FS_PRE_MODIFY;
+ else if (perm_mask & MAY_OPEN)
+ fsnotify_mask = FS_PRE_ACCESS;
else
return 0;
@@ -176,10 +180,14 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
/*
* fsnotify_file_perm - permission hook before file access
+ *
+ * Called from read()/write() with perm_mask MAY_READ/MAY_WRITE.
+ * Called from open() with MAY_OPEN without sb_writers held and after the file
+ * was truncated. Note that this is a different event from fsnotify_open_perm().
*/
static inline int fsnotify_file_perm(struct file *file, int perm_mask)
{
- return fsnotify_file_area_perm(file, perm_mask, NULL, 0);
+ return fsnotify_file_area_perm(file, perm_mask, &file->f_pos, 0);
}
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 04/18] fanotify: introduce FAN_PRE_ACCESS permission event
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (2 preceding siblings ...)
2024-09-04 20:27 ` [PATCH v5 03/18] fsnotify: generate pre-content permission event on open Josef Bacik
@ 2024-09-04 20:27 ` Josef Bacik
2024-09-04 20:27 ` [PATCH v5 05/18] fanotify: introduce FAN_PRE_MODIFY " Josef Bacik
` (15 subsequent siblings)
19 siblings, 0 replies; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:27 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
From: Amir Goldstein <amir73il@gmail.com>
Similar to FAN_ACCESS_PERM permission event, but it is only allowed with
class FAN_CLASS_PRE_CONTENT and only allowed on regular files and dirs.
Unlike FAN_ACCESS_PERM, it is safe to write to the file being accessed
in the context of the event handler.
This pre-content event is meant to be used by hierarchical storage
managers that want to fill the content of files on first read access.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
---
fs/notify/fanotify/fanotify.c | 3 ++-
fs/notify/fanotify/fanotify_user.c | 17 ++++++++++++++---
include/linux/fanotify.h | 14 ++++++++++----
include/uapi/linux/fanotify.h | 2 ++
4 files changed, 28 insertions(+), 8 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 224bccaab4cc..7dac8e4486df 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -910,8 +910,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
BUILD_BUG_ON(FAN_RENAME != FS_RENAME);
+ BUILD_BUG_ON(FAN_PRE_ACCESS != FS_PRE_ACCESS);
- BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 21);
+ BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 22);
mask = fanotify_group_event_mask(group, iter_info, &match_mask,
mask, data, data_type, dir);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 2e2fba8a9d20..c294849e474f 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1628,6 +1628,7 @@ static int fanotify_events_supported(struct fsnotify_group *group,
unsigned int flags)
{
unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
+ bool is_dir = d_is_dir(path->dentry);
/* Strict validation of events in non-dir inode mask with v5.17+ APIs */
bool strict_dir_events = FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID) ||
(mask & FAN_RENAME) ||
@@ -1665,9 +1666,15 @@ static int fanotify_events_supported(struct fsnotify_group *group,
* but because we always allowed it, error only when using new APIs.
*/
if (strict_dir_events && mark_type == FAN_MARK_INODE &&
- !d_is_dir(path->dentry) && (mask & FANOTIFY_DIRONLY_EVENT_BITS))
+ !is_dir && (mask & FANOTIFY_DIRONLY_EVENT_BITS))
return -ENOTDIR;
+ /* Pre-content events are only supported on regular files and dirs */
+ if (mask & FANOTIFY_PRE_CONTENT_EVENTS) {
+ if (!is_dir && !d_is_reg(path->dentry))
+ return -EINVAL;
+ }
+
return 0;
}
@@ -1769,11 +1776,15 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
goto fput_and_out;
/*
- * Permission events require minimum priority FAN_CLASS_CONTENT.
+ * Permission events are not allowed for FAN_CLASS_NOTIF.
+ * Pre-content permission events are not allowed for FAN_CLASS_CONTENT.
*/
ret = -EINVAL;
if (mask & FANOTIFY_PERM_EVENTS &&
- group->priority < FSNOTIFY_PRIO_CONTENT)
+ group->priority == FSNOTIFY_PRIO_NORMAL)
+ goto fput_and_out;
+ else if (mask & FANOTIFY_PRE_CONTENT_EVENTS &&
+ group->priority == FSNOTIFY_PRIO_CONTENT)
goto fput_and_out;
if (mask & FAN_FS_ERROR &&
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 4f1c4f603118..5c811baf44d2 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -88,6 +88,16 @@
#define FANOTIFY_DIRENT_EVENTS (FAN_MOVE | FAN_CREATE | FAN_DELETE | \
FAN_RENAME)
+/* Content events can be used to inspect file content */
+#define FANOTIFY_CONTENT_PERM_EVENTS (FAN_OPEN_PERM | FAN_OPEN_EXEC_PERM | \
+ FAN_ACCESS_PERM)
+/* Pre-content events can be used to fill file content */
+#define FANOTIFY_PRE_CONTENT_EVENTS (FAN_PRE_ACCESS)
+
+/* Events that require a permission response from user */
+#define FANOTIFY_PERM_EVENTS (FANOTIFY_CONTENT_PERM_EVENTS | \
+ FANOTIFY_PRE_CONTENT_EVENTS)
+
/* Events that can be reported with event->fd */
#define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS)
@@ -103,10 +113,6 @@
FANOTIFY_INODE_EVENTS | \
FANOTIFY_ERROR_EVENTS)
-/* Events that require a permission response from user */
-#define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \
- FAN_OPEN_EXEC_PERM)
-
/* Extra flags that may be reported with event or control handling of events */
#define FANOTIFY_EVENT_FLAGS (FAN_EVENT_ON_CHILD | FAN_ONDIR)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index a37de58ca571..bcada21a3a2e 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -26,6 +26,8 @@
#define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */
#define FAN_OPEN_EXEC_PERM 0x00040000 /* File open/exec in perm check */
+#define FAN_PRE_ACCESS 0x00080000 /* Pre-content access hook */
+
#define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */
#define FAN_RENAME 0x10000000 /* File was renamed */
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 05/18] fanotify: introduce FAN_PRE_MODIFY permission event
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (3 preceding siblings ...)
2024-09-04 20:27 ` [PATCH v5 04/18] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
@ 2024-09-04 20:27 ` Josef Bacik
2024-09-04 20:27 ` [PATCH v5 06/18] fanotify: pass optional file access range in pre-content event Josef Bacik
` (14 subsequent siblings)
19 siblings, 0 replies; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:27 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
From: Amir Goldstein <amir73il@gmail.com>
Generate FAN_PRE_MODIFY permission event from fsnotify_file_perm()
pre-write hook to notify fanotify listeners on an intent to make
modification to a file.
Like FAN_PRE_ACCESS, it is only allowed with FAN_CLASS_PRE_CONTENT
and unlike FAN_MODIFY, it is only allowed on regular files.
Like FAN_PRE_ACCESS, it is generated without sb_start_write() held,
so it is safe to perform filesystem modifications in the context of
event handler.
This pre-content event is meant to be used by hierarchical storage
managers that want to fill the content of files on first write access.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
---
fs/notify/fanotify/fanotify.c | 3 ++-
fs/notify/fanotify/fanotify_user.c | 2 ++
include/linux/fanotify.h | 3 ++-
include/uapi/linux/fanotify.h | 1 +
4 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 7dac8e4486df..b163594843f5 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -911,8 +911,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
BUILD_BUG_ON(FAN_RENAME != FS_RENAME);
BUILD_BUG_ON(FAN_PRE_ACCESS != FS_PRE_ACCESS);
+ BUILD_BUG_ON(FAN_PRE_MODIFY != FS_PRE_MODIFY);
- BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 22);
+ BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 23);
mask = fanotify_group_event_mask(group, iter_info, &match_mask,
mask, data, data_type, dir);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index c294849e474f..3a7101544f30 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1673,6 +1673,8 @@ static int fanotify_events_supported(struct fsnotify_group *group,
if (mask & FANOTIFY_PRE_CONTENT_EVENTS) {
if (!is_dir && !d_is_reg(path->dentry))
return -EINVAL;
+ if (is_dir && mask & FAN_PRE_MODIFY)
+ return -EISDIR;
}
return 0;
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 5c811baf44d2..ae6cb2688d52 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -92,7 +92,8 @@
#define FANOTIFY_CONTENT_PERM_EVENTS (FAN_OPEN_PERM | FAN_OPEN_EXEC_PERM | \
FAN_ACCESS_PERM)
/* Pre-content events can be used to fill file content */
-#define FANOTIFY_PRE_CONTENT_EVENTS (FAN_PRE_ACCESS)
+#define FANOTIFY_PRE_CONTENT_EVENTS (FAN_PRE_ACCESS | FAN_PRE_MODIFY)
+#define FANOTIFY_PRE_MODIFY_EVENTS (FAN_PRE_MODIFY)
/* Events that require a permission response from user */
#define FANOTIFY_PERM_EVENTS (FANOTIFY_CONTENT_PERM_EVENTS | \
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index bcada21a3a2e..ac00fad66416 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -27,6 +27,7 @@
#define FAN_OPEN_EXEC_PERM 0x00040000 /* File open/exec in perm check */
#define FAN_PRE_ACCESS 0x00080000 /* Pre-content access hook */
+#define FAN_PRE_MODIFY 0x00100000 /* Pre-content modify hook */
#define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 06/18] fanotify: pass optional file access range in pre-content event
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (4 preceding siblings ...)
2024-09-04 20:27 ` [PATCH v5 05/18] fanotify: introduce FAN_PRE_MODIFY " Josef Bacik
@ 2024-09-04 20:27 ` Josef Bacik
2024-09-04 20:27 ` [PATCH v5 07/18] fanotify: rename a misnamed constant Josef Bacik
` (13 subsequent siblings)
19 siblings, 0 replies; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:27 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
From: Amir Goldstein <amir73il@gmail.com>
We would like to add file range information to pre-content events.
Pass a struct file_range with optional offset and length to event handler
along with pre-content permission event.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/notify/fanotify/fanotify.c | 10 ++++++++--
fs/notify/fanotify/fanotify.h | 2 ++
include/linux/fsnotify.h | 17 ++++++++++++++++-
include/linux/fsnotify_backend.h | 32 ++++++++++++++++++++++++++++++++
4 files changed, 58 insertions(+), 3 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index b163594843f5..4e8dce39fa8f 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -549,9 +549,13 @@ static struct fanotify_event *fanotify_alloc_path_event(const struct path *path,
return &pevent->fae;
}
-static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path,
+static struct fanotify_event *fanotify_alloc_perm_event(const void *data,
+ int data_type,
gfp_t gfp)
{
+ const struct path *path = fsnotify_data_path(data, data_type);
+ const struct file_range *range =
+ fsnotify_data_file_range(data, data_type);
struct fanotify_perm_event *pevent;
pevent = kmem_cache_alloc(fanotify_perm_event_cachep, gfp);
@@ -565,6 +569,8 @@ static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path,
pevent->hdr.len = 0;
pevent->state = FAN_EVENT_INIT;
pevent->path = *path;
+ pevent->ppos = range ? range->ppos : NULL;
+ pevent->count = range ? range->count : 0;
path_get(path);
return &pevent->fae;
@@ -802,7 +808,7 @@ static struct fanotify_event *fanotify_alloc_event(
old_memcg = set_active_memcg(group->memcg);
if (fanotify_is_perm_event(mask)) {
- event = fanotify_alloc_perm_event(path, gfp);
+ event = fanotify_alloc_perm_event(data, data_type, gfp);
} else if (fanotify_is_error_event(mask)) {
event = fanotify_alloc_error_event(group, fsid, data,
data_type, &hash);
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index e5ab33cae6a7..93598b7d5952 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -425,6 +425,8 @@ FANOTIFY_PE(struct fanotify_event *event)
struct fanotify_perm_event {
struct fanotify_event fae;
struct path path;
+ const loff_t *ppos; /* optional file range info */
+ size_t count;
u32 response; /* userspace answer to the event */
unsigned short state; /* state of the event */
int fd; /* fd we passed to userspace for this event */
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index fb3837b8de4c..9d001d328619 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -132,6 +132,21 @@ static inline int fsnotify_file(struct file *file, __u32 mask)
}
#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
+static inline int fsnotify_file_range(struct file *file, __u32 mask,
+ const loff_t *ppos, size_t count)
+{
+ struct file_range range;
+
+ if (file->f_mode & FMODE_NONOTIFY)
+ return 0;
+
+ range.path = &file->f_path;
+ range.ppos = ppos;
+ range.count = count;
+ return fsnotify_parent(range.path->dentry, mask, &range,
+ FSNOTIFY_EVENT_FILE_RANGE);
+}
+
/*
* fsnotify_file_area_perm - permission hook before access/modify of file range
*/
@@ -175,7 +190,7 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
else
return 0;
- return fsnotify_file(file, fsnotify_mask);
+ return fsnotify_file_range(file, fsnotify_mask, ppos, count);
}
/*
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 200a5e3b1cd4..276320846bfd 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -298,6 +298,7 @@ static inline void fsnotify_group_assert_locked(struct fsnotify_group *group)
/* When calling fsnotify tell it if the data is a path or inode */
enum fsnotify_data_type {
FSNOTIFY_EVENT_NONE,
+ FSNOTIFY_EVENT_FILE_RANGE,
FSNOTIFY_EVENT_PATH,
FSNOTIFY_EVENT_INODE,
FSNOTIFY_EVENT_DENTRY,
@@ -310,6 +311,17 @@ struct fs_error_report {
struct super_block *sb;
};
+struct file_range {
+ const struct path *path;
+ const loff_t *ppos;
+ size_t count;
+};
+
+static inline const struct path *file_range_path(const struct file_range *range)
+{
+ return range->path;
+}
+
static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
{
switch (data_type) {
@@ -319,6 +331,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
return d_inode(data);
case FSNOTIFY_EVENT_PATH:
return d_inode(((const struct path *)data)->dentry);
+ case FSNOTIFY_EVENT_FILE_RANGE:
+ return d_inode(file_range_path(data)->dentry);
case FSNOTIFY_EVENT_ERROR:
return ((struct fs_error_report *)data)->inode;
default:
@@ -334,6 +348,8 @@ static inline struct dentry *fsnotify_data_dentry(const void *data, int data_typ
return (struct dentry *)data;
case FSNOTIFY_EVENT_PATH:
return ((const struct path *)data)->dentry;
+ case FSNOTIFY_EVENT_FILE_RANGE:
+ return file_range_path(data)->dentry;
default:
return NULL;
}
@@ -345,6 +361,8 @@ static inline const struct path *fsnotify_data_path(const void *data,
switch (data_type) {
case FSNOTIFY_EVENT_PATH:
return data;
+ case FSNOTIFY_EVENT_FILE_RANGE:
+ return file_range_path(data);
default:
return NULL;
}
@@ -360,6 +378,8 @@ static inline struct super_block *fsnotify_data_sb(const void *data,
return ((struct dentry *)data)->d_sb;
case FSNOTIFY_EVENT_PATH:
return ((const struct path *)data)->dentry->d_sb;
+ case FSNOTIFY_EVENT_FILE_RANGE:
+ return file_range_path(data)->dentry->d_sb;
case FSNOTIFY_EVENT_ERROR:
return ((struct fs_error_report *) data)->sb;
default:
@@ -379,6 +399,18 @@ static inline struct fs_error_report *fsnotify_data_error_report(
}
}
+static inline const struct file_range *fsnotify_data_file_range(
+ const void *data,
+ int data_type)
+{
+ switch (data_type) {
+ case FSNOTIFY_EVENT_FILE_RANGE:
+ return (struct file_range *)data;
+ default:
+ return NULL;
+ }
+}
+
/*
* Index to merged marks iterator array that correlates to a type of watch.
* The type of watched object can be deduced from the iterator type, but not
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 07/18] fanotify: rename a misnamed constant
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (5 preceding siblings ...)
2024-09-04 20:27 ` [PATCH v5 06/18] fanotify: pass optional file access range in pre-content event Josef Bacik
@ 2024-09-04 20:27 ` Josef Bacik
2024-09-04 20:27 ` [PATCH v5 08/18] fanotify: report file range info with pre-content events Josef Bacik
` (12 subsequent siblings)
19 siblings, 0 replies; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:27 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
From: Amir Goldstein <amir73il@gmail.com>
FANOTIFY_PIDFD_INFO_HDR_LEN is not the length of the header.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
---
fs/notify/fanotify/fanotify_user.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 3a7101544f30..5ece186d5c50 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -119,7 +119,7 @@ struct kmem_cache *fanotify_perm_event_cachep __ro_after_init;
#define FANOTIFY_EVENT_ALIGN 4
#define FANOTIFY_FID_INFO_HDR_LEN \
(sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle))
-#define FANOTIFY_PIDFD_INFO_HDR_LEN \
+#define FANOTIFY_PIDFD_INFO_LEN \
sizeof(struct fanotify_event_info_pidfd)
#define FANOTIFY_ERROR_INFO_LEN \
(sizeof(struct fanotify_event_info_error))
@@ -174,14 +174,14 @@ static size_t fanotify_event_len(unsigned int info_mode,
dot_len = 1;
}
- if (info_mode & FAN_REPORT_PIDFD)
- event_len += FANOTIFY_PIDFD_INFO_HDR_LEN;
-
if (fanotify_event_has_object_fh(event)) {
fh_len = fanotify_event_object_fh_len(event);
event_len += fanotify_fid_info_len(fh_len, dot_len);
}
+ if (info_mode & FAN_REPORT_PIDFD)
+ event_len += FANOTIFY_PIDFD_INFO_LEN;
+
return event_len;
}
@@ -511,7 +511,7 @@ static int copy_pidfd_info_to_user(int pidfd,
size_t count)
{
struct fanotify_event_info_pidfd info = { };
- size_t info_len = FANOTIFY_PIDFD_INFO_HDR_LEN;
+ size_t info_len = FANOTIFY_PIDFD_INFO_LEN;
if (WARN_ON_ONCE(info_len > count))
return -EFAULT;
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 08/18] fanotify: report file range info with pre-content events
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (6 preceding siblings ...)
2024-09-04 20:27 ` [PATCH v5 07/18] fanotify: rename a misnamed constant Josef Bacik
@ 2024-09-04 20:27 ` Josef Bacik
2024-09-04 20:27 ` [PATCH v5 09/18] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
` (11 subsequent siblings)
19 siblings, 0 replies; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:27 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
From: Amir Goldstein <amir73il@gmail.com>
With group class FAN_CLASS_PRE_CONTENT, report offset and length info
along with FAN_PRE_ACCESS and FAN_PRE_MODIFY permission events.
This information is meant to be used by hierarchical storage managers
that want to fill partial content of files on first access to range.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/notify/fanotify/fanotify.h | 8 +++++++
fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++++
include/uapi/linux/fanotify.h | 8 +++++++
3 files changed, 54 insertions(+)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 93598b7d5952..7f06355afa1f 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -448,6 +448,14 @@ static inline bool fanotify_is_perm_event(u32 mask)
mask & FANOTIFY_PERM_EVENTS;
}
+static inline bool fanotify_event_has_access_range(struct fanotify_event *event)
+{
+ if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS))
+ return false;
+
+ return FANOTIFY_PERM(event)->ppos;
+}
+
static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse)
{
return container_of(fse, struct fanotify_event, fse);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 5ece186d5c50..ed56fe6f5ec7 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -123,6 +123,8 @@ struct kmem_cache *fanotify_perm_event_cachep __ro_after_init;
sizeof(struct fanotify_event_info_pidfd)
#define FANOTIFY_ERROR_INFO_LEN \
(sizeof(struct fanotify_event_info_error))
+#define FANOTIFY_RANGE_INFO_LEN \
+ (sizeof(struct fanotify_event_info_range))
static int fanotify_fid_info_len(int fh_len, int name_len)
{
@@ -182,6 +184,9 @@ static size_t fanotify_event_len(unsigned int info_mode,
if (info_mode & FAN_REPORT_PIDFD)
event_len += FANOTIFY_PIDFD_INFO_LEN;
+ if (fanotify_event_has_access_range(event))
+ event_len += FANOTIFY_RANGE_INFO_LEN;
+
return event_len;
}
@@ -526,6 +531,30 @@ static int copy_pidfd_info_to_user(int pidfd,
return info_len;
}
+static size_t copy_range_info_to_user(struct fanotify_event *event,
+ char __user *buf, int count)
+{
+ struct fanotify_perm_event *pevent = FANOTIFY_PERM(event);
+ struct fanotify_event_info_range info = { };
+ size_t info_len = FANOTIFY_RANGE_INFO_LEN;
+
+ if (WARN_ON_ONCE(info_len > count))
+ return -EFAULT;
+
+ if (WARN_ON_ONCE(!pevent->ppos))
+ return -EINVAL;
+
+ info.hdr.info_type = FAN_EVENT_INFO_TYPE_RANGE;
+ info.hdr.len = info_len;
+ info.offset = *(pevent->ppos);
+ info.count = pevent->count;
+
+ if (copy_to_user(buf, &info, info_len))
+ return -EFAULT;
+
+ return info_len;
+}
+
static int copy_info_records_to_user(struct fanotify_event *event,
struct fanotify_info *info,
unsigned int info_mode, int pidfd,
@@ -647,6 +676,15 @@ static int copy_info_records_to_user(struct fanotify_event *event,
total_bytes += ret;
}
+ if (fanotify_event_has_access_range(event)) {
+ ret = copy_range_info_to_user(event, buf, count);
+ if (ret < 0)
+ return ret;
+ buf += ret;
+ count -= ret;
+ total_bytes += ret;
+ }
+
return total_bytes;
}
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index ac00fad66416..6136e8a9f9f3 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -145,6 +145,7 @@ struct fanotify_event_metadata {
#define FAN_EVENT_INFO_TYPE_DFID 3
#define FAN_EVENT_INFO_TYPE_PIDFD 4
#define FAN_EVENT_INFO_TYPE_ERROR 5
+#define FAN_EVENT_INFO_TYPE_RANGE 6
/* Special info types for FAN_RENAME */
#define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME 10
@@ -191,6 +192,13 @@ struct fanotify_event_info_error {
__u32 error_count;
};
+struct fanotify_event_info_range {
+ struct fanotify_event_info_header hdr;
+ __u32 pad;
+ __u64 offset;
+ __u64 count;
+};
+
/*
* User space may need to record additional information about its decision.
* The extra information type records what kind of information is included.
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 09/18] fanotify: allow to set errno in FAN_DENY permission response
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (7 preceding siblings ...)
2024-09-04 20:27 ` [PATCH v5 08/18] fanotify: report file range info with pre-content events Josef Bacik
@ 2024-09-04 20:27 ` Josef Bacik
2024-09-04 20:28 ` [PATCH v5 10/18] fs: add a flag to indicate the fs supports pre-content events Josef Bacik
` (10 subsequent siblings)
19 siblings, 0 replies; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:27 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
From: Amir Goldstein <amir73il@gmail.com>
With FAN_DENY response, user trying to perform the filesystem operation
gets an error with errno set to EPERM.
It is useful for hierarchical storage management (HSM) service to be able
to deny access for reasons more diverse than EPERM, for example EAGAIN,
if HSM could retry the operation later.
Allow fanotify groups with priority FAN_CLASSS_PRE_CONTENT to responsd
to permission events with the response value FAN_DENY_ERRNO(errno),
instead of FAN_DENY to return a custom error.
Limit custom error values to errors expected on read(2)/write(2) and
open(2) of regular files. This list could be extended in the future.
Userspace can test for legitimate values of FAN_DENY_ERRNO(errno) by
writing a response to an fanotify group fd with a value of FAN_NOFD in
the fd field of the response.
The change in fanotify_response is backward compatible, because errno is
written in the high 8 bits of the 32bit response field and old kernels
reject respose value with high bits set.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/notify/fanotify/fanotify.c | 19 +++++++++++----
fs/notify/fanotify/fanotify.h | 5 ++++
fs/notify/fanotify/fanotify_user.c | 37 ++++++++++++++++++++++++++----
include/linux/fanotify.h | 5 +++-
include/uapi/linux/fanotify.h | 7 ++++++
5 files changed, 62 insertions(+), 11 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 4e8dce39fa8f..2969f45a08dc 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -224,7 +224,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
struct fanotify_perm_event *event,
struct fsnotify_iter_info *iter_info)
{
- int ret;
+ int ret, errno;
+ u32 decision;
pr_debug("%s: group=%p event=%p\n", __func__, group, event);
@@ -257,20 +258,28 @@ static int fanotify_get_response(struct fsnotify_group *group,
goto out;
}
+ decision = event->response &
+ (FANOTIFY_RESPONSE_ACCESS | FANOTIFY_RESPONSE_FLAGS);
/* userspace responded, convert to something usable */
- switch (event->response & FANOTIFY_RESPONSE_ACCESS) {
+ switch (decision & FANOTIFY_RESPONSE_ACCESS) {
case FAN_ALLOW:
ret = 0;
break;
case FAN_DENY:
+ /* Check custom errno from pre-content events */
+ errno = fanotify_get_response_errno(event->response);
+ if (errno) {
+ ret = -errno;
+ break;
+ }
+ fallthrough;
default:
ret = -EPERM;
}
/* Check if the response should be audited */
- if (event->response & FAN_AUDIT)
- audit_fanotify(event->response & ~FAN_AUDIT,
- &event->audit_rule);
+ if (decision & FAN_AUDIT)
+ audit_fanotify(decision & ~FAN_AUDIT, &event->audit_rule);
pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
group, event, ret);
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 7f06355afa1f..9e93aba210c9 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -528,3 +528,8 @@ static inline unsigned int fanotify_mark_user_flags(struct fsnotify_mark *mark)
return mflags;
}
+
+static inline u32 fanotify_get_response_errno(int res)
+{
+ return res >> FAN_ERRNO_SHIFT;
+}
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index ed56fe6f5ec7..53eee8af34a0 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -337,11 +337,14 @@ static int process_access_response(struct fsnotify_group *group,
struct fanotify_perm_event *event;
int fd = response_struct->fd;
u32 response = response_struct->response;
+ u32 decision = response &
+ (FANOTIFY_RESPONSE_ACCESS | FANOTIFY_RESPONSE_FLAGS);
+ int errno = fanotify_get_response_errno(response);
int ret = info_len;
struct fanotify_response_info_audit_rule friar;
- pr_debug("%s: group=%p fd=%d response=%u buf=%p size=%zu\n", __func__,
- group, fd, response, info, info_len);
+ pr_debug("%s: group=%p fd=%d response=%x errno=%d buf=%p size=%zu\n",
+ __func__, group, fd, response, errno, info, info_len);
/*
* make sure the response is valid, if invalid we do nothing and either
* userspace can send a valid response or we will clean it up after the
@@ -350,18 +353,42 @@ static int process_access_response(struct fsnotify_group *group,
if (response & ~FANOTIFY_RESPONSE_VALID_MASK)
return -EINVAL;
- switch (response & FANOTIFY_RESPONSE_ACCESS) {
+ switch (decision & FANOTIFY_RESPONSE_ACCESS) {
case FAN_ALLOW:
+ if (errno)
+ return -EINVAL;
+ break;
case FAN_DENY:
+ /* Custom errno is supported only for pre-content groups */
+ if (errno && group->priority != FSNOTIFY_PRIO_PRE_CONTENT)
+ return -EINVAL;
+
+ /*
+ * Limit errno to values expected on open(2)/read(2)/write(2)
+ * of regular files.
+ */
+ switch (errno) {
+ case 0:
+ case EIO:
+ case EPERM:
+ case EBUSY:
+ case ETXTBSY:
+ case EAGAIN:
+ case ENOSPC:
+ case EDQUOT:
+ break;
+ default:
+ return -EINVAL;
+ }
break;
default:
return -EINVAL;
}
- if ((response & FAN_AUDIT) && !FAN_GROUP_FLAG(group, FAN_ENABLE_AUDIT))
+ if ((decision & FAN_AUDIT) && !FAN_GROUP_FLAG(group, FAN_ENABLE_AUDIT))
return -EINVAL;
- if (response & FAN_INFO) {
+ if (decision & FAN_INFO) {
ret = process_access_response_info(info, info_len, &friar);
if (ret < 0)
return ret;
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index ae6cb2688d52..547514542669 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -132,7 +132,10 @@
/* These masks check for invalid bits in permission responses. */
#define FANOTIFY_RESPONSE_ACCESS (FAN_ALLOW | FAN_DENY)
#define FANOTIFY_RESPONSE_FLAGS (FAN_AUDIT | FAN_INFO)
-#define FANOTIFY_RESPONSE_VALID_MASK (FANOTIFY_RESPONSE_ACCESS | FANOTIFY_RESPONSE_FLAGS)
+#define FANOTIFY_RESPONSE_ERRNO (FAN_ERRNO_MASK << FAN_ERRNO_SHIFT)
+#define FANOTIFY_RESPONSE_VALID_MASK \
+ (FANOTIFY_RESPONSE_ACCESS | FANOTIFY_RESPONSE_FLAGS | \
+ FANOTIFY_RESPONSE_ERRNO)
/* Do not use these old uapi constants internally */
#undef FAN_ALL_CLASS_BITS
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 6136e8a9f9f3..34b5d634de1f 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -234,6 +234,13 @@ struct fanotify_response_info_audit_rule {
/* Legit userspace responses to a _PERM event */
#define FAN_ALLOW 0x01
#define FAN_DENY 0x02
+/* errno other than EPERM can specified in upper byte of deny response */
+#define FAN_ERRNO_BITS 8
+#define FAN_ERRNO_SHIFT (32 - FAN_ERRNO_BITS)
+#define FAN_ERRNO_MASK ((1 << FAN_ERRNO_BITS) - 1)
+#define FAN_DENY_ERRNO(err) \
+ (FAN_DENY | ((((__u32)(err)) & FAN_ERRNO_MASK) << FAN_ERRNO_SHIFT))
+
#define FAN_AUDIT 0x10 /* Bitmask to create audit record for result */
#define FAN_INFO 0x20 /* Bitmask to indicate additional information */
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 10/18] fs: add a flag to indicate the fs supports pre-content events
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (8 preceding siblings ...)
2024-09-04 20:27 ` [PATCH v5 09/18] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
@ 2024-09-04 20:28 ` Josef Bacik
2024-09-05 8:07 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 11/18] fanotify: add a helper to check for pre content events Josef Bacik
` (9 subsequent siblings)
19 siblings, 1 reply; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:28 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
The pre-content events require some extra thinking, especially around
page faults. In order to make sure we don't advertise a feature working
that doesn't actually work, add a flag to allow file systems to opt-in
to this behavior.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/notify/fanotify/fanotify_user.c | 2 ++
include/linux/fs.h | 1 +
include/linux/fsnotify.h | 4 ++++
3 files changed, 7 insertions(+)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 53eee8af34a0..936e9f9e0cbc 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1736,6 +1736,8 @@ static int fanotify_events_supported(struct fsnotify_group *group,
/* Pre-content events are only supported on regular files and dirs */
if (mask & FANOTIFY_PRE_CONTENT_EVENTS) {
+ if (!(path->mnt->mnt_sb->s_type->fs_flags & FS_ALLOW_HSM))
+ return -EINVAL;
if (!is_dir && !d_is_reg(path->dentry))
return -EINVAL;
if (is_dir && mask & FAN_PRE_MODIFY)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd34b5755c0b..5708e91d3625 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2494,6 +2494,7 @@ struct file_system_type {
#define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */
#define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */
#define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */
+#define FS_ALLOW_HSM 64 /* FS can handle fanotify pre-content events. */
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
int (*init_fs_context)(struct fs_context *);
const struct fs_parameter_spec *parameters;
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 9d001d328619..27992b548f0c 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -179,6 +179,10 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
if (!S_ISDIR(inode->i_mode) && !S_ISREG(inode->i_mode))
return 0;
+ /* The fs doesn't support pre-content events. */
+ if (!(inode->i_sb->s_type->fs_flags & FS_ALLOW_HSM))
+ return 0;
+
if (perm_mask & MAY_WRITE)
fsnotify_mask = FS_PRE_MODIFY;
else if (perm_mask & (MAY_READ | MAY_ACCESS))
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 11/18] fanotify: add a helper to check for pre content events
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (9 preceding siblings ...)
2024-09-04 20:28 ` [PATCH v5 10/18] fs: add a flag to indicate the fs supports pre-content events Josef Bacik
@ 2024-09-04 20:28 ` Josef Bacik
2024-09-05 8:09 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 12/18] fanotify: disable readahead if we have pre-content watches Josef Bacik
` (8 subsequent siblings)
19 siblings, 1 reply; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:28 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
We want to emit events during page fault, and calling into fanotify
could be expensive, so add a helper to allow us to skip calling into
fanotify from page fault. This will also be used to disable readahead
for content watched files which will be handled in a subsequent patch.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/notify/fsnotify.c | 15 +++++++++++++++
include/linux/fsnotify_backend.h | 14 ++++++++++++++
2 files changed, 29 insertions(+)
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 1ca4a8da7f29..30badf8ba36c 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -201,6 +201,21 @@ static inline bool fsnotify_object_watched(struct inode *inode, __u32 mnt_mask,
return mask & marks_mask & ALL_FSNOTIFY_EVENTS;
}
+#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
+bool fsnotify_file_has_pre_content_watches(struct file *file)
+{
+ struct inode *inode = file_inode(file);
+ __u32 mnt_mask = real_mount(file->f_path.mnt)->mnt_fsnotify_mask;
+
+ if (!(inode->i_sb->s_type->fs_flags & FS_ALLOW_HSM))
+ return false;
+
+ return fsnotify_object_watched(inode, mnt_mask,
+ FSNOTIFY_PRE_CONTENT_EVENTS);
+}
+#endif
+
+
/*
* Notify this dentry's parent about a child's events with child name info
* if parent is watching or if inode/sb/mount are interested in events with
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 276320846bfd..b495a0676dd3 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -900,6 +900,15 @@ static inline void fsnotify_init_event(struct fsnotify_event *event)
INIT_LIST_HEAD(&event->list);
}
+#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
+bool fsnotify_file_has_pre_content_watches(struct file *file);
+#else
+static inline bool fsnotify_file_has_pre_content_watches(struct file *file)
+{
+ return false;
+}
+#endif /* CONFIG_FANOTIFY_ACCESS_PERMISSIONS */
+
#else
static inline int fsnotify(__u32 mask, const void *data, int data_type,
@@ -938,6 +947,11 @@ static inline u32 fsnotify_get_cookie(void)
static inline void fsnotify_unmount_inodes(struct super_block *sb)
{}
+static inline bool fsnotify_file_has_pre_content_watches(struct file *file)
+{
+ return false;
+}
+
#endif /* CONFIG_FSNOTIFY */
#endif /* __KERNEL __ */
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 12/18] fanotify: disable readahead if we have pre-content watches
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (10 preceding siblings ...)
2024-09-04 20:28 ` [PATCH v5 11/18] fanotify: add a helper to check for pre content events Josef Bacik
@ 2024-09-04 20:28 ` Josef Bacik
2024-09-05 8:12 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 13/18] mm: don't allow huge faults for files with pre content watches Josef Bacik
` (7 subsequent siblings)
19 siblings, 1 reply; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:28 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
With page faults we can trigger readahead on the file, and then
subsequent faults can find these pages and insert them into the file
without emitting an fanotify event. To avoid this case, disable
readahead if we have pre-content watches on the file. This way we are
guaranteed to get an event for every range we attempt to access on a
pre-content watched file.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
mm/filemap.c | 12 ++++++++++++
mm/readahead.c | 13 +++++++++++++
2 files changed, 25 insertions(+)
diff --git a/mm/filemap.c b/mm/filemap.c
index ca8c8d889eef..8b1684b62177 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3122,6 +3122,14 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
unsigned long vm_flags = vmf->vma->vm_flags;
unsigned int mmap_miss;
+ /*
+ * If we have pre-content watches we need to disable readahead to make
+ * sure that we don't populate our mapping with 0 filled pages that we
+ * never emitted an event for.
+ */
+ if (fsnotify_file_has_pre_content_watches(file))
+ return fpin;
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
/* Use the readahead code, even if readahead is disabled */
if ((vm_flags & VM_HUGEPAGE) && HPAGE_PMD_ORDER <= MAX_PAGECACHE_ORDER) {
@@ -3190,6 +3198,10 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
struct file *fpin = NULL;
unsigned int mmap_miss;
+ /* See comment in do_sync_mmap_readahead. */
+ if (fsnotify_file_has_pre_content_watches(file))
+ return fpin;
+
/* If we don't want any read-ahead, don't bother */
if (vmf->vma->vm_flags & VM_RAND_READ || !ra->ra_pages)
return fpin;
diff --git a/mm/readahead.c b/mm/readahead.c
index 817b2a352d78..bc068d9218e3 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -128,6 +128,7 @@
#include <linux/blk-cgroup.h>
#include <linux/fadvise.h>
#include <linux/sched/mm.h>
+#include <linux/fsnotify.h>
#include "internal.h"
@@ -674,6 +675,14 @@ void page_cache_sync_ra(struct readahead_control *ractl,
{
bool do_forced_ra = ractl->file && (ractl->file->f_mode & FMODE_RANDOM);
+ /*
+ * If we have pre-content watches we need to disable readahead to make
+ * sure that we don't find 0 filled pages in cache that we never emitted
+ * events for.
+ */
+ if (ractl->file && fsnotify_file_has_pre_content_watches(ractl->file))
+ return;
+
/*
* Even if readahead is disabled, issue this request as readahead
* as we'll need it to satisfy the requested range. The forced
@@ -704,6 +713,10 @@ void page_cache_async_ra(struct readahead_control *ractl,
if (!ractl->ra->ra_pages)
return;
+ /* See the comment in page_cache_sync_ra. */
+ if (ractl->file && fsnotify_file_has_pre_content_watches(ractl->file))
+ return;
+
/*
* Same bit is used for PG_readahead and PG_reclaim.
*/
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 13/18] mm: don't allow huge faults for files with pre content watches
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (11 preceding siblings ...)
2024-09-04 20:28 ` [PATCH v5 12/18] fanotify: disable readahead if we have pre-content watches Josef Bacik
@ 2024-09-04 20:28 ` Josef Bacik
2024-09-05 8:14 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 14/18] fsnotify: generate pre-content permission event on page fault Josef Bacik
` (6 subsequent siblings)
19 siblings, 1 reply; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:28 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
There's nothing stopping us from supporting this, we could simply pass
the order into the helper and emit the proper length. However currently
there's no tests to validate this works properly, so disable it until
there's a desire to support this along with the appropriate tests.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
mm/memory.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index d10e616d7389..3010bcc5e4f9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -78,6 +78,7 @@
#include <linux/ptrace.h>
#include <linux/vmalloc.h>
#include <linux/sched/sysctl.h>
+#include <linux/fsnotify.h>
#include <trace/events/kmem.h>
@@ -5252,8 +5253,17 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
+ struct file *file = vma->vm_file;
if (vma_is_anonymous(vma))
return do_huge_pmd_anonymous_page(vmf);
+ /*
+ * Currently we just emit PAGE_SIZE for our fault events, so don't allow
+ * a huge fault if we have a pre content watch on this file. This would
+ * be trivial to support, but there would need to be tests to ensure
+ * this works properly and those don't exist currently.
+ */
+ if (file && fsnotify_file_has_pre_content_watches(file))
+ return VM_FAULT_FALLBACK;
if (vma->vm_ops->huge_fault)
return vma->vm_ops->huge_fault(vmf, PMD_ORDER);
return VM_FAULT_FALLBACK;
@@ -5263,6 +5273,7 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
+ struct file *file = vma->vm_file;
const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE;
vm_fault_t ret;
@@ -5277,6 +5288,9 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf)
}
if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) {
+ /* See comment in create_huge_pmd. */
+ if (file && fsnotify_file_has_pre_content_watches(file))
+ goto split;
if (vma->vm_ops->huge_fault) {
ret = vma->vm_ops->huge_fault(vmf, PMD_ORDER);
if (!(ret & VM_FAULT_FALLBACK))
@@ -5296,9 +5310,13 @@ static vm_fault_t create_huge_pud(struct vm_fault *vmf)
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \
defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
struct vm_area_struct *vma = vmf->vma;
+ struct file *file = vma->vm_file;
/* No support for anonymous transparent PUD pages yet */
if (vma_is_anonymous(vma))
return VM_FAULT_FALLBACK;
+ /* See comment in create_huge_pmd. */
+ if (file && fsnotify_file_has_pre_content_watches(file))
+ return VM_FAULT_FALLBACK;
if (vma->vm_ops->huge_fault)
return vma->vm_ops->huge_fault(vmf, PUD_ORDER);
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
@@ -5310,12 +5328,16 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \
defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
struct vm_area_struct *vma = vmf->vma;
+ struct file *file = vma->vm_file;
vm_fault_t ret;
/* No support for anonymous transparent PUD pages yet */
if (vma_is_anonymous(vma))
goto split;
if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) {
+ /* See comment in create_huge_pmd. */
+ if (file && fsnotify_file_has_pre_content_watches(file))
+ goto split;
if (vma->vm_ops->huge_fault) {
ret = vma->vm_ops->huge_fault(vmf, PUD_ORDER);
if (!(ret & VM_FAULT_FALLBACK))
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 14/18] fsnotify: generate pre-content permission event on page fault
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (12 preceding siblings ...)
2024-09-04 20:28 ` [PATCH v5 13/18] mm: don't allow huge faults for files with pre content watches Josef Bacik
@ 2024-09-04 20:28 ` Josef Bacik
2024-09-04 20:28 ` [PATCH v5 15/18] bcachefs: add pre-content fsnotify hook to fault Josef Bacik
` (5 subsequent siblings)
19 siblings, 0 replies; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:28 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on page fault depending
on the faulting method.
This pre-content event is meant to be used by hierarchical storage
managers that want to fill in the file content on first read access.
Export a simple helper that file systems that have their own ->fault()
will use, and have a more complicated helper to be do fancy things with
in filemap_fault.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
include/linux/mm.h | 1 +
mm/filemap.c | 116 ++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 110 insertions(+), 7 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ab3d78116043..89665732b404 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3503,6 +3503,7 @@ extern vm_fault_t filemap_fault(struct vm_fault *vmf);
extern vm_fault_t filemap_map_pages(struct vm_fault *vmf,
pgoff_t start_pgoff, pgoff_t end_pgoff);
extern vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf);
+extern vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf);
extern unsigned long stack_guard_gap;
/* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */
diff --git a/mm/filemap.c b/mm/filemap.c
index 8b1684b62177..b2d29947ce7f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -46,6 +46,7 @@
#include <linux/pipe_fs_i.h>
#include <linux/splice.h>
#include <linux/rcupdate_wait.h>
+#include <linux/fsnotify.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include "internal.h"
@@ -3112,13 +3113,13 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
* that. If we didn't pin a file then we return NULL. The file that is
* returned needs to be fput()'ed when we're done with it.
*/
-static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
+static struct file *do_sync_mmap_readahead(struct vm_fault *vmf,
+ struct file *fpin)
{
struct file *file = vmf->vma->vm_file;
struct file_ra_state *ra = &file->f_ra;
struct address_space *mapping = file->f_mapping;
DEFINE_READAHEAD(ractl, file, ra, mapping, vmf->pgoff);
- struct file *fpin = NULL;
unsigned long vm_flags = vmf->vma->vm_flags;
unsigned int mmap_miss;
@@ -3190,12 +3191,12 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
* was pinned if we have to drop the mmap_lock in order to do IO.
*/
static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
- struct folio *folio)
+ struct folio *folio,
+ struct file *fpin)
{
struct file *file = vmf->vma->vm_file;
struct file_ra_state *ra = &file->f_ra;
DEFINE_READAHEAD(ractl, file, ra, file->f_mapping, vmf->pgoff);
- struct file *fpin = NULL;
unsigned int mmap_miss;
/* See comment in do_sync_mmap_readahead. */
@@ -3260,6 +3261,93 @@ static vm_fault_t filemap_fault_recheck_pte_none(struct vm_fault *vmf)
return ret;
}
+/*
+ * If we have pre-content watches on this file we will need to emit an event for
+ * this range. We will handle dropping the lock and emitting the event.
+ *
+ * If FAULT_FLAG_RETRY_NOWAIT is set then we'll return VM_FAULT_RETRY.
+ *
+ * If no event was emitted then *fpin will be NULL and we will return 0.
+ *
+ * If any error occurred we will return VM_FAULT_SIGBUS, *fpin could still be
+ * set and will need to have fput() called on it.
+ *
+ * If we emitted the event then we will return 0 and *fpin will be set, this
+ * must have fput() called on it, and the caller must call VM_FAULT_RETRY after
+ * any other operations it does in order to re-fault the page and make sure the
+ * appropriate locking is maintained.
+ *
+ * Return: the appropriate vm_fault_t return code, 0 on success.
+ */
+static vm_fault_t __filemap_fsnotify_fault(struct vm_fault *vmf,
+ struct file **fpin)
+{
+ struct file *file = vmf->vma->vm_file;
+ loff_t pos = vmf->pgoff << PAGE_SHIFT;
+ int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_ACCESS;
+ int ret;
+
+ /*
+ * We already did this and now we're retrying with everything locked,
+ * don't emit the event and continue.
+ */
+ if (vmf->flags & FAULT_FLAG_TRIED)
+ return 0;
+
+ /* No watches, return NULL. */
+ if (!fsnotify_file_has_pre_content_watches(file))
+ return 0;
+
+ /* We are NOWAIT, we can't wait, just return EAGAIN. */
+ if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
+ return VM_FAULT_RETRY;
+
+ /*
+ * If this fails then we're not allowed to drop the fault lock, return a
+ * SIGBUS so we don't errantly populate pagecache with bogus data for
+ * this file.
+ */
+ *fpin = maybe_unlock_mmap_for_io(vmf, *fpin);
+ if (*fpin == NULL)
+ return VM_FAULT_SIGBUS;
+
+ /*
+ * We can't fput(*fpin) at this point because we could have been passed
+ * in fpin from a previous call.
+ */
+ ret = fsnotify_file_area_perm(*fpin, mask, &pos, PAGE_SIZE);
+ if (ret)
+ return VM_FAULT_SIGBUS;
+
+ return 0;
+}
+
+/**
+ * filemap_fsnotify_fault - maybe emit a pre-content event.
+ * @vmf: struct vm_fault containing details of the fault.
+ *
+ * If we have a pre-content watch on this file we will emit an event for this
+ * range. If we return anything the fault caller should return immediately, we
+ * will return VM_FAULT_RETRY if we had to emit an event, which will trigger the
+ * fault again and then the fault handler will run the second time through.
+ *
+ * Return: a bitwise-OR of %VM_FAULT_ codes, 0 if nothing happened.
+ */
+vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf)
+{
+ struct file *fpin = NULL;
+ vm_fault_t ret;
+
+ ret = __filemap_fsnotify_fault(vmf, &fpin);
+ if (fpin) {
+ fput(fpin);
+ if (!ret)
+ ret = VM_FAULT_RETRY;
+ }
+ return ret;
+}
+EXPORT_SYMBOL_GPL(filemap_fsnotify_fault);
+
/**
* filemap_fault - read in file data for page fault handling
* @vmf: struct vm_fault containing details of the fault
@@ -3299,6 +3387,17 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
if (unlikely(index >= max_idx))
return VM_FAULT_SIGBUS;
+ /*
+ * If we have pre-content watchers then we need to generate events on
+ * page fault so that we can populate any data before the fault.
+ */
+ ret = __filemap_fsnotify_fault(vmf, &fpin);
+ if (unlikely(ret)) {
+ if (fpin)
+ fput(fpin);
+ return ret;
+ }
+
/*
* Do we have something in the page cache already?
*/
@@ -3309,21 +3408,24 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
* the lock.
*/
if (!(vmf->flags & FAULT_FLAG_TRIED))
- fpin = do_async_mmap_readahead(vmf, folio);
+ fpin = do_async_mmap_readahead(vmf, folio, fpin);
if (unlikely(!folio_test_uptodate(folio))) {
filemap_invalidate_lock_shared(mapping);
mapping_locked = true;
}
} else {
ret = filemap_fault_recheck_pte_none(vmf);
- if (unlikely(ret))
+ if (unlikely(ret)) {
+ if (fpin)
+ goto out_retry;
return ret;
+ }
/* No page in the page cache at all */
count_vm_event(PGMAJFAULT);
count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
ret = VM_FAULT_MAJOR;
- fpin = do_sync_mmap_readahead(vmf);
+ fpin = do_sync_mmap_readahead(vmf, fpin);
retry_find:
/*
* See comment in filemap_create_folio() why we need
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 15/18] bcachefs: add pre-content fsnotify hook to fault
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (13 preceding siblings ...)
2024-09-04 20:28 ` [PATCH v5 14/18] fsnotify: generate pre-content permission event on page fault Josef Bacik
@ 2024-09-04 20:28 ` Josef Bacik
2024-09-04 20:28 ` [PATCH v5 16/18] xfs: add pre-content fsnotify hook for write faults Josef Bacik
` (4 subsequent siblings)
19 siblings, 0 replies; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:28 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
Cc: Kent Overstreet
bcachefs has its own locking around filemap_fault, so we have to make
sure we do the fsnotify hook before the locking. Add the check to emit
the event before the locking and return VM_FAULT_RETRY to retrigger the
fault once the event has been emitted.
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/bcachefs/fs-io-pagecache.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/bcachefs/fs-io-pagecache.c b/fs/bcachefs/fs-io-pagecache.c
index a9cc5cad9cc9..ce7968e4fd2f 100644
--- a/fs/bcachefs/fs-io-pagecache.c
+++ b/fs/bcachefs/fs-io-pagecache.c
@@ -570,6 +570,10 @@ vm_fault_t bch2_page_fault(struct vm_fault *vmf)
if (fdm == mapping)
return VM_FAULT_SIGBUS;
+ ret = filemap_fsnotify_fault(vmf);
+ if (unlikely(ret))
+ return ret;
+
/* Lock ordering: */
if (fdm > mapping) {
struct bch_inode_info *fdm_host = to_bch_ei(fdm->host);
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 16/18] xfs: add pre-content fsnotify hook for write faults
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (14 preceding siblings ...)
2024-09-04 20:28 ` [PATCH v5 15/18] bcachefs: add pre-content fsnotify hook to fault Josef Bacik
@ 2024-09-04 20:28 ` Josef Bacik
2024-09-05 8:29 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 17/18] btrfs: disable defrag on pre-content watched files Josef Bacik
` (3 subsequent siblings)
19 siblings, 1 reply; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:28 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
xfs has it's own handling for write faults, so we need to add the
pre-content fsnotify hook for this case. Reads go through filemap_fault
so they're handled properly there.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/xfs/xfs_file.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 4cdc54dc9686..3e385756017f 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1283,6 +1283,10 @@ xfs_write_fault(
unsigned int lock_mode = XFS_MMAPLOCK_SHARED;
vm_fault_t ret;
+ ret = filemap_fsnotify_fault(vmf);
+ if (unlikely(ret))
+ return ret;
+
sb_start_pagefault(inode->i_sb);
file_update_time(vmf->vma->vm_file);
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 17/18] btrfs: disable defrag on pre-content watched files
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (15 preceding siblings ...)
2024-09-04 20:28 ` [PATCH v5 16/18] xfs: add pre-content fsnotify hook for write faults Josef Bacik
@ 2024-09-04 20:28 ` Josef Bacik
2024-09-05 8:23 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 18/18] fs: enable pre-content events on supported file systems Josef Bacik
` (2 subsequent siblings)
19 siblings, 1 reply; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:28 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
We queue up inodes to be defrag'ed asynchronously, which means we do not
have their original file for readahead. This means that the code to
skip readahead on pre-content watched files will not run, and we could
potentially read in empty pages.
Handle this corner case by disabling defrag on files that are currently
being watched for pre-content events.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/ioctl.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index e0a664b8a46a..529f7416814f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2640,6 +2640,15 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
goto out;
}
+ /*
+ * Don't allow defrag on pre-content watched files, as it could
+ * populate the page cache with 0's via readahead.
+ */
+ if (fsnotify_file_has_pre_content_watches(file)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
if (argp) {
if (copy_from_user(&range, argp, sizeof(range))) {
ret = -EFAULT;
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v5 18/18] fs: enable pre-content events on supported file systems
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (16 preceding siblings ...)
2024-09-04 20:28 ` [PATCH v5 17/18] btrfs: disable defrag on pre-content watched files Josef Bacik
@ 2024-09-04 20:28 ` Josef Bacik
2024-09-05 8:27 ` Amir Goldstein
2024-09-05 8:33 ` [PATCH v5 00/18] fanotify: add pre-content hooks Amir Goldstein
2024-09-05 12:08 ` Jan Kara
19 siblings, 1 reply; 34+ messages in thread
From: Josef Bacik @ 2024-09-04 20:28 UTC (permalink / raw)
To: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
Now that all the code has been added for pre-content events, and the
various file systems that need the page fault hooks for fsnotify have
been updated, add FS_ALLOW_HSM to the currently tested file systems.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/bcachefs/fs.c | 2 +-
fs/btrfs/super.c | 3 ++-
fs/ext4/super.c | 6 +++---
fs/xfs/xfs_super.c | 2 +-
4 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
index 3a5f49affa0a..f889a105643b 100644
--- a/fs/bcachefs/fs.c
+++ b/fs/bcachefs/fs.c
@@ -2124,7 +2124,7 @@ static struct file_system_type bcache_fs_type = {
.name = "bcachefs",
.init_fs_context = bch2_init_fs_context,
.kill_sb = bch2_kill_sb,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_HSM,
};
MODULE_ALIAS_FS("bcachefs");
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 0eda8c21d861..201ed90a6083 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2193,7 +2193,8 @@ static struct file_system_type btrfs_fs_type = {
.init_fs_context = btrfs_init_fs_context,
.parameters = btrfs_fs_parameters,
.kill_sb = btrfs_kill_super,
- .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_ALLOW_IDMAP,
+ .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA |
+ FS_ALLOW_IDMAP | FS_ALLOW_HSM,
};
MODULE_ALIAS_FS("btrfs");
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e72145c4ae5a..a042216fb370 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -137,7 +137,7 @@ static struct file_system_type ext2_fs_type = {
.init_fs_context = ext4_init_fs_context,
.parameters = ext4_param_specs,
.kill_sb = ext4_kill_sb,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_HSM,
};
MODULE_ALIAS_FS("ext2");
MODULE_ALIAS("ext2");
@@ -153,7 +153,7 @@ static struct file_system_type ext3_fs_type = {
.init_fs_context = ext4_init_fs_context,
.parameters = ext4_param_specs,
.kill_sb = ext4_kill_sb,
- .fs_flags = FS_REQUIRES_DEV,
+ .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_HSM,
};
MODULE_ALIAS_FS("ext3");
MODULE_ALIAS("ext3");
@@ -7298,7 +7298,7 @@ static struct file_system_type ext4_fs_type = {
.init_fs_context = ext4_init_fs_context,
.parameters = ext4_param_specs,
.kill_sb = ext4_kill_sb,
- .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
+ .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_ALLOW_HSM,
};
MODULE_ALIAS_FS("ext4");
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 27e9f749c4c7..04a6ec7bc2ae 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -2052,7 +2052,7 @@ static struct file_system_type xfs_fs_type = {
.init_fs_context = xfs_init_fs_context,
.parameters = xfs_fs_parameters,
.kill_sb = xfs_kill_sb,
- .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
+ .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_ALLOW_HSM,
};
MODULE_ALIAS_FS("xfs");
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v5 01/18] fanotify: don't skip extra event info if no info_mode is set
2024-09-04 20:27 ` [PATCH v5 01/18] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
@ 2024-09-05 7:49 ` Amir Goldstein
0 siblings, 0 replies; 34+ messages in thread
From: Amir Goldstein @ 2024-09-05 7:49 UTC (permalink / raw)
To: Josef Bacik
Cc: kernel-team, linux-fsdevel, jack, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> New pre-content events will be path events but they will also carry
> additional range information. Remove the optimization to skip checking
> whether info structures need to be generated for path events. This
> results in no change in generated info structures for existing events.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/notify/fanotify/fanotify_user.c | 13 ++++---------
> 1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 9ec313e9f6e1..2e2fba8a9d20 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -160,9 +160,6 @@ static size_t fanotify_event_len(unsigned int info_mode,
> int fh_len;
> int dot_len = 0;
>
> - if (!info_mode)
> - return event_len;
> -
> if (fanotify_is_error_event(event->mask))
> event_len += FANOTIFY_ERROR_INFO_LEN;
>
> @@ -740,12 +737,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> if (fanotify_is_perm_event(event->mask))
> FANOTIFY_PERM(event)->fd = fd;
>
> - if (info_mode) {
> - ret = copy_info_records_to_user(event, info, info_mode, pidfd,
> - buf, count);
> - if (ret < 0)
> - goto out_close_fd;
> - }
> + ret = copy_info_records_to_user(event, info, info_mode, pidfd,
> + buf, count);
> + if (ret < 0)
> + goto out_close_fd;
>
> if (f)
> fd_install(fd, f);
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 10/18] fs: add a flag to indicate the fs supports pre-content events
2024-09-04 20:28 ` [PATCH v5 10/18] fs: add a flag to indicate the fs supports pre-content events Josef Bacik
@ 2024-09-05 8:07 ` Amir Goldstein
0 siblings, 0 replies; 34+ messages in thread
From: Amir Goldstein @ 2024-09-05 8:07 UTC (permalink / raw)
To: Josef Bacik
Cc: kernel-team, linux-fsdevel, jack, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> The pre-content events require some extra thinking, especially around
> page faults. In order to make sure we don't advertise a feature working
> that doesn't actually work, add a flag to allow file systems to opt-in
> to this behavior.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Better squash this patch to FAN_PRE_ACCESS patch, so it will
not be allowed to mark with FAN_PRE_ACCESS on unsupported fs
mid series.
OR - move this patch before the FAN_PRE_ACCESS patch and pre-define
FANOTIFY_PRE_CONTENT_EVENTS to 0.
Apart from that, you may add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks,
Amir.
> ---
> fs/notify/fanotify/fanotify_user.c | 2 ++
> include/linux/fs.h | 1 +
> include/linux/fsnotify.h | 4 ++++
> 3 files changed, 7 insertions(+)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 53eee8af34a0..936e9f9e0cbc 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -1736,6 +1736,8 @@ static int fanotify_events_supported(struct fsnotify_group *group,
>
> /* Pre-content events are only supported on regular files and dirs */
> if (mask & FANOTIFY_PRE_CONTENT_EVENTS) {
> + if (!(path->mnt->mnt_sb->s_type->fs_flags & FS_ALLOW_HSM))
> + return -EINVAL;
> if (!is_dir && !d_is_reg(path->dentry))
> return -EINVAL;
> if (is_dir && mask & FAN_PRE_MODIFY)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index fd34b5755c0b..5708e91d3625 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2494,6 +2494,7 @@ struct file_system_type {
> #define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */
> #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */
> #define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */
> +#define FS_ALLOW_HSM 64 /* FS can handle fanotify pre-content events. */
> #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
> int (*init_fs_context)(struct fs_context *);
> const struct fs_parameter_spec *parameters;
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index 9d001d328619..27992b548f0c 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -179,6 +179,10 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
> if (!S_ISDIR(inode->i_mode) && !S_ISREG(inode->i_mode))
> return 0;
>
> + /* The fs doesn't support pre-content events. */
> + if (!(inode->i_sb->s_type->fs_flags & FS_ALLOW_HSM))
> + return 0;
> +
> if (perm_mask & MAY_WRITE)
> fsnotify_mask = FS_PRE_MODIFY;
> else if (perm_mask & (MAY_READ | MAY_ACCESS))
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 11/18] fanotify: add a helper to check for pre content events
2024-09-04 20:28 ` [PATCH v5 11/18] fanotify: add a helper to check for pre content events Josef Bacik
@ 2024-09-05 8:09 ` Amir Goldstein
0 siblings, 0 replies; 34+ messages in thread
From: Amir Goldstein @ 2024-09-05 8:09 UTC (permalink / raw)
To: Josef Bacik
Cc: kernel-team, linux-fsdevel, jack, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> We want to emit events during page fault, and calling into fanotify
> could be expensive, so add a helper to allow us to skip calling into
> fanotify from page fault. This will also be used to disable readahead
> for content watched files which will be handled in a subsequent patch.
>
> Reviewed-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks,
Amir.
> ---
> fs/notify/fsnotify.c | 15 +++++++++++++++
> include/linux/fsnotify_backend.h | 14 ++++++++++++++
> 2 files changed, 29 insertions(+)
>
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 1ca4a8da7f29..30badf8ba36c 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -201,6 +201,21 @@ static inline bool fsnotify_object_watched(struct inode *inode, __u32 mnt_mask,
> return mask & marks_mask & ALL_FSNOTIFY_EVENTS;
> }
>
> +#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> +bool fsnotify_file_has_pre_content_watches(struct file *file)
> +{
> + struct inode *inode = file_inode(file);
> + __u32 mnt_mask = real_mount(file->f_path.mnt)->mnt_fsnotify_mask;
> +
> + if (!(inode->i_sb->s_type->fs_flags & FS_ALLOW_HSM))
> + return false;
> +
> + return fsnotify_object_watched(inode, mnt_mask,
> + FSNOTIFY_PRE_CONTENT_EVENTS);
> +}
> +#endif
> +
> +
> /*
> * Notify this dentry's parent about a child's events with child name info
> * if parent is watching or if inode/sb/mount are interested in events with
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 276320846bfd..b495a0676dd3 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -900,6 +900,15 @@ static inline void fsnotify_init_event(struct fsnotify_event *event)
> INIT_LIST_HEAD(&event->list);
> }
>
> +#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> +bool fsnotify_file_has_pre_content_watches(struct file *file);
> +#else
> +static inline bool fsnotify_file_has_pre_content_watches(struct file *file)
> +{
> + return false;
> +}
> +#endif /* CONFIG_FANOTIFY_ACCESS_PERMISSIONS */
> +
> #else
>
> static inline int fsnotify(__u32 mask, const void *data, int data_type,
> @@ -938,6 +947,11 @@ static inline u32 fsnotify_get_cookie(void)
> static inline void fsnotify_unmount_inodes(struct super_block *sb)
> {}
>
> +static inline bool fsnotify_file_has_pre_content_watches(struct file *file)
> +{
> + return false;
> +}
> +
> #endif /* CONFIG_FSNOTIFY */
>
> #endif /* __KERNEL __ */
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 12/18] fanotify: disable readahead if we have pre-content watches
2024-09-04 20:28 ` [PATCH v5 12/18] fanotify: disable readahead if we have pre-content watches Josef Bacik
@ 2024-09-05 8:12 ` Amir Goldstein
0 siblings, 0 replies; 34+ messages in thread
From: Amir Goldstein @ 2024-09-05 8:12 UTC (permalink / raw)
To: Josef Bacik
Cc: kernel-team, linux-fsdevel, jack, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> With page faults we can trigger readahead on the file, and then
> subsequent faults can find these pages and insert them into the file
> without emitting an fanotify event. To avoid this case, disable
> readahead if we have pre-content watches on the file. This way we are
> guaranteed to get an event for every range we attempt to access on a
> pre-content watched file.
>
> Reviewed-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks,
Amir.
> ---
> mm/filemap.c | 12 ++++++++++++
> mm/readahead.c | 13 +++++++++++++
> 2 files changed, 25 insertions(+)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index ca8c8d889eef..8b1684b62177 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3122,6 +3122,14 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
> unsigned long vm_flags = vmf->vma->vm_flags;
> unsigned int mmap_miss;
>
> + /*
> + * If we have pre-content watches we need to disable readahead to make
> + * sure that we don't populate our mapping with 0 filled pages that we
> + * never emitted an event for.
> + */
> + if (fsnotify_file_has_pre_content_watches(file))
> + return fpin;
> +
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> /* Use the readahead code, even if readahead is disabled */
> if ((vm_flags & VM_HUGEPAGE) && HPAGE_PMD_ORDER <= MAX_PAGECACHE_ORDER) {
> @@ -3190,6 +3198,10 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
> struct file *fpin = NULL;
> unsigned int mmap_miss;
>
> + /* See comment in do_sync_mmap_readahead. */
> + if (fsnotify_file_has_pre_content_watches(file))
> + return fpin;
> +
> /* If we don't want any read-ahead, don't bother */
> if (vmf->vma->vm_flags & VM_RAND_READ || !ra->ra_pages)
> return fpin;
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 817b2a352d78..bc068d9218e3 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -128,6 +128,7 @@
> #include <linux/blk-cgroup.h>
> #include <linux/fadvise.h>
> #include <linux/sched/mm.h>
> +#include <linux/fsnotify.h>
>
> #include "internal.h"
>
> @@ -674,6 +675,14 @@ void page_cache_sync_ra(struct readahead_control *ractl,
> {
> bool do_forced_ra = ractl->file && (ractl->file->f_mode & FMODE_RANDOM);
>
> + /*
> + * If we have pre-content watches we need to disable readahead to make
> + * sure that we don't find 0 filled pages in cache that we never emitted
> + * events for.
> + */
> + if (ractl->file && fsnotify_file_has_pre_content_watches(ractl->file))
> + return;
> +
> /*
> * Even if readahead is disabled, issue this request as readahead
> * as we'll need it to satisfy the requested range. The forced
> @@ -704,6 +713,10 @@ void page_cache_async_ra(struct readahead_control *ractl,
> if (!ractl->ra->ra_pages)
> return;
>
> + /* See the comment in page_cache_sync_ra. */
> + if (ractl->file && fsnotify_file_has_pre_content_watches(ractl->file))
> + return;
> +
> /*
> * Same bit is used for PG_readahead and PG_reclaim.
> */
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 13/18] mm: don't allow huge faults for files with pre content watches
2024-09-04 20:28 ` [PATCH v5 13/18] mm: don't allow huge faults for files with pre content watches Josef Bacik
@ 2024-09-05 8:14 ` Amir Goldstein
0 siblings, 0 replies; 34+ messages in thread
From: Amir Goldstein @ 2024-09-05 8:14 UTC (permalink / raw)
To: Josef Bacik
Cc: kernel-team, linux-fsdevel, jack, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> There's nothing stopping us from supporting this, we could simply pass
> the order into the helper and emit the proper length. However currently
> there's no tests to validate this works properly, so disable it until
> there's a desire to support this along with the appropriate tests.
>
> Reviewed-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks,
Amir.
> ---
> mm/memory.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index d10e616d7389..3010bcc5e4f9 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -78,6 +78,7 @@
> #include <linux/ptrace.h>
> #include <linux/vmalloc.h>
> #include <linux/sched/sysctl.h>
> +#include <linux/fsnotify.h>
>
> #include <trace/events/kmem.h>
>
> @@ -5252,8 +5253,17 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
> static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
> {
> struct vm_area_struct *vma = vmf->vma;
> + struct file *file = vma->vm_file;
> if (vma_is_anonymous(vma))
> return do_huge_pmd_anonymous_page(vmf);
> + /*
> + * Currently we just emit PAGE_SIZE for our fault events, so don't allow
> + * a huge fault if we have a pre content watch on this file. This would
> + * be trivial to support, but there would need to be tests to ensure
> + * this works properly and those don't exist currently.
> + */
> + if (file && fsnotify_file_has_pre_content_watches(file))
> + return VM_FAULT_FALLBACK;
> if (vma->vm_ops->huge_fault)
> return vma->vm_ops->huge_fault(vmf, PMD_ORDER);
> return VM_FAULT_FALLBACK;
> @@ -5263,6 +5273,7 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
> static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf)
> {
> struct vm_area_struct *vma = vmf->vma;
> + struct file *file = vma->vm_file;
> const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE;
> vm_fault_t ret;
>
> @@ -5277,6 +5288,9 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf)
> }
>
> if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) {
> + /* See comment in create_huge_pmd. */
> + if (file && fsnotify_file_has_pre_content_watches(file))
> + goto split;
> if (vma->vm_ops->huge_fault) {
> ret = vma->vm_ops->huge_fault(vmf, PMD_ORDER);
> if (!(ret & VM_FAULT_FALLBACK))
> @@ -5296,9 +5310,13 @@ static vm_fault_t create_huge_pud(struct vm_fault *vmf)
> #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \
> defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
> struct vm_area_struct *vma = vmf->vma;
> + struct file *file = vma->vm_file;
> /* No support for anonymous transparent PUD pages yet */
> if (vma_is_anonymous(vma))
> return VM_FAULT_FALLBACK;
> + /* See comment in create_huge_pmd. */
> + if (file && fsnotify_file_has_pre_content_watches(file))
> + return VM_FAULT_FALLBACK;
> if (vma->vm_ops->huge_fault)
> return vma->vm_ops->huge_fault(vmf, PUD_ORDER);
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> @@ -5310,12 +5328,16 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
> #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \
> defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
> struct vm_area_struct *vma = vmf->vma;
> + struct file *file = vma->vm_file;
> vm_fault_t ret;
>
> /* No support for anonymous transparent PUD pages yet */
> if (vma_is_anonymous(vma))
> goto split;
> if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) {
> + /* See comment in create_huge_pmd. */
> + if (file && fsnotify_file_has_pre_content_watches(file))
> + goto split;
> if (vma->vm_ops->huge_fault) {
> ret = vma->vm_ops->huge_fault(vmf, PUD_ORDER);
> if (!(ret & VM_FAULT_FALLBACK))
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 17/18] btrfs: disable defrag on pre-content watched files
2024-09-04 20:28 ` [PATCH v5 17/18] btrfs: disable defrag on pre-content watched files Josef Bacik
@ 2024-09-05 8:23 ` Amir Goldstein
0 siblings, 0 replies; 34+ messages in thread
From: Amir Goldstein @ 2024-09-05 8:23 UTC (permalink / raw)
To: Josef Bacik
Cc: kernel-team, linux-fsdevel, jack, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> We queue up inodes to be defrag'ed asynchronously, which means we do not
> have their original file for readahead. This means that the code to
> skip readahead on pre-content watched files will not run, and we could
> potentially read in empty pages.
>
> Handle this corner case by disabling defrag on files that are currently
> being watched for pre-content events.
IIUC, you are disabling the *start* of async defrag.
What happens if defrag has already started and then a pre-content
watch is set up?
Do we care about this corner of a corner case?
Thanks,
Amir.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
> fs/btrfs/ioctl.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index e0a664b8a46a..529f7416814f 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2640,6 +2640,15 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
> goto out;
> }
>
> + /*
> + * Don't allow defrag on pre-content watched files, as it could
> + * populate the page cache with 0's via readahead.
> + */
> + if (fsnotify_file_has_pre_content_watches(file)) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> if (argp) {
> if (copy_from_user(&range, argp, sizeof(range))) {
> ret = -EFAULT;
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 18/18] fs: enable pre-content events on supported file systems
2024-09-04 20:28 ` [PATCH v5 18/18] fs: enable pre-content events on supported file systems Josef Bacik
@ 2024-09-05 8:27 ` Amir Goldstein
2024-09-05 10:36 ` Jan Kara
0 siblings, 1 reply; 34+ messages in thread
From: Amir Goldstein @ 2024-09-05 8:27 UTC (permalink / raw)
To: Josef Bacik, jack
Cc: kernel-team, linux-fsdevel, brauner, linux-xfs, linux-bcachefs,
linux-btrfs, linux-mm
On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> Now that all the code has been added for pre-content events, and the
> various file systems that need the page fault hooks for fsnotify have
> been updated, add FS_ALLOW_HSM to the currently tested file systems.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
I would not be devastated if this patch remains as is,
but I think it would be nicer to:
1. Move it before the fs specific patches
2. Set the HSM flag only on ext*
3. Add the HSM flag in other fs specific patches
Thanks,
Amir.
> ---
> fs/bcachefs/fs.c | 2 +-
> fs/btrfs/super.c | 3 ++-
> fs/ext4/super.c | 6 +++---
> fs/xfs/xfs_super.c | 2 +-
> 4 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
> index 3a5f49affa0a..f889a105643b 100644
> --- a/fs/bcachefs/fs.c
> +++ b/fs/bcachefs/fs.c
> @@ -2124,7 +2124,7 @@ static struct file_system_type bcache_fs_type = {
> .name = "bcachefs",
> .init_fs_context = bch2_init_fs_context,
> .kill_sb = bch2_kill_sb,
> - .fs_flags = FS_REQUIRES_DEV,
> + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_HSM,
> };
>
> MODULE_ALIAS_FS("bcachefs");
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 0eda8c21d861..201ed90a6083 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2193,7 +2193,8 @@ static struct file_system_type btrfs_fs_type = {
> .init_fs_context = btrfs_init_fs_context,
> .parameters = btrfs_fs_parameters,
> .kill_sb = btrfs_kill_super,
> - .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_ALLOW_IDMAP,
> + .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA |
> + FS_ALLOW_IDMAP | FS_ALLOW_HSM,
> };
>
> MODULE_ALIAS_FS("btrfs");
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index e72145c4ae5a..a042216fb370 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -137,7 +137,7 @@ static struct file_system_type ext2_fs_type = {
> .init_fs_context = ext4_init_fs_context,
> .parameters = ext4_param_specs,
> .kill_sb = ext4_kill_sb,
> - .fs_flags = FS_REQUIRES_DEV,
> + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_HSM,
> };
> MODULE_ALIAS_FS("ext2");
> MODULE_ALIAS("ext2");
> @@ -153,7 +153,7 @@ static struct file_system_type ext3_fs_type = {
> .init_fs_context = ext4_init_fs_context,
> .parameters = ext4_param_specs,
> .kill_sb = ext4_kill_sb,
> - .fs_flags = FS_REQUIRES_DEV,
> + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_HSM,
> };
> MODULE_ALIAS_FS("ext3");
> MODULE_ALIAS("ext3");
> @@ -7298,7 +7298,7 @@ static struct file_system_type ext4_fs_type = {
> .init_fs_context = ext4_init_fs_context,
> .parameters = ext4_param_specs,
> .kill_sb = ext4_kill_sb,
> - .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
> + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_ALLOW_HSM,
> };
> MODULE_ALIAS_FS("ext4");
>
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 27e9f749c4c7..04a6ec7bc2ae 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -2052,7 +2052,7 @@ static struct file_system_type xfs_fs_type = {
> .init_fs_context = xfs_init_fs_context,
> .parameters = xfs_fs_parameters,
> .kill_sb = xfs_kill_sb,
> - .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
> + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_ALLOW_HSM,
> };
> MODULE_ALIAS_FS("xfs");
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 16/18] xfs: add pre-content fsnotify hook for write faults
2024-09-04 20:28 ` [PATCH v5 16/18] xfs: add pre-content fsnotify hook for write faults Josef Bacik
@ 2024-09-05 8:29 ` Amir Goldstein
0 siblings, 0 replies; 34+ messages in thread
From: Amir Goldstein @ 2024-09-05 8:29 UTC (permalink / raw)
To: Josef Bacik
Cc: kernel-team, linux-fsdevel, jack, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> xfs has it's own handling for write faults, so we need to add the
> pre-content fsnotify hook for this case. Reads go through filemap_fault
> so they're handled properly there.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
> fs/xfs/xfs_file.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 4cdc54dc9686..3e385756017f 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1283,6 +1283,10 @@ xfs_write_fault(
> unsigned int lock_mode = XFS_MMAPLOCK_SHARED;
> vm_fault_t ret;
>
> + ret = filemap_fsnotify_fault(vmf);
> + if (unlikely(ret))
> + return ret;
> +
> sb_start_pagefault(inode->i_sb);
> file_update_time(vmf->vma->vm_file);
>
> --
> 2.43.0
>
The end result is so much nicer than early sketches ;)
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks,
Amir.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 00/18] fanotify: add pre-content hooks
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (17 preceding siblings ...)
2024-09-04 20:28 ` [PATCH v5 18/18] fs: enable pre-content events on supported file systems Josef Bacik
@ 2024-09-05 8:33 ` Amir Goldstein
2024-09-05 10:32 ` Jan Kara
2024-09-05 19:30 ` Josef Bacik
2024-09-05 12:08 ` Jan Kara
19 siblings, 2 replies; 34+ messages in thread
From: Amir Goldstein @ 2024-09-05 8:33 UTC (permalink / raw)
To: Josef Bacik, jack
Cc: kernel-team, linux-fsdevel, brauner, linux-xfs, linux-bcachefs,
linux-btrfs, linux-mm
On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> v4: https://lore.kernel.org/linux-fsdevel/cover.1723670362.git.josef@toxicpanda.com/
> v3: https://lore.kernel.org/linux-fsdevel/cover.1723228772.git.josef@toxicpanda.com/
> v2: https://lore.kernel.org/linux-fsdevel/cover.1723144881.git.josef@toxicpanda.com/
> v1: https://lore.kernel.org/linux-fsdevel/cover.1721931241.git.josef@toxicpanda.com/
>
> v4->v5:
> - Cleaned up the various "I'll fix it on commit" notes that Jan made since I had
> to respin the series anyway.
> - Renamed the filemap pagefault helper for fsnotify per Christians suggestion.
> - Added a FS_ALLOW_HSM flag per Jan's comments, based on Amir's rough sketch.
> - Added a patch to disable btrfs defrag on pre-content watched files.
> - Added a patch to turn on FS_ALLOW_HSM for all the file systems that I tested.
My only nits are about different ordering of the FS_ALLOW_HSM patches
I guess as the merge window is closing in, Jan could do these trivial
reorders on commit, based on his preference (?).
> - Added two fstests (which will be posted separately) to validate everything,
> re-validated the series with btrfs, xfs, ext4, and bcachefs to make sure I
> didn't break anything.
Very cool!
Thanks again for the "productization" of my patches :)
Amir.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 00/18] fanotify: add pre-content hooks
2024-09-05 8:33 ` [PATCH v5 00/18] fanotify: add pre-content hooks Amir Goldstein
@ 2024-09-05 10:32 ` Jan Kara
2024-09-05 19:30 ` Josef Bacik
1 sibling, 0 replies; 34+ messages in thread
From: Jan Kara @ 2024-09-05 10:32 UTC (permalink / raw)
To: Amir Goldstein
Cc: Josef Bacik, jack, kernel-team, linux-fsdevel, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
On Thu 05-09-24 10:33:07, Amir Goldstein wrote:
> On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > v4: https://lore.kernel.org/linux-fsdevel/cover.1723670362.git.josef@toxicpanda.com/
> > v3: https://lore.kernel.org/linux-fsdevel/cover.1723228772.git.josef@toxicpanda.com/
> > v2: https://lore.kernel.org/linux-fsdevel/cover.1723144881.git.josef@toxicpanda.com/
> > v1: https://lore.kernel.org/linux-fsdevel/cover.1721931241.git.josef@toxicpanda.com/
> >
> > v4->v5:
> > - Cleaned up the various "I'll fix it on commit" notes that Jan made since I had
> > to respin the series anyway.
> > - Renamed the filemap pagefault helper for fsnotify per Christians suggestion.
> > - Added a FS_ALLOW_HSM flag per Jan's comments, based on Amir's rough sketch.
> > - Added a patch to disable btrfs defrag on pre-content watched files.
> > - Added a patch to turn on FS_ALLOW_HSM for all the file systems that I tested.
>
> My only nits are about different ordering of the FS_ALLOW_HSM patches
> I guess as the merge window is closing in, Jan could do these trivial
> reorders on commit, based on his preference (?).
Yes, I can do the reordering on commit.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 18/18] fs: enable pre-content events on supported file systems
2024-09-05 8:27 ` Amir Goldstein
@ 2024-09-05 10:36 ` Jan Kara
0 siblings, 0 replies; 34+ messages in thread
From: Jan Kara @ 2024-09-05 10:36 UTC (permalink / raw)
To: Amir Goldstein
Cc: Josef Bacik, jack, kernel-team, linux-fsdevel, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
On Thu 05-09-24 10:27:52, Amir Goldstein wrote:
> On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > Now that all the code has been added for pre-content events, and the
> > various file systems that need the page fault hooks for fsnotify have
> > been updated, add FS_ALLOW_HSM to the currently tested file systems.
> >
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>
> I would not be devastated if this patch remains as is,
> but I think it would be nicer to:
> 1. Move it before the fs specific patches
> 2. Set the HSM flag only on ext*
> 3. Add the HSM flag in other fs specific patches
I agree that doing this inside the patch where we tweak its functions to
properly support HSM would be more natural. And then only ext4 will remain
here. I guess I'll do this on commit.
Honza
>
> Thanks,
> Amir.
>
> > ---
> > fs/bcachefs/fs.c | 2 +-
> > fs/btrfs/super.c | 3 ++-
> > fs/ext4/super.c | 6 +++---
> > fs/xfs/xfs_super.c | 2 +-
> > 4 files changed, 7 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
> > index 3a5f49affa0a..f889a105643b 100644
> > --- a/fs/bcachefs/fs.c
> > +++ b/fs/bcachefs/fs.c
> > @@ -2124,7 +2124,7 @@ static struct file_system_type bcache_fs_type = {
> > .name = "bcachefs",
> > .init_fs_context = bch2_init_fs_context,
> > .kill_sb = bch2_kill_sb,
> > - .fs_flags = FS_REQUIRES_DEV,
> > + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_HSM,
> > };
> >
> > MODULE_ALIAS_FS("bcachefs");
> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > index 0eda8c21d861..201ed90a6083 100644
> > --- a/fs/btrfs/super.c
> > +++ b/fs/btrfs/super.c
> > @@ -2193,7 +2193,8 @@ static struct file_system_type btrfs_fs_type = {
> > .init_fs_context = btrfs_init_fs_context,
> > .parameters = btrfs_fs_parameters,
> > .kill_sb = btrfs_kill_super,
> > - .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_ALLOW_IDMAP,
> > + .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA |
> > + FS_ALLOW_IDMAP | FS_ALLOW_HSM,
> > };
> >
> > MODULE_ALIAS_FS("btrfs");
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index e72145c4ae5a..a042216fb370 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -137,7 +137,7 @@ static struct file_system_type ext2_fs_type = {
> > .init_fs_context = ext4_init_fs_context,
> > .parameters = ext4_param_specs,
> > .kill_sb = ext4_kill_sb,
> > - .fs_flags = FS_REQUIRES_DEV,
> > + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_HSM,
> > };
> > MODULE_ALIAS_FS("ext2");
> > MODULE_ALIAS("ext2");
> > @@ -153,7 +153,7 @@ static struct file_system_type ext3_fs_type = {
> > .init_fs_context = ext4_init_fs_context,
> > .parameters = ext4_param_specs,
> > .kill_sb = ext4_kill_sb,
> > - .fs_flags = FS_REQUIRES_DEV,
> > + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_HSM,
> > };
> > MODULE_ALIAS_FS("ext3");
> > MODULE_ALIAS("ext3");
> > @@ -7298,7 +7298,7 @@ static struct file_system_type ext4_fs_type = {
> > .init_fs_context = ext4_init_fs_context,
> > .parameters = ext4_param_specs,
> > .kill_sb = ext4_kill_sb,
> > - .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
> > + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_ALLOW_HSM,
> > };
> > MODULE_ALIAS_FS("ext4");
> >
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 27e9f749c4c7..04a6ec7bc2ae 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -2052,7 +2052,7 @@ static struct file_system_type xfs_fs_type = {
> > .init_fs_context = xfs_init_fs_context,
> > .parameters = xfs_fs_parameters,
> > .kill_sb = xfs_kill_sb,
> > - .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
> > + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_ALLOW_HSM,
> > };
> > MODULE_ALIAS_FS("xfs");
> >
> > --
> > 2.43.0
> >
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 00/18] fanotify: add pre-content hooks
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
` (18 preceding siblings ...)
2024-09-05 8:33 ` [PATCH v5 00/18] fanotify: add pre-content hooks Amir Goldstein
@ 2024-09-05 12:08 ` Jan Kara
2024-09-05 19:29 ` Josef Bacik
19 siblings, 1 reply; 34+ messages in thread
From: Jan Kara @ 2024-09-05 12:08 UTC (permalink / raw)
To: Josef Bacik
Cc: kernel-team, linux-fsdevel, jack, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
Hello!
On Wed 04-09-24 16:27:50, Josef Bacik wrote:
> These are the patches for the bare bones pre-content fanotify support. The
> majority of this work is Amir's, my contribution to this has solely been around
> adding the page fault hooks, testing and validating everything. I'm sending it
> because Amir is traveling a bunch, and I touched it last so I'm going to take
> all the hate and he can take all the credit.
>
> There is a PoC that I've been using to validate this work, you can find the git
> repo here
>
> https://github.com/josefbacik/remote-fetch
The test tool seems to be a bit outdated wrt the current series. It took me
quite a while to debug why HSM isn't working with it (eventually I've
tracked it down to the changes in struct fanotify_event_info_range...).
Anyway all seems to be working (after fixing up some missing export), I've
pushed out the result I have to:
https://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify
and will push it to linux-next as well so that it gets some soaking before
the merge window. That being said I'd still like to get explicit ack from
XFS folks (hint) so don't patches may still rebase due to that.
Honza
> This consists of 3 different tools.
>
> 1. populate. This just creates all the stub files in the directory from the
> source directory. Just run ./populate ~/linux ~/hsm-linux and it'll
> recursively create all of the stub files and directories.
> 2. remote-fetch. This is the actual PoC, you just point it at the source and
> destination directory and then you can do whatever. ./remote-fetch ~/linux
> ~/hsm-linux.
> 3. mmap-validate. This was to validate the pagefault thing, this is likely what
> will be turned into the selftest with remote-fetch. It creates a file and
> then you can validate the file matches the right pattern with both normal
> reads and mmap. Normally I do something like
>
> ./mmap-validate create ~/src/foo
> ./populate ~/src ~/dst
> ./rmeote-fetch ~/src ~/dst
> ./mmap-validate validate ~/dst/foo
>
> I did a bunch of testing, I also got some performance numbers. I copied a
> kernel tree, and then did remote-fetch, and then make -j4
>
> Normal
> real 9m49.709s
> user 28m11.372s
> sys 4m57.304s
>
> HSM
> real 10m6.454s
> user 29m10.517s
> sys 5m2.617s
>
> So ~17 seconds more to build with HSM. I then did a make mrproper on both trees
> to see the size
>
> [root@fedora ~]# du -hs /src/linux
> 1.6G /src/linux
> [root@fedora ~]# du -hs dst
> 125M dst
>
> This mirrors the sort of savings we've seen in production.
>
> Meta has had these patches (minus the page fault patch) deployed in production
> for almost a year with our own utility for doing on-demand package fetching.
> The savings from this has been pretty significant.
>
> The page-fault hooks are necessary for the last thing we need, which is
> on-demand range fetching of executables. Some of our binaries are several gigs
> large, having the ability to remote fetch them on demand is a huge win for us
> not only with space savings, but with startup time of containers.
>
> There will be tests for this going into LTP once we're satisfied with the
> patches and they're on their way upstream. Thanks,
>
> Josef
>
> Amir Goldstein (8):
> fsnotify: introduce pre-content permission event
> fsnotify: generate pre-content permission event on open
> fanotify: introduce FAN_PRE_ACCESS permission event
> fanotify: introduce FAN_PRE_MODIFY permission event
> fanotify: pass optional file access range in pre-content event
> fanotify: rename a misnamed constant
> fanotify: report file range info with pre-content events
> fanotify: allow to set errno in FAN_DENY permission response
>
> Josef Bacik (10):
> fanotify: don't skip extra event info if no info_mode is set
> fs: add a flag to indicate the fs supports pre-content events
> fanotify: add a helper to check for pre content events
> fanotify: disable readahead if we have pre-content watches
> mm: don't allow huge faults for files with pre content watches
> fsnotify: generate pre-content permission event on page fault
> bcachefs: add pre-content fsnotify hook to fault
> xfs: add pre-content fsnotify hook for write faults
> btrfs: disable defrag on pre-content watched files
> fs: enable pre-content events on supported file systems
>
> fs/bcachefs/fs-io-pagecache.c | 4 +
> fs/bcachefs/fs.c | 2 +-
> fs/btrfs/ioctl.c | 9 ++
> fs/btrfs/super.c | 3 +-
> fs/ext4/super.c | 6 +-
> fs/namei.c | 9 ++
> fs/notify/fanotify/fanotify.c | 33 ++++++--
> fs/notify/fanotify/fanotify.h | 15 ++++
> fs/notify/fanotify/fanotify_user.c | 119 ++++++++++++++++++++++-----
> fs/notify/fsnotify.c | 17 +++-
> fs/xfs/xfs_file.c | 4 +
> fs/xfs/xfs_super.c | 2 +-
> include/linux/fanotify.h | 20 +++--
> include/linux/fs.h | 1 +
> include/linux/fsnotify.h | 58 +++++++++++--
> include/linux/fsnotify_backend.h | 59 ++++++++++++-
> include/linux/mm.h | 1 +
> include/uapi/linux/fanotify.h | 18 ++++
> mm/filemap.c | 128 +++++++++++++++++++++++++++--
> mm/memory.c | 22 +++++
> mm/readahead.c | 13 +++
> security/selinux/hooks.c | 3 +-
> 22 files changed, 489 insertions(+), 57 deletions(-)
>
> --
> 2.43.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 00/18] fanotify: add pre-content hooks
2024-09-05 12:08 ` Jan Kara
@ 2024-09-05 19:29 ` Josef Bacik
0 siblings, 0 replies; 34+ messages in thread
From: Josef Bacik @ 2024-09-05 19:29 UTC (permalink / raw)
To: Jan Kara
Cc: kernel-team, linux-fsdevel, amir73il, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
On Thu, Sep 05, 2024 at 02:08:08PM +0200, Jan Kara wrote:
> Hello!
>
> On Wed 04-09-24 16:27:50, Josef Bacik wrote:
> > These are the patches for the bare bones pre-content fanotify support. The
> > majority of this work is Amir's, my contribution to this has solely been around
> > adding the page fault hooks, testing and validating everything. I'm sending it
> > because Amir is traveling a bunch, and I touched it last so I'm going to take
> > all the hate and he can take all the credit.
> >
> > There is a PoC that I've been using to validate this work, you can find the git
> > repo here
> >
> > https://github.com/josefbacik/remote-fetch
>
> The test tool seems to be a bit outdated wrt the current series. It took me
> quite a while to debug why HSM isn't working with it (eventually I've
> tracked it down to the changes in struct fanotify_event_info_range...).
> Anyway all seems to be working (after fixing up some missing export), I've
> pushed out the result I have to:
Eesh sorry, I updated it for the fstests and used that as the source of truth
for this stuff, which is how I validated all of the fs'es that got the
FS_ALLOW_HSM flag.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify
>
> and will push it to linux-next as well so that it gets some soaking before
> the merge window. That being said I'd still like to get explicit ack from
> XFS folks (hint) so don't patches may still rebase due to that.
>
Awesome, thanks!
Josef
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 00/18] fanotify: add pre-content hooks
2024-09-05 8:33 ` [PATCH v5 00/18] fanotify: add pre-content hooks Amir Goldstein
2024-09-05 10:32 ` Jan Kara
@ 2024-09-05 19:30 ` Josef Bacik
1 sibling, 0 replies; 34+ messages in thread
From: Josef Bacik @ 2024-09-05 19:30 UTC (permalink / raw)
To: Amir Goldstein
Cc: jack, kernel-team, linux-fsdevel, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
On Thu, Sep 05, 2024 at 10:33:07AM +0200, Amir Goldstein wrote:
> On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > v4: https://lore.kernel.org/linux-fsdevel/cover.1723670362.git.josef@toxicpanda.com/
> > v3: https://lore.kernel.org/linux-fsdevel/cover.1723228772.git.josef@toxicpanda.com/
> > v2: https://lore.kernel.org/linux-fsdevel/cover.1723144881.git.josef@toxicpanda.com/
> > v1: https://lore.kernel.org/linux-fsdevel/cover.1721931241.git.josef@toxicpanda.com/
> >
> > v4->v5:
> > - Cleaned up the various "I'll fix it on commit" notes that Jan made since I had
> > to respin the series anyway.
> > - Renamed the filemap pagefault helper for fsnotify per Christians suggestion.
> > - Added a FS_ALLOW_HSM flag per Jan's comments, based on Amir's rough sketch.
> > - Added a patch to disable btrfs defrag on pre-content watched files.
> > - Added a patch to turn on FS_ALLOW_HSM for all the file systems that I tested.
>
> My only nits are about different ordering of the FS_ALLOW_HSM patches
> I guess as the merge window is closing in, Jan could do these trivial
> reorders on commit, based on his preference (?).
>
> > - Added two fstests (which will be posted separately) to validate everything,
> > re-validated the series with btrfs, xfs, ext4, and bcachefs to make sure I
> > didn't break anything.
>
> Very cool!
>
> Thanks again for the "productization" of my patches :)
Thanks for doing all the heavy lifting in the first place! Glad we can move on
to other things from here,
Josef
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v5 03/18] fsnotify: generate pre-content permission event on open
2024-09-04 20:27 ` [PATCH v5 03/18] fsnotify: generate pre-content permission event on open Josef Bacik
@ 2024-10-24 13:06 ` Amir Goldstein
0 siblings, 0 replies; 34+ messages in thread
From: Amir Goldstein @ 2024-10-24 13:06 UTC (permalink / raw)
To: Josef Bacik
Cc: kernel-team, linux-fsdevel, jack, brauner, linux-xfs,
linux-bcachefs, linux-btrfs, linux-mm
On Wed, Sep 4, 2024 at 10:29 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> From: Amir Goldstein <amir73il@gmail.com>
>
> FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on open depending on
> file open mode. The pre-content event will be generated in addition to
> FS_OPEN_PERM, but without sb_writers held and after file was truncated
> in case file was opened with O_CREAT and/or O_TRUNC.
>
> The event will have a range info of (0..0) to provide an opportunity
> to fill entire file content on open.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> Reviewed-by: Christian Brauner <brauner@kernel.org>
> ---
> fs/namei.c | 9 +++++++++
> include/linux/fsnotify.h | 10 +++++++++-
> 2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 3a4c40e12f78..c16487e3742d 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3735,6 +3735,15 @@ static int do_open(struct nameidata *nd,
> }
> if (do_truncate)
> mnt_drop_write(nd->path.mnt);
> +
> + /*
> + * This permission hook is different than fsnotify_open_perm() hook.
> + * This is a pre-content hook that is called without sb_writers held
> + * and after the file was truncated.
> + */
> + if (!error)
> + error = fsnotify_file_perm(file, MAY_OPEN);
> +
Josef,
Please change that for v6 to:
error = fsnotify_file_area_perm(file, MAY_OPEN,
&file->f_pos, 0);
...
> return error;
> }
>
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index 7600a0c045ba..fb3837b8de4c 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -168,6 +168,10 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
> fsnotify_mask = FS_PRE_MODIFY;
> else if (perm_mask & (MAY_READ | MAY_ACCESS))
> fsnotify_mask = FS_PRE_ACCESS;
> + else if (perm_mask & MAY_OPEN && file->f_mode & FMODE_WRITER)
> + fsnotify_mask = FS_PRE_MODIFY;
> + else if (perm_mask & MAY_OPEN)
> + fsnotify_mask = FS_PRE_ACCESS;
> else
> return 0;
>
> @@ -176,10 +180,14 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
>
> /*
> * fsnotify_file_perm - permission hook before file access
> + *
> + * Called from read()/write() with perm_mask MAY_READ/MAY_WRITE.
> + * Called from open() with MAY_OPEN without sb_writers held and after the file
> + * was truncated. Note that this is a different event from fsnotify_open_perm().
> */
> static inline int fsnotify_file_perm(struct file *file, int perm_mask)
> {
> - return fsnotify_file_area_perm(file, perm_mask, NULL, 0);
> + return fsnotify_file_area_perm(file, perm_mask, &file->f_pos, 0);
> }
... and drop this change, because this change will make readdir()
start reporting odd file ranges and HSM won't be able to tell the
difference between an opendir() and a readdir().
And I will send an extra patch for reporting an event with
range [size..size] for truncate(size).
Thanks,
Amir.
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2024-10-24 13:06 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-04 20:27 [PATCH v5 00/18] fanotify: add pre-content hooks Josef Bacik
2024-09-04 20:27 ` [PATCH v5 01/18] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
2024-09-05 7:49 ` Amir Goldstein
2024-09-04 20:27 ` [PATCH v5 02/18] fsnotify: introduce pre-content permission event Josef Bacik
2024-09-04 20:27 ` [PATCH v5 03/18] fsnotify: generate pre-content permission event on open Josef Bacik
2024-10-24 13:06 ` Amir Goldstein
2024-09-04 20:27 ` [PATCH v5 04/18] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
2024-09-04 20:27 ` [PATCH v5 05/18] fanotify: introduce FAN_PRE_MODIFY " Josef Bacik
2024-09-04 20:27 ` [PATCH v5 06/18] fanotify: pass optional file access range in pre-content event Josef Bacik
2024-09-04 20:27 ` [PATCH v5 07/18] fanotify: rename a misnamed constant Josef Bacik
2024-09-04 20:27 ` [PATCH v5 08/18] fanotify: report file range info with pre-content events Josef Bacik
2024-09-04 20:27 ` [PATCH v5 09/18] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
2024-09-04 20:28 ` [PATCH v5 10/18] fs: add a flag to indicate the fs supports pre-content events Josef Bacik
2024-09-05 8:07 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 11/18] fanotify: add a helper to check for pre content events Josef Bacik
2024-09-05 8:09 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 12/18] fanotify: disable readahead if we have pre-content watches Josef Bacik
2024-09-05 8:12 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 13/18] mm: don't allow huge faults for files with pre content watches Josef Bacik
2024-09-05 8:14 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 14/18] fsnotify: generate pre-content permission event on page fault Josef Bacik
2024-09-04 20:28 ` [PATCH v5 15/18] bcachefs: add pre-content fsnotify hook to fault Josef Bacik
2024-09-04 20:28 ` [PATCH v5 16/18] xfs: add pre-content fsnotify hook for write faults Josef Bacik
2024-09-05 8:29 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 17/18] btrfs: disable defrag on pre-content watched files Josef Bacik
2024-09-05 8:23 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 18/18] fs: enable pre-content events on supported file systems Josef Bacik
2024-09-05 8:27 ` Amir Goldstein
2024-09-05 10:36 ` Jan Kara
2024-09-05 8:33 ` [PATCH v5 00/18] fanotify: add pre-content hooks Amir Goldstein
2024-09-05 10:32 ` Jan Kara
2024-09-05 19:30 ` Josef Bacik
2024-09-05 12:08 ` Jan Kara
2024-09-05 19:29 ` Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).