From: Josef Bacik <josef@toxicpanda.com>
To: kernel-team@fb.com, linux-fsdevel@vger.kernel.org, jack@suse.cz,
amir73il@gmail.com, brauner@kernel.org,
linux-xfs@vger.kernel.org, linux-bcachefs@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-mm@kvack.org
Subject: [PATCH v5 00/18] fanotify: add pre-content hooks
Date: Wed, 4 Sep 2024 16:27:50 -0400 [thread overview]
Message-ID: <cover.1725481503.git.josef@toxicpanda.com> (raw)
v4: https://lore.kernel.org/linux-fsdevel/cover.1723670362.git.josef@toxicpanda.com/
v3: https://lore.kernel.org/linux-fsdevel/cover.1723228772.git.josef@toxicpanda.com/
v2: https://lore.kernel.org/linux-fsdevel/cover.1723144881.git.josef@toxicpanda.com/
v1: https://lore.kernel.org/linux-fsdevel/cover.1721931241.git.josef@toxicpanda.com/
v4->v5:
- Cleaned up the various "I'll fix it on commit" notes that Jan made since I had
to respin the series anyway.
- Renamed the filemap pagefault helper for fsnotify per Christians suggestion.
- Added a FS_ALLOW_HSM flag per Jan's comments, based on Amir's rough sketch.
- Added a patch to disable btrfs defrag on pre-content watched files.
- Added a patch to turn on FS_ALLOW_HSM for all the file systems that I tested.
- Added two fstests (which will be posted separately) to validate everything,
re-validated the series with btrfs, xfs, ext4, and bcachefs to make sure I
didn't break anything.
v3->v4:
- Trying to send a final verson Friday at 5pm before you go on vacation is a
recipe for silly mistakes, fixed the xfs handling yet again, per Christoph's
review.
- Reworked the file system helper so it's handling of fpin was a little less
silly, per Chinner's suggestion.
- Updated the return values to not or in VM_FAULT_RETRY, as we have a comment
in filemap_fault that says if VM_FAULT_ERROR is set we won't have
VM_FAULT_RETRY set.
v2->v3:
- Fix the pagefault path to do MAY_ACCESS instead, updated the perm handler to
emit PRE_ACCESS in this case, so we can avoid the extraneous perm event as per
Amir's suggestion.
- Reworked the exported helper so the per-filesystem changes are much smaller,
per Amir's suggestion.
- Fixed the screwup for DAX writes per Chinner's suggestion.
- Added Christian's reviewed-by's where appropriate.
v1->v2:
- reworked the page fault logic based on Jan's suggestion and turned it into a
helper.
- Added 3 patches per-fs where we need to call the fsnotify helper from their
->fault handlers.
- Disabled readahead in the case that there's a pre-content watch in place.
- Disabled huge faults when there's a pre-content watch in place (entirely
because it's untested, theoretically it should be straightforward to do).
- Updated the command numbers.
- Addressed the random spelling/grammer mistakes that Jan pointed out.
- Addressed the other random nits from Jan.
--- Original email ---
Hello,
These are the patches for the bare bones pre-content fanotify support. The
majority of this work is Amir's, my contribution to this has solely been around
adding the page fault hooks, testing and validating everything. I'm sending it
because Amir is traveling a bunch, and I touched it last so I'm going to take
all the hate and he can take all the credit.
There is a PoC that I've been using to validate this work, you can find the git
repo here
https://github.com/josefbacik/remote-fetch
This consists of 3 different tools.
1. populate. This just creates all the stub files in the directory from the
source directory. Just run ./populate ~/linux ~/hsm-linux and it'll
recursively create all of the stub files and directories.
2. remote-fetch. This is the actual PoC, you just point it at the source and
destination directory and then you can do whatever. ./remote-fetch ~/linux
~/hsm-linux.
3. mmap-validate. This was to validate the pagefault thing, this is likely what
will be turned into the selftest with remote-fetch. It creates a file and
then you can validate the file matches the right pattern with both normal
reads and mmap. Normally I do something like
./mmap-validate create ~/src/foo
./populate ~/src ~/dst
./rmeote-fetch ~/src ~/dst
./mmap-validate validate ~/dst/foo
I did a bunch of testing, I also got some performance numbers. I copied a
kernel tree, and then did remote-fetch, and then make -j4
Normal
real 9m49.709s
user 28m11.372s
sys 4m57.304s
HSM
real 10m6.454s
user 29m10.517s
sys 5m2.617s
So ~17 seconds more to build with HSM. I then did a make mrproper on both trees
to see the size
[root@fedora ~]# du -hs /src/linux
1.6G /src/linux
[root@fedora ~]# du -hs dst
125M dst
This mirrors the sort of savings we've seen in production.
Meta has had these patches (minus the page fault patch) deployed in production
for almost a year with our own utility for doing on-demand package fetching.
The savings from this has been pretty significant.
The page-fault hooks are necessary for the last thing we need, which is
on-demand range fetching of executables. Some of our binaries are several gigs
large, having the ability to remote fetch them on demand is a huge win for us
not only with space savings, but with startup time of containers.
There will be tests for this going into LTP once we're satisfied with the
patches and they're on their way upstream. Thanks,
Josef
Amir Goldstein (8):
fsnotify: introduce pre-content permission event
fsnotify: generate pre-content permission event on open
fanotify: introduce FAN_PRE_ACCESS permission event
fanotify: introduce FAN_PRE_MODIFY permission event
fanotify: pass optional file access range in pre-content event
fanotify: rename a misnamed constant
fanotify: report file range info with pre-content events
fanotify: allow to set errno in FAN_DENY permission response
Josef Bacik (10):
fanotify: don't skip extra event info if no info_mode is set
fs: add a flag to indicate the fs supports pre-content events
fanotify: add a helper to check for pre content events
fanotify: disable readahead if we have pre-content watches
mm: don't allow huge faults for files with pre content watches
fsnotify: generate pre-content permission event on page fault
bcachefs: add pre-content fsnotify hook to fault
xfs: add pre-content fsnotify hook for write faults
btrfs: disable defrag on pre-content watched files
fs: enable pre-content events on supported file systems
fs/bcachefs/fs-io-pagecache.c | 4 +
fs/bcachefs/fs.c | 2 +-
fs/btrfs/ioctl.c | 9 ++
fs/btrfs/super.c | 3 +-
fs/ext4/super.c | 6 +-
fs/namei.c | 9 ++
fs/notify/fanotify/fanotify.c | 33 ++++++--
fs/notify/fanotify/fanotify.h | 15 ++++
fs/notify/fanotify/fanotify_user.c | 119 ++++++++++++++++++++++-----
fs/notify/fsnotify.c | 17 +++-
fs/xfs/xfs_file.c | 4 +
fs/xfs/xfs_super.c | 2 +-
include/linux/fanotify.h | 20 +++--
include/linux/fs.h | 1 +
include/linux/fsnotify.h | 58 +++++++++++--
include/linux/fsnotify_backend.h | 59 ++++++++++++-
include/linux/mm.h | 1 +
include/uapi/linux/fanotify.h | 18 ++++
mm/filemap.c | 128 +++++++++++++++++++++++++++--
mm/memory.c | 22 +++++
mm/readahead.c | 13 +++
security/selinux/hooks.c | 3 +-
22 files changed, 489 insertions(+), 57 deletions(-)
--
2.43.0
next reply other threads:[~2024-09-04 20:29 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-04 20:27 Josef Bacik [this message]
2024-09-04 20:27 ` [PATCH v5 01/18] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
2024-09-05 7:49 ` Amir Goldstein
2024-09-04 20:27 ` [PATCH v5 02/18] fsnotify: introduce pre-content permission event Josef Bacik
2024-09-04 20:27 ` [PATCH v5 03/18] fsnotify: generate pre-content permission event on open Josef Bacik
2024-10-24 13:06 ` Amir Goldstein
2024-09-04 20:27 ` [PATCH v5 04/18] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
2024-09-04 20:27 ` [PATCH v5 05/18] fanotify: introduce FAN_PRE_MODIFY " Josef Bacik
2024-09-04 20:27 ` [PATCH v5 06/18] fanotify: pass optional file access range in pre-content event Josef Bacik
2024-09-04 20:27 ` [PATCH v5 07/18] fanotify: rename a misnamed constant Josef Bacik
2024-09-04 20:27 ` [PATCH v5 08/18] fanotify: report file range info with pre-content events Josef Bacik
2024-09-04 20:27 ` [PATCH v5 09/18] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
2024-09-04 20:28 ` [PATCH v5 10/18] fs: add a flag to indicate the fs supports pre-content events Josef Bacik
2024-09-05 8:07 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 11/18] fanotify: add a helper to check for pre content events Josef Bacik
2024-09-05 8:09 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 12/18] fanotify: disable readahead if we have pre-content watches Josef Bacik
2024-09-05 8:12 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 13/18] mm: don't allow huge faults for files with pre content watches Josef Bacik
2024-09-05 8:14 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 14/18] fsnotify: generate pre-content permission event on page fault Josef Bacik
2024-09-04 20:28 ` [PATCH v5 15/18] bcachefs: add pre-content fsnotify hook to fault Josef Bacik
2024-09-04 20:28 ` [PATCH v5 16/18] xfs: add pre-content fsnotify hook for write faults Josef Bacik
2024-09-05 8:29 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 17/18] btrfs: disable defrag on pre-content watched files Josef Bacik
2024-09-05 8:23 ` Amir Goldstein
2024-09-04 20:28 ` [PATCH v5 18/18] fs: enable pre-content events on supported file systems Josef Bacik
2024-09-05 8:27 ` Amir Goldstein
2024-09-05 10:36 ` Jan Kara
2024-09-05 8:33 ` [PATCH v5 00/18] fanotify: add pre-content hooks Amir Goldstein
2024-09-05 10:32 ` Jan Kara
2024-09-05 19:30 ` Josef Bacik
2024-09-05 12:08 ` Jan Kara
2024-09-05 19:29 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1725481503.git.josef@toxicpanda.com \
--to=josef@toxicpanda.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=kernel-team@fb.com \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).