All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: kernel-team@fb.com, linux-fsdevel@vger.kernel.org, jack@suse.cz,
	brauner@kernel.org
Subject: Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
Date: Mon, 29 Jul 2024 13:11:20 -0400	[thread overview]
Message-ID: <20240729171120.GB3596468@perftesting> (raw)
In-Reply-To: <CAOQ4uxgXEzT=Buwu8SOkQG+2qcObmdH4NgsGme8bECObiobfTQ@mail.gmail.com>

On Thu, Jul 25, 2024 at 11:19:33PM +0300, Amir Goldstein wrote:
> On Thu, Jul 25, 2024 at 9:20 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > FS_PRE_ACCESS or FS_PRE_MODIFY will be generated on page fault depending
> > on the faulting method.
> >
> > This pre-content event is meant to be used by hierarchical storage
> > managers that want to fill in the file content on first read access.
> >
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/notify/fsnotify.c             | 13 +++++++++
> >  include/linux/fsnotify_backend.h | 14 +++++++++
> >  mm/filemap.c                     | 50 ++++++++++++++++++++++++++++----
> >  3 files changed, 71 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> > index 1ca4a8da7f29..435232d46b4f 100644
> > --- a/fs/notify/fsnotify.c
> > +++ b/fs/notify/fsnotify.c
> > @@ -28,6 +28,19 @@ void __fsnotify_vfsmount_delete(struct vfsmount *mnt)
> >         fsnotify_clear_marks_by_mount(mnt);
> >  }
> >
> > +bool fsnotify_file_has_content_watches(struct file *file)
> 
> nit: has_pre_content_watches...
> 
> > +{
> > +       struct inode *inode = file_inode(file);
> > +       struct super_block *sb = inode->i_sb;
> > +       struct mount *mnt = real_mount(file->f_path.mnt);
> > +       u32 mask = inode->i_fsnotify_mask;
> > +
> > +       mask |= mnt->mnt_fsnotify_mask;
> > +       mask |= sb->s_fsnotify_mask;
> > +
> > +       return !!(mask & FSNOTIFY_PRE_CONTENT_EVENTS);
> 
> This can use the fsnotify_object_watched() helper, and it will need
> the READ_ONCE() that are just being added to avoid data races.
> 
> > +}
> > +
> >  /**
> >   * fsnotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
> >   * @sb: superblock being unmounted.
> > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> > index 36c3d18cc40a..6983fbf096b8 100644
> > --- a/include/linux/fsnotify_backend.h
> > +++ b/include/linux/fsnotify_backend.h
> > @@ -900,6 +900,15 @@ static inline void fsnotify_init_event(struct fsnotify_event *event)
> >         INIT_LIST_HEAD(&event->list);
> >  }
> >
> > +#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> > +bool fsnotify_file_has_content_watches(struct file *file);
> > +#else
> > +static inline bool fsnotify_file_has_content_watches(struct file *file)
> > +{
> > +       return false;
> > +}
> > +#endif /* CONFIG_FANOTIFY_ACCESS_PERMISSIONS */
> > +
> >  #else
> >
> >  static inline int fsnotify(__u32 mask, const void *data, int data_type,
> > @@ -938,6 +947,11 @@ static inline u32 fsnotify_get_cookie(void)
> >  static inline void fsnotify_unmount_inodes(struct super_block *sb)
> >  {}
> >
> > +static inline bool fsnotify_file_has_content_watches(struct file *file)
> > +{
> > +       return false;
> > +}
> > +
> >  #endif /* CONFIG_FSNOTIFY */
> >
> >  #endif /* __KERNEL __ */
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index ca8c8d889eef..cc9d7885bbe3 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -46,6 +46,7 @@
> >  #include <linux/pipe_fs_i.h>
> >  #include <linux/splice.h>
> >  #include <linux/rcupdate_wait.h>
> > +#include <linux/fsnotify.h>
> >  #include <asm/pgalloc.h>
> >  #include <asm/tlbflush.h>
> >  #include "internal.h"
> > @@ -3112,13 +3113,13 @@ static int lock_folio_maybe_drop_mmap(struct vm_fault *vmf, struct folio *folio,
> >   * that.  If we didn't pin a file then we return NULL.  The file that is
> >   * returned needs to be fput()'ed when we're done with it.
> >   */
> > -static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
> > +static struct file *do_sync_mmap_readahead(struct vm_fault *vmf,
> > +                                          struct file *fpin)
> >  {
> >         struct file *file = vmf->vma->vm_file;
> >         struct file_ra_state *ra = &file->f_ra;
> >         struct address_space *mapping = file->f_mapping;
> >         DEFINE_READAHEAD(ractl, file, ra, mapping, vmf->pgoff);
> > -       struct file *fpin = NULL;
> >         unsigned long vm_flags = vmf->vma->vm_flags;
> >         unsigned int mmap_miss;
> >
> > @@ -3182,12 +3183,12 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
> >   * was pinned if we have to drop the mmap_lock in order to do IO.
> >   */
> >  static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
> > -                                           struct folio *folio)
> > +                                           struct folio *folio,
> > +                                           struct file *fpin)
> >  {
> 
> If I am reading correctly, iomap (i.e. xfs) write shared memory fault
> does not reach this code?
> 
> Do we care about writable shared memory faults use case for HSM?
> It does not sound very relevant to HSM, but we cannot just ignore it..
> 

Sorry I realized I went off to try and solve this problem and never responded to
you.  I'm addressing the other comments, but this one is a little tricky.

We're kind of stuck between a rock and a hard place with this.  I had originally
put this before the ->fault() callback, but purposefully moved it into
filemap_fault() because I want to be able to drop the mmap lock while we're
waiting for a response from the HSM.

The reason to do this is because there are things that take the mmap lock for
simple things outside of the process, like /proc/$PID/smaps and other related
things, and this can cause high priority tasks to block behind possibly low
priority IO, creating a priority inversion.

Now, I'm not sure how widespread of a problem this is anymore, I know there's
been work done to the kernel and tools to avoid this style of problem.  I'm ok
with a "try it and see" approach, but I don't love that.

However I think putting fsnotify hooks into XFS itself for this particular path
is a good choice either.  What do you think?  Just move it to before ->fault(),
leave the mmap lock in place, and be done with it?  Thanks,

Josef

  reply	other threads:[~2024-07-29 17:11 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
2024-07-25 18:19 ` [PATCH 01/10] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
2024-08-01 17:59   ` Jan Kara
2024-07-25 18:19 ` [PATCH 02/10] fsnotify: introduce pre-content permission event Josef Bacik
2024-08-01 16:31   ` Jan Kara
2024-08-03 16:52     ` Amir Goldstein
2024-10-25  7:55       ` Amir Goldstein
2024-10-25 13:09         ` Jan Kara
2024-10-25 13:39           ` Amir Goldstein
2024-10-26  6:58             ` Amir Goldstein
2024-10-31 12:47               ` Jan Kara
2024-07-25 18:19 ` [PATCH 03/10] fsnotify: generate pre-content permission event on open Josef Bacik
2024-08-01 17:01   ` Jan Kara
2024-08-03 16:53     ` Amir Goldstein
2024-07-25 18:19 ` [PATCH 04/10] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
2024-08-01 17:04   ` Jan Kara
2024-07-25 18:19 ` [PATCH 05/10] fanotify: introduce FAN_PRE_MODIFY " Josef Bacik
2024-08-01 17:09   ` Jan Kara
2024-08-03 16:55     ` Amir Goldstein
2024-08-05 11:18       ` Jan Kara
2024-07-25 18:19 ` [PATCH 06/10] fanotify: pass optional file access range in pre-content event Josef Bacik
2024-08-01 17:16   ` Jan Kara
2024-08-03 17:00     ` Amir Goldstein
2024-08-05 11:20       ` Jan Kara
2024-07-25 18:19 ` [PATCH 07/10] fanotify: rename a misnamed constant Josef Bacik
2024-08-01 17:19   ` Jan Kara
2024-08-01 17:23     ` Jan Kara
2024-07-25 18:19 ` [PATCH 08/10] fanotify: report file range info with pre-content events Josef Bacik
2024-08-01 17:38   ` Jan Kara
2024-10-24 10:06     ` Amir Goldstein
2024-10-24 16:35       ` Jan Kara
2024-10-24 16:49         ` Amir Goldstein
2024-10-24 16:56           ` Jan Kara
2024-07-25 18:19 ` [PATCH 09/10] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
2024-08-01 21:14   ` Jan Kara
2024-08-03 17:06     ` Amir Goldstein
2024-07-25 18:19 ` [PATCH 10/10] fsnotify: generate pre-content permission event on page fault Josef Bacik
2024-07-25 20:19   ` Amir Goldstein
2024-07-29 17:11     ` Josef Bacik [this message]
2024-07-29 18:57       ` Amir Goldstein
2024-07-30 12:18         ` Jan Kara
2024-07-30 16:51           ` Josef Bacik
2024-08-01 21:34             ` Jan Kara
2024-07-26  5:57   ` kernel test robot
2024-07-26  6:30   ` kernel test robot
2024-08-01 21:40   ` Jan Kara
2024-08-02 16:03     ` Josef Bacik
2024-08-05 12:13       ` Jan Kara
2024-08-07 19:04         ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240729171120.GB3596468@perftesting \
    --to=josef@toxicpanda.com \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.