From: Josef Bacik <josef@toxicpanda.com>
To: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>,
kernel-team@fb.com, linux-fsdevel@vger.kernel.org,
brauner@kernel.org
Subject: Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
Date: Tue, 30 Jul 2024 12:51:49 -0400 [thread overview]
Message-ID: <20240730165149.GA3828363@perftesting> (raw)
In-Reply-To: <20240730121837.fixxjcbbu7caxf2s@quack3>
On Tue, Jul 30, 2024 at 02:18:37PM +0200, Jan Kara wrote:
> On Mon 29-07-24 21:57:34, Amir Goldstein wrote:
> > On Mon, Jul 29, 2024 at 8:11 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > > > If I am reading correctly, iomap (i.e. xfs) write shared memory fault
> > > > does not reach this code?
> > > >
> > > > Do we care about writable shared memory faults use case for HSM?
> > > > It does not sound very relevant to HSM, but we cannot just ignore it..
> > > >
> > >
> > > Sorry I realized I went off to try and solve this problem and never responded to
> > > you. I'm addressing the other comments, but this one is a little tricky.
> > >
> > > We're kind of stuck between a rock and a hard place with this. I had originally
> > > put this before the ->fault() callback, but purposefully moved it into
> > > filemap_fault() because I want to be able to drop the mmap lock while we're
> > > waiting for a response from the HSM.
> > >
> > > The reason to do this is because there are things that take the mmap lock for
> > > simple things outside of the process, like /proc/$PID/smaps and other related
> > > things, and this can cause high priority tasks to block behind possibly low
> > > priority IO, creating a priority inversion.
> > >
> > > Now, I'm not sure how widespread of a problem this is anymore, I know there's
> > > been work done to the kernel and tools to avoid this style of problem. I'm ok
> > > with a "try it and see" approach, but I don't love that.
> > >
> >
> > I defer this question to Jan.
> >
> > > However I think putting fsnotify hooks into XFS itself for this particular path
> > > is a good choice either.
> >
> > I think you meant "not a good choice" and I agree -
> > it is not only xfs, but could be any fs that will be converted to iomap
> > Other fs have ->fault != filemap_fault, even if they do end up calling
> > filemap_fault, IOW, there is no API guarantee that they will.
> >
> > > What do you think? Just move it to before ->fault(),
> > > leave the mmap lock in place, and be done with it?
> >
> > If Jan blesses the hook called with mmap lock, then yeh,
> > putting the hook in the most generic "vfs" code would be
> > the best choice for maintenance.
>
> Well, I agree with Josef's comment about a rock and a hard place. For once,
> regardless whether the hook will happen from before ->fault or from inside
> the ->fault handler there will be fault callers where we cannot drop
> mmap_lock (not all archs support dropping mmap_lock inside a fault AFAIR -
> but a quick grep seems to show that these days maybe they do, also some
> callers - most notably through GUP - don't allow dropping of mmap_lock
> inside fault). So we have to have a way to handle a fault without
> FAULT_FLAG_ALLOW_RETRY flag.
>
> Now of course waiting for userspace reply to fanotify event with mmap_lock
> held is ... dangerous. For example consider application with two threads:
>
> T1 T2
> page fault on inode I write to inode I
> lock mm->mmap_lock inode_lock(I)
> send fanotify event ...
> fault_in_iov_iter_readable()
> lock mm->mmap_lock -> blocks
> behind T1
>
> now the HSM handler needs to fill in contents of inode I requested by the
> page fault:
>
> inode_lock(I) -> deadlock
>
> So conceptually I think the flow could look like (in __do_fault):
>
> if (!(vmf->flags & FAULT_FLAG_TRIED) &&
> fsnotify_may_send_pre_content_event()) {
> if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> return VM_FAULT_RETRY;
> fpin = maybe_unlock_mmap_for_io(vmf, NULL);
> if (!fpin)
> return ???VM_FAULT_SIGSEGV???;
> err = fsnotify_fault(...);
> if (err)
> return VM_FAULT_SIGBUS | VM_FAULT_RETRY;
> /*
> * We are fine with proceeding with the fault. Retry the fault
> * to let the filesystem handle it.
> */
> return VM_FAULT_RETRY;
> }
>
> The downside is that if we enter the page fault without ability to drop
> mmap_lock on a file needing HSM handling, we get SIGSEGV. I'm not sure it
> matters in practice because these are not that common paths e.g. stuff like
> putting a breakpoint / uprobe on a file but maybe there are some surprises.
>
The only thing I don't like about this is that now the fault handler loses the
ability to drop the mmap sem. I think in practice this doesn't really matter,
because we're trying to avoid doing IO while under the mmap sem, and presumably
this will have instantiated the pages to avoid the IO.
However if you use reflink this wouldn't be the case, and now we've
re-introduced the priority inversion possiblity.
I'm leaning more towards just putting the fsnotify hook in the xfs code for the
write case, and anybody else who implements their own ->fault without calling
the generic one will just have to do the same.
That being said I'm not going to die on any particular hill here, if you still
want to do the above before the ->fault() handler then I'll update my code to do
that and we can move on. Thanks,
Josef
next prev parent reply other threads:[~2024-07-30 16:51 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
2024-07-25 18:19 ` [PATCH 01/10] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
2024-08-01 17:59 ` Jan Kara
2024-07-25 18:19 ` [PATCH 02/10] fsnotify: introduce pre-content permission event Josef Bacik
2024-08-01 16:31 ` Jan Kara
2024-08-03 16:52 ` Amir Goldstein
2024-10-25 7:55 ` Amir Goldstein
2024-10-25 13:09 ` Jan Kara
2024-10-25 13:39 ` Amir Goldstein
2024-10-26 6:58 ` Amir Goldstein
2024-10-31 12:47 ` Jan Kara
2024-07-25 18:19 ` [PATCH 03/10] fsnotify: generate pre-content permission event on open Josef Bacik
2024-08-01 17:01 ` Jan Kara
2024-08-03 16:53 ` Amir Goldstein
2024-07-25 18:19 ` [PATCH 04/10] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
2024-08-01 17:04 ` Jan Kara
2024-07-25 18:19 ` [PATCH 05/10] fanotify: introduce FAN_PRE_MODIFY " Josef Bacik
2024-08-01 17:09 ` Jan Kara
2024-08-03 16:55 ` Amir Goldstein
2024-08-05 11:18 ` Jan Kara
2024-07-25 18:19 ` [PATCH 06/10] fanotify: pass optional file access range in pre-content event Josef Bacik
2024-08-01 17:16 ` Jan Kara
2024-08-03 17:00 ` Amir Goldstein
2024-08-05 11:20 ` Jan Kara
2024-07-25 18:19 ` [PATCH 07/10] fanotify: rename a misnamed constant Josef Bacik
2024-08-01 17:19 ` Jan Kara
2024-08-01 17:23 ` Jan Kara
2024-07-25 18:19 ` [PATCH 08/10] fanotify: report file range info with pre-content events Josef Bacik
2024-08-01 17:38 ` Jan Kara
2024-10-24 10:06 ` Amir Goldstein
2024-10-24 16:35 ` Jan Kara
2024-10-24 16:49 ` Amir Goldstein
2024-10-24 16:56 ` Jan Kara
2024-07-25 18:19 ` [PATCH 09/10] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
2024-08-01 21:14 ` Jan Kara
2024-08-03 17:06 ` Amir Goldstein
2024-07-25 18:19 ` [PATCH 10/10] fsnotify: generate pre-content permission event on page fault Josef Bacik
2024-07-25 20:19 ` Amir Goldstein
2024-07-29 17:11 ` Josef Bacik
2024-07-29 18:57 ` Amir Goldstein
2024-07-30 12:18 ` Jan Kara
2024-07-30 16:51 ` Josef Bacik [this message]
2024-08-01 21:34 ` Jan Kara
2024-08-01 21:40 ` Jan Kara
2024-08-02 16:03 ` Josef Bacik
2024-08-05 12:13 ` Jan Kara
2024-08-07 19:04 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240730165149.GA3828363@perftesting \
--to=josef@toxicpanda.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=kernel-team@fb.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).