From: Jan Kara <jack@suse.cz>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Josef Bacik <josef@toxicpanda.com>,
jack@suse.cz, kernel-team@fb.com, linux-fsdevel@vger.kernel.org,
brauner@kernel.org
Subject: Re: [PATCH 10/10] fsnotify: generate pre-content permission event on page fault
Date: Tue, 30 Jul 2024 14:18:37 +0200 [thread overview]
Message-ID: <20240730121837.fixxjcbbu7caxf2s@quack3> (raw)
In-Reply-To: <CAOQ4uxjjBiPkg9uxyW12Xd+GZ7t3aP1m9Ayzr8WzqryfqK1x3g@mail.gmail.com>
On Mon 29-07-24 21:57:34, Amir Goldstein wrote:
> On Mon, Jul 29, 2024 at 8:11 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > > If I am reading correctly, iomap (i.e. xfs) write shared memory fault
> > > does not reach this code?
> > >
> > > Do we care about writable shared memory faults use case for HSM?
> > > It does not sound very relevant to HSM, but we cannot just ignore it..
> > >
> >
> > Sorry I realized I went off to try and solve this problem and never responded to
> > you. I'm addressing the other comments, but this one is a little tricky.
> >
> > We're kind of stuck between a rock and a hard place with this. I had originally
> > put this before the ->fault() callback, but purposefully moved it into
> > filemap_fault() because I want to be able to drop the mmap lock while we're
> > waiting for a response from the HSM.
> >
> > The reason to do this is because there are things that take the mmap lock for
> > simple things outside of the process, like /proc/$PID/smaps and other related
> > things, and this can cause high priority tasks to block behind possibly low
> > priority IO, creating a priority inversion.
> >
> > Now, I'm not sure how widespread of a problem this is anymore, I know there's
> > been work done to the kernel and tools to avoid this style of problem. I'm ok
> > with a "try it and see" approach, but I don't love that.
> >
>
> I defer this question to Jan.
>
> > However I think putting fsnotify hooks into XFS itself for this particular path
> > is a good choice either.
>
> I think you meant "not a good choice" and I agree -
> it is not only xfs, but could be any fs that will be converted to iomap
> Other fs have ->fault != filemap_fault, even if they do end up calling
> filemap_fault, IOW, there is no API guarantee that they will.
>
> > What do you think? Just move it to before ->fault(),
> > leave the mmap lock in place, and be done with it?
>
> If Jan blesses the hook called with mmap lock, then yeh,
> putting the hook in the most generic "vfs" code would be
> the best choice for maintenance.
Well, I agree with Josef's comment about a rock and a hard place. For once,
regardless whether the hook will happen from before ->fault or from inside
the ->fault handler there will be fault callers where we cannot drop
mmap_lock (not all archs support dropping mmap_lock inside a fault AFAIR -
but a quick grep seems to show that these days maybe they do, also some
callers - most notably through GUP - don't allow dropping of mmap_lock
inside fault). So we have to have a way to handle a fault without
FAULT_FLAG_ALLOW_RETRY flag.
Now of course waiting for userspace reply to fanotify event with mmap_lock
held is ... dangerous. For example consider application with two threads:
T1 T2
page fault on inode I write to inode I
lock mm->mmap_lock inode_lock(I)
send fanotify event ...
fault_in_iov_iter_readable()
lock mm->mmap_lock -> blocks
behind T1
now the HSM handler needs to fill in contents of inode I requested by the
page fault:
inode_lock(I) -> deadlock
So conceptually I think the flow could look like (in __do_fault):
if (!(vmf->flags & FAULT_FLAG_TRIED) &&
fsnotify_may_send_pre_content_event()) {
if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
return VM_FAULT_RETRY;
fpin = maybe_unlock_mmap_for_io(vmf, NULL);
if (!fpin)
return ???VM_FAULT_SIGSEGV???;
err = fsnotify_fault(...);
if (err)
return VM_FAULT_SIGBUS | VM_FAULT_RETRY;
/*
* We are fine with proceeding with the fault. Retry the fault
* to let the filesystem handle it.
*/
return VM_FAULT_RETRY;
}
The downside is that if we enter the page fault without ability to drop
mmap_lock on a file needing HSM handling, we get SIGSEGV. I'm not sure it
matters in practice because these are not that common paths e.g. stuff like
putting a breakpoint / uprobe on a file but maybe there are some surprises.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2024-07-30 12:18 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-25 18:19 [PATCH 00/10] fanotify: add pre-content hooks Josef Bacik
2024-07-25 18:19 ` [PATCH 01/10] fanotify: don't skip extra event info if no info_mode is set Josef Bacik
2024-08-01 17:59 ` Jan Kara
2024-07-25 18:19 ` [PATCH 02/10] fsnotify: introduce pre-content permission event Josef Bacik
2024-08-01 16:31 ` Jan Kara
2024-08-03 16:52 ` Amir Goldstein
2024-10-25 7:55 ` Amir Goldstein
2024-10-25 13:09 ` Jan Kara
2024-10-25 13:39 ` Amir Goldstein
2024-10-26 6:58 ` Amir Goldstein
2024-10-31 12:47 ` Jan Kara
2024-07-25 18:19 ` [PATCH 03/10] fsnotify: generate pre-content permission event on open Josef Bacik
2024-08-01 17:01 ` Jan Kara
2024-08-03 16:53 ` Amir Goldstein
2024-07-25 18:19 ` [PATCH 04/10] fanotify: introduce FAN_PRE_ACCESS permission event Josef Bacik
2024-08-01 17:04 ` Jan Kara
2024-07-25 18:19 ` [PATCH 05/10] fanotify: introduce FAN_PRE_MODIFY " Josef Bacik
2024-08-01 17:09 ` Jan Kara
2024-08-03 16:55 ` Amir Goldstein
2024-08-05 11:18 ` Jan Kara
2024-07-25 18:19 ` [PATCH 06/10] fanotify: pass optional file access range in pre-content event Josef Bacik
2024-08-01 17:16 ` Jan Kara
2024-08-03 17:00 ` Amir Goldstein
2024-08-05 11:20 ` Jan Kara
2024-07-25 18:19 ` [PATCH 07/10] fanotify: rename a misnamed constant Josef Bacik
2024-08-01 17:19 ` Jan Kara
2024-08-01 17:23 ` Jan Kara
2024-07-25 18:19 ` [PATCH 08/10] fanotify: report file range info with pre-content events Josef Bacik
2024-08-01 17:38 ` Jan Kara
2024-10-24 10:06 ` Amir Goldstein
2024-10-24 16:35 ` Jan Kara
2024-10-24 16:49 ` Amir Goldstein
2024-10-24 16:56 ` Jan Kara
2024-07-25 18:19 ` [PATCH 09/10] fanotify: allow to set errno in FAN_DENY permission response Josef Bacik
2024-08-01 21:14 ` Jan Kara
2024-08-03 17:06 ` Amir Goldstein
2024-07-25 18:19 ` [PATCH 10/10] fsnotify: generate pre-content permission event on page fault Josef Bacik
2024-07-25 20:19 ` Amir Goldstein
2024-07-29 17:11 ` Josef Bacik
2024-07-29 18:57 ` Amir Goldstein
2024-07-30 12:18 ` Jan Kara [this message]
2024-07-30 16:51 ` Josef Bacik
2024-08-01 21:34 ` Jan Kara
2024-08-01 21:40 ` Jan Kara
2024-08-02 16:03 ` Josef Bacik
2024-08-05 12:13 ` Jan Kara
2024-08-07 19:04 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240730121837.fixxjcbbu7caxf2s@quack3 \
--to=jack@suse.cz \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).