linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix for potential deadlock in pre-content event
@ 2025-03-09 11:52 Amir Goldstein
  0 siblings, 0 replies; 8+ messages in thread
From: Amir Goldstein @ 2025-03-09 11:52 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, Christian Brauner, linux-fsdevel

Jan,

Please consider this fix for the lockdep issue reported by syzbot.

Thanks,
Amir.

Amir Goldstein (2):
  fsnotify: remove check if file is actually being watched for
    pre-content events on open
  fsnotify: avoid pre-content events when faulting in user pages

 fs/notify/fsnotify.c | 29 ++++-------------------------
 include/linux/fs.h   | 18 +++++++++++++++++-
 2 files changed, 21 insertions(+), 26 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 0/2] Fix for potential deadlock in pre-content event
@ 2025-03-11 11:41 Amir Goldstein
  2025-03-11 11:41 ` [PATCH 1/2] fsnotify: add pre-content hooks on mmap() Amir Goldstein
  2025-03-11 11:41 ` [PATCH 2/2] fsnotify: avoid pre-content events when faulting in user pages Amir Goldstein
  0 siblings, 2 replies; 8+ messages in thread
From: Amir Goldstein @ 2025-03-11 11:41 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, Christian Brauner, linux-fsdevel

Jan,

This is the alternative solution proposed by Josef to solve the
deadlock.

I've added test coverage to mmap() pre-content events and verified
no pre-content events on page fault [1].

I went back and forward about allowing pre-content events on page fault
for __FMODE_EXEC and I have actually tested this variant, but because I
do not have test coverage for Josef's large executables use case and
because it is late in the rc cycle, I decided to disable pre-content
hooks in page faults temporarily, but leave the code in to allow Josef
the opportunity to re-enable the hooks for __FMODE_EXEC with a separate
patch after more testing.

Thanks,
Amir.

[1] https://github.com/amir73il/ltp/commits/fan_hsm/


Amir Goldstein (2):
  fsnotify: add pre-content hooks on mmap()
  fsnotify: avoid pre-content events when faulting in user pages

 include/linux/fsnotify.h | 11 ++++++++---
 mm/filemap.c             |  3 +--
 mm/mmap.c                | 12 ++++++++++++
 mm/util.c                |  7 +++++++
 4 files changed, 28 insertions(+), 5 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] fsnotify: add pre-content hooks on mmap()
  2025-03-11 11:41 [PATCH 0/2] Fix for potential deadlock in pre-content event Amir Goldstein
@ 2025-03-11 11:41 ` Amir Goldstein
  2025-03-11 12:33   ` Lorenzo Stoakes
  2025-03-11 11:41 ` [PATCH 2/2] fsnotify: avoid pre-content events when faulting in user pages Amir Goldstein
  1 sibling, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2025-03-11 11:41 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, Christian Brauner, linux-fsdevel

Pre-content hooks in page faults introduces potential deadlock of HSM
handler in userspace with filesystem freezing.

The requirement with pre-content event is that for every accessed file
range an event covering at least this range will be generated at least
once before the file data is accesses.

In preparation to disabling pre-content event hooks on page faults,
change those hooks to always use the mask MAY_ACCESS and add pre-content
hooks at mmap() variants for the entire mmaped range, so HSM can fill
content when user requests to map a portion of the file.

Note that exec() variant also calls vm_mmap_pgoff() internally to map
code sections, so pre-content hooks are also generated in this case.

Link: https://lore.kernel.org/linux-fsdevel/7ehxrhbvehlrjwvrduoxsao5k3x4aw275patsb3krkwuq573yv@o2hskrfawbnc/
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 mm/filemap.c |  3 +--
 mm/mmap.c    | 12 ++++++++++++
 mm/util.c    |  7 +++++++
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 2974691fdfad2..f85d288209b44 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3350,7 +3350,6 @@ static vm_fault_t filemap_fault_recheck_pte_none(struct vm_fault *vmf)
 vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf)
 {
 	struct file *fpin = NULL;
-	int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_ACCESS;
 	loff_t pos = vmf->pgoff >> PAGE_SHIFT;
 	size_t count = PAGE_SIZE;
 	int err;
@@ -3370,7 +3369,7 @@ vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf)
 	if (!fpin)
 		return VM_FAULT_SIGBUS;
 
-	err = fsnotify_file_area_perm(fpin, mask, &pos, count);
+	err = fsnotify_file_area_perm(fpin, MAY_ACCESS, &pos, count);
 	fput(fpin);
 	if (err)
 		return VM_FAULT_SIGBUS;
diff --git a/mm/mmap.c b/mm/mmap.c
index cda01071c7b1f..70318936fd588 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -48,6 +48,7 @@
 #include <linux/sched/mm.h>
 #include <linux/ksm.h>
 #include <linux/memfd.h>
+#include <linux/fsnotify.h>
 
 #include <linux/uaccess.h>
 #include <asm/cacheflush.h>
@@ -1151,6 +1152,17 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 		return ret;
 	}
 
+	if (file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) {
+		int mask = (prot & PROT_WRITE) ? MAY_WRITE : MAY_READ;
+		loff_t pos = pgoff >> PAGE_SHIFT;
+
+		ret = fsnotify_file_area_perm(file, mask, &pos, size);
+		if (ret) {
+			fput(file);
+			return ret;
+		}
+	}
+
 	ret = -EINVAL;
 
 	/* OK security check passed, take write lock + let it rip. */
diff --git a/mm/util.c b/mm/util.c
index b6b9684a14388..2dddeabac6098 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -23,6 +23,7 @@
 #include <linux/processor.h>
 #include <linux/sizes.h>
 #include <linux/compat.h>
+#include <linux/fsnotify.h>
 
 #include <linux/uaccess.h>
 
@@ -569,6 +570,12 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	LIST_HEAD(uf);
 
 	ret = security_mmap_file(file, prot, flag);
+	if (!ret && file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) {
+		int mask = (prot & PROT_WRITE) ? MAY_WRITE : MAY_READ;
+		loff_t pos = pgoff >> PAGE_SHIFT;
+
+		ret = fsnotify_file_area_perm(file, mask, &pos, len);
+	}
 	if (!ret) {
 		if (mmap_write_lock_killable(mm))
 			return -EINTR;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] fsnotify: avoid pre-content events when faulting in user pages
  2025-03-11 11:41 [PATCH 0/2] Fix for potential deadlock in pre-content event Amir Goldstein
  2025-03-11 11:41 ` [PATCH 1/2] fsnotify: add pre-content hooks on mmap() Amir Goldstein
@ 2025-03-11 11:41 ` Amir Goldstein
  2025-03-11 12:37   ` Lorenzo Stoakes
  1 sibling, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2025-03-11 11:41 UTC (permalink / raw)
  To: Jan Kara; +Cc: Josef Bacik, Christian Brauner, linux-fsdevel

In the use case of buffered write whose input buffer is mmapped file on a
filesystem with a pre-content mark, the prefaulting of the buffer can
happen under the filesystem freeze protection (obtained in vfs_write())
which breaks assumptions of pre-content hook and introduces potential
deadlock of HSM handler in userspace with filesystem freezing.

Now that we have pre-content hooks at file mmap() time, disable the
pre-content event hooks on page fault to avoid the potential deadlock.

Leave the code of pre-content hooks in page fault because we may want
to re-enable them on executables or user mapped files under certain
conditions after resolving the potential deadlocks.

Reported-by: syzbot+7229071b47908b19d5b7@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-fsdevel/7ehxrhbvehlrjwvrduoxsao5k3x4aw275patsb3krkwuq573yv@o2hskrfawbnc/
Fixes: 8392bc2ff8c8b ("fsnotify: generate pre-content permission event on page fault")
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: syzbot+7229071b47908b19d5b7@syzkaller.appspotmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 include/linux/fsnotify.h | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 6a33288bd6a1f..796dacceec488 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -137,6 +137,14 @@ void file_set_fsnotify_mode_from_watchers(struct file *file);
 static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
 					  const loff_t *ppos, size_t count)
 {
+	/*
+	 * Temporarily disable pre-content hooks from page faults (MAY_ACCESS).
+	 * We may bring them back later either only to executables or to user
+	 * mapped files under some conditions.
+	 */
+	if (!(perm_mask & (MAY_READ | MAY_WRITE)))
+		return 0;
+
 	/*
 	 * filesystem may be modified in the context of permission events
 	 * (e.g. by HSM filling a file on access), so sb freeze protection
@@ -144,9 +152,6 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
 	 */
 	lockdep_assert_once(file_write_not_started(file));
 
-	if (!(perm_mask & (MAY_READ | MAY_WRITE | MAY_ACCESS)))
-		return 0;
-
 	if (likely(!FMODE_FSNOTIFY_PERM(file->f_mode)))
 		return 0;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] fsnotify: add pre-content hooks on mmap()
  2025-03-11 11:41 ` [PATCH 1/2] fsnotify: add pre-content hooks on mmap() Amir Goldstein
@ 2025-03-11 12:33   ` Lorenzo Stoakes
  2025-03-11 14:37     ` Amir Goldstein
  0 siblings, 1 reply; 8+ messages in thread
From: Lorenzo Stoakes @ 2025-03-11 12:33 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Josef Bacik, Christian Brauner, linux-fsdevel

On Tue, Mar 11, 2025 at 12:41:52PM +0100, Amir Goldstein wrote:
> Pre-content hooks in page faults introduces potential deadlock of HSM
> handler in userspace with filesystem freezing.
>
> The requirement with pre-content event is that for every accessed file
> range an event covering at least this range will be generated at least
> once before the file data is accesses.
>
> In preparation to disabling pre-content event hooks on page faults,
> change those hooks to always use the mask MAY_ACCESS and add pre-content
> hooks at mmap() variants for the entire mmaped range, so HSM can fill
> content when user requests to map a portion of the file.
>
> Note that exec() variant also calls vm_mmap_pgoff() internally to map
> code sections, so pre-content hooks are also generated in this case.
>
> Link: https://lore.kernel.org/linux-fsdevel/7ehxrhbvehlrjwvrduoxsao5k3x4aw275patsb3krkwuq573yv@o2hskrfawbnc/
> Suggested-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  mm/filemap.c |  3 +--
>  mm/mmap.c    | 12 ++++++++++++
>  mm/util.c    |  7 +++++++
>  3 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 2974691fdfad2..f85d288209b44 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3350,7 +3350,6 @@ static vm_fault_t filemap_fault_recheck_pte_none(struct vm_fault *vmf)
>  vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf)
>  {
>  	struct file *fpin = NULL;
> -	int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_ACCESS;
>  	loff_t pos = vmf->pgoff >> PAGE_SHIFT;
>  	size_t count = PAGE_SIZE;
>  	int err;
> @@ -3370,7 +3369,7 @@ vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf)
>  	if (!fpin)
>  		return VM_FAULT_SIGBUS;
>
> -	err = fsnotify_file_area_perm(fpin, mask, &pos, count);
> +	err = fsnotify_file_area_perm(fpin, MAY_ACCESS, &pos, count);
>  	fput(fpin);
>  	if (err)
>  		return VM_FAULT_SIGBUS;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index cda01071c7b1f..70318936fd588 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -48,6 +48,7 @@
>  #include <linux/sched/mm.h>
>  #include <linux/ksm.h>
>  #include <linux/memfd.h>
> +#include <linux/fsnotify.h>
>
>  #include <linux/uaccess.h>
>  #include <asm/cacheflush.h>
> @@ -1151,6 +1152,17 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,

I kind of hate that we keep on extending this deprecate syscall. Is it
truly necessary here?

>  		return ret;
>  	}
>
> +	if (file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) {

Is there a circumstance where file == NULL here? I mean get_file()
literally dereferences a field then returns the pointer, so that'd be a
null pointer deref?

Also I'm pretty sure it's impossible possible for a VMA to be VM_SHARED and
!vma->vm_file, since we need to access the address_space etc.

> +		int mask = (prot & PROT_WRITE) ? MAY_WRITE : MAY_READ;
> +		loff_t pos = pgoff >> PAGE_SHIFT;
> +
> +		ret = fsnotify_file_area_perm(file, mask, &pos, size);

All other invocations of this in fs code, this further amplifies my belief
that this belongs in fs code.

> +		if (ret) {
> +			fput(file);
> +			return ret;
> +		}
> +	}
> +
>  	ret = -EINVAL;
>
>  	/* OK security check passed, take write lock + let it rip. */
> diff --git a/mm/util.c b/mm/util.c
> index b6b9684a14388..2dddeabac6098 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -23,6 +23,7 @@
>  #include <linux/processor.h>
>  #include <linux/sizes.h>
>  #include <linux/compat.h>
> +#include <linux/fsnotify.h>
>
>  #include <linux/uaccess.h>
>
> @@ -569,6 +570,12 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
>  	LIST_HEAD(uf);
>
>  	ret = security_mmap_file(file, prot, flag);
> +	if (!ret && file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) {
> +		int mask = (prot & PROT_WRITE) ? MAY_WRITE : MAY_READ;
> +		loff_t pos = pgoff >> PAGE_SHIFT;
> +
> +		ret = fsnotify_file_area_perm(file, mask, &pos, len);
> +	}
>  	if (!ret) {
>  		if (mmap_write_lock_killable(mm))
>  			return -EINTR;

You've duplicated this code in 2 places, can we please just have it in one
as a helper function?

Also I'm not a fan of having super-specific file system code relating to
HSM in general mapping code like this. Can't we have something like a hook
or something more generic?

I mean are we going to keep on expanding this for other super-specific
cases?

I would say refactor this whole thing into a check that's done in fs code
that we can call into.

If we need a new hook, then let's add one. If we can use existing hooks,
let's use them.

Also, is it valid to be accessing this file without doing a get_file()
here? It seems super inconsistent you increment ref count in one place but
not the other?

> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] fsnotify: avoid pre-content events when faulting in user pages
  2025-03-11 11:41 ` [PATCH 2/2] fsnotify: avoid pre-content events when faulting in user pages Amir Goldstein
@ 2025-03-11 12:37   ` Lorenzo Stoakes
  2025-03-11 14:10     ` Amir Goldstein
  0 siblings, 1 reply; 8+ messages in thread
From: Lorenzo Stoakes @ 2025-03-11 12:37 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Josef Bacik, Christian Brauner, linux-fsdevel

On Tue, Mar 11, 2025 at 12:41:53PM +0100, Amir Goldstein wrote:
> In the use case of buffered write whose input buffer is mmapped file on a
> filesystem with a pre-content mark, the prefaulting of the buffer can
> happen under the filesystem freeze protection (obtained in vfs_write())
> which breaks assumptions of pre-content hook and introduces potential
> deadlock of HSM handler in userspace with filesystem freezing.
>
> Now that we have pre-content hooks at file mmap() time, disable the
> pre-content event hooks on page fault to avoid the potential deadlock.
>
> Leave the code of pre-content hooks in page fault because we may want
> to re-enable them on executables or user mapped files under certain
> conditions after resolving the potential deadlocks.
>

Will leave the fs bits to fs people but not hugely comfortable with the
concept of 'leaving code in place just in case'.

Often things end up not being the case :)

> Reported-by: syzbot+7229071b47908b19d5b7@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/linux-fsdevel/7ehxrhbvehlrjwvrduoxsao5k3x4aw275patsb3krkwuq573yv@o2hskrfawbnc/
> Fixes: 8392bc2ff8c8b ("fsnotify: generate pre-content permission event on page fault")
> Suggested-by: Josef Bacik <josef@toxicpanda.com>
> Tested-by: syzbot+7229071b47908b19d5b7@syzkaller.appspotmail.com
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  include/linux/fsnotify.h | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index 6a33288bd6a1f..796dacceec488 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -137,6 +137,14 @@ void file_set_fsnotify_mode_from_watchers(struct file *file);
>  static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
>  					  const loff_t *ppos, size_t count)
>  {
> +	/*
> +	 * Temporarily disable pre-content hooks from page faults (MAY_ACCESS).
> +	 * We may bring them back later either only to executables or to user
> +	 * mapped files under some conditions.
> +	 */
> +	if (!(perm_mask & (MAY_READ | MAY_WRITE)))
> +		return 0;
> +
>  	/*
>  	 * filesystem may be modified in the context of permission events
>  	 * (e.g. by HSM filling a file on access), so sb freeze protection
> @@ -144,9 +152,6 @@ static inline int fsnotify_file_area_perm(struct file *file, int perm_mask,
>  	 */
>  	lockdep_assert_once(file_write_not_started(file));
>
> -	if (!(perm_mask & (MAY_READ | MAY_WRITE | MAY_ACCESS)))
> -		return 0;
> -
>  	if (likely(!FMODE_FSNOTIFY_PERM(file->f_mode)))
>  		return 0;
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] fsnotify: avoid pre-content events when faulting in user pages
  2025-03-11 12:37   ` Lorenzo Stoakes
@ 2025-03-11 14:10     ` Amir Goldstein
  0 siblings, 0 replies; 8+ messages in thread
From: Amir Goldstein @ 2025-03-11 14:10 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: Jan Kara, Josef Bacik, Christian Brauner, linux-fsdevel

On Tue, Mar 11, 2025 at 1:37 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Tue, Mar 11, 2025 at 12:41:53PM +0100, Amir Goldstein wrote:
> > In the use case of buffered write whose input buffer is mmapped file on a
> > filesystem with a pre-content mark, the prefaulting of the buffer can
> > happen under the filesystem freeze protection (obtained in vfs_write())
> > which breaks assumptions of pre-content hook and introduces potential
> > deadlock of HSM handler in userspace with filesystem freezing.
> >
> > Now that we have pre-content hooks at file mmap() time, disable the
> > pre-content event hooks on page fault to avoid the potential deadlock.
> >
> > Leave the code of pre-content hooks in page fault because we may want
> > to re-enable them on executables or user mapped files under certain
> > conditions after resolving the potential deadlocks.
> >
>
> Will leave the fs bits to fs people but not hugely comfortable with the
> concept of 'leaving code in place just in case'.
>
> Often things end up not being the case :)

Fair point.

It's not exactly a "just in case" situation - the existing user of
this code (Meta)
do use the page fault hooks, so I expect they will work on re-enabling
the hooks upstream after this release, but I am fine with reverting the page
fault hooks instead of the temporary disable if that is what everyone wants.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] fsnotify: add pre-content hooks on mmap()
  2025-03-11 12:33   ` Lorenzo Stoakes
@ 2025-03-11 14:37     ` Amir Goldstein
  0 siblings, 0 replies; 8+ messages in thread
From: Amir Goldstein @ 2025-03-11 14:37 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: Jan Kara, Josef Bacik, Christian Brauner, linux-fsdevel

On Tue, Mar 11, 2025 at 1:34 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Tue, Mar 11, 2025 at 12:41:52PM +0100, Amir Goldstein wrote:
> > Pre-content hooks in page faults introduces potential deadlock of HSM
> > handler in userspace with filesystem freezing.
> >
> > The requirement with pre-content event is that for every accessed file
> > range an event covering at least this range will be generated at least
> > once before the file data is accesses.
> >
> > In preparation to disabling pre-content event hooks on page faults,
> > change those hooks to always use the mask MAY_ACCESS and add pre-content
> > hooks at mmap() variants for the entire mmaped range, so HSM can fill
> > content when user requests to map a portion of the file.
> >
> > Note that exec() variant also calls vm_mmap_pgoff() internally to map
> > code sections, so pre-content hooks are also generated in this case.
> >
> > Link: https://lore.kernel.org/linux-fsdevel/7ehxrhbvehlrjwvrduoxsao5k3x4aw275patsb3krkwuq573yv@o2hskrfawbnc/
> > Suggested-by: Josef Bacik <josef@toxicpanda.com>
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  mm/filemap.c |  3 +--
> >  mm/mmap.c    | 12 ++++++++++++
> >  mm/util.c    |  7 +++++++
> >  3 files changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 2974691fdfad2..f85d288209b44 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -3350,7 +3350,6 @@ static vm_fault_t filemap_fault_recheck_pte_none(struct vm_fault *vmf)
> >  vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf)
> >  {
> >       struct file *fpin = NULL;
> > -     int mask = (vmf->flags & FAULT_FLAG_WRITE) ? MAY_WRITE : MAY_ACCESS;
> >       loff_t pos = vmf->pgoff >> PAGE_SHIFT;
> >       size_t count = PAGE_SIZE;
> >       int err;
> > @@ -3370,7 +3369,7 @@ vm_fault_t filemap_fsnotify_fault(struct vm_fault *vmf)
> >       if (!fpin)
> >               return VM_FAULT_SIGBUS;
> >
> > -     err = fsnotify_file_area_perm(fpin, mask, &pos, count);
> > +     err = fsnotify_file_area_perm(fpin, MAY_ACCESS, &pos, count);
> >       fput(fpin);
> >       if (err)
> >               return VM_FAULT_SIGBUS;
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index cda01071c7b1f..70318936fd588 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -48,6 +48,7 @@
> >  #include <linux/sched/mm.h>
> >  #include <linux/ksm.h>
> >  #include <linux/memfd.h>
> > +#include <linux/fsnotify.h>
> >
> >  #include <linux/uaccess.h>
> >  #include <asm/cacheflush.h>
> > @@ -1151,6 +1152,17 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>
> I kind of hate that we keep on extending this deprecate syscall. Is it
> truly necessary here?
>

If my understanding of remap_file_pages() is correct then no new regions of the
file are being mapped - so no, my bad - the pre-content hook is not
needed in this case.


> >               return ret;
> >       }
> >
> > +     if (file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) {
>
> Is there a circumstance where file == NULL here? I mean get_file()
> literally dereferences a field then returns the pointer, so that'd be a
> null pointer deref?
>
> Also I'm pretty sure it's impossible possible for a VMA to be VM_SHARED and
> !vma->vm_file, since we need to access the address_space etc.
>
> > +             int mask = (prot & PROT_WRITE) ? MAY_WRITE : MAY_READ;
> > +             loff_t pos = pgoff >> PAGE_SHIFT;
> > +
> > +             ret = fsnotify_file_area_perm(file, mask, &pos, size);
>
> All other invocations of this in fs code, this further amplifies my belief
> that this belongs in fs code.
>
> > +             if (ret) {
> > +                     fput(file);
> > +                     return ret;
> > +             }
> > +     }
> > +
> >       ret = -EINVAL;
> >
> >       /* OK security check passed, take write lock + let it rip. */
> > diff --git a/mm/util.c b/mm/util.c
> > index b6b9684a14388..2dddeabac6098 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -23,6 +23,7 @@
> >  #include <linux/processor.h>
> >  #include <linux/sizes.h>
> >  #include <linux/compat.h>
> > +#include <linux/fsnotify.h>
> >
> >  #include <linux/uaccess.h>
> >
> > @@ -569,6 +570,12 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
> >       LIST_HEAD(uf);
> >
> >       ret = security_mmap_file(file, prot, flag);
> > +     if (!ret && file && unlikely(FMODE_FSNOTIFY_HSM(file->f_mode))) {
> > +             int mask = (prot & PROT_WRITE) ? MAY_WRITE : MAY_READ;
> > +             loff_t pos = pgoff >> PAGE_SHIFT;
> > +
> > +             ret = fsnotify_file_area_perm(file, mask, &pos, len);
> > +     }
> >       if (!ret) {
> >               if (mmap_write_lock_killable(mm))
> >                       return -EINTR;
>
> You've duplicated this code in 2 places, can we please just have it in one
> as a helper function?
>
> Also I'm not a fan of having super-specific file system code relating to
> HSM in general mapping code like this. Can't we have something like a hook
> or something more generic?
>
> I mean are we going to keep on expanding this for other super-specific
> cases?
>
> I would say refactor this whole thing into a check that's done in fs code
> that we can call into.
>
> If we need a new hook, then let's add one. If we can use existing hooks,
> let's use them.
>

fsnotify hooks and logic have traditionally been contained in fsnotify.h
wrappers much like the security_ hooks.

For the page fault hooks, Linus asked for the FMODE_FSNOTIFY_HSM
condition to be very visible and explicit, but I suppose there is no reason
for not having a fsnotify_mmap() wrapper here.

> Also, is it valid to be accessing this file without doing a get_file()
> here?

Caller of vm_mmap_pgoff() should have a reference on file.

> It seems super inconsistent you increment ref count in one place but
> not the other?

By other place do you mean in remap_file_pages()?
There, the file reference is coming from vma->vm_file and hook is called
outside mmap lock.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-03-11 14:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-11 11:41 [PATCH 0/2] Fix for potential deadlock in pre-content event Amir Goldstein
2025-03-11 11:41 ` [PATCH 1/2] fsnotify: add pre-content hooks on mmap() Amir Goldstein
2025-03-11 12:33   ` Lorenzo Stoakes
2025-03-11 14:37     ` Amir Goldstein
2025-03-11 11:41 ` [PATCH 2/2] fsnotify: avoid pre-content events when faulting in user pages Amir Goldstein
2025-03-11 12:37   ` Lorenzo Stoakes
2025-03-11 14:10     ` Amir Goldstein
  -- strict thread matches above, loose matches on Subject: below --
2025-03-09 11:52 [PATCH 0/2] Fix for potential deadlock in pre-content event Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).