* [PATCH] mm: do file ownership checks with the proper mount idmap
@ 2026-06-25 15:38 Pedro Falcato
2026-06-25 18:29 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Pedro Falcato @ 2026-06-25 15:38 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Matthew Wilcox (Oracle),
Andrew Morton, Liam R. Howlett, David Hildenbrand
Cc: Jan Kara, Vlastimil Babka, Jann Horn, linux-fsdevel, linux-mm,
linux-kernel, Pedro Falcato, stable
Ever since idmapped mounts were introduced, inode ownership checks
(for side-channel protection) in mincore() and madvise(MADV_PAGEOUT) were
done against the nop_mnt_idmap, which completely ignores the file's mount's
idmap. This results in odd edgecases like:
1) mount/bind-mount with an idmap userA:userB:1
2) userB runs an owner_or_capable() check on file that is owned by userA
on-disk/in-memory, but owned by userB after idmap translation
3) owner_or_capable() mysteriously fails as the correct idmap wasn't supplied
In the case of mincore/madvise MADV_PAGEOUT, this is usually benign, because
file_permission(file, MAY_WRITE) will probably succeed, as it uses the proper
idmap internally, but it does not need to be the case on e.g a 0444 file
where even the owner itself doesn't have permissions to write to it.
Since this is clearly not trivial to get right, introduce a
file_owner_or_capable() that can carry the correct semantics, and switch
the various users in mm to it.
The issue was found by manual code inspection & an off-list discussion with
Jan Kara.
Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP")
Cc: stable@vger.kernel.org
Signed-off-by: Pedro Falcato <pfalcato@suse.de>
---
I noticed there are a couple of call sites in fs/ that could perhaps be
cleaned up with the added helper, but I'm skipping that for now for brevity's
sake.
include/linux/fs.h | 5 +++++
mm/filemap.c | 2 +-
mm/madvise.c | 3 +--
mm/mincore.c | 3 +--
4 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d10897b3a1e3..50ce731a2b78 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2444,6 +2444,11 @@ static inline struct mnt_idmap *file_mnt_idmap(const struct file *file)
return mnt_idmap(file->f_path.mnt);
}
+static inline bool file_owner_or_capable(const struct file *file)
+{
+ return inode_owner_or_capable(file_mnt_idmap(file), file_inode(file));
+}
+
/**
* is_idmapped_mnt - check whether a mount is mapped
* @mnt: the mount to check
diff --git a/mm/filemap.c b/mm/filemap.c
index 5af62e6abca5..58eb9d240643 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4704,7 +4704,7 @@ static inline bool can_do_cachestat(struct file *f)
{
if (f->f_mode & FMODE_WRITE)
return true;
- if (inode_owner_or_capable(file_mnt_idmap(f), file_inode(f)))
+ if (file_owner_or_capable(f))
return true;
return file_permission(f, MAY_WRITE) == 0;
}
diff --git a/mm/madvise.c b/mm/madvise.c
index cd9bb077072c..77552b03d318 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -336,8 +336,7 @@ static inline bool can_do_file_pageout(struct vm_area_struct *vma)
* otherwise we'd be including shared non-exclusive mappings, which
* opens a side channel.
*/
- return inode_owner_or_capable(&nop_mnt_idmap,
- file_inode(vma->vm_file)) ||
+ return file_owner_or_capable(vma->vm_file) ||
file_permission(vma->vm_file, MAY_WRITE) == 0;
}
diff --git a/mm/mincore.c b/mm/mincore.c
index 296f2e3922b5..c8757c5085bf 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -227,8 +227,7 @@ static inline bool can_do_mincore(struct vm_area_struct *vma)
* for writing; otherwise we'd be including shared non-exclusive
* mappings, which opens a side channel.
*/
- return inode_owner_or_capable(&nop_mnt_idmap,
- file_inode(vma->vm_file)) ||
+ return file_owner_or_capable(vma->vm_file) ||
file_permission(vma->vm_file, MAY_WRITE) == 0;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] mm: do file ownership checks with the proper mount idmap
2026-06-25 15:38 [PATCH] mm: do file ownership checks with the proper mount idmap Pedro Falcato
@ 2026-06-25 18:29 ` Andrew Morton
2026-06-26 9:27 ` Pedro Falcato
2026-06-26 14:19 ` Jan Kara
2026-06-26 16:02 ` David Hildenbrand (Arm)
2 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2026-06-25 18:29 UTC (permalink / raw)
To: Pedro Falcato
Cc: Alexander Viro, Christian Brauner, Matthew Wilcox (Oracle),
Liam R. Howlett, David Hildenbrand, Jan Kara, Vlastimil Babka,
Jann Horn, linux-fsdevel, linux-mm, linux-kernel, stable
On Thu, 25 Jun 2026 16:38:53 +0100 Pedro Falcato <pfalcato@suse.de> wrote:
> Ever since idmapped mounts were introduced, inode ownership checks
> (for side-channel protection) in mincore() and madvise(MADV_PAGEOUT) were
> done against the nop_mnt_idmap, which completely ignores the file's mount's
> idmap. This results in odd edgecases like:
>
> 1) mount/bind-mount with an idmap userA:userB:1
> 2) userB runs an owner_or_capable() check on file that is owned by userA
> on-disk/in-memory, but owned by userB after idmap translation
> 3) owner_or_capable() mysteriously fails as the correct idmap wasn't supplied
>
> In the case of mincore/madvise MADV_PAGEOUT, this is usually benign, because
> file_permission(file, MAY_WRITE) will probably succeed, as it uses the proper
> idmap internally, but it does not need to be the case on e.g a 0444 file
> where even the owner itself doesn't have permissions to write to it.
>
> Since this is clearly not trivial to get right, introduce a
> file_owner_or_capable() that can carry the correct semantics, and switch
> the various users in mm to it.
>
> The issue was found by manual code inspection & an off-list discussion with
> Jan Kara.
Do our idmap selftests tickle these issues? If not, is it hard to add?
> I noticed there are a couple of call sites in fs/ that could perhaps be
> cleaned up with the added helper, but I'm skipping that for now for brevity's
> sake.
You could do this as a 2-patch series, because:
> include/linux/fs.h | 5 +++++
> mm/filemap.c | 2 +-
> mm/madvise.c | 3 +--
> mm/mincore.c | 3 +--
> 4 files changed, 8 insertions(+), 5 deletions(-)
it touches mm/ but ->Christian, please.
(or I can queue it with Christian's ack, of course)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm: do file ownership checks with the proper mount idmap
2026-06-25 18:29 ` Andrew Morton
@ 2026-06-26 9:27 ` Pedro Falcato
0 siblings, 0 replies; 7+ messages in thread
From: Pedro Falcato @ 2026-06-26 9:27 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Christian Brauner, Matthew Wilcox (Oracle),
Liam R. Howlett, David Hildenbrand, Jan Kara, Vlastimil Babka,
Jann Horn, linux-fsdevel, linux-mm, linux-kernel, stable
On Thu, Jun 25, 2026 at 11:29:03AM -0700, Andrew Morton wrote:
> On Thu, 25 Jun 2026 16:38:53 +0100 Pedro Falcato <pfalcato@suse.de> wrote:
>
> > Ever since idmapped mounts were introduced, inode ownership checks
> > (for side-channel protection) in mincore() and madvise(MADV_PAGEOUT) were
> > done against the nop_mnt_idmap, which completely ignores the file's mount's
> > idmap. This results in odd edgecases like:
> >
> > 1) mount/bind-mount with an idmap userA:userB:1
> > 2) userB runs an owner_or_capable() check on file that is owned by userA
> > on-disk/in-memory, but owned by userB after idmap translation
> > 3) owner_or_capable() mysteriously fails as the correct idmap wasn't supplied
> >
> > In the case of mincore/madvise MADV_PAGEOUT, this is usually benign, because
> > file_permission(file, MAY_WRITE) will probably succeed, as it uses the proper
> > idmap internally, but it does not need to be the case on e.g a 0444 file
> > where even the owner itself doesn't have permissions to write to it.
> >
> > Since this is clearly not trivial to get right, introduce a
> > file_owner_or_capable() that can carry the correct semantics, and switch
> > the various users in mm to it.
> >
> > The issue was found by manual code inspection & an off-list discussion with
> > Jan Kara.
>
> Do our idmap selftests tickle these issues? If not, is it hard to add?
In theory we could add this to tools/testing/selftests/mount_setattr/mount_setattr_test.c, but
that seems like the wrong place for an mm regression test. And if we add it
somewhere else, we'll have to deal with the bureaucracy of setting up an idmapped
mount (including setting up a filesystem image!). I'm taking suggestions :)
>
> > I noticed there are a couple of call sites in fs/ that could perhaps be
> > cleaned up with the added helper, but I'm skipping that for now for brevity's
> > sake.
>
> You could do this as a 2-patch series, because:
>
> > include/linux/fs.h | 5 +++++
> > mm/filemap.c | 2 +-
> > mm/madvise.c | 3 +--
> > mm/mincore.c | 3 +--
> > 4 files changed, 8 insertions(+), 5 deletions(-)
>
> it touches mm/ but ->Christian, please.
>
> (or I can queue it with Christian's ack, of course)
Understood.
--
Pedro
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm: do file ownership checks with the proper mount idmap
2026-06-25 15:38 [PATCH] mm: do file ownership checks with the proper mount idmap Pedro Falcato
2026-06-25 18:29 ` Andrew Morton
@ 2026-06-26 14:19 ` Jan Kara
2026-06-29 12:15 ` Christian Brauner
2026-06-26 16:02 ` David Hildenbrand (Arm)
2 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2026-06-26 14:19 UTC (permalink / raw)
To: Pedro Falcato
Cc: Alexander Viro, Christian Brauner, Matthew Wilcox (Oracle),
Andrew Morton, Liam R. Howlett, David Hildenbrand, Jan Kara,
Vlastimil Babka, Jann Horn, linux-fsdevel, linux-mm, linux-kernel,
stable
On Thu 25-06-26 16:38:53, Pedro Falcato wrote:
> Ever since idmapped mounts were introduced, inode ownership checks
> (for side-channel protection) in mincore() and madvise(MADV_PAGEOUT) were
> done against the nop_mnt_idmap, which completely ignores the file's mount's
> idmap. This results in odd edgecases like:
>
> 1) mount/bind-mount with an idmap userA:userB:1
> 2) userB runs an owner_or_capable() check on file that is owned by userA
> on-disk/in-memory, but owned by userB after idmap translation
> 3) owner_or_capable() mysteriously fails as the correct idmap wasn't supplied
>
> In the case of mincore/madvise MADV_PAGEOUT, this is usually benign, because
> file_permission(file, MAY_WRITE) will probably succeed, as it uses the proper
> idmap internally, but it does not need to be the case on e.g a 0444 file
> where even the owner itself doesn't have permissions to write to it.
>
> Since this is clearly not trivial to get right, introduce a
> file_owner_or_capable() that can carry the correct semantics, and switch
> the various users in mm to it.
>
> The issue was found by manual code inspection & an off-list discussion with
> Jan Kara.
>
> Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP")
> Cc: stable@vger.kernel.org
> Signed-off-by: Pedro Falcato <pfalcato@suse.de>
This looks good to me. I'm a bit curious why Christian initially (in 2021)
used init_user_ns here instead of the file namespace... Anyway feel free to
add:
Reviewed-by: Jan Kara <jack@suse.cz>
I'll also note that there are quite some places in fs/ that could be
simplified with this helper but I'd leave them for later.
Honza
> ---
>
> I noticed there are a couple of call sites in fs/ that could perhaps be
> cleaned up with the added helper, but I'm skipping that for now for brevity's
> sake.
>
> include/linux/fs.h | 5 +++++
> mm/filemap.c | 2 +-
> mm/madvise.c | 3 +--
> mm/mincore.c | 3 +--
> 4 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index d10897b3a1e3..50ce731a2b78 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2444,6 +2444,11 @@ static inline struct mnt_idmap *file_mnt_idmap(const struct file *file)
> return mnt_idmap(file->f_path.mnt);
> }
>
> +static inline bool file_owner_or_capable(const struct file *file)
> +{
> + return inode_owner_or_capable(file_mnt_idmap(file), file_inode(file));
> +}
> +
> /**
> * is_idmapped_mnt - check whether a mount is mapped
> * @mnt: the mount to check
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 5af62e6abca5..58eb9d240643 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -4704,7 +4704,7 @@ static inline bool can_do_cachestat(struct file *f)
> {
> if (f->f_mode & FMODE_WRITE)
> return true;
> - if (inode_owner_or_capable(file_mnt_idmap(f), file_inode(f)))
> + if (file_owner_or_capable(f))
> return true;
> return file_permission(f, MAY_WRITE) == 0;
> }
> diff --git a/mm/madvise.c b/mm/madvise.c
> index cd9bb077072c..77552b03d318 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -336,8 +336,7 @@ static inline bool can_do_file_pageout(struct vm_area_struct *vma)
> * otherwise we'd be including shared non-exclusive mappings, which
> * opens a side channel.
> */
> - return inode_owner_or_capable(&nop_mnt_idmap,
> - file_inode(vma->vm_file)) ||
> + return file_owner_or_capable(vma->vm_file) ||
> file_permission(vma->vm_file, MAY_WRITE) == 0;
> }
>
> diff --git a/mm/mincore.c b/mm/mincore.c
> index 296f2e3922b5..c8757c5085bf 100644
> --- a/mm/mincore.c
> +++ b/mm/mincore.c
> @@ -227,8 +227,7 @@ static inline bool can_do_mincore(struct vm_area_struct *vma)
> * for writing; otherwise we'd be including shared non-exclusive
> * mappings, which opens a side channel.
> */
> - return inode_owner_or_capable(&nop_mnt_idmap,
> - file_inode(vma->vm_file)) ||
> + return file_owner_or_capable(vma->vm_file) ||
> file_permission(vma->vm_file, MAY_WRITE) == 0;
> }
>
> --
> 2.54.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] mm: do file ownership checks with the proper mount idmap
2026-06-26 14:19 ` Jan Kara
@ 2026-06-29 12:15 ` Christian Brauner
2026-06-29 18:30 ` Pedro Falcato
0 siblings, 1 reply; 7+ messages in thread
From: Christian Brauner @ 2026-06-29 12:15 UTC (permalink / raw)
To: Jan Kara
Cc: Pedro Falcato, Alexander Viro, Christian Brauner,
Matthew Wilcox (Oracle), Andrew Morton, Liam R. Howlett,
David Hildenbrand, Vlastimil Babka, Jann Horn, linux-fsdevel,
linux-mm, linux-kernel, stable
On 2026-06-26 16:19:18+02:00, Jan Kara wrote:
> On Thu 25-06-26 16:38:53, Pedro Falcato wrote:
>
> > Ever since idmapped mounts were introduced, inode ownership checks
> > (for side-channel protection) in mincore() and madvise(MADV_PAGEOUT) were
> > done against the nop_mnt_idmap, which completely ignores the file's mount's
> > idmap. This results in odd edgecases like:
> >
> > 1) mount/bind-mount with an idmap userA:userB:1
> > 2) userB runs an owner_or_capable() check on file that is owned by userA
> > on-disk/in-memory, but owned by userB after idmap translation
> > 3) owner_or_capable() mysteriously fails as the correct idmap wasn't supplied
> >
> > In the case of mincore/madvise MADV_PAGEOUT, this is usually benign, because
> > file_permission(file, MAY_WRITE) will probably succeed, as it uses the proper
> > idmap internally, but it does not need to be the case on e.g a 0444 file
> > where even the owner itself doesn't have permissions to write to it.
> >
> > Since this is clearly not trivial to get right, introduce a
> > file_owner_or_capable() that can carry the correct semantics, and switch
> > the various users in mm to it.
> >
> > The issue was found by manual code inspection & an off-list discussion with
> > Jan Kara.
> >
> > Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Pedro Falcato <pfalcato@suse.de>
>
> This looks good to me. I'm a bit curious why Christian initially (in 2021)
> used init_user_ns here instead of the file namespace... Anyway feel free to
> add:
Back when this was added only the do_mincore() codepath existed and that
was intentionally left unconverted because it exposes the cache
residency status. So it was effectively a massive side-channel.
Both fd3b1bc3c86e ("mm/madvise: fix madvise_pageout for private file mappings")
and specifically cachestat() came way after all that.
I'm otherwise fine with the change.
Reviewed-by: Christian Brauner (Amutable) <brauner@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] mm: do file ownership checks with the proper mount idmap
2026-06-29 12:15 ` Christian Brauner
@ 2026-06-29 18:30 ` Pedro Falcato
0 siblings, 0 replies; 7+ messages in thread
From: Pedro Falcato @ 2026-06-29 18:30 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Alexander Viro, Matthew Wilcox (Oracle), Andrew Morton,
Liam R. Howlett, David Hildenbrand, Vlastimil Babka, Jann Horn,
linux-fsdevel, linux-mm, linux-kernel, stable
On Mon, Jun 29, 2026 at 02:15:19PM +0200, Christian Brauner wrote:
> On 2026-06-26 16:19:18+02:00, Jan Kara wrote:
> > On Thu 25-06-26 16:38:53, Pedro Falcato wrote:
> >
> > > Ever since idmapped mounts were introduced, inode ownership checks
> > > (for side-channel protection) in mincore() and madvise(MADV_PAGEOUT) were
> > > done against the nop_mnt_idmap, which completely ignores the file's mount's
> > > idmap. This results in odd edgecases like:
> > >
> > > 1) mount/bind-mount with an idmap userA:userB:1
> > > 2) userB runs an owner_or_capable() check on file that is owned by userA
> > > on-disk/in-memory, but owned by userB after idmap translation
> > > 3) owner_or_capable() mysteriously fails as the correct idmap wasn't supplied
> > >
> > > In the case of mincore/madvise MADV_PAGEOUT, this is usually benign, because
> > > file_permission(file, MAY_WRITE) will probably succeed, as it uses the proper
> > > idmap internally, but it does not need to be the case on e.g a 0444 file
> > > where even the owner itself doesn't have permissions to write to it.
> > >
> > > Since this is clearly not trivial to get right, introduce a
> > > file_owner_or_capable() that can carry the correct semantics, and switch
> > > the various users in mm to it.
> > >
> > > The issue was found by manual code inspection & an off-list discussion with
> > > Jan Kara.
> > >
> > > Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Pedro Falcato <pfalcato@suse.de>
> >
> > This looks good to me. I'm a bit curious why Christian initially (in 2021)
> > used init_user_ns here instead of the file namespace... Anyway feel free to
> > add:
>
> Back when this was added only the do_mincore() codepath existed and that
> was intentionally left unconverted because it exposes the cache
> residency status. So it was effectively a massive side-channel.
Hmm. I'm not sure what you mean by this. Wouldn't it be more correct to respect
the mount idmap (given that a mount-ns-capable user mounted it with an idmap for
someone else, or itself) for mincore? Am I missing something? Or maybe I'm
misunderstanding that paragraph.
>
> Both fd3b1bc3c86e ("mm/madvise: fix madvise_pageout for private file mappings")
> and specifically cachestat() came way after all that.
>
> I'm otherwise fine with the change.
>
> Reviewed-by: Christian Brauner (Amutable) <brauner@kernel.org>
Thanks!
--
Pedro
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mm: do file ownership checks with the proper mount idmap
2026-06-25 15:38 [PATCH] mm: do file ownership checks with the proper mount idmap Pedro Falcato
2026-06-25 18:29 ` Andrew Morton
2026-06-26 14:19 ` Jan Kara
@ 2026-06-26 16:02 ` David Hildenbrand (Arm)
2 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-26 16:02 UTC (permalink / raw)
To: Pedro Falcato, Alexander Viro, Christian Brauner,
Matthew Wilcox (Oracle), Andrew Morton, Liam R. Howlett
Cc: Jan Kara, Vlastimil Babka, Jann Horn, linux-fsdevel, linux-mm,
linux-kernel, stable
On 6/25/26 17:38, Pedro Falcato wrote:
> Ever since idmapped mounts were introduced, inode ownership checks
> (for side-channel protection) in mincore() and madvise(MADV_PAGEOUT) were
> done against the nop_mnt_idmap, which completely ignores the file's mount's
> idmap. This results in odd edgecases like:
>
> 1) mount/bind-mount with an idmap userA:userB:1
> 2) userB runs an owner_or_capable() check on file that is owned by userA
> on-disk/in-memory, but owned by userB after idmap translation
> 3) owner_or_capable() mysteriously fails as the correct idmap wasn't supplied
>
> In the case of mincore/madvise MADV_PAGEOUT, this is usually benign, because
> file_permission(file, MAY_WRITE) will probably succeed, as it uses the proper
> idmap internally, but it does not need to be the case on e.g a 0444 file
> where even the owner itself doesn't have permissions to write to it.
>
> Since this is clearly not trivial to get right, introduce a
> file_owner_or_capable() that can carry the correct semantics, and switch
> the various users in mm to it.
>
> The issue was found by manual code inspection & an off-list discussion with
> Jan Kara.
>
> Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP")
> Cc: stable@vger.kernel.org
> Signed-off-by: Pedro Falcato <pfalcato@suse.de>
MM side LGTM
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-29 18:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 15:38 [PATCH] mm: do file ownership checks with the proper mount idmap Pedro Falcato
2026-06-25 18:29 ` Andrew Morton
2026-06-26 9:27 ` Pedro Falcato
2026-06-26 14:19 ` Jan Kara
2026-06-29 12:15 ` Christian Brauner
2026-06-29 18:30 ` Pedro Falcato
2026-06-26 16:02 ` David Hildenbrand (Arm)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox