From: Josef Bacik <josef@toxicpanda.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
jack@suse.cz, david@fromorbit.com, amir73il@gmail.com,
hch@lst.de
Subject: Re: [PATCH][RFC] fs: add levels to inode write access
Date: Thu, 30 May 2024 10:58:10 -0400 [thread overview]
Message-ID: <20240530145810.GA2205585@perftesting> (raw)
In-Reply-To: <20240530-atheismus-festland-c11c1d3b7671@brauner>
On Thu, May 30, 2024 at 12:32:06PM +0200, Christian Brauner wrote:
> On Wed, May 29, 2024 at 04:41:32PM -0400, Josef Bacik wrote:
> > NOTE:
> > This is compile tested only. It's also based on an out of tree feature branch
> > from Amir that I'm extending to add page fault content events to allow us to
> > have on-demand binary loading at exec time. If you need more context please let
> > me know, I'll push my current branch somewhere and wire up how I plan to use
> > this patch so you can see it in better context, but hopefully I've described
> > what I'm trying to accomplish enough that this leads to useful discussion.
> >
> >
> > Currently we have ->i_writecount to control write access to an inode.
> > Callers may deny write access by calling deny_write_access(), which will
> > cause ->i_writecount to go negative, and allow_write_access() to push it
> > back up to 0.
> >
> > This is used in a few ways, the biggest user being exec. Historically
> > we've blocked write access to files that are opened for executing.
> > fsverity is the other notable user.
> >
> > With the upcoming fanotify feature that allows for on-demand population
> > of files, this blanket policy of denying writes to files opened for
> > executing creates a problem. We have to issue events right before
> > denying access, and the entire file must be populated before we can
> > continue with the exec.
> >
> > This creates a problem for users who have large binaries they want to
> > populate on demand. Inside of Meta we often have multi-gigabyte
> > binaries, even on the order of tens of gigabytes. Pre-loading these
> > large files is costly, especially when large portions of the binary may
> > never be read (think debuginfo).
> >
> > In order to facilitate on-demand loading of binaries we need to have a
> > way to punch a hole in this exec related write denial.
>
> Hm. I suggest we try to tackle this in a completely different way. Maybe
> I mentioned it during LSFMM but instead of doing this dance we should
> try and remove the deny_write_access() mechanisms for exec completely.
>
> Back in 2021 we removed the remaining VM_DENYWRITE bits but left in
> deny_write_access() for exec. Back then I had thought that this was a
> bit risky for not too much gain. But looking at this code here I think
> we now have an even stronger reason to try and get rid of this
> restriction. And I've since changed my mind. See my notes on the
> _completely untested_ RFC patch I appended.
>
> Ofc depends on whether Linus still agrees that removing this might be
> something we could try.
>
Muahaha you took the bait.
I obviously much prefer this solution, and from what I can tell we're all pretty
well in agreement about this. Turn it into a real patch and I'll happily add my
reviewed-by. Thanks,
Josef
next prev parent reply other threads:[~2024-05-30 14:58 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-29 20:41 [PATCH][RFC] fs: add levels to inode write access Josef Bacik
2024-05-29 22:00 ` Jeff Layton
2024-05-30 1:14 ` Matthew Wilcox
2024-05-30 10:32 ` Christian Brauner
2024-05-30 12:57 ` Amir Goldstein
2024-05-30 14:58 ` Josef Bacik [this message]
2024-05-30 15:23 ` Christian Brauner
2024-05-30 15:49 ` Linus Torvalds
2024-05-31 10:02 ` Christian Brauner
2024-05-31 12:32 ` Christian Brauner
2024-05-31 13:01 ` [PATCH] fs: don't block i_writecount during exec Christian Brauner
2024-05-31 15:40 ` Linus Torvalds
2024-05-31 18:08 ` Jeff Layton
2024-05-31 22:08 ` Josef Bacik
2024-06-03 13:52 ` (subset) " Christian Brauner
2024-06-06 12:45 ` Aishwarya TCV
2024-06-06 15:37 ` Mark Brown
2024-06-06 16:53 ` Josef Bacik
2024-06-06 17:33 ` Mark Brown
2024-06-06 17:49 ` Mark Brown
2024-08-07 9:59 ` Tudor Ambarus
2024-09-04 17:04 ` Jann Horn
2024-09-05 7:38 ` Roberto Sassu
2024-05-31 13:09 ` [PATCH][RFC] fs: add levels to inode write access Amir Goldstein
2024-05-31 14:50 ` Christian Brauner
2024-05-31 15:47 ` Matthew Wilcox
2024-05-31 22:14 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240530145810.GA2205585@perftesting \
--to=josef@toxicpanda.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).