Re: [RFC PATCH v1 1/2] fs: Add O_DENY_WRITE

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Mickaël Salaün" <mic@digikod.net>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Jann Horn <jannh@google.com>, Al Viro <viro@zeniv.linux.org.uk>,
	 Christian Brauner <brauner@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	 Paul Moore <paul@paul-moore.com>,
	Serge Hallyn <serge@hallyn.com>,
	 Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
	 Christian Heimes <christian@python.org>,
	Dmitry Vyukov <dvyukov@google.com>,
	 Elliott Hughes <enh@google.com>,
	Fan Wu <wufan@linux.microsoft.com>,
	 Florian Weimer <fweimer@redhat.com>, Jeff Xu <jeffxu@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	 Jordan R Abrahams <ajordanr@google.com>,
	Lakshmi Ramasubramanian <nramas@linux.microsoft.com>,
	 Luca Boccassi <bluca@debian.org>,
	Matt Bobrowski <mattbobrowski@google.com>,
	 Miklos Szeredi <mszeredi@redhat.com>,
	Mimi Zohar <zohar@linux.ibm.com>,
	 Nicolas Bouchinet <nicolas.bouchinet@oss.cyber.gouv.fr>,
	Robert Waite <rowait@microsoft.com>,
	 Roberto Sassu <roberto.sassu@huawei.com>,
	Scott Shell <scottsh@microsoft.com>,
	 Steve Dower <steve.dower@python.org>,
	Steve Grubb <sgrubb@redhat.com>,
	 kernel-hardening@lists.openwall.com, linux-api@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,  linux-integrity@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	 linux-security-module@vger.kernel.org,
	Jeff Xu <jeffxu@chromium.org>
Subject: Re: [RFC PATCH v1 1/2] fs: Add O_DENY_WRITE
Date: Mon, 25 Aug 2025 11:31:42 +0200	[thread overview]
Message-ID: <20250825.mahNeel0dohz@digikod.net> (raw)
In-Reply-To: <CALCETrWwd90qQ3U2nZg9Fhye6CMQ6ZF20oQ4ME6BoyrFd0t88Q@mail.gmail.com>

On Sun, Aug 24, 2025 at 11:04:03AM -0700, Andy Lutomirski wrote:
> On Sun, Aug 24, 2025 at 4:03 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Fri, Aug 22, 2025 at 09:45:32PM +0200, Jann Horn wrote:
> > > On Fri, Aug 22, 2025 at 7:08 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > Add a new O_DENY_WRITE flag usable at open time and on opened file (e.g.
> > > > passed file descriptors).  This changes the state of the opened file by
> > > > making it read-only until it is closed.  The main use case is for script
> > > > interpreters to get the guarantee that script' content cannot be altered
> > > > while being read and interpreted.  This is useful for generic distros
> > > > that may not have a write-xor-execute policy.  See commit a5874fde3c08
> > > > ("exec: Add a new AT_EXECVE_CHECK flag to execveat(2)")
> > > >
> > > > Both execve(2) and the IOCTL to enable fsverity can already set this
> > > > property on files with deny_write_access().  This new O_DENY_WRITE make
> > >
> > > The kernel actually tried to get rid of this behavior on execve() in
> > > commit 2a010c41285345da60cece35575b4e0af7e7bf44.; but sadly that had
> > > to be reverted in commit 3b832035387ff508fdcf0fba66701afc78f79e3d
> > > because it broke userspace assumptions.
> >
> > Oh, good to know.
> >
> > >
> > > > it widely available.  This is similar to what other OSs may provide
> > > > e.g., opening a file with only FILE_SHARE_READ on Windows.
> > >
> > > We used to have the analogous mmap() flag MAP_DENYWRITE, and that was
> > > removed for security reasons; as
> > > https://man7.org/linux/man-pages/man2/mmap.2.html says:
> > >
> > > |        MAP_DENYWRITE
> > > |               This flag is ignored.  (Long ago—Linux 2.0 and earlier—it
> > > |               signaled that attempts to write to the underlying file
> > > |               should fail with ETXTBSY.  But this was a source of denial-
> > > |               of-service attacks.)"
> > >
> > > It seems to me that the same issue applies to your patch - it would
> > > allow unprivileged processes to essentially lock files such that other
> > > processes can't write to them anymore. This might allow unprivileged
> > > users to prevent root from updating config files or stuff like that if
> > > they're updated in-place.
> >
> > Yes, I agree, but since it is the case for executed files I though it
> > was worth starting a discussion on this topic.  This new flag could be
> > restricted to executable files, but we should avoid system-wide locks
> > like this.  I'm not sure how Windows handle these issues though.
> >
> > Anyway, we should rely on the access control policy to control write and
> > execute access in a consistent way (e.g. write-xor-execute).  Thanks for
> > the references and the background!
> 
> I'm confused.  I understand that there are many contexts in which one
> would want to prevent execution of unapproved content, which might
> include preventing a given process from modifying some code and then
> executing it.
> 
> I don't understand what these deny-write features have to do with it.
> These features merely prevent someone from modifying code *that is
> currently in use*, which is not at all the same thing as preventing
> modifying code that might get executed -- one can often modify
> contents *before* executing those contents.

The order of checks would be:
1. open script with O_DENY_WRITE
2. check executability with AT_EXECVE_CHECK
3. read the content and interpret it

The deny-write feature was to guarantee that there is no race condition
between step 2 and 3.  All these checks are supposed to be done by a
trusted interpreter (which is allowed to be executed).  The
AT_EXECVE_CHECK call enables the caller to know if the kernel (and
associated security policies) allowed the *current* content of the file
to be executed.  Whatever happen before or after that (wrt.
O_DENY_WRITE) should be covered by the security policy.

> 
> In any case, IMO it's rather sad that the elimination of ETXTBSY had
> to be reverted -- it's really quite a nasty feature.  But it occurs to
> me that Linux can more or less do what is IMO the actually desired
> thing: snapshot the contents of a file and execute the snapshot.  The
> hack at the end of the email works!  (Well, it works if the chosen
> filesystem supports it.)
> 
> $ ./silly_tmp /tmp/test /tmp vim /proc/self/fd/3
> 
> emacs is apparently far, far too clever and can't save if you do:
> 
> $ ./silly_tmp /tmp/test /tmp emacs /proc/self/fd/3
> 
> 
> I'm not seriously suggesting that anyone should execute binaries or
> scripts on Linux exactly like this, for a whole bunch of reasons:
> 
> - It needs filesystem support (but maybe this isn't so bad)
> 
> - It needs write access to a directory on the correct filesystem (a
> showstopper for serious use)
> 
> - It is wildly incompatible with write-xor-execute, so this would be a
> case of one step forward, ten steps back.
> 
> - It would defeat a lot of tools that inspect /proc, which would be
> quite annoying to say the least.
> 
> 
> But maybe a less kludgy version could be used for real.  What if there
> was a syscall that would take an fd and make a snapshot of the file?

Yes, that would be a clean solution.  I don't think this is achievable
in an efficient way without involving filesystem implementations though.

> It would, at least by default, produce a *read-only* snapshot (fully
> sealed a la F_SEAL_*), inherit any integrity data that came with the
> source (e.g. LSMs could understand it), would not require a writable
> directory on the filesystem, and would maybe even come with an extra
> seal-like thing that prevents it from being linkat-ed.  (I'm not sure
> that linkat would actually be a problem, but I'm also not immediately
> sure that LSMs would be as comfortable with it if linkat were
> allowed.)  And there could probably be an extremely efficient
> implementation that might even reuse the existing deny-write mechanism
> to optimize the common case where the file is never written.
> 
> For that matter, the actual common case would be to execute stuff in
> /usr or similar, and those files really ought never to be modified.
> So there could be a file attribute or something that means "this file
> CANNOT be modified, but it can still be unlinked or replaced as
> usual", and snapshotting such a file would be a no-op.  Distributions
> and container tools could set that attribute.  Overlayfs could also
> provide an efficient implementation if the file currently comes from
> an immutable source.
> 
> Hmm, maybe it's not strictly necessary that it be immutable -- maybe
> it's sometimes okay if reads start to fail if the contents change.
> Let's call this a "weak snapshot" -- reads of a weak snapshot either
> return the original contents or fail.  fsverity would give weak
> snapshots for at no additional cost.
> 
> 
> It's worth noting that the common case doesn't actually need an fd.
> We have mmap(..., MAP_PRIVATE, ...).  What we would actually want for
> mmap use cases is mmap(..., MAP_SNAPSHOT, ...), with the semantics
> that the kernel promises that future writes to the source would either
> not be reflected in the mapping or would cause SIGBUS.  One might
> reasonably debate what forced-writes would do (I think forced-writes
> should be allowed just like they currently are, since anyone who can
> force-write to process memory is already assumed to be permitted to
> bypass write-xor-execute).
> 
> 
> ---
> 
> /* Written by Claude Sonnet 4 with a surprisingly small amount of help
> from Andy */
> 
> #define _GNU_SOURCE
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <sys/ioctl.h>
> #include <linux/fs.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <errno.h>
> #include <string.h>
> 
> int main(int argc, char *argv[]) {
>     if (argc < 4) {
>         fprintf(stderr, "Usage: %s <source_file> <temp_dir>
> [exec_args...]\n", argv[0]);
>         exit(1);
>     }
> 
>     const char *source_file = argv[1];
>     const char *temp_dir = argv[2];
> 
>     // Open source file
>     int source_fd = open(source_file, O_RDONLY);
>     if (source_fd == -1) {
>         perror("Failed to open source file");
>         exit(1);
>     }
> 
>     // Create temporary file
>     int temp_fd = open(temp_dir, O_TMPFILE | O_RDWR, 0600);
>     if (temp_fd == -1) {
>         perror("Failed to create temporary file");
>         close(source_fd);
>         exit(1);
>     }
> 
>     // Clone the file contents using FICLONE
>     if (ioctl(temp_fd, FICLONE, source_fd) == -1) {
>         perror("Failed to clone file");
>         close(source_fd);
>         close(temp_fd);
>         exit(1);
>     }
> 
>     // Close source file
>     close(source_fd);
> 
>     // Make sure temp file is on fd 3
>     if (temp_fd != 3) {
>         if (dup2(temp_fd, 3) == -1) {
>             perror("Failed to move temp file to fd 3");
>             close(temp_fd);
>             exit(1);
>         }
>         close(temp_fd);
>     }
> 
>     // Execute the remaining arguments
>     if (argc >= 3) {
>         execvp(argv[3], &argv[3]);
>         perror("Failed to execute command");
>         exit(1);
>     }
> 
>     return 0;
> }

As you said, this doesn't work if temp_dir is not allowed for execution,
and it doesn't allow the kernel to check/track the content of the
script, which is the purpose of AT_EXECVE_CHECK.

next prev parent reply	other threads:[~2025-08-25  9:31 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-22 17:07 [RFC PATCH v1 0/2] Add O_DENY_WRITE (complement AT_EXECVE_CHECK) Mickaël Salaün
2025-08-22 17:07 ` [RFC PATCH v1 1/2] fs: Add O_DENY_WRITE Mickaël Salaün
2025-08-22 19:45   ` Jann Horn
2025-08-24 11:03     ` Mickaël Salaün
2025-08-24 18:04       ` Andy Lutomirski
2025-08-25  9:31         ` Mickaël Salaün [this message]
2025-08-25  9:39           ` Florian Weimer
2025-08-26 12:35             ` Mickaël Salaün
2025-08-25 16:43           ` Andy Lutomirski
2025-08-25 18:10             ` Jeff Xu
2025-08-25 17:57           ` Jeff Xu
2025-08-26 12:39             ` Mickaël Salaün
2025-08-26 20:29               ` Jeff Xu
2025-08-27  8:19                 ` Mickaël Salaün
2025-08-28 20:17                   ` Jeff Xu
2025-08-27 10:18     ` Aleksa Sarai
2025-08-27 10:29   ` Aleksa Sarai
2025-08-22 17:08 ` [RFC PATCH v1 2/2] selftests/exec: Add O_DENY_WRITE tests Mickaël Salaün
2025-08-26  9:07 ` [RFC PATCH v1 0/2] Add O_DENY_WRITE (complement AT_EXECVE_CHECK) Christian Brauner
2025-08-26 11:23   ` Mickaël Salaün
2025-08-26 12:30     ` Theodore Ts'o
2025-08-26 17:47       ` Mickaël Salaün
2025-08-26 20:50         ` Theodore Ts'o
2025-08-27  8:19           ` Mickaël Salaün
2025-08-27 17:35         ` Andy Lutomirski
2025-08-27 19:07           ` Mickaël Salaün
2025-08-27 20:35             ` Andy Lutomirski
2025-08-28  0:14     ` Aleksa Sarai
2025-08-28  0:32       ` Andy Lutomirski
2025-08-28  0:52         ` Aleksa Sarai
2025-08-28 21:01         ` Serge E. Hallyn
2025-09-01 11:05           ` Jann Horn
2025-09-01 13:18             ` Serge E. Hallyn
2025-09-01 16:01             ` Andy Lutomirski
2025-09-01  9:24       ` Roberto Sassu
2025-09-01 16:25         ` Andy Lutomirski
2025-09-01 17:01           ` Roberto Sassu
2025-09-02  8:57             ` Roberto Sassu
  -- strict thread matches above, loose matches on Subject: below --
2025-08-25 21:56 [RFC PATCH v1 1/2] fs: Add O_DENY_WRITE Andy Lutomirski
2025-08-25 23:06 ` Jeff Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250825.mahNeel0dohz@digikod.net \
    --to=mic@digikod.net \
    --cc=ajordanr@google.com \
    --cc=arnd@arndb.de \
    --cc=bluca@debian.org \
    --cc=brauner@kernel.org \
    --cc=christian@python.org \
    --cc=corbet@lwn.net \
    --cc=dvyukov@google.com \
    --cc=enh@google.com \
    --cc=fweimer@redhat.com \
    --cc=jannh@google.com \
    --cc=jeffxu@chromium.org \
    --cc=jeffxu@google.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mattbobrowski@google.com \
    --cc=mszeredi@redhat.com \
    --cc=nicolas.bouchinet@oss.cyber.gouv.fr \
    --cc=nramas@linux.microsoft.com \
    --cc=paul@paul-moore.com \
    --cc=roberto.sassu@huawei.com \
    --cc=rowait@microsoft.com \
    --cc=scottsh@microsoft.com \
    --cc=serge@hallyn.com \
    --cc=sgrubb@redhat.com \
    --cc=steve.dower@python.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wufan@linux.microsoft.com \
    --cc=zohar@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).