* [PATCH] fs/coredump: open coredump file in O_WRONLY instead of O_RDWR @ 2023-04-20 12:04 Vladimir Sementsov-Ogievskiy 2023-05-10 15:15 ` Vladimir Sementsov-Ogievskiy 0 siblings, 1 reply; 8+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2023-04-20 12:04 UTC (permalink / raw) To: Alexander Viro, Christian Brauner; +Cc: vsementsov, linux-fsdevel, linux-kernel This makes it possible to make stricter apparmor profile and don't allow the program to read any coredump in the system. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> --- fs/coredump.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/coredump.c b/fs/coredump.c index 5df1e6e1eb2b..8f263a389175 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -646,7 +646,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) } else { struct mnt_idmap *idmap; struct inode *inode; - int open_flags = O_CREAT | O_RDWR | O_NOFOLLOW | + int open_flags = O_CREAT | O_WRONLY | O_NOFOLLOW | O_LARGEFILE | O_EXCL; if (cprm.limit < binfmt->min_coredump) -- 2.34.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] fs/coredump: open coredump file in O_WRONLY instead of O_RDWR 2023-04-20 12:04 [PATCH] fs/coredump: open coredump file in O_WRONLY instead of O_RDWR Vladimir Sementsov-Ogievskiy @ 2023-05-10 15:15 ` Vladimir Sementsov-Ogievskiy 2023-05-15 17:55 ` Christian Brauner 0 siblings, 1 reply; 8+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2023-05-10 15:15 UTC (permalink / raw) To: Alexander Viro, Christian Brauner Cc: linux-fsdevel, linux-kernel, ptikhomirov, Andrey Ryabinin Gently ping. Is there any interest? On 20.04.23 15:04, Vladimir Sementsov-Ogievskiy wrote: > This makes it possible to make stricter apparmor profile and don't > allow the program to read any coredump in the system. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > --- > fs/coredump.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/coredump.c b/fs/coredump.c > index 5df1e6e1eb2b..8f263a389175 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -646,7 +646,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) > } else { > struct mnt_idmap *idmap; > struct inode *inode; > - int open_flags = O_CREAT | O_RDWR | O_NOFOLLOW | > + int open_flags = O_CREAT | O_WRONLY | O_NOFOLLOW | > O_LARGEFILE | O_EXCL; > > if (cprm.limit < binfmt->min_coredump) -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fs/coredump: open coredump file in O_WRONLY instead of O_RDWR 2023-05-10 15:15 ` Vladimir Sementsov-Ogievskiy @ 2023-05-15 17:55 ` Christian Brauner 2023-05-15 18:50 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Christian Brauner @ 2023-05-15 17:55 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy Cc: Alexander Viro, linux-fsdevel, linux-kernel, ptikhomirov, Andrey Ryabinin, Linus Torvalds On Wed, May 10, 2023 at 06:15:41PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Gently ping. > > Is there any interest? The question that I would've loved to have an answer to was why was it made O_RDWR and not just O_WRONLY in the first place. Was there a time when this was meaningful? Because honestly this looks innocent and straightforward and then it always makes me go and think "Oh, there's probably a good reason and something super obvious I'm missing.". Funny enough, this code was originally: if (open_namei("core",O_CREAT | O_WRONLY | O_TRUNC,0600,&inode,NULL) and then became: if (open_namei("core",O_CREAT | 2 | O_TRUNC,0600,&inode,NULL)) in commit 9cb9f18b5d26 ("[PATCH] Linux-0.99.10 (June 7, 1993)") Author/applier of said patch Cced (more for the fun of referencing Linux-0.99.10 than anything else). So after this commit the flag combination just got copied over and over. First when coredump handling was moved out of fs/exec.c into the individual binfmt handlers and then again when it was moved back into fs/exec.c and then again when it was moved to fs/coredump.c. So that open-coded 2 added in commit 9cb9f18b5d26 ("[PATCH] Linux-0.99.10 (June 7, 1993)") survived for 23 years until it was replaced by Jan in 378c6520e7d2 ("fs/coredump: prevent fsuid=0 dumps into user-controlled directories"). So no one could be bothered for 23 years to use O_RDWR instead of that lonely 2 which is kinda funny. :) In any case, I don't see anything that suggests this would cause issues. So I'm going to pick this up unless I'm being told I'm being obviously stupid and this absolutely needs to be O_RDWR... > > On 20.04.23 15:04, Vladimir Sementsov-Ogievskiy wrote: > > This makes it possible to make stricter apparmor profile and don't > > allow the program to read any coredump in the system. > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > > --- > > fs/coredump.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/coredump.c b/fs/coredump.c > > index 5df1e6e1eb2b..8f263a389175 100644 > > --- a/fs/coredump.c > > +++ b/fs/coredump.c > > @@ -646,7 +646,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) > > } else { > > struct mnt_idmap *idmap; > > struct inode *inode; > > - int open_flags = O_CREAT | O_RDWR | O_NOFOLLOW | > > + int open_flags = O_CREAT | O_WRONLY | O_NOFOLLOW | > > O_LARGEFILE | O_EXCL; > > if (cprm.limit < binfmt->min_coredump) > > -- > Best regards, > Vladimir > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fs/coredump: open coredump file in O_WRONLY instead of O_RDWR 2023-05-15 17:55 ` Christian Brauner @ 2023-05-15 18:50 ` Linus Torvalds 2023-05-15 19:13 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2023-05-15 18:50 UTC (permalink / raw) To: Christian Brauner Cc: Vladimir Sementsov-Ogievskiy, Alexander Viro, linux-fsdevel, linux-kernel, ptikhomirov, Andrey Ryabinin On Mon, May 15, 2023 at 10:55 AM Christian Brauner <brauner@kernel.org> wrote: > > So that open-coded 2 added in commit 9cb9f18b5d26 ("[PATCH] > Linux-0.99.10 (June 7, 1993)") survived for 23 years until it was > replaced by Jan in 378c6520e7d2 ("fs/coredump: prevent fsuid=0 dumps > into user-controlled directories"). Hmm. I can *not* for the life of me remember anything that far back, and our mail archives don't go that far back either. It's strange, because the "O_WRONLY" -> "2" change that changes to a magic raw number is right next to changing "(unsigned short) 0x10" to "KERNEL_DS", so we're getting *rid* of a magic raw number there. Which makes me think it was intentional, but I don't know why it wouldn't have used O_RDWR instead of "2". Back then we did *not* have O_EXCL in the core file creation flags, so I'm wondering if it was some half-arsed thing as in "do not allow core-files to overwrite non-readable files in-place". They'd still have to be *writable*, though, so that still seems more than a bit odd. I have this *dim* memory of us having had filesystems that required readability for over-writing existing file data (because we'd do a read-modify-write for the page cache, kind of like how you can't have write-only pages on many architectures). But while we didn't have O_EXCL, we *did* have O_TRUNC, so that should be a non-issue. I don't see a problem with making it O_WRONLY. Like it was 30 years ago. But that unexplained "O_WRONLY" -> "2" annoys me. It does feel like there was some reason for it. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fs/coredump: open coredump file in O_WRONLY instead of O_RDWR 2023-05-15 18:50 ` Linus Torvalds @ 2023-05-15 19:13 ` Linus Torvalds 2023-05-16 8:05 ` Vladimir Sementsov-Ogievskiy 2023-05-16 13:46 ` Christian Brauner 0 siblings, 2 replies; 8+ messages in thread From: Linus Torvalds @ 2023-05-15 19:13 UTC (permalink / raw) To: Christian Brauner Cc: Vladimir Sementsov-Ogievskiy, Alexander Viro, linux-fsdevel, linux-kernel, ptikhomirov, Andrey Ryabinin On Mon, May 15, 2023 at 11:50 AM Linus Torvalds <torvalds@linuxfoundation.org> wrote: > > It's strange, because the "O_WRONLY" -> "2" change that changes to a > magic raw number is right next to changing "(unsigned short) 0x10" to > "KERNEL_DS", so we're getting *rid* of a magic raw number there. Oh, no, never mind. I see what is going on. Back then, "open_namei()" didn't actually take O_RDWR style flags AT ALL. The O_RDONLY flags are broken, because you cannot say "open with no permissions", which we used internally. You have 0 - read-only 1 - write-only 2 - read-write but the internal code actually wants to match that up with the read-write permission bits (FMODE_READ etc). And then we've long had a special value for "open for special accesses" (format etc), which (naturally) was 3. So then the open code would do f->f_flags = flag = flags; f->f_mode = (flag+1) & O_ACCMODE; if (f->f_mode) flag++; which means that "f_mode" now becomes that FMODE_READ | FMODE_WRITE mask, and "flag" ends up being a translation from that O_RDWR space (0/1/2/3) into the FMODE_READ/WRITE space (1/2/3/3, where "special" required read-write permissions, and 0 was only used for symlinks). We still have that, although the code looks different. So back then, "open_namei()" took that FMODE_READ/WRITE flag as an argument, and the "O_WRONLY" -> "2" change is actually a bugfix and makes sense. The O_WRONLY thing was wrong, because it was 1, which actuall ymeant FMODE_READ. And back then, we didn't *have* FMODE_READ and FMODE_WRITE. So just writing it as "2" made sense, even if it was horrible. We added FMODE_WRITE later, but never fixed up those core file writers. So that 0.99pl10 commit from 1993 is actually correct, and the bug happened *later*. I think the real bug may have been in 2.2.4pre4 (February 16, 1999), when this happened: - dentry = open_namei(corefile,O_CREAT | 2 | O_TRUNC | O_NOFOLLOW, 0600); ... + file = filp_open(corefile,O_CREAT | 2 | O_TRUNC | O_NOFOLLOW, 0600); without realizing that the "2" in open_namei() should have become a O_WRONLY for filp_open(). So I think this explains it all. Very understandable mistake after all. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fs/coredump: open coredump file in O_WRONLY instead of O_RDWR 2023-05-15 19:13 ` Linus Torvalds @ 2023-05-16 8:05 ` Vladimir Sementsov-Ogievskiy 2023-05-16 13:46 ` Christian Brauner 1 sibling, 0 replies; 8+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2023-05-16 8:05 UTC (permalink / raw) To: Linus Torvalds, Christian Brauner Cc: Alexander Viro, linux-fsdevel, linux-kernel, ptikhomirov, Andrey Ryabinin On 15.05.23 22:13, Linus Torvalds wrote: > On Mon, May 15, 2023 at 11:50 AM Linus Torvalds > <torvalds@linuxfoundation.org> wrote: >> >> It's strange, because the "O_WRONLY" -> "2" change that changes to a >> magic raw number is right next to changing "(unsigned short) 0x10" to >> "KERNEL_DS", so we're getting *rid* of a magic raw number there. > > Oh, no, never mind. I see what is going on. > > Back then, "open_namei()" didn't actually take O_RDWR style flags AT ALL. > > The O_RDONLY flags are broken, because you cannot say "open with no > permissions", which we used internally. You have > > 0 - read-only > 1 - write-only > 2 - read-write > > but the internal code actually wants to match that up with the > read-write permission bits (FMODE_READ etc). > > And then we've long had a special value for "open for special > accesses" (format etc), which (naturally) was 3. > > So then the open code would do > > f->f_flags = flag = flags; > f->f_mode = (flag+1) & O_ACCMODE; > if (f->f_mode) > flag++; > > which means that "f_mode" now becomes that FMODE_READ | FMODE_WRITE > mask, and "flag" ends up being a translation from that O_RDWR space > (0/1/2/3) into the FMODE_READ/WRITE space (1/2/3/3, where "special" > required read-write permissions, and 0 was only used for symlinks). > > We still have that, although the code looks different. > > So back then, "open_namei()" took that FMODE_READ/WRITE flag as an > argument, and the "O_WRONLY" -> "2" change is actually a bugfix and > makes sense. The O_WRONLY thing was wrong, because it was 1, which > actuall ymeant FMODE_READ. > > And back then, we didn't *have* FMODE_READ and FMODE_WRITE. > > So just writing it as "2" made sense, even if it was horrible. We > added FMODE_WRITE later, but never fixed up those core file writers. > > So that 0.99pl10 commit from 1993 is actually correct, and the bug > happened *later*. > > I think the real bug may have been in 2.2.4pre4 (February 16, 1999), > when this happened: > > - dentry = open_namei(corefile,O_CREAT | 2 | O_TRUNC | O_NOFOLLOW, 0600); > ... > + file = filp_open(corefile,O_CREAT | 2 | O_TRUNC | O_NOFOLLOW, 0600); > > without realizing that the "2" in open_namei() should have become a > O_WRONLY for filp_open(). > > So I think this explains it all. > > Very understandable mistake after all. > > Linus Wow that's became a detective story, great thanks! [took note to check history myself next time] -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fs/coredump: open coredump file in O_WRONLY instead of O_RDWR 2023-05-15 19:13 ` Linus Torvalds 2023-05-16 8:05 ` Vladimir Sementsov-Ogievskiy @ 2023-05-16 13:46 ` Christian Brauner 2023-05-16 13:47 ` Christian Brauner 1 sibling, 1 reply; 8+ messages in thread From: Christian Brauner @ 2023-05-16 13:46 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy, Linus Torvalds Cc: Christian Brauner, Alexander Viro, linux-fsdevel, linux-kernel, ptikhomirov, Andrey Ryabinin On Thu, 20 Apr 2023 15:04:09 +0300, Vladimir Sementsov-Ogievskiy wrote: > This makes it possible to make stricter apparmor profile and don't > allow the program to read any coredump in the system. > > Applied to the vfs.misc branch of the vfs/vfs.git tree. Patches in the vfs.misc branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs.misc [1/1] fs/coredump: open coredump file in O_WRONLY instead of O_RDWR https://git.kernel.org/vfs/vfs/c/f84566e710af ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fs/coredump: open coredump file in O_WRONLY instead of O_RDWR 2023-05-16 13:46 ` Christian Brauner @ 2023-05-16 13:47 ` Christian Brauner 0 siblings, 0 replies; 8+ messages in thread From: Christian Brauner @ 2023-05-16 13:47 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy, Linus Torvalds Cc: Alexander Viro, linux-fsdevel, linux-kernel, ptikhomirov, Andrey Ryabinin On Tue, May 16, 2023 at 03:46:11PM +0200, Christian Brauner wrote: > On Thu, 20 Apr 2023 15:04:09 +0300, Vladimir Sementsov-Ogievskiy wrote: > > This makes it possible to make stricter apparmor profile and don't > > allow the program to read any coredump in the system. > > > > > > Applied to the vfs.misc branch of the vfs/vfs.git tree. > Patches in the vfs.misc branch should appear in linux-next soon. > > Please report any outstanding bugs that were missed during review in a > new review to the original patch series allowing us to drop it. > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git > branch: vfs.misc > > [1/1] fs/coredump: open coredump file in O_WRONLY instead of O_RDWR > https://git.kernel.org/vfs/vfs/c/f84566e710af I updated the patch to include all the details we unearthed about this. Linus, I added your SOB. It just made sense imho given that you provided quite some details on this. Let me know if that bothers you. The patch now is: From f84566e710af39895e54d8e812cd47e5e53db671 Mon Sep 17 00:00:00 2001 From: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Date: Thu, 20 Apr 2023 15:04:09 +0300 Subject: coredump: require O_WRONLY instead of O_RDWR The motivation for this patch has been to enable using a stricter apparmor profile to prevent programs from reading any coredump in the system. However, this became something else. The following details are based on Christian's and Linus' archeology into the history of the number "2" in the coredump handling code. To make sure we're not accidently introducing some subtle behavioral change into the coredump code we set out on a voyage into the depths of history.git to figure out why this was O_RDWR in the first place. Coredump handling was introduced over 30 years ago in commit ddc733f452e0 ("[PATCH] Linux-0.97 (August 1, 1992)"). The original code used O_WRONLY: open_namei("core",O_CREAT | O_WRONLY | O_TRUNC,0600,&inode,NULL) However, this changed in 1993 and starting with commit 9cb9f18b5d26 ("[PATCH] Linux-0.99.10 (June 7, 1993)") the coredump code suddenly used the constant "2": open_namei("core",O_CREAT | 2 | O_TRUNC,0600,&inode,NULL) This was curious as in the same commit the kernel switched from constants to proper defines in other places such as KERNEL_DS and USER_DS and O_RDWR did already exist. So why was "2" used? It turns out that open_namei() - an early version of what later turned into filp_open() - didn't accept O_RDWR. A semantic quirk of the open() uapi is the definition of the O_RDONLY flag. It would seem natural to define: #define O_RDWR (O_RDONLY | O_WRONLY) but that isn't possible because: #define O_RDONLY 0 This makes O_RDONLY effectively meaningless when passed to the kernel. In other words, there has never been a way - until O_PATH at least - to open a file without any permission; O_RDONLY was always implied on the uapi side while the kernel does in fact allow opening files without permissions. The trouble comes when trying to map the uapi flags onto the corresponding file mode flags FMODE_{READ,WRITE}. This mapping still happens today and is causing issues to this day (We ran into this during additions for openat2() for example.). So the special value "3" was used to indicate that the file was opened for special access: f->f_flags = flag = flags; f->f_mode = (flag+1) & O_ACCMODE; if (f->f_mode) flag++; This allowed the file mode to be set to FMODE_READ | FMODE_WRITE mapping the O_{RDONLY,WRONLY,RDWR} flags into the FMODE_{READ,WRITE} flags. The special access then required read-write permissions and 0 was used to access symlinks. But back when ddc733f452e0 ("[PATCH] Linux-0.97 (August 1, 1992)") added coredump handling open_namei() took the FMODE_{READ,WRITE} flags as an argument. So the coredump handling introduced in ddc733f452e0 ("[PATCH] Linux-0.97 (August 1, 1992)") was buggy because O_WRONLY shouldn't have been passed. Since O_WRONLY is 1 but open_namei() took FMODE_{READ,WRITE} it was passed FMODE_READ on accident. So 9cb9f18b5d26 ("[PATCH] Linux-0.99.10 (June 7, 1993)") was a bugfix for this and the 2 didn't really mean O_RDWR, it meant FMODE_WRITE which was correct. The clue is that FMODE_{READ,WRITE} didn't exist yet and thus a raw "2" value was passed. Fast forward 5 years when around 2.2.4pre4 (February 16, 1999) this code was changed to: - dentry = open_namei(corefile,O_CREAT | 2 | O_TRUNC | O_NOFOLLOW, 0600); ... + file = filp_open(corefile,O_CREAT | 2 | O_TRUNC | O_NOFOLLOW, 0600); At this point the raw "2" should have become O_WRONLY again as filp_open() didn't take FMODE_{READ,WRITE} but O_{RDONLY,WRONLY,RDWR}. Another 17 years later, the code was changed again cementing the mistake and making it almost impossible to detect when commit 378c6520e7d2 ("fs/coredump: prevent fsuid=0 dumps into user-controlled directories") replaced the raw "2" with O_RDWR. And now, here we are with this patch that sent us on a quest to answer the big questions in life such as "Why are coredump files opened with O_RDWR?" and "Is it safe to just use O_WRONLY?". So with this commit we're reintroducing O_WRONLY again and bringing this code back to its original state when it was first introduced in commit ddc733f452e0 ("[PATCH] Linux-0.97 (August 1, 1992)") over 30 years ago. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20230420120409.602576-1-vsementsov@yandex-team.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org> --- fs/coredump.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/coredump.c b/fs/coredump.c index ece7badf701bc..ead3b05fb8f48 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -646,7 +646,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) } else { struct mnt_idmap *idmap; struct inode *inode; - int open_flags = O_CREAT | O_RDWR | O_NOFOLLOW | + int open_flags = O_CREAT | O_WRONLY | O_NOFOLLOW | O_LARGEFILE | O_EXCL; if (cprm.limit < binfmt->min_coredump) -- cgit ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-05-16 13:48 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-04-20 12:04 [PATCH] fs/coredump: open coredump file in O_WRONLY instead of O_RDWR Vladimir Sementsov-Ogievskiy 2023-05-10 15:15 ` Vladimir Sementsov-Ogievskiy 2023-05-15 17:55 ` Christian Brauner 2023-05-15 18:50 ` Linus Torvalds 2023-05-15 19:13 ` Linus Torvalds 2023-05-16 8:05 ` Vladimir Sementsov-Ogievskiy 2023-05-16 13:46 ` Christian Brauner 2023-05-16 13:47 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).