From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>, linux-fsdevel@vger.kernel.org
Subject: [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
Date: Sat, 3 Aug 2024 23:50:54 +0100 [thread overview]
Message-ID: <20240803225054.GY5334@ZenIV> (raw)
[in vfs.git#fixes]
copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes. What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear. Otherwise we are risking garbage from the last word
we'd copied.
For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.
The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds. In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.
Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
* descriptor table being currently shared
* 'to' being above the current capacity of descriptor table
* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor #128, despite #64 being observably not open.
The best way to fix that is in dup_fd() - there we both can easily check
whether we might need to fix anything up and see which word needs the
upper bits cleared.
Reproducer follows:
#define __GNU_SOURCE
#include <linux/close_range.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <sched.h>
#include <stdio.h>
#include <stdbool.h>
#include <sys/mman.h>
void is_open(int fd)
{
printf("#%d is %s\n", fd,
fcntl(fd, F_GETFD) >= 0 ? "opened" : "not opened");
}
int child(void *unused)
{
while(1) {
}
return 0;
}
int main(void)
{
char *stack;
pid_t pid;
stack = mmap(NULL, 1024*1024, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
if (stack == MAP_FAILED) {
perror("mmap");
return -1;
}
pid = clone(child, stack + 1024*1024, CLONE_FILES | SIGCHLD, NULL);
if (pid == -1) {
perror("clone");
return -1;
}
for (int i = 2; i < 128; i++)
dup2(0, i);
close_range(64, ~0U, CLOSE_RANGE_UNSHARE);
is_open(64);
printf("dup(0) => %d, expected 64\n", dup(0));
kill(pid, 9);
return 0;
}
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/file.c b/fs/file.c
index a11e59b5d602..7f0ab8636d7c 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -380,6 +380,20 @@ struct files_struct *dup_fd(struct files_struct *oldf, unsigned int max_fds, int
}
copy_fd_bitmaps(new_fdt, old_fdt, open_files);
+ if (unlikely(max_fds != NR_OPEN_MAX)) {
+ /*
+ * if there had been opened descriptors past open_files,
+ * the last copied word in full_fds_bits might have picked
+ * stray bits; nothing of that sort happens to open_fds and
+ * close_on_exec, since there the part that needs to be copied
+ * will always fall on a word boundary.
+ */
+ unsigned int n = open_files / BITS_PER_LONG;
+ if (n % BITS_PER_LONG) {
+ unsigned long mask = BITMAP_LAST_WORD_MASK(n);
+ new_fdt->full_fds_bits[n / BITS_PER_LONG] &= mask;
+ }
+ }
old_fds = old_fdt->fd;
new_fds = new_fdt->fd;
next reply other threads:[~2024-08-03 22:50 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-03 22:50 Al Viro [this message]
2024-08-03 23:06 ` [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE Al Viro
2024-08-03 23:51 ` Linus Torvalds
2024-08-04 0:05 ` Linus Torvalds
2024-08-04 0:34 ` Al Viro
2024-08-04 3:42 ` Linus Torvalds
2024-08-04 3:47 ` Al Viro
2024-08-04 4:17 ` Al Viro
2024-08-04 15:18 ` Linus Torvalds
2024-08-04 21:13 ` Al Viro
2024-08-05 23:44 ` Al Viro
2024-08-06 0:04 ` Linus Torvalds
2024-08-06 1:02 ` Al Viro
2024-08-06 8:41 ` Christian Brauner
2024-08-06 16:32 ` Al Viro
2024-08-06 17:01 ` Linus Torvalds
2024-08-05 7:22 ` Christian Brauner
2024-08-05 18:54 ` Al Viro
2024-08-06 9:11 ` Christian Brauner
2024-08-05 9:48 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240803225054.GY5334@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=brauner@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.