From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>, linux-fsdevel@vger.kernel.org
Subject: [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
Date: Sat, 3 Aug 2024 23:50:54 +0100 [thread overview]
Message-ID: <20240803225054.GY5334@ZenIV> (raw)
[in vfs.git#fixes]
copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes. What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear. Otherwise we are risking garbage from the last word
we'd copied.
For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.
The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds. In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.
Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
* descriptor table being currently shared
* 'to' being above the current capacity of descriptor table
* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor #128, despite #64 being observably not open.
The best way to fix that is in dup_fd() - there we both can easily check
whether we might need to fix anything up and see which word needs the
upper bits cleared.
Reproducer follows:
#define __GNU_SOURCE
#include <linux/close_range.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <sched.h>
#include <stdio.h>
#include <stdbool.h>
#include <sys/mman.h>
void is_open(int fd)
{
printf("#%d is %s\n", fd,
fcntl(fd, F_GETFD) >= 0 ? "opened" : "not opened");
}
int child(void *unused)
{
while(1) {
}
return 0;
}
int main(void)
{
char *stack;
pid_t pid;
stack = mmap(NULL, 1024*1024, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
if (stack == MAP_FAILED) {
perror("mmap");
return -1;
}
pid = clone(child, stack + 1024*1024, CLONE_FILES | SIGCHLD, NULL);
if (pid == -1) {
perror("clone");
return -1;
}
for (int i = 2; i < 128; i++)
dup2(0, i);
close_range(64, ~0U, CLOSE_RANGE_UNSHARE);
is_open(64);
printf("dup(0) => %d, expected 64\n", dup(0));
kill(pid, 9);
return 0;
}
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/file.c b/fs/file.c
index a11e59b5d602..7f0ab8636d7c 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -380,6 +380,20 @@ struct files_struct *dup_fd(struct files_struct *oldf, unsigned int max_fds, int
}
copy_fd_bitmaps(new_fdt, old_fdt, open_files);
+ if (unlikely(max_fds != NR_OPEN_MAX)) {
+ /*
+ * if there had been opened descriptors past open_files,
+ * the last copied word in full_fds_bits might have picked
+ * stray bits; nothing of that sort happens to open_fds and
+ * close_on_exec, since there the part that needs to be copied
+ * will always fall on a word boundary.
+ */
+ unsigned int n = open_files / BITS_PER_LONG;
+ if (n % BITS_PER_LONG) {
+ unsigned long mask = BITMAP_LAST_WORD_MASK(n);
+ new_fdt->full_fds_bits[n / BITS_PER_LONG] &= mask;
+ }
+ }
old_fds = old_fdt->fd;
new_fds = new_fdt->fd;
next reply other threads:[~2024-08-03 22:50 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-03 22:50 Al Viro [this message]
2024-08-03 23:06 ` [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE Al Viro
2024-08-03 23:51 ` Linus Torvalds
2024-08-04 0:05 ` Linus Torvalds
2024-08-04 0:34 ` Al Viro
2024-08-04 3:42 ` Linus Torvalds
2024-08-04 3:47 ` Al Viro
2024-08-04 4:17 ` Al Viro
2024-08-04 15:18 ` Linus Torvalds
2024-08-04 21:13 ` Al Viro
2024-08-05 23:44 ` Al Viro
2024-08-06 0:04 ` Linus Torvalds
2024-08-06 1:02 ` Al Viro
2024-08-06 8:41 ` Christian Brauner
2024-08-06 16:32 ` Al Viro
2024-08-06 17:01 ` Linus Torvalds
2024-08-05 7:22 ` Christian Brauner
2024-08-05 18:54 ` Al Viro
2024-08-06 9:11 ` Christian Brauner
2024-08-05 9:48 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240803225054.GY5334@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=brauner@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox