public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
@ 2024-08-03 22:50 Al Viro
  2024-08-03 23:06 ` Al Viro
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Al Viro @ 2024-08-03 22:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christian Brauner, linux-fsdevel

[in vfs.git#fixes]

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor #128, despite #64 being observably not open.

The best way to fix that is in dup_fd() - there we both can easily check
whether we might need to fix anything up and see which word needs the
upper bits cleared.
    
Reproducer follows:

#define __GNU_SOURCE
#include <linux/close_range.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <sched.h>
#include <stdio.h>
#include <stdbool.h>
#include <sys/mman.h>
    
void is_open(int fd)
{
	printf("#%d is %s\n", fd,
		fcntl(fd, F_GETFD) >= 0 ? "opened" : "not opened");
}
    
int child(void *unused)
{
	while(1) {
	}
	return 0;
}
    
int main(void)
{
	char *stack;
	pid_t pid;

	stack = mmap(NULL, 1024*1024, PROT_READ | PROT_WRITE,
		     MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
	if (stack == MAP_FAILED) {
		perror("mmap");
		return -1;
	}

	pid = clone(child, stack + 1024*1024, CLONE_FILES | SIGCHLD, NULL);
	if (pid == -1) {
		perror("clone");
		return -1;
	}
	for (int i = 2; i < 128; i++)
	    dup2(0, i);
	close_range(64, ~0U, CLOSE_RANGE_UNSHARE);

	is_open(64);
	printf("dup(0) => %d, expected 64\n", dup(0));

	kill(pid, 9);
	return 0;
}

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/file.c b/fs/file.c
index a11e59b5d602..7f0ab8636d7c 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -380,6 +380,20 @@ struct files_struct *dup_fd(struct files_struct *oldf, unsigned int max_fds, int
 	}
 
 	copy_fd_bitmaps(new_fdt, old_fdt, open_files);
+	if (unlikely(max_fds != NR_OPEN_MAX)) {
+		/*
+		 * if there had been opened descriptors past open_files,
+		 * the last copied word in full_fds_bits might have picked
+		 * stray bits; nothing of that sort happens to open_fds and
+		 * close_on_exec, since there the part that needs to be copied
+		 * will always fall on a word boundary.
+		 */
+		unsigned int n = open_files / BITS_PER_LONG;
+		if (n % BITS_PER_LONG) {
+			unsigned long mask = BITMAP_LAST_WORD_MASK(n);
+			new_fdt->full_fds_bits[n / BITS_PER_LONG] &= mask;
+		}
+	}
 
 	old_fds = old_fdt->fd;
 	new_fds = new_fdt->fd;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-08-06 17:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-03 22:50 [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE Al Viro
2024-08-03 23:06 ` Al Viro
2024-08-03 23:51 ` Linus Torvalds
2024-08-04  0:05   ` Linus Torvalds
2024-08-04  0:34   ` Al Viro
2024-08-04  3:42     ` Linus Torvalds
2024-08-04  3:47     ` Al Viro
2024-08-04  4:17       ` Al Viro
2024-08-04 15:18       ` Linus Torvalds
2024-08-04 21:13         ` Al Viro
2024-08-05 23:44           ` Al Viro
2024-08-06  0:04             ` Linus Torvalds
2024-08-06  1:02               ` Al Viro
2024-08-06  8:41                 ` Christian Brauner
2024-08-06 16:32                   ` Al Viro
2024-08-06 17:01                     ` Linus Torvalds
2024-08-05  7:22 ` Christian Brauner
2024-08-05 18:54   ` Al Viro
2024-08-06  9:11     ` Christian Brauner
2024-08-05  9:48 ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox