public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>, linux-fsdevel@vger.kernel.org
Subject: [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
Date: Sat, 3 Aug 2024 23:50:54 +0100	[thread overview]
Message-ID: <20240803225054.GY5334@ZenIV> (raw)

[in vfs.git#fixes]

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor #128, despite #64 being observably not open.

The best way to fix that is in dup_fd() - there we both can easily check
whether we might need to fix anything up and see which word needs the
upper bits cleared.
    
Reproducer follows:

#define __GNU_SOURCE
#include <linux/close_range.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <sched.h>
#include <stdio.h>
#include <stdbool.h>
#include <sys/mman.h>
    
void is_open(int fd)
{
	printf("#%d is %s\n", fd,
		fcntl(fd, F_GETFD) >= 0 ? "opened" : "not opened");
}
    
int child(void *unused)
{
	while(1) {
	}
	return 0;
}
    
int main(void)
{
	char *stack;
	pid_t pid;

	stack = mmap(NULL, 1024*1024, PROT_READ | PROT_WRITE,
		     MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
	if (stack == MAP_FAILED) {
		perror("mmap");
		return -1;
	}

	pid = clone(child, stack + 1024*1024, CLONE_FILES | SIGCHLD, NULL);
	if (pid == -1) {
		perror("clone");
		return -1;
	}
	for (int i = 2; i < 128; i++)
	    dup2(0, i);
	close_range(64, ~0U, CLOSE_RANGE_UNSHARE);

	is_open(64);
	printf("dup(0) => %d, expected 64\n", dup(0));

	kill(pid, 9);
	return 0;
}

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/file.c b/fs/file.c
index a11e59b5d602..7f0ab8636d7c 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -380,6 +380,20 @@ struct files_struct *dup_fd(struct files_struct *oldf, unsigned int max_fds, int
 	}
 
 	copy_fd_bitmaps(new_fdt, old_fdt, open_files);
+	if (unlikely(max_fds != NR_OPEN_MAX)) {
+		/*
+		 * if there had been opened descriptors past open_files,
+		 * the last copied word in full_fds_bits might have picked
+		 * stray bits; nothing of that sort happens to open_fds and
+		 * close_on_exec, since there the part that needs to be copied
+		 * will always fall on a word boundary.
+		 */
+		unsigned int n = open_files / BITS_PER_LONG;
+		if (n % BITS_PER_LONG) {
+			unsigned long mask = BITMAP_LAST_WORD_MASK(n);
+			new_fdt->full_fds_bits[n / BITS_PER_LONG] &= mask;
+		}
+	}
 
 	old_fds = old_fdt->fd;
 	new_fds = new_fdt->fd;

             reply	other threads:[~2024-08-03 22:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-03 22:50 Al Viro [this message]
2024-08-03 23:06 ` [PATCH] fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE Al Viro
2024-08-03 23:51 ` Linus Torvalds
2024-08-04  0:05   ` Linus Torvalds
2024-08-04  0:34   ` Al Viro
2024-08-04  3:42     ` Linus Torvalds
2024-08-04  3:47     ` Al Viro
2024-08-04  4:17       ` Al Viro
2024-08-04 15:18       ` Linus Torvalds
2024-08-04 21:13         ` Al Viro
2024-08-05 23:44           ` Al Viro
2024-08-06  0:04             ` Linus Torvalds
2024-08-06  1:02               ` Al Viro
2024-08-06  8:41                 ` Christian Brauner
2024-08-06 16:32                   ` Al Viro
2024-08-06 17:01                     ` Linus Torvalds
2024-08-05  7:22 ` Christian Brauner
2024-08-05 18:54   ` Al Viro
2024-08-06  9:11     ` Christian Brauner
2024-08-05  9:48 ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240803225054.GY5334@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox