linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ETXTBSY window in __fput
@ 2025-08-26 21:05 Alexander Monakov
  2025-08-26 22:00 ` Al Viro
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Alexander Monakov @ 2025-08-26 21:05 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1987 bytes --]

Dear fs hackers,

I suspect there's an unfortunate race window in __fput where file locks are
dropped (locks_remove_file) prior to decreasing writer refcount
(put_file_access). If I'm not mistaken, this window is observable and it
breaks a solution to ETXTBSY problem on exec'ing a just-written file, explained
in more detail below.

The program demonstrating the problem is attached (a slightly modified version
of the demo given by Russ Cox on the Go issue tracker, see URL in first line).
It makes 20 threads, each executing an infinite loop doing the following:

1) open an fd for writing with O_CLOEXEC
2) write executable code into it
3) close it
4) fork
5) in the child, attempt to execve the just-written file

If you compile it with -DNOWAIT, you'll see that execve often fails with
ETXTBSY. This happens if another thread forked while we were holding an open fd
between steps 1 and 3, our fd "leaked" in that child, and then we reached our
step 5 before that child did execve (at which point the leaked fd would be
closed thanks to O_CLOEXEC).

I suggested on the Go bugreport that the problem can be solved without any
inter-thread cooperation by utilizing BSD locks. Replace step 3 by

3a) place an exlusive lock on the file identified by fd (flock(fd, LOCK_EX))
3b) close the fd
3c) open an fd on the same path again
3d) place a lock on it again
3e) close it again

Since BSD locks are placed via the open file description, the lock placed at
step 3a is not released until all descriptors duplicated via forks are closed.
Hence, at step 3d we wait until all forked children proceeded to execve.

Recently another person tried this solution and observed that they still see the
errors, albeit at a much lower rate, about three per 30 minutes (I've not been
able to replicate that). I suspect the race window from the first paragraph
makes that possible.

If so, would it be possible to close that window? Would be nice to have this
algorithm work reliably.

Thanks.
Alexander

[-- Attachment #2: Type: text/plain, Size: 1450 bytes --]

/* ETXTBSY race example from https://github.com/golang/go/issues/22315 */
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <pthread.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <sys/file.h>

static void *runner(void *);

int
main(void)
{
	pthread_t thr[20];

	for (int i=1; i<20; i++)
		pthread_create(&thr[i], 0, runner, (void*)(uintptr_t)i);
	runner(0);
}

static const char *script = "#!/bin/sh\nexit 0\n";

static void *
runner(void *v)
{
	int i, fd, pid, status;
	char buf[100], *argv[2];

	i = (int)(uintptr_t)v;
	snprintf(buf, sizeof buf, "txtbusy-%d", i);
	argv[0] = buf;
	argv[1] = 0;
	for(;;) {
		fd = open(buf, O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0777);
		if(fd < 0) {
			perror("open");
			exit(2);
		}
		write(fd, script, strlen(script));
#ifndef NOWAIT
		flock(fd, LOCK_EX);
		close(fd);
		fd = open(buf, O_RDONLY|O_CLOEXEC);
		flock(fd, LOCK_SH);
#endif
		close(fd);
		pid = fork();
		if(pid < 0) {
			perror("fork");
			exit(2);
		}
		if(pid == 0) {
			execve(buf, argv, 0);
			exit(errno);
		}
		if(waitpid(pid, &status, 0) < 0) {
			perror("waitpid");
			exit(2);
		}
		if(!WIFEXITED(status)) {
			perror("waitpid not exited");
			exit(2);
		}
		status = WEXITSTATUS(status);
		if(status != 0)
			fprintf(stderr, "exec: %d %s\n", status, strerror(status));
	}
	return 0;
}

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-09-02 10:36 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-26 21:05 ETXTBSY window in __fput Alexander Monakov
2025-08-26 22:00 ` Al Viro
2025-08-27  7:22   ` Alexander Monakov
2025-08-27 11:52     ` Theodore Ts'o
2025-08-27 13:05       ` Alexander Monakov
2025-08-31 19:22         ` David Laight
2025-09-01  8:44           ` Jan Kara
2025-08-27 13:16     ` Aleksa Sarai
2025-08-27 14:29       ` Alexander Monakov
2025-08-29  7:21 ` Alexander Monakov
2025-08-29  9:47   ` Christian Brauner
2025-08-29 10:17     ` Alexander Monakov
2025-08-29 11:07       ` Christian Brauner
2025-08-29 11:45         ` Alexander Monakov
2025-08-29 14:02           ` Jan Kara
2025-09-01 17:53             ` Alexander Monakov
2025-09-02 10:36               ` Jan Kara
2025-08-29 18:32 ` Colin Walters
2025-09-01 18:39 ` Mateusz Guzik
2025-09-01 19:57   ` Colin Walters
2025-09-01 20:22     ` Mateusz Guzik
2025-09-02  8:33   ` Christian Brauner
2025-09-02  8:44     ` Mateusz Guzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).