linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Reproducer] Corruption, possible race between splice and FALLOC_FL_PUNCH_HOLE
@ 2023-06-27  1:12 Matt Whitlock
  2023-06-27  5:47 ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Matt Whitlock @ 2023-06-27  1:12 UTC (permalink / raw)
  To: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2403 bytes --]

Hello, all. I am experiencing a data corruption issue on Linux 6.1.24 when 
calling fallocate with FALLOC_FL_PUNCH_HOLE to punch out pages that have 
just been spliced into a pipe. It appears that the fallocate call can zero 
out the pages that are sitting in the pipe buffer, before those pages are 
read from the pipe.

Simplified code excerpt (eliding error checking):

int fd = /* open file descriptor referring to some disk file */;
for (off_t consumed = 0;;) {
   ssize_t n = splice(fd, NULL, STDOUT_FILENO, NULL, SIZE_MAX, 0);
   if (n <= 0) break;
   consumed += n;
   fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0, consumed);
}

Expected behavior:
Punching holes in a file after splicing pages out of that file into a pipe 
should not corrupt the spliced-out pages in the pipe buffer.

Observed behavior:
Some of the pages that have been spliced into the pipe get zeroed out by 
the subsequent fallocate call before they can be consumed from the read 
side of the pipe.


Steps to reproduce:

1. Save the attached ones.c, dontneed.c, and consume.c.

2. gcc -o ones ones.c
   gcc -o dontneed dontneed.c
   gcc -o consume consume.c

3. Fill a file with 32 MiB of 0xFF:
   ./ones | head -c$((1<<25)) >testfile

4. Evict the pages of the file from the page cache:
   sync testfile && ./dontneed testfile

5. Splice the file into a pipe, punching out batches of pages after 
splicing them:
   ./consume testfile | hexdump -C

The expected output from hexdump should show 32 MiB of 0xFF. Indeed, on my 
system, if I omit the POSIX_FADV_DONTNEED advice, then I do get the 
expected output. However, if the pages of the file are not already present 
in the page cache (i.e., if the splice call faults them in from disk), then 
the hexdump output shows some pages full of 0xFF and some pages full of 
0x00.

Note #1: I am running a fairly antiquated x86_64 system. You may need to 
use a file larger than 32 MiB to reproduce the misbehavior on a more modern 
system. In particular, even I cannot reproduce the problem when I use 16 
MiB. Conversely, when I use a very large file (too large to fit entirely in 
the page cache), then I don't need the "dontneed" call to reproduce the 
problem.

Note #2: I am siting my test file on an XFS file system running on a 
hardware RAID volume at /dev/sda1. I'm not sure if that's relevant.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ones.c --]
[-- Type: text/x-csrc, Size: 295 bytes --]

#include <string.h>

#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <sysexits.h>
#include <unistd.h>

int main() {
	char buf[4096];
	memset(buf, 0xFF, sizeof buf);
	for (;;)
		if (write(STDOUT_FILENO, buf, sizeof buf) < 0)
			error(EX_IOERR, errno, "write");
	return EX_OK;
}

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: dontneed.c --]
[-- Type: text/x-csrc, Size: 654 bytes --]

#include <stdio.h>

#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <sysexits.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
	if (argc < 2) {
		printf("usage: %s <file> [...]\n", argc > 0 ? *argv : "dontneed");
		return EX_USAGE;
	}
	int ret = EX_OK;
	while (++argv, --argc > 0) {
		int fd = open(*argv, O_RDONLY);
		if (fd < 0) {
			error(0, errno, "%s", *argv);
			ret = ret ?: EX_NOINPUT;
			continue;
		}
		if (posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED) < 0) {
			error(0, errno, "%s: posix_fadvise", *argv);
			ret = EX_OSERR;
		}
		if (close(fd) < 0)
			error(EX_OSERR, errno, "%s: close", *argv);
	}
	return ret;
}

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: consume.c --]
[-- Type: text/x-csrc, Size: 888 bytes --]

#define _GNU_SOURCE

#include <stdint.h>

#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <sysexits.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
	++argv, --argc;
	do {
		const char *filename = "<stdin>";
		if (argc > 0) {
			close(STDIN_FILENO);
			if (open(filename = *argv, O_RDWR) != STDIN_FILENO)
				error(EX_NOINPUT, errno, "%s", filename);
		}
		for (off_t consumed = 0;;) {
			ssize_t n = splice(STDIN_FILENO, NULL, STDOUT_FILENO, NULL, SIZE_MAX, 0);
			if (n <= 0) {
				if (n < 0)
					error(EX_IOERR, errno, "%s: splice", filename);
				break;
			}
			if (fallocate(STDIN_FILENO, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0, consumed += n) < 0)
				error(EX_OSERR, errno, "%s: fallocate", filename);
		}
		if (argc > 0 && unlink(filename) < 0)
			error(EX_OSERR, errno, "%s: unlink", filename);
	} while (++argv, --argc > 0);
	return EX_OK;
}

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-06-28 22:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-27  1:12 [Reproducer] Corruption, possible race between splice and FALLOC_FL_PUNCH_HOLE Matt Whitlock
2023-06-27  5:47 ` Dave Chinner
2023-06-27  6:20   ` Dave Chinner
2023-06-27 10:31   ` Matthew Wilcox
2023-06-27 18:38     ` Matt Whitlock
2023-06-27 21:49     ` Dave Chinner
2023-06-27 18:14   ` Matt Whitlock
2023-06-27 21:51     ` Dave Chinner
2023-06-28  6:30   ` David Howells
2023-06-28  8:15     ` Dave Chinner
2023-06-28  9:33     ` David Howells
2023-06-28 12:51       ` Matthew Wilcox
2023-06-28 14:19       ` David Howells
2023-06-28 22:41       ` Dave Chinner
2023-06-28 17:35     ` Matt Whitlock
2023-06-28 18:27     ` David Howells
2023-06-28 22:17       ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).