From: Hanna Czenczek <hreitz@redhat.com>
To: linux-btrfs@vger.kernel.org
Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>,
Filipe Manana <fdmanana@suse.com>,
Stefano Garzarella <sgarzare@redhat.com>
Subject: Appending from unpopulated page creates hole
Date: Thu, 25 Jul 2024 11:15:57 +0200 [thread overview]
Message-ID: <0b841d46-12fe-4e64-9abb-871d8d0de271@redhat.com> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 2200 bytes --]
Hi,
I’ve noticed that appending to a file on btrfs will create a hole before
the appended data under certain circumstances:
- Appending means O_APPEND or RWF_APPEND,
- Writing is done in direct mode, i.e. O_DIRECT, and
- The source buffer is not present in the page tables (what mmap() calls
“unpopulated”).
The hole seems to generally have the size of the data being written
(i.e. a 64k write will create a 64k hole, followed by the 64k of data we
actually wanted to write), but I assume this is true only up to a
specific size (depending on how the request is split in the kernel).
I’ve appended a reproducer.
We encounter this problem with virtio-fs (sharing of directories between
a virtual machine host and guest), where writing is done by virtiofsd, a
process external to the VMM (e.g. qemu), but the buffer comes from the
VM guest. Memory is shared between virtiofsd and the VMM, so virtiofsd
often writes data from shared memory without having accessed it itself,
so it isn’t present in virtiofsd’s page tables.
Stefano Garzarella (CC-ed) has tested |some Fedora kernels, and has
confirmed that this bug does not appear in ||6.0.7-301.fc37.x86_64, but
does appear in ||6.0.9-300.fc37.x86_64.
While I don’t know anything about btrfs code, I looked into it, and I
believe the changes made by commit
8184620ae21213d51eaf2e0bd4186baacb928172 (btrfs: fix lost file sync on
direct IO write with nowait and dsync iocb) may have caused this.
Specifically, it changed the `goto again` on EFAULT to `goto relock`, a
much earlier label, which causes btrfs_direct_write() to call
generic_write_checks() again after the first btrfs_dio_write() attempt.
btrfs_dio_write() will have already allocated extents for the data and
updated the file length before trying to actually write the data (which
generates the EFAULT), so this second generic_write_checks() call will
update the I/O position to this new file length, exactly at the end of
the area to where the data was supposed to be written.
To test this hypothesis, I’ve tried skipping the generic_write_checks()
call after reaching this specific goto, and that does make the bug
disappear.
Hanna
|
[-- Attachment #1.2: Type: text/html, Size: 2906 bytes --]
[-- Attachment #2: repro.c --]
[-- Type: text/x-csrc, Size: 879 bytes --]
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <fcntl.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
if (argc < 2) {
fprintf(stderr, "Usage: %s <test file>\n", argv[0]);
return 1;
}
int fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT | O_APPEND, 0644);
if (fd < 0) {
perror("creating test file");
return 1;
}
char *buf = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
ssize_t ret = write(fd, buf, 4096);
if (ret < 0) {
perror("pwritev2");
return 1;
}
struct stat stbuf;
ret = fstat(fd, &stbuf);
if (ret < 0) {
perror("stat");
return 1;
}
printf("size: %llu\n", (unsigned long long)stbuf.st_size);
return stbuf.st_size == 4096 ? 0 : 1;
}
next reply other threads:[~2024-07-25 9:16 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-25 9:15 Hanna Czenczek [this message]
2024-07-25 10:19 ` Appending from unpopulated page creates hole Filipe Manana
2024-07-25 12:23 ` Hanna Czenczek
2024-07-26 16:13 ` Filipe Manana
2024-07-26 16:32 ` Filipe Manana
2024-07-29 11:54 ` Hanna Czenczek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0b841d46-12fe-4e64-9abb-871d8d0de271@redhat.com \
--to=hreitz@redhat.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=fdmanana@suse.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=sgarzare@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox