From: Sunil Mushran <sunil.mushran@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] Fwd: [PATCH 0/2] Fix aio completion vs unwritten extents
Date: Tue, 22 Jun 2010 10:21:44 -0700 [thread overview]
Message-ID: <4C20F128.8020705@oracle.com> (raw)
Tristan,
The reflink test is missing aio bits. Please could you add it.
In short, a pread issued after an aio write, must return that
data. This should be tested for both direct and buffered ios.
Similarly, fill_holes also needs to be enhanced.
Thanks
Sunil
-------- Original Message --------
Subject: [PATCH 0/2] Fix aio completion vs unwritten extents
Date: Tue, 22 Jun 2010 08:21:44 -0400
From: Christoph Hellwig <hch@infradead.org>
To: linux-fsdevel at vger.kernel.org, xfs at oss.sgi.com,
linux-ext4 at vger.kernel.org
Some filesystems (XFS and ext4) have support for a concept called
unwritten extents, where we can write data into holes / preallocated
space and only mark them as allocated when the data I/O has finished.
Because the transaction to convert the extent can't be submitted from
I/O completion, which normally happens from IRQ context it needs to
be defered to a workqueue. This is not a problem for buffered I/O
where we keep the data in cache at least until the I/O operation has
finished, but it is an issue for direct I/O. XFS avoids that problem
for synchronous direct I/O by waiting for all unwritten extent conversions
to finish if we did one during direct I/O, but so far has ignored the
problem for asynchronous I/O. Unfortunately the race is very easy
to hit by using QEMU with native AIO support on a sparse image, and
the result is filesystem corruption in the guest.
This contains core direct I/O changes to allow the filesystem to delay
AIO completion, as well as a patch to fix XFS. ext4 also has the same
issue, and from a quick look also doesn't properly complete unwritten
extent conversions for synchronous direct I/O, but I'll leave that
for someone more familar to figure out.
Below is a minimal reproducer for the issue. Given that we're dealing
with a race condition it doesn't always fail, but in 2 core laptop
it triggers 100% reproducibly in 20 runs in a loop.
---
#define _GNU_SOURCE
#include<sys/stat.h>
#include<sys/types.h>
#include<errno.h>
#include<fcntl.h>
#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<libaio.h>
#define BUF_SIZE 4096
#define IO_PATTERN 0xab
int main(int argc, char *argv[])
{
struct io_context *ctx = NULL;
struct io_event ev;
struct iocb iocb, *iocbs[] = {&iocb };
void *buf;
char cmp_buf[BUF_SIZE];
int fd, err = 0;
fd = open(argv[1], O_DIRECT | O_CREAT | O_TRUNC | O_RDWR, 0600);
if (fd == -1) {
perror("open");
return 1;
}
err = posix_memalign(&buf, BUF_SIZE, BUF_SIZE);
if (err) {
fprintf(stderr, "error %s during %s\n",
strerror(-err),
"posix_memalign");
return 1;
}
memset(buf, IO_PATTERN, BUF_SIZE);
memset(cmp_buf, IO_PATTERN, BUF_SIZE);
/*
* Truncate to some random large file size. Just make sure
* it's not smaller than our I/O size.
*/
if (ftruncate(fd, 1024 * 1024 * 1024)< 0) {
perror("ftruncate");
return 1;
}
/*
* Do a simple 4k write into a hole using aio.
*/
err = io_setup(1,&ctx);
if (err) {
fprintf(stderr, "error %s during %s\n",
strerror(-err),
"io_setup");
return 1;
}
io_prep_pwrite(&iocb, fd, buf, BUF_SIZE, 0);
err = io_submit(ctx, 1, iocbs);
if (err != 1) {
fprintf(stderr, "error %s during %s\n",
strerror(-err),
"io_submit");
return 1;
}
err = io_getevents(ctx, 1, 1,&ev, NULL);
if (err != 1) {
fprintf(stderr, "error %s during %s\n",
strerror(-err),
"io_getevents");
return 1;
}
/*
* And then read it back.
*
* Using pread to keep it simple, but AIO has the same effect.
*/
if (pread(fd, buf, BUF_SIZE, 0) != BUF_SIZE) {
perror("pread");
return 1;
}
/*
* And depending on the machine we'll just get zeroes back quite
* often here. That's because the unwritten extent conversion
* hasn't finished.
*/
if (memcmp(buf, cmp_buf, BUF_SIZE)) {
unsigned long long *ubuf = (unsigned long long *)buf;
int i;
for (i = 0; i< BUF_SIZE / sizeof(unsigned long long); i++)
printf("%d: 0x%llx\n", i, ubuf[i]);
return 1;
}
return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo at vger.kernel.org
More majordomo info@ http://vger.kernel.org/majordomo-info.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20100622/21bb77b6/attachment-0001.html
next reply other threads:[~2010-06-22 17:21 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-22 17:21 Sunil Mushran [this message]
2010-06-23 1:50 ` [Ocfs2-devel] Fwd: [PATCH 0/2] Fix aio completion vs unwritten extents tristan
2010-06-23 2:33 ` [Ocfs2-devel] " Sunil Mushran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C20F128.8020705@oracle.com \
--to=sunil.mushran@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.