All of lore.kernel.org
 help / color / mirror / Atom feed
From: tristan <tristan.ye@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] Fwd: [PATCH 0/2] Fix aio completion vs unwritten extents
Date: Wed, 23 Jun 2010 09:50:54 +0800	[thread overview]
Message-ID: <4C21687E.5090800@oracle.com> (raw)
In-Reply-To: <4C20F128.8020705@oracle.com>

Sunil Mushran wrote:
> Tristan,
>
> The reflink test is missing aio bits. Please could you add it.
> In short, a pread issued after an aio write, must return that
> data. This should be tested for both direct and buffered ios.
>
> Similarly, fill_holes also needs to be enhanced.

Gotta, verifications for odirect and buffered ios have already been 
included in current tests I guess.

What's more, maybe we can also verify the data from aio_read() after 
pwrite() completed.

Tristan.

>
> Thanks
> Sunil
>
> -------- Original Message --------
> Subject: 	[PATCH 0/2] Fix aio completion vs unwritten extents
> Date: 	Tue, 22 Jun 2010 08:21:44 -0400
> From: 	Christoph Hellwig <hch@infradead.org>
> To: 	linux-fsdevel at vger.kernel.org, xfs at oss.sgi.com, 
> linux-ext4 at vger.kernel.org
>
>
>
> Some filesystems (XFS and ext4) have support for a concept called
> unwritten extents, where we can write data into holes / preallocated
> space and only mark them as allocated when the data I/O has finished.
>
> Because the transaction to convert the extent can't be submitted from
> I/O completion, which normally happens from IRQ context it needs to
> be defered to a workqueue.  This is not a problem for buffered I/O
> where we keep the data in cache at least until the I/O operation has
> finished, but it is an issue for direct I/O.  XFS avoids that problem
> for synchronous direct I/O by waiting for all unwritten extent conversions
> to finish if we did one during direct I/O, but so far has ignored the
> problem for asynchronous I/O.  Unfortunately the race is very easy
> to hit by using QEMU with native AIO support on a sparse image, and
> the result is filesystem corruption in the guest.
>
> This contains core direct I/O changes to allow the filesystem to delay
> AIO completion, as well as a patch to fix XFS.  ext4 also has the same
> issue, and from a quick look also doesn't properly complete unwritten
> extent conversions for synchronous direct I/O, but I'll leave that
> for someone more familar to figure out.
>
> Below is a minimal reproducer for the issue.  Given that we're dealing
> with a race condition it doesn't always fail, but in 2 core laptop
> it triggers 100% reproducibly in 20 runs in a loop.
>
> ---
>
> #define _GNU_SOURCE
>
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> #include <libaio.h>
>
> #define BUF_SIZE	4096
> #define IO_PATTERN	0xab
>
> int main(int argc, char *argv[])
> {
> 	struct io_context *ctx = NULL;
> 	struct io_event ev;
> 	struct iocb iocb, *iocbs[] = { &iocb };
> 	void *buf;
> 	char cmp_buf[BUF_SIZE];
> 	int fd, err = 0;
>
> 	fd = open(argv[1], O_DIRECT | O_CREAT | O_TRUNC | O_RDWR, 0600);
> 	if (fd == -1) {
> 		perror("open");
> 		return 1;
> 	}
>
> 	err = posix_memalign(&buf, BUF_SIZE, BUF_SIZE);
> 	if (err) {
> 		fprintf(stderr, "error %s during %s\n",
> 			strerror(-err),
> 			"posix_memalign");
> 		return 1;
> 	}
> 	memset(buf, IO_PATTERN, BUF_SIZE);
> 	memset(cmp_buf, IO_PATTERN, BUF_SIZE);
>
> 	/*
> 	 * Truncate to some random large file size.  Just make sure
> 	 * it's not smaller than our I/O size.
> 	 */
> 	if (ftruncate(fd, 1024 * 1024 * 1024) < 0) {
> 		perror("ftruncate");
> 		return 1;
> 	}
>
>
> 	/*
> 	 * Do a simple 4k write into a hole using aio.
> 	 */
> 	err = io_setup(1, &ctx);
> 	if (err) {
> 		fprintf(stderr, "error %s during %s\n",
> 			strerror(-err),
> 			"io_setup");
> 		return 1;
> 	}
>
> 	io_prep_pwrite(&iocb, fd, buf, BUF_SIZE, 0);
>
> 	err = io_submit(ctx, 1, iocbs);
> 	if (err != 1) {
> 		fprintf(stderr, "error %s during %s\n",
> 			strerror(-err),
> 			"io_submit");
> 		return 1;
> 	}
>
> 	err = io_getevents(ctx, 1, 1, &ev, NULL);
> 	if (err != 1) {
> 		fprintf(stderr, "error %s during %s\n",
> 			strerror(-err),
> 			"io_getevents");
> 		return 1;
> 	}
>
> 	/*
> 	 * And then read it back.
> 	 *
> 	 * Using pread to keep it simple, but AIO has the same effect.
> 	 */
> 	if (pread(fd, buf, BUF_SIZE, 0) != BUF_SIZE) {
> 		perror("pread");
> 		return 1;
> 	}
>
> 	/*
> 	 * And depending on the machine we'll just get zeroes back quite
> 	 * often here.  That's because the unwritten extent conversion
> 	 * hasn't finished.
> 	 */
> 	if (memcmp(buf, cmp_buf, BUF_SIZE)) {
> 		unsigned long long *ubuf = (unsigned long long *)buf;
> 		int i;
>
> 		for (i = 0; i < BUF_SIZE / sizeof(unsigned long long); i++)
> 			printf("%d: 0x%llx\n", i, ubuf[i]);
> 			
> 		return 1;
> 	}
>
> 	return 0;
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

  reply	other threads:[~2010-06-23  1:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-22 17:21 [Ocfs2-devel] Fwd: [PATCH 0/2] Fix aio completion vs unwritten extents Sunil Mushran
2010-06-23  1:50 ` tristan [this message]
2010-06-23  2:33   ` [Ocfs2-devel] " Sunil Mushran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C21687E.5090800@oracle.com \
    --to=tristan.ye@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.