public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [BUG] ext2/3/4: dio reads stale data when we do some append dio writes
       [not found]         ` <20131119111826.GA20485@infradead.org>
@ 2013-11-19 11:51           ` Zheng Liu
  2013-11-19 12:09             ` Dave Chinner
  2013-11-19 12:01           ` Dave Chinner
  1 sibling, 1 reply; 5+ messages in thread
From: Zheng Liu @ 2013-11-19 11:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4, xfs

On Tue, Nov 19, 2013 at 03:18:26AM -0800, Christoph Hellwig wrote:
> On Tue, Nov 19, 2013 at 07:19:47PM +0800, Zheng Liu wrote:
> > Yes, I know that XFS has a shared/exclusive lock.  I guess that is why
> > it can pass the test.  But another question is why xfs fails when we do
> > some append dio writes with doing buffered read.
> 
> Can you provide a test case for that issue?

Simple.  Reader just need to open this file without O_DIRECT flag.  I
paste the full code snippet below.  Please take care of this line:
	readfd = open(argv[1], /*O_DIRECT|*/O_RDONLY, S_IRWXU);

The result of this program on my own sand box looks like below:
        encounter an error: offset 0

                                                - Zheng

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <memory.h>

#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>

#include <pthread.h>

#define BUF_ALIGN	1024

struct writer_data {
	int fd;
	size_t blksize;
	char *buf;
};

static void *writer(void *arg)
{
	struct writer_data *data = (struct writer_data *)arg;
	int ret;

	ret = write(data->fd, data->buf, data->blksize);
	if (ret < 0)
		fprintf(stderr, "write file failed: %s\n", strerror(errno));

	return NULL;
}

int main(int argc, char *argv[])
{
	pthread_t tid;
	struct writer_data wdata;
	size_t max_blocks = 10 * 1024;
	size_t blksize = 1 * 1024 * 1024;
	char *rbuf, *wbuf;
	int readfd, writefd;
	int i, j;

	if (argc < 2) {
		fprintf(stderr, "usage: %s [filename]\n", argv[0]);
		exit(1);
	}

	writefd = open(argv[1], O_CREAT|O_DIRECT|O_WRONLY|O_APPEND|O_TRUNC, S_IRWXU);
	if (writefd < 0) {
		fprintf(stderr, "failed to open wfile: %s\n", strerror(errno));
		exit(1);
	}
	readfd = open(argv[1], /*O_DIRECT|*/O_RDONLY, S_IRWXU);
	if (readfd < 0) {
		fprintf(stderr, "failed to open rfile: %s\n", strerror(errno));
		exit(1);
	}

	if (posix_memalign((void **)&wbuf, BUF_ALIGN, blksize)) {
		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
		exit(1);
	}

	if (posix_memalign((void **)&rbuf, 4096, blksize)) {
		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
		exit(1);
	}

	memset(wbuf, 'a', blksize);

	wdata.fd = writefd;
	wdata.blksize = blksize;
	wdata.buf = wbuf;

	for (i = 0; i < max_blocks; i++) {
		void *retval;
		int ret;

		ret = pthread_create(&tid, NULL, writer, &wdata);
		if (ret) {
			fprintf(stderr, "create thread failed: %s\n", strerror(errno));
			exit(1);
		}

		memset(rbuf, 'b', blksize);
		do {
			ret = pread(readfd, rbuf, blksize, i * blksize);
		} while (ret <= 0);

		if (ret < 0) {
			fprintf(stderr, "read file failed: %s\n", strerror(errno));
			exit(1);
		}

		if (pthread_join(tid, &retval)) {
			fprintf(stderr, "pthread join failed: %s\n", strerror(errno));
			exit(1);
		}

		if (ret >= 0) {
			for (j = 0; j < ret; j++) {
				if (rbuf[i] != 'a') {
					fprintf(stderr, "encounter an error: offset %ld\n",
						i);
					goto err;
				}
			}
		}
	}

err:
	free(wbuf);
	free(rbuf);

	return 0;
}

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] ext2/3/4: dio reads stale data when we do some append dio writes
       [not found]         ` <20131119111826.GA20485@infradead.org>
  2013-11-19 11:51           ` [BUG] ext2/3/4: dio reads stale data when we do some append dio writes Zheng Liu
@ 2013-11-19 12:01           ` Dave Chinner
  2013-11-19 12:20             ` Zheng Liu
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2013-11-19 12:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4, xfs

On Tue, Nov 19, 2013 at 03:18:26AM -0800, Christoph Hellwig wrote:
> On Tue, Nov 19, 2013 at 07:19:47PM +0800, Zheng Liu wrote:
> > Yes, I know that XFS has a shared/exclusive lock.  I guess that is why
> > it can pass the test.  But another question is why xfs fails when we do
> > some append dio writes with doing buffered read.
> 
> Can you provide a test case for that issue?

For XFS, appending direct IO writes only hold the IOLOCK exclusive
for as long as it takes to guarantee that the the region between the
old EOF and the new EOF is full of zeros before it is demoted.  i.e.
once the region is guaranteed not to expose stale data, the
exclusive IO lock is demoted to to a shared lock and a buffered read
is then allowed to proceed concurrently with the DIO write.

Hence even appending writes occur concurrently with buffered reads,
and if the read overlaps the block at the old EOF then the page
brought into the page cache will have zeros in it.

FWIW, there's a wonderful comment in generic_file_direct_write()
that pretty much covers this case:

        /*
         * Finally, try again to invalidate clean pages which might have been
         * cached by non-direct readahead, or faulted in by get_user_pages()
         * if the source of the write was an mmap'ed region of the file
         * we're writing.  Either one is a pretty crazy thing to do,
         * so we don't support it 100%.  If this invalidation
         * fails, tough, the write still worked...
         */

The kernel code simply does not have the exclusion mechanisms to
make concurrent buffered and direct IO robust. This is one of the
problems (amongst many) that we've been looking to solve with an VFS
level IO range lock of some kind....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] ext2/3/4: dio reads stale data when we do some append dio writes
  2013-11-19 11:51           ` [BUG] ext2/3/4: dio reads stale data when we do some append dio writes Zheng Liu
@ 2013-11-19 12:09             ` Dave Chinner
  2013-11-19 12:18               ` Zheng Liu
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2013-11-19 12:09 UTC (permalink / raw)
  To: Christoph Hellwig, linux-ext4, linux-fsdevel, xfs

On Tue, Nov 19, 2013 at 07:51:22PM +0800, Zheng Liu wrote:
> On Tue, Nov 19, 2013 at 03:18:26AM -0800, Christoph Hellwig wrote:
> > On Tue, Nov 19, 2013 at 07:19:47PM +0800, Zheng Liu wrote:
> > > Yes, I know that XFS has a shared/exclusive lock.  I guess that is why
> > > it can pass the test.  But another question is why xfs fails when we do
> > > some append dio writes with doing buffered read.
> > 
> > Can you provide a test case for that issue?
> 
> Simple.  Reader just need to open this file without O_DIRECT flag.  I
> paste the full code snippet below.  Please take care of this line:
> 	readfd = open(argv[1], /*O_DIRECT|*/O_RDONLY, S_IRWXU);
> 
> The result of this program on my own sand box looks like below:
>         encounter an error: offset 0
....
> 		if (ret >= 0) {
> 			for (j = 0; j < ret; j++) {
> 				if (rbuf[i] != 'a') {
> 					fprintf(stderr, "encounter an error: offset %ld\n",
> 						i);
> 					goto err;

Should be checking rbuf[j], perhaps?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] ext2/3/4: dio reads stale data when we do some append dio writes
  2013-11-19 12:09             ` Dave Chinner
@ 2013-11-19 12:18               ` Zheng Liu
  0 siblings, 0 replies; 5+ messages in thread
From: Zheng Liu @ 2013-11-19 12:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-fsdevel, linux-ext4, xfs

On Tue, Nov 19, 2013 at 11:09:29PM +1100, Dave Chinner wrote:
> On Tue, Nov 19, 2013 at 07:51:22PM +0800, Zheng Liu wrote:
> > On Tue, Nov 19, 2013 at 03:18:26AM -0800, Christoph Hellwig wrote:
> > > On Tue, Nov 19, 2013 at 07:19:47PM +0800, Zheng Liu wrote:
> > > > Yes, I know that XFS has a shared/exclusive lock.  I guess that is why
> > > > it can pass the test.  But another question is why xfs fails when we do
> > > > some append dio writes with doing buffered read.
> > > 
> > > Can you provide a test case for that issue?
> > 
> > Simple.  Reader just need to open this file without O_DIRECT flag.  I
> > paste the full code snippet below.  Please take care of this line:
> > 	readfd = open(argv[1], /*O_DIRECT|*/O_RDONLY, S_IRWXU);
> > 
> > The result of this program on my own sand box looks like below:
> >         encounter an error: offset 0
> ....
> > 		if (ret >= 0) {
> > 			for (j = 0; j < ret; j++) {
> > 				if (rbuf[i] != 'a') {
> > 					fprintf(stderr, "encounter an error: offset %ld\n",
> > 						i);
> > 					goto err;
> 
> Should be checking rbuf[j], perhaps?

Oops, it's my fault.  Yes. it should check rbuf[j].

Thanks,
                                                - Zheng

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] ext2/3/4: dio reads stale data when we do some append dio writes
  2013-11-19 12:01           ` Dave Chinner
@ 2013-11-19 12:20             ` Zheng Liu
  0 siblings, 0 replies; 5+ messages in thread
From: Zheng Liu @ 2013-11-19 12:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-fsdevel, linux-ext4, xfs

On Tue, Nov 19, 2013 at 11:01:12PM +1100, Dave Chinner wrote:
> On Tue, Nov 19, 2013 at 03:18:26AM -0800, Christoph Hellwig wrote:
> > On Tue, Nov 19, 2013 at 07:19:47PM +0800, Zheng Liu wrote:
> > > Yes, I know that XFS has a shared/exclusive lock.  I guess that is why
> > > it can pass the test.  But another question is why xfs fails when we do
> > > some append dio writes with doing buffered read.
> > 
> > Can you provide a test case for that issue?
> 
> For XFS, appending direct IO writes only hold the IOLOCK exclusive
> for as long as it takes to guarantee that the the region between the
> old EOF and the new EOF is full of zeros before it is demoted.  i.e.
> once the region is guaranteed not to expose stale data, the
> exclusive IO lock is demoted to to a shared lock and a buffered read
> is then allowed to proceed concurrently with the DIO write.
> 
> Hence even appending writes occur concurrently with buffered reads,
> and if the read overlaps the block at the old EOF then the page
> brought into the page cache will have zeros in it.
> 
> FWIW, there's a wonderful comment in generic_file_direct_write()
> that pretty much covers this case:
> 
>         /*
>          * Finally, try again to invalidate clean pages which might have been
>          * cached by non-direct readahead, or faulted in by get_user_pages()
>          * if the source of the write was an mmap'ed region of the file
>          * we're writing.  Either one is a pretty crazy thing to do,
>          * so we don't support it 100%.  If this invalidation
>          * fails, tough, the write still worked...
>          */
> 
> The kernel code simply does not have the exclusion mechanisms to
> make concurrent buffered and direct IO robust. This is one of the
> problems (amongst many) that we've been looking to solve with an VFS
> level IO range lock of some kind....

Thanks for pointing it out.

                                                - Zheng

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-11-19 12:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20131119095302.GA4534@gmail.com>
     [not found] ` <20131119102235.GA5010@infradead.org>
     [not found]   ` <20131119104508.GA4630@gmail.com>
     [not found]     ` <20131119110147.GA3323@infradead.org>
     [not found]       ` <20131119111947.GA4782@gmail.com>
     [not found]         ` <20131119111826.GA20485@infradead.org>
2013-11-19 11:51           ` [BUG] ext2/3/4: dio reads stale data when we do some append dio writes Zheng Liu
2013-11-19 12:09             ` Dave Chinner
2013-11-19 12:18               ` Zheng Liu
2013-11-19 12:01           ` Dave Chinner
2013-11-19 12:20             ` Zheng Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox