linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Whitehouse <swhiteho@redhat.com>
To: Zheng Liu <gnehzuil.liu@gmail.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Dmitry Monakhov <dmonakhov@openvz.org>,
	Dave Chinner <david@fromorbit.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Zheng Liu <wenqing.lz@taobao.com>
Subject: Re: [PATCH v3] vfs: fix a bug when we do some dio reads with append dio writes
Date: Thu, 19 Dec 2013 12:27:53 +0000	[thread overview]
Message-ID: <1387456073.2763.20.camel@menhir> (raw)
In-Reply-To: <20131217164159.GA7331@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2177 bytes --]

Hi,

Sorry for the delay... this has turned out to be a somewhat more
complicated investigation than I'd first expected. In fact there are
still a few things I don't yet understand, however I thought I'd let you
know how I'm getting on in the mean time.

So I started looking at GFS2, since thats what I'm most familiar with.
I've found a couple of bugs which I'm about to post patches for,
although even with those patches GFS2 doesn't pass the test all the
time, although it does get through the test some of the time, and it
does last longer than ext4.

Since I wondered whether I was just lucky running the test against XFS,
I've run it again several times, and I still have not seen a single
failure on XFS.

In order to gain a bit more information about the problem, I added a few
more things to the printf, and in particular I note that under GFS2 I
see ret (the amount of data read) at various different sizes. On ext4,
ret is always the full 1M buffer size. So I assume that is the
difference which your patch was intended to cure.

However, I also printed out j, the offset where the first error occurs,
and in both the ext4 and gfs2 cases, that offset is 0, and after exactly
4096 bytes, there is an 'a'. I'd have expected to see a number of pages
of 'a' followed by zero pages, but instead, I'm seeing a single zero
page followed by at least one 'a'. I've not extended my instrumentation
to print out a list of which pages are zero and which 'a', but that is
an unexpected result.

Some tracing shows that with the additional GFS2 patches, the data does
get written to disk correctly, ahead of the read which is issued. Also
since the test does allocating writes, GFS2 will fall back to buffered
I/O for that, and only the read is direct I/O, so since we see similar
results in the GFS2 and ext4 cases, this missing first page which is
common to both looks like it might be related to the read side of
things.

I'm attaching my updated version of your test program, which I'd like to
add to our test suite in due course, if you have no objections.

So far I'm not sure what causes the missing first page issue though...
I'll keep looking in the mean time,

Steve.


[-- Attachment #2: dio-append.c --]
[-- Type: text/x-csrc, Size: 2558 bytes --]

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <memory.h>

#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>

#include <pthread.h>

#define BUF_ALIGN	1024

struct writer_data {
	int fd;
	size_t blksize;
	char *buf;
};

static void *writer(void *arg)
{
	struct writer_data *data = (struct writer_data *)arg;
	int ret;

	ret = write(data->fd, data->buf, data->blksize);
	if (ret < 0)
		fprintf(stderr, "write file failed: %s\n", strerror(errno));

	return NULL;
}

int main(int argc, char *argv[])
{
	pthread_t tid;
	struct writer_data wdata;
	size_t max_blocks = 10 * 1024;
	size_t blksize = 1 * 1024 * 1024;
	char *rbuf, *wbuf;
	int readfd, writefd;
	int i, j;
	int k;

	if (argc < 2) {
		fprintf(stderr, "usage: %s [filename]\n", argv[0]);
		exit(1);
	}

	writefd = open(argv[1], O_CREAT|O_DIRECT|O_WRONLY|O_APPEND|O_TRUNC, S_IRWXU);
	if (writefd < 0) {
		fprintf(stderr, "failed to open wfile: %s\n", strerror(errno));
		exit(1);
	}
	readfd = open(argv[1], O_DIRECT|O_RDONLY, S_IRWXU);
	if (readfd < 0) {
		fprintf(stderr, "failed to open rfile: %s\n", strerror(errno));
		exit(1);
	}

	if (posix_memalign((void **)&wbuf, BUF_ALIGN, blksize)) {
		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
		exit(1);
	}

	if (posix_memalign((void **)&rbuf, 4096, blksize)) {
		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
		exit(1);
	}

	memset(wbuf, 'a', blksize);

	wdata.fd = writefd;
	wdata.blksize = blksize;
	wdata.buf = wbuf;

	for (i = 0; i < max_blocks; i++) {
		void *retval;
		int ret;

		ret = pthread_create(&tid, NULL, writer, &wdata);
		if (ret) {
			fprintf(stderr, "create thread failed: %s\n", strerror(errno));
			exit(1);
		}

		memset(rbuf, 'b', blksize);
		do {
			ret = pread(readfd, rbuf, blksize, i * blksize);
		} while (ret <= 0);

		if (ret < 0) {
			fprintf(stderr, "read file failed: %s\n", strerror(errno));
			exit(1);
		}

		if (pthread_join(tid, &retval)) {
			fprintf(stderr, "pthread join failed: %s\n", strerror(errno));
			exit(1);
		}

		if (ret >= 0) {
			for (j = 0; j < ret; j++) {
				if (rbuf[j] != 'a') {
					char c = rbuf[j];
					fprintf(stderr, "encounter an error: offset %ld, j=%ld, ret=%ld rbuf[j]=0x%02x\n",
						i, j, ret, rbuf[j]);
					for(k = j; k < ret; k++) {
						if (rbuf[k] != c)
							break;
					}
					fprintf(stderr, "0x%02x continues for %d bytes, next=0x%02x\n", c, k - j, rbuf[k]);
					goto err;
				}
			}
		}
	}

err:
	free(wbuf);
	free(rbuf);

	return 0;
}

  reply	other threads:[~2013-12-19 12:28 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-07 10:55 [PATCH v3] vfs: fix a bug when we do some dio reads with append dio writes Zheng Liu
2013-12-16  9:37 ` Steven Whitehouse
2013-12-16 10:01   ` Jan Kara
2013-12-17  9:43     ` Steven Whitehouse
2013-12-17 11:16       ` Zheng Liu
2013-12-17 12:17         ` Steven Whitehouse
2013-12-17 16:41           ` Zheng Liu
2013-12-19 12:27             ` Steven Whitehouse [this message]
2013-12-19 22:44               ` Dave Chinner
2013-12-20  9:28                 ` Steven Whitehouse
2013-12-23  3:00                   ` Dave Chinner
2014-01-14 15:22                     ` Steven Whitehouse
2014-01-14 19:19                       ` Jan Kara
2014-01-15  7:19                         ` Zheng Liu
2014-01-16 15:35                           ` [RFC] [PATCH] Fix race when checking i_size on direct i/o read Steven Whitehouse
2014-01-17 10:20                             ` Miklos Szeredi
2014-01-24 14:42                               ` Steven Whitehouse
2014-01-27 10:13                                 ` Jan Kara
2013-12-17 14:01       ` [PATCH v3] vfs: fix a bug when we do some dio reads with append dio writes Jan Kara
2013-12-17 15:32         ` Steven Whitehouse
2013-12-17 16:39           ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1387456073.2763.20.camel@menhir \
    --to=swhiteho@redhat.com \
    --cc=david@fromorbit.com \
    --cc=dmonakhov@openvz.org \
    --cc=gnehzuil.liu@gmail.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wenqing.lz@taobao.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).