linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible ext4 data corruption with large files and async I/O
@ 2010-01-29 14:54 Giel de Nijs
  2010-01-29 15:30 ` Nick Dokos
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Giel de Nijs @ 2010-01-29 14:54 UTC (permalink / raw)
  To: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

Dear ext4 devs,

Today I hit a situation where seemingly blocks did not get written to 
disk. I've narrowed it down to the following test case.

Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an 
i7 920 and a Core2 Q6600, I executed the following steps:

- create a file
- with kernel async i/o, write a 512kb (haven't tried other sizes) block 
to an offset >4GB, effectively creating a large sparse file
- again with async i/o, write a 512kb block to an offset smaller than 
the previous write, but >4GB
- wait for the kernel async i/o to tell you the writes have succeeded

Now, looking at the file, the second write never seems to have happened. 
When doing this on the same machines on ext3, the behavior is as expected.

As far as I can tell (from the bigger program that triggered this), all 
writes >4GB but < EOF to a sparse file with async i/o aren't executed. 
When creating a large file first (i.e., with dd), everything does work 
as expected.

Attached is some C code that triggers this bug for me.

If you need more information or want me to test some more things, please 
do ask.

Thanks,
Giel de Nijs
VectorWise

[-- Attachment #2: ext4_bug_2.c --]
[-- Type: text/x-csrc, Size: 3680 bytes --]


/*
 Author: Giel de Nijs, VectorWise B.V. <giel@vectorwise.com>

 Running Fedora Core 12 kernel
 2.6.31.9-174.fc12.x86_64 #1 SMP Mon Dec 21 05:33:33 UTC 2009 x86_64 x86_64 x86_64 GNU/Linux

 When writing with kernel asynchronous I/O to an ext4 partition, to a sparse
 file at offsets >4GB which is not the end of the file, writes don't happen.

 Compile with -laio

 run ext4_bug_2 on a filesystem with >6GB free space
 it writes a 512KB block at 6GB, then one at 5GB

 dd if=ext4_bug.testfile bs=512k count=1 skip=12K|hexdump
 dd if=ext4_bug.testfile bs=512k count=1 skip=10K|hexdump

 should both give:
 0000000 ffff ffff ffff ffff ffff ffff ffff ffff
 *
 0080000

 on ext4, second one gives:
 0000000 0000 0000 0000 0000 0000 0000 0000 0000
 *
 0080000

 i.e.,: no data written
*/

#define _GNU_SOURCE
#define _LARGEFILE64_SOURCE
#define _FILE_OFFSET_BITS 64
#include <features.h>
#include <libaio.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <error.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <ctype.h>

int main(void)
{
	char *filename = "ext4_bug.testfile";

	size_t blocksize = (size_t)512 * 1024;
	off_t offset1 = (off_t)6 * 1024 * 1024 * 1024;
	off_t offset2 = (off_t)5 * 1024 * 1024 * 1024;

	int queue_depth = 8;

	int err;
	char *buf;
	io_context_t io_ctx;
	struct iocb iocb;
	struct iocb *iocblist[1];
	struct io_event events[1];
	int fd;

	/* allocate aligned memory (for direct i/o) */
	err = posix_memalign((void **)&buf, getpagesize(), blocksize);
	if (err) {
		printf("error allocating memory: %s\n", strerror(err));
		return(err);
	}
	memset(buf, 255, blocksize);


	/* initialize async i/o */
	err = io_queue_init(queue_depth, &io_ctx);
	if (err < 0) {
		printf("error initializing I/O queue: %s\n", strerror(-err));
		return(-err);
	}

	/* create file */
	printf("opening file %s\n", filename);
	fd = open(filename, O_DIRECT|O_RDWR|O_EXCL|O_LARGEFILE|O_CREAT, 0666);
	if (fd < 0) {
		perror("error opening file");
		return(errno);
	}

	/* write at offset 6GB, i.e., create a sparse file >4GB */
	io_prep_pwrite(&iocb, fd, buf, blocksize, offset1);
	iocblist[0] = &iocb;
	printf("submitting write of %zd bytes at offset %zd\n", blocksize, offset1);
	err = io_submit(io_ctx, 1, iocblist);
	if (err < 0) {
		printf("error submitting I/O requests: %s\n", strerror(-err));
		return(-err);
	}

	printf("waiting for write to be finished\n");
	err = io_getevents(io_ctx, 1, 1, events, NULL);
	if (err < 0) {
		printf("error getting I/O events: %s\n", strerror(-err));
		return(-err);
	}
	printf("got %d events\n", err);
	err = events[0].res;
	if (err < 0) {
		printf("error writing buffer: %s\n", strerror(-err));
		return(-err);
	}
	printf("written %ld bytes\n", events[0].res);

	/* write at offset 5GB, i.e., in sparse file >4GB but not at EOF */
	io_prep_pwrite(&iocb, fd, buf, blocksize, offset2);
	iocblist[0] = &iocb;
	printf("submitting write of %zd bytes at offset %zd\n", blocksize, offset2);
	err = io_submit(io_ctx, 1, iocblist);
	if (err < 0) {
		printf("error submitting I/O requests: %s\n", strerror(-err));
		return(-err);
	}

	printf("waiting for write to be finished\n");
	err = io_getevents(io_ctx, 1, 1, events, NULL);
	if (err < 0) {
		printf("error getting I/O events: %s\n", strerror(-err));
		return(-err);
	}
	printf("got %d events\n", err);
	err = events[0].res;
	if (err < 0) {
		printf("error writing buffer: %s\n", strerror(-err));
		return(-err);
	}
	printf("written %ld bytes\n", events[0].res);

	close(fd);
	io_destroy(io_ctx);

	/* You _should_ have a 6GB sparse file now with two 512KB blocks of 0xFF
	 * at 5GB and at 6GB.
	 */
	free(buf);
	return 0;
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible ext4 data corruption with large files and async I/O
  2010-01-29 14:54 Possible ext4 data corruption with large files and async I/O Giel de Nijs
@ 2010-01-29 15:30 ` Nick Dokos
  2010-01-29 15:49   ` Nick Dokos
  2010-01-29 15:30 ` Eric Sandeen
  2010-01-29 18:18 ` Eric Sandeen
  2 siblings, 1 reply; 5+ messages in thread
From: Nick Dokos @ 2010-01-29 15:30 UTC (permalink / raw)
  To: Giel de Nijs; +Cc: linux-ext4, nicholas.dokos

> 
> Dear ext4 devs,
> 
> Today I hit a situation where seemingly blocks did not get written to 
> disk. I've narrowed it down to the following test case.
> 
> Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an 
> i7 920 and a Core2 Q6600, I executed the following steps:
> 
> - create a file
> - with kernel async i/o, write a 512kb (haven't tried other sizes) block 
> to an offset >4GB, effectively creating a large sparse file
> - again with async i/o, write a 512kb block to an offset smaller than 
> the previous write, but >4GB
> - wait for the kernel async i/o to tell you the writes have succeeded
> 
> Now, looking at the file, the second write never seems to have happened. 
> When doing this on the same machines on ext3, the behavior is as expected.
> 
> As far as I can tell (from the bigger program that triggered this), all 
> writes >4GB but < EOF to a sparse file with async i/o aren't executed. 
> When creating a large file first (i.e., with dd), everything does work 
> as expected.
> 
> Attached is some C code that triggers this bug for me.
> 
> If you need more information or want me to test some more things, please 
> do ask.
> 

I ran your program on FC-11 with a 2.6.33-rc4 upstream kernel: it worked fine.
Both dd's gave the expected output.

Thanks,
Nick

Transcript:

root@shifter:~/src/ext4/giel-de-nijs# ./a.out
opening file ext4_bug.testfile
submitting write of 524288 bytes at offset 6442450944
waiting for write to be finished
got 1 events
written 524288 bytes
submitting write of 524288 bytes at offset 5368709120
waiting for write to be finished
got 1 events
written 524288 bytes
root@shifter:~/src/ext4/giel-de-nijs#  dd if=ext4_bug.testfile bs=512k count=1 skip=10K|hexdump
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
1+0 records in
1+0 records out
524288 bytes (524 kB) copied, 0.0045471 s, 115 MB/s
0080000
root@shifter:~/src/ext4/giel-de-nijs#  dd if=ext4_bug.testfile bs=512k count=1 skip=12K|hexdump
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
1+0 records in
1+0 records out
524288 bytes (524 kB) copied, 0.00474075 s, 111 MB/s
0080000

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible ext4 data corruption with large files and async I/O
  2010-01-29 14:54 Possible ext4 data corruption with large files and async I/O Giel de Nijs
  2010-01-29 15:30 ` Nick Dokos
@ 2010-01-29 15:30 ` Eric Sandeen
  2010-01-29 18:18 ` Eric Sandeen
  2 siblings, 0 replies; 5+ messages in thread
From: Eric Sandeen @ 2010-01-29 15:30 UTC (permalink / raw)
  To: Giel de Nijs; +Cc: linux-ext4

Giel de Nijs wrote:
> Dear ext4 devs,
> 
> Today I hit a situation where seemingly blocks did not get written to
> disk. I've narrowed it down to the following test case.
> 
> Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an
> i7 920 and a Core2 Q6600, I executed the following steps:
> 
> - create a file
> - with kernel async i/o, write a 512kb (haven't tried other sizes) block
> to an offset >4GB, effectively creating a large sparse file
> - again with async i/o, write a 512kb block to an offset smaller than
> the previous write, but >4GB
> - wait for the kernel async i/o to tell you the writes have succeeded
> 
> Now, looking at the file, the second write never seems to have happened.
> When doing this on the same machines on ext3, the behavior is as expected.
> 
> As far as I can tell (from the bigger program that triggered this), all
> writes >4GB but < EOF to a sparse file with async i/o aren't executed.
> When creating a large file first (i.e., with dd), everything does work
> as expected.
> 
> Attached is some C code that triggers this bug for me.
> 
> If you need more information or want me to test some more things, please
> do ask.

Thanks, I can reproduce this as well - and yep works ok on ext3 & xfs,
so looks like an ext4 bug indeed.  I'll look into it.

-Eric

> Thanks,
> Giel de Nijs
> VectorWise
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible ext4 data corruption with large files and async I/O
  2010-01-29 15:30 ` Nick Dokos
@ 2010-01-29 15:49   ` Nick Dokos
  0 siblings, 0 replies; 5+ messages in thread
From: Nick Dokos @ 2010-01-29 15:49 UTC (permalink / raw)
  To: nicholas.dokos; +Cc: Giel de Nijs, linux-ext4

> > 
> > Dear ext4 devs,
> > 
> > Today I hit a situation where seemingly blocks did not get written to 
> > disk. I've narrowed it down to the following test case.
> > 
> > Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an 
> > i7 920 and a Core2 Q6600, I executed the following steps:
> > 
> > - create a file
> > - with kernel async i/o, write a 512kb (haven't tried other sizes) block 
> > to an offset >4GB, effectively creating a large sparse file
> > - again with async i/o, write a 512kb block to an offset smaller than 
> > the previous write, but >4GB
> > - wait for the kernel async i/o to tell you the writes have succeeded
> > 
> > Now, looking at the file, the second write never seems to have happened. 
> > When doing this on the same machines on ext3, the behavior is as expected.
> > 
> > As far as I can tell (from the bigger program that triggered this), all 
> > writes >4GB but < EOF to a sparse file with async i/o aren't executed. 
> > When creating a large file first (i.e., with dd), everything does work 
> > as expected.
> > 
> > Attached is some C code that triggers this bug for me.
> > 
> > If you need more information or want me to test some more things, please 
> > do ask.
> > 
> 
> I ran your program on FC-11 with a 2.6.33-rc4 upstream kernel: it worked fine.
> Both dd's gave the expected output.
> 

Scratch that: I goofed. I can reproduce it too.

Sorry for the confusion.

Nick

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible ext4 data corruption with large files and async I/O
  2010-01-29 14:54 Possible ext4 data corruption with large files and async I/O Giel de Nijs
  2010-01-29 15:30 ` Nick Dokos
  2010-01-29 15:30 ` Eric Sandeen
@ 2010-01-29 18:18 ` Eric Sandeen
  2 siblings, 0 replies; 5+ messages in thread
From: Eric Sandeen @ 2010-01-29 18:18 UTC (permalink / raw)
  To: Giel de Nijs; +Cc: linux-ext4

Giel de Nijs wrote:
> Dear ext4 devs,
> 
> Today I hit a situation where seemingly blocks did not get written to
> disk. I've narrowed it down to the following test case.
> 
> Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an
> i7 920 and a Core2 Q6600, I executed the following steps:
> 
> - create a file
> - with kernel async i/o, write a 512kb (haven't tried other sizes) block
> to an offset >4GB, effectively creating a large sparse file
> - again with async i/o, write a 512kb block to an offset smaller than
> the previous write, but >4GB
> - wait for the kernel async i/o to tell you the writes have succeeded
> 
> Now, looking at the file, the second write never seems to have happened.
> When doing this on the same machines on ext3, the behavior is as expected.
> 
> As far as I can tell (from the bigger program that triggered this), all
> writes >4GB but < EOF to a sparse file with async i/o aren't executed.
> When creating a large file first (i.e., with dd), everything does work
> as expected.
> 
> Attached is some C code that triggers this bug for me.
> 
> If you need more information or want me to test some more things, please
> do ask.
> 
> Thanks,
> Giel de Nijs
> VectorWise
> 

Ok, got it, will send a patch - thanks.

-Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-01-29 18:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-29 14:54 Possible ext4 data corruption with large files and async I/O Giel de Nijs
2010-01-29 15:30 ` Nick Dokos
2010-01-29 15:49   ` Nick Dokos
2010-01-29 15:30 ` Eric Sandeen
2010-01-29 18:18 ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).