* Possible ext4 data corruption with large files and async I/O
@ 2010-01-29 14:54 Giel de Nijs
2010-01-29 15:30 ` Nick Dokos
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Giel de Nijs @ 2010-01-29 14:54 UTC (permalink / raw)
To: linux-ext4
[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]
Dear ext4 devs,
Today I hit a situation where seemingly blocks did not get written to
disk. I've narrowed it down to the following test case.
Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an
i7 920 and a Core2 Q6600, I executed the following steps:
- create a file
- with kernel async i/o, write a 512kb (haven't tried other sizes) block
to an offset >4GB, effectively creating a large sparse file
- again with async i/o, write a 512kb block to an offset smaller than
the previous write, but >4GB
- wait for the kernel async i/o to tell you the writes have succeeded
Now, looking at the file, the second write never seems to have happened.
When doing this on the same machines on ext3, the behavior is as expected.
As far as I can tell (from the bigger program that triggered this), all
writes >4GB but < EOF to a sparse file with async i/o aren't executed.
When creating a large file first (i.e., with dd), everything does work
as expected.
Attached is some C code that triggers this bug for me.
If you need more information or want me to test some more things, please
do ask.
Thanks,
Giel de Nijs
VectorWise
[-- Attachment #2: ext4_bug_2.c --]
[-- Type: text/x-csrc, Size: 3680 bytes --]
/*
Author: Giel de Nijs, VectorWise B.V. <giel@vectorwise.com>
Running Fedora Core 12 kernel
2.6.31.9-174.fc12.x86_64 #1 SMP Mon Dec 21 05:33:33 UTC 2009 x86_64 x86_64 x86_64 GNU/Linux
When writing with kernel asynchronous I/O to an ext4 partition, to a sparse
file at offsets >4GB which is not the end of the file, writes don't happen.
Compile with -laio
run ext4_bug_2 on a filesystem with >6GB free space
it writes a 512KB block at 6GB, then one at 5GB
dd if=ext4_bug.testfile bs=512k count=1 skip=12K|hexdump
dd if=ext4_bug.testfile bs=512k count=1 skip=10K|hexdump
should both give:
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
0080000
on ext4, second one gives:
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0080000
i.e.,: no data written
*/
#define _GNU_SOURCE
#define _LARGEFILE64_SOURCE
#define _FILE_OFFSET_BITS 64
#include <features.h>
#include <libaio.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <error.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <ctype.h>
int main(void)
{
char *filename = "ext4_bug.testfile";
size_t blocksize = (size_t)512 * 1024;
off_t offset1 = (off_t)6 * 1024 * 1024 * 1024;
off_t offset2 = (off_t)5 * 1024 * 1024 * 1024;
int queue_depth = 8;
int err;
char *buf;
io_context_t io_ctx;
struct iocb iocb;
struct iocb *iocblist[1];
struct io_event events[1];
int fd;
/* allocate aligned memory (for direct i/o) */
err = posix_memalign((void **)&buf, getpagesize(), blocksize);
if (err) {
printf("error allocating memory: %s\n", strerror(err));
return(err);
}
memset(buf, 255, blocksize);
/* initialize async i/o */
err = io_queue_init(queue_depth, &io_ctx);
if (err < 0) {
printf("error initializing I/O queue: %s\n", strerror(-err));
return(-err);
}
/* create file */
printf("opening file %s\n", filename);
fd = open(filename, O_DIRECT|O_RDWR|O_EXCL|O_LARGEFILE|O_CREAT, 0666);
if (fd < 0) {
perror("error opening file");
return(errno);
}
/* write at offset 6GB, i.e., create a sparse file >4GB */
io_prep_pwrite(&iocb, fd, buf, blocksize, offset1);
iocblist[0] = &iocb;
printf("submitting write of %zd bytes at offset %zd\n", blocksize, offset1);
err = io_submit(io_ctx, 1, iocblist);
if (err < 0) {
printf("error submitting I/O requests: %s\n", strerror(-err));
return(-err);
}
printf("waiting for write to be finished\n");
err = io_getevents(io_ctx, 1, 1, events, NULL);
if (err < 0) {
printf("error getting I/O events: %s\n", strerror(-err));
return(-err);
}
printf("got %d events\n", err);
err = events[0].res;
if (err < 0) {
printf("error writing buffer: %s\n", strerror(-err));
return(-err);
}
printf("written %ld bytes\n", events[0].res);
/* write at offset 5GB, i.e., in sparse file >4GB but not at EOF */
io_prep_pwrite(&iocb, fd, buf, blocksize, offset2);
iocblist[0] = &iocb;
printf("submitting write of %zd bytes at offset %zd\n", blocksize, offset2);
err = io_submit(io_ctx, 1, iocblist);
if (err < 0) {
printf("error submitting I/O requests: %s\n", strerror(-err));
return(-err);
}
printf("waiting for write to be finished\n");
err = io_getevents(io_ctx, 1, 1, events, NULL);
if (err < 0) {
printf("error getting I/O events: %s\n", strerror(-err));
return(-err);
}
printf("got %d events\n", err);
err = events[0].res;
if (err < 0) {
printf("error writing buffer: %s\n", strerror(-err));
return(-err);
}
printf("written %ld bytes\n", events[0].res);
close(fd);
io_destroy(io_ctx);
/* You _should_ have a 6GB sparse file now with two 512KB blocks of 0xFF
* at 5GB and at 6GB.
*/
free(buf);
return 0;
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Possible ext4 data corruption with large files and async I/O
2010-01-29 14:54 Possible ext4 data corruption with large files and async I/O Giel de Nijs
@ 2010-01-29 15:30 ` Nick Dokos
2010-01-29 15:49 ` Nick Dokos
2010-01-29 15:30 ` Eric Sandeen
2010-01-29 18:18 ` Eric Sandeen
2 siblings, 1 reply; 5+ messages in thread
From: Nick Dokos @ 2010-01-29 15:30 UTC (permalink / raw)
To: Giel de Nijs; +Cc: linux-ext4, nicholas.dokos
>
> Dear ext4 devs,
>
> Today I hit a situation where seemingly blocks did not get written to
> disk. I've narrowed it down to the following test case.
>
> Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an
> i7 920 and a Core2 Q6600, I executed the following steps:
>
> - create a file
> - with kernel async i/o, write a 512kb (haven't tried other sizes) block
> to an offset >4GB, effectively creating a large sparse file
> - again with async i/o, write a 512kb block to an offset smaller than
> the previous write, but >4GB
> - wait for the kernel async i/o to tell you the writes have succeeded
>
> Now, looking at the file, the second write never seems to have happened.
> When doing this on the same machines on ext3, the behavior is as expected.
>
> As far as I can tell (from the bigger program that triggered this), all
> writes >4GB but < EOF to a sparse file with async i/o aren't executed.
> When creating a large file first (i.e., with dd), everything does work
> as expected.
>
> Attached is some C code that triggers this bug for me.
>
> If you need more information or want me to test some more things, please
> do ask.
>
I ran your program on FC-11 with a 2.6.33-rc4 upstream kernel: it worked fine.
Both dd's gave the expected output.
Thanks,
Nick
Transcript:
root@shifter:~/src/ext4/giel-de-nijs# ./a.out
opening file ext4_bug.testfile
submitting write of 524288 bytes at offset 6442450944
waiting for write to be finished
got 1 events
written 524288 bytes
submitting write of 524288 bytes at offset 5368709120
waiting for write to be finished
got 1 events
written 524288 bytes
root@shifter:~/src/ext4/giel-de-nijs# dd if=ext4_bug.testfile bs=512k count=1 skip=10K|hexdump
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
1+0 records in
1+0 records out
524288 bytes (524 kB) copied, 0.0045471 s, 115 MB/s
0080000
root@shifter:~/src/ext4/giel-de-nijs# dd if=ext4_bug.testfile bs=512k count=1 skip=12K|hexdump
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
1+0 records in
1+0 records out
524288 bytes (524 kB) copied, 0.00474075 s, 111 MB/s
0080000
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Possible ext4 data corruption with large files and async I/O
2010-01-29 14:54 Possible ext4 data corruption with large files and async I/O Giel de Nijs
2010-01-29 15:30 ` Nick Dokos
@ 2010-01-29 15:30 ` Eric Sandeen
2010-01-29 18:18 ` Eric Sandeen
2 siblings, 0 replies; 5+ messages in thread
From: Eric Sandeen @ 2010-01-29 15:30 UTC (permalink / raw)
To: Giel de Nijs; +Cc: linux-ext4
Giel de Nijs wrote:
> Dear ext4 devs,
>
> Today I hit a situation where seemingly blocks did not get written to
> disk. I've narrowed it down to the following test case.
>
> Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an
> i7 920 and a Core2 Q6600, I executed the following steps:
>
> - create a file
> - with kernel async i/o, write a 512kb (haven't tried other sizes) block
> to an offset >4GB, effectively creating a large sparse file
> - again with async i/o, write a 512kb block to an offset smaller than
> the previous write, but >4GB
> - wait for the kernel async i/o to tell you the writes have succeeded
>
> Now, looking at the file, the second write never seems to have happened.
> When doing this on the same machines on ext3, the behavior is as expected.
>
> As far as I can tell (from the bigger program that triggered this), all
> writes >4GB but < EOF to a sparse file with async i/o aren't executed.
> When creating a large file first (i.e., with dd), everything does work
> as expected.
>
> Attached is some C code that triggers this bug for me.
>
> If you need more information or want me to test some more things, please
> do ask.
Thanks, I can reproduce this as well - and yep works ok on ext3 & xfs,
so looks like an ext4 bug indeed. I'll look into it.
-Eric
> Thanks,
> Giel de Nijs
> VectorWise
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Possible ext4 data corruption with large files and async I/O
2010-01-29 15:30 ` Nick Dokos
@ 2010-01-29 15:49 ` Nick Dokos
0 siblings, 0 replies; 5+ messages in thread
From: Nick Dokos @ 2010-01-29 15:49 UTC (permalink / raw)
To: nicholas.dokos; +Cc: Giel de Nijs, linux-ext4
> >
> > Dear ext4 devs,
> >
> > Today I hit a situation where seemingly blocks did not get written to
> > disk. I've narrowed it down to the following test case.
> >
> > Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an
> > i7 920 and a Core2 Q6600, I executed the following steps:
> >
> > - create a file
> > - with kernel async i/o, write a 512kb (haven't tried other sizes) block
> > to an offset >4GB, effectively creating a large sparse file
> > - again with async i/o, write a 512kb block to an offset smaller than
> > the previous write, but >4GB
> > - wait for the kernel async i/o to tell you the writes have succeeded
> >
> > Now, looking at the file, the second write never seems to have happened.
> > When doing this on the same machines on ext3, the behavior is as expected.
> >
> > As far as I can tell (from the bigger program that triggered this), all
> > writes >4GB but < EOF to a sparse file with async i/o aren't executed.
> > When creating a large file first (i.e., with dd), everything does work
> > as expected.
> >
> > Attached is some C code that triggers this bug for me.
> >
> > If you need more information or want me to test some more things, please
> > do ask.
> >
>
> I ran your program on FC-11 with a 2.6.33-rc4 upstream kernel: it worked fine.
> Both dd's gave the expected output.
>
Scratch that: I goofed. I can reproduce it too.
Sorry for the confusion.
Nick
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Possible ext4 data corruption with large files and async I/O
2010-01-29 14:54 Possible ext4 data corruption with large files and async I/O Giel de Nijs
2010-01-29 15:30 ` Nick Dokos
2010-01-29 15:30 ` Eric Sandeen
@ 2010-01-29 18:18 ` Eric Sandeen
2 siblings, 0 replies; 5+ messages in thread
From: Eric Sandeen @ 2010-01-29 18:18 UTC (permalink / raw)
To: Giel de Nijs; +Cc: linux-ext4
Giel de Nijs wrote:
> Dear ext4 devs,
>
> Today I hit a situation where seemingly blocks did not get written to
> disk. I've narrowed it down to the following test case.
>
> Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an
> i7 920 and a Core2 Q6600, I executed the following steps:
>
> - create a file
> - with kernel async i/o, write a 512kb (haven't tried other sizes) block
> to an offset >4GB, effectively creating a large sparse file
> - again with async i/o, write a 512kb block to an offset smaller than
> the previous write, but >4GB
> - wait for the kernel async i/o to tell you the writes have succeeded
>
> Now, looking at the file, the second write never seems to have happened.
> When doing this on the same machines on ext3, the behavior is as expected.
>
> As far as I can tell (from the bigger program that triggered this), all
> writes >4GB but < EOF to a sparse file with async i/o aren't executed.
> When creating a large file first (i.e., with dd), everything does work
> as expected.
>
> Attached is some C code that triggers this bug for me.
>
> If you need more information or want me to test some more things, please
> do ask.
>
> Thanks,
> Giel de Nijs
> VectorWise
>
Ok, got it, will send a patch - thanks.
-Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-01-29 18:18 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-29 14:54 Possible ext4 data corruption with large files and async I/O Giel de Nijs
2010-01-29 15:30 ` Nick Dokos
2010-01-29 15:49 ` Nick Dokos
2010-01-29 15:30 ` Eric Sandeen
2010-01-29 18:18 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).