ll_ver_fs data verification failure

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ll_ver_fs data verification failure - 96TB fs
@ 2009-08-03 13:54 Nick Dokos
  2009-08-03 14:22 ` Nick Dokos
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Nick Dokos @ 2009-08-03 13:54 UTC (permalink / raw)
  To: linux-ext4; +Cc: nicholas.dokos

Just a heads-up for now. I ran ll_ver_fs on a 96TB fs - the write phase
finished without problems, but the read phase encountered a problem:

...
read File name: /mnt/dir00373/file026

liverfs: verify /mnt/dir00373/file026 failed offset/timestamp/inode 3244298240/1248819541/1096796: found 3243249664/1248819541/1096796 instead

liverfs: Data verification failed
770.45user 218639.65system 67:38:18elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
100357573552inputs+195522668184outputs (1major+414minor)pagefaults 0swaps
make: *** [llver] Error 2

The offset difference is exactly 1M, and it occurs about 3GB into the file.

In total, there are 726 directories, each with 32 4GB files (except the last,
which only has 12 files). So directory 373 is roughly half-way. I'll take a look
at the block allocation of both the directory and the file and see if they are
straddling the 16TB boundary (or other such).

Running a fsck now to see if there are other problems.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ll_ver_fs data verification failure - 96TB fs
  2009-08-03 13:54 ll_ver_fs data verification failure - 96TB fs Nick Dokos
@ 2009-08-03 14:22 ` Nick Dokos
  2009-08-06 20:04 ` Valerie Aurora
  2009-08-06 20:36 ` Valerie Aurora
  2 siblings, 0 replies; 11+ messages in thread
From: Nick Dokos @ 2009-08-03 14:22 UTC (permalink / raw)
  To: nicholas.dokos; +Cc: linux-ext4, Valerie Aurora

Nick Dokos <nicholas.dokos@hp.com> wrote:

> Just a heads-up for now. I ran ll_ver_fs on a 96TB fs - the write phase
> finished without problems, but the read phase encountered a problem:
> 
> ...

> Running a fsck now to see if there are other problems.
> 

The fsck did not find any other problems:


sudo time e2fsck  -t -t -F -n -f /dev/mapper/bigvg-bigvol
e2fsck 1.41.8 (20-Jul-2009)
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 284k/1630436k (131k/154k), time: 1055.90/880.05/ 6.00
Pass 1: I/O read: 190MB, write: 0MB, rate: 0.18MB/s
Pass 2: Checking directory structure
Pass 2: Memory used: 284k/18014398508112064k (94k/191k), time:  2.33/ 0.93/ 0.92
Pass 2: I/O read: 3MB, write: 0MB, rate: 1.29MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 284k/18014398508635724k (94k/191k), time: 1067.60/888.18/ 9.10
Pass 3: Memory used: 284k/18014398508112064k (86k/199k), time:  0.58/ 0.26/ 0.32
Pass 3: I/O read: 1MB, write: 0MB, rate: 1.71MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 284k/18014398509394812k (85k/200k), time: 147.37/147.08/ 0.27
Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 336k/18014398508871152k (86k/251k), time: 1124.59/707.93/18.27
Pass 5: I/O read: 2934MB, write: 0MB, rate: 2.61MB/s
/dev/mapper/bigvg-bigvol: 23950/4289791968 files (0.0% non-contiguous), 24609217723/24609226752 blocks
Memory used: 336k/18014398508871152k (86k/251k), time: 2339.61/1743.19/27.68
I/O read: 3173MB, write: 0MB, rate: 1.36MB/s
1743.18user 31.66system 39:03.59elapsed 75%CPU (0avgtext+0avgdata 0maxresident)k
32603472inputs+0outputs (0major+4403060minor)pagefaults 0swaps


Nick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ll_ver_fs data verification failure - 96TB fs
  2009-08-03 13:54 ll_ver_fs data verification failure - 96TB fs Nick Dokos
  2009-08-03 14:22 ` Nick Dokos
@ 2009-08-06 20:04 ` Valerie Aurora
  2009-08-06 20:37   ` Nick Dokos
  2009-08-06 20:36 ` Valerie Aurora
  2 siblings, 1 reply; 11+ messages in thread
From: Valerie Aurora @ 2009-08-06 20:04 UTC (permalink / raw)
  To: Nick Dokos; +Cc: linux-ext4

On Mon, Aug 03, 2009 at 09:54:36AM -0400, Nick Dokos wrote:
> Just a heads-up for now. I ran ll_ver_fs on a 96TB fs - the write phase
> finished without problems, but the read phase encountered a problem:
> 
> ...
> read File name: /mnt/dir00373/file026
> 
> liverfs: verify /mnt/dir00373/file026 failed offset/timestamp/inode 3244298240/1248819541/1096796: found 3243249664/1248819541/1096796 instead
> 
> liverfs: Data verification failed
> 770.45user 218639.65system 67:38:18elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
> 100357573552inputs+195522668184outputs (1major+414minor)pagefaults 0swaps
> make: *** [llver] Error 2
> 
> 
> The offset difference is exactly 1M, and it occurs about 3GB into the file.

Interesting - exactly 1M off.  Does this correspond to anything
interesting in extent layout or block allocation boundaries?

Any chance you can patch ll_ver_fs to continue after the first error?
I'd be happy to write the patch for you.

> In total, there are 726 directories, each with 32 4GB files (except the last,
> which only has 12 files). So directory 373 is roughly half-way. I'll take a look
> at the block allocation of both the directory and the file and see if they are
> straddling the 16TB boundary (or other such).

Did you have a chance to look at what falls before and after the 16TB
boundary?

Thanks,

-VAL

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ll_ver_fs data verification failure - 96TB fs
  2009-08-03 13:54 ll_ver_fs data verification failure - 96TB fs Nick Dokos
  2009-08-03 14:22 ` Nick Dokos
  2009-08-06 20:04 ` Valerie Aurora
@ 2009-08-06 20:36 ` Valerie Aurora
  2 siblings, 0 replies; 11+ messages in thread
From: Valerie Aurora @ 2009-08-06 20:36 UTC (permalink / raw)
  To: Nick Dokos; +Cc: linux-ext4

On Mon, Aug 03, 2009 at 09:54:36AM -0400, Nick Dokos wrote:
> Just a heads-up for now. I ran ll_ver_fs on a 96TB fs - the write phase
> finished without problems, but the read phase encountered a problem:
> 
> ...
> read File name: /mnt/dir00373/file026
> 
> liverfs: verify /mnt/dir00373/file026 failed offset/timestamp/inode 3244298240/1248819541/1096796: found 3243249664/1248819541/1096796 instead
> 
> liverfs: Data verification failed
> 770.45user 218639.65system 67:38:18elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
> 100357573552inputs+195522668184outputs (1major+414minor)pagefaults 0swaps
> make: *** [llver] Error 2
> 
> 
> The offset difference is exactly 1M, and it occurs about 3GB into the file.

Just a note, this is an off-by-one rather than a bit flip:

obase=2
3244298240
11000001011000000001000000000000
3243249664
11000001010100000001000000000000

-VAL

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ll_ver_fs data verification failure - 96TB fs
  2009-08-06 20:04 ` Valerie Aurora
@ 2009-08-06 20:37   ` Nick Dokos
  2009-08-06 20:50     ` Andreas Dilger
  0 siblings, 1 reply; 11+ messages in thread
From: Nick Dokos @ 2009-08-06 20:37 UTC (permalink / raw)
  To: Valerie Aurora; +Cc: Nick Dokos, linux-ext4

Valerie Aurora <vaurora@redhat.com> wrote:

> On Mon, Aug 03, 2009 at 09:54:36AM -0400, Nick Dokos wrote:
> > Just a heads-up for now. I ran ll_ver_fs on a 96TB fs - the write phase
> > finished without problems, but the read phase encountered a problem:
> > 
> > ...
> > read File name: /mnt/dir00373/file026
> > 
> > liverfs: verify /mnt/dir00373/file026 failed offset/timestamp/inode 3244298240/1248819541/1096796: found 3243249664/1248819541/1096796 instead
> > 
> > liverfs: Data verification failed
> > 770.45user 218639.65system 67:38:18elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
> > 100357573552inputs+195522668184outputs (1major+414minor)pagefaults 0swaps
> > make: *** [llver] Error 2
> > 
> > 
> > The offset difference is exactly 1M, and it occurs about 3GB into the file.
> 
> Interesting - exactly 1M off.  Does this correspond to anything
> interesting in extent layout or block allocation boundaries?
> 
> Any chance you can patch ll_ver_fs to continue after the first error?
> I'd be happy to write the patch for you.

I did that to begin with but the problem turns out to be much more
mundane: there was an IO error on one of the volumes. It wasn't quite
obvious (no red lights going off) but there *was* a message in
/var/log/messages - unfortunately I missed it. I eventually recreated
the error by trying to read the file with ``od -c'' and then went back
and found the original error. I don't know why/how ll_ver_fs managed to
read the offset and come up with a 1M difference[1] -- ``od -c'' failed with
a big thud.

We have now replaced the disk and I'm doing the test again: it should be
done (barring further problems) by sometime next week.

> 
> > In total, there are 726 directories, each with 32 4GB files (except the last,
> > which only has 12 files). So directory 373 is roughly half-way. I'll take a look
> > at the block allocation of both the directory and the file and see if they are
> > straddling the 16TB boundary (or other such).
> 
> Did you have a chance to look at what falls before and after the 16TB
> boundary?
> 

I did go barking up the wrong tree for a while :-) (or should that be :-( ?)

Thanks,
Nick

[1] that's a 2-bit flip:

3244298240 = 2#11000001011000000001000000000000
3243249664 = 2#11000001010100000001000000000000

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ll_ver_fs data verification failure - 96TB fs
  2009-08-06 20:37   ` Nick Dokos
@ 2009-08-06 20:50     ` Andreas Dilger
  2009-08-06 21:28       ` Nick Dokos
  0 siblings, 1 reply; 11+ messages in thread
From: Andreas Dilger @ 2009-08-06 20:50 UTC (permalink / raw)
  To: Nick Dokos; +Cc: Valerie Aurora, linux-ext4

On Aug 06, 2009  16:37 -0400, Nick Dokos wrote:
> I did that to begin with but the problem turns out to be much more
> mundane: there was an IO error on one of the volumes. It wasn't quite
> obvious (no red lights going off) but there *was* a message in
> /var/log/messages - unfortunately I missed it. I eventually recreated
> the error by trying to read the file with ``od -c'' and then went back
> and found the original error. I don't know why/how ll_ver_fs managed to
> read the offset and come up with a 1M difference[1] -- ``od -c'' failed with
> a big thud.

Can you have a look at the error handling in ll_ver_fs at that point?
It seems that it might just have re-used the previous 1MB buffer, but
didn't detect/report the error from the read, which would itself be bad.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ll_ver_fs data verification failure - 96TB fs
  2009-08-06 20:50     ` Andreas Dilger
@ 2009-08-06 21:28       ` Nick Dokos
  2009-08-06 21:43         ` Nick Dokos
  2009-08-06 22:19         ` Andreas Dilger
  0 siblings, 2 replies; 11+ messages in thread
From: Nick Dokos @ 2009-08-06 21:28 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Nick Dokos, Valerie Aurora, linux-ext4

> On Aug 06, 2009  16:37 -0400, Nick Dokos wrote:
> > I did that to begin with but the problem turns out to be much more
> > mundane: there was an IO error on one of the volumes. It wasn't quite
> > obvious (no red lights going off) but there *was* a message in
> > /var/log/messages - unfortunately I missed it. I eventually recreated
> > the error by trying to read the file with ``od -c'' and then went back
> > and found the original error. I don't know why/how ll_ver_fs managed to
> > read the offset and come up with a 1M difference[1] -- ``od -c'' failed with
> > a big thud.
> 
> Can you have a look at the error handling in ll_ver_fs at that point?
> It seems that it might just have re-used the previous 1MB buffer, but
> didn't detect/report the error from the read, which would itself be bad.
> 

It looks right to me:

,----
|               ...
| 		if (read(fd, chunk_buf, chunksize) < 0) {
| 			fprintf(stderr, "\n%s: read %s+%llu failed: %s\n",
| 				progname, file, offset, strerror(errno));
| 			return 1;
| 		}
| 		if (verify_chunk(chunk_buf, chunksize, offset, time_st,
| 				 inode_st, file) != 0)
| 			return 1;
|               ...
`----

The read() should have failed (and I should have gotten a different error
message) but somehow it didn't - instead, verify_chunk() was called and
*that* detected the mismatch.

Thanks,
Nick




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ll_ver_fs data verification failure - 96TB fs
  2009-08-06 21:28       ` Nick Dokos
@ 2009-08-06 21:43         ` Nick Dokos
  2009-08-06 22:01           ` Nick Dokos
  2009-08-06 22:19         ` Andreas Dilger
  1 sibling, 1 reply; 11+ messages in thread
From: Nick Dokos @ 2009-08-06 21:43 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Valerie Aurora, linux-ext4, nicholas.dokos

Nick Dokos <nicholas.dokos@hp.com> wrote:

> > On Aug 06, 2009  16:37 -0400, Nick Dokos wrote:
> > > I did that to begin with but the problem turns out to be much more
> > > mundane: there was an IO error on one of the volumes. It wasn't quite
> > > obvious (no red lights going off) but there *was* a message in
> > > /var/log/messages - unfortunately I missed it. I eventually recreated
> > > the error by trying to read the file with ``od -c'' and then went back
> > > and found the original error. I don't know why/how ll_ver_fs managed to
> > > read the offset and come up with a 1M difference[1] -- ``od -c'' failed with
> > > a big thud.
> > 
> > Can you have a look at the error handling in ll_ver_fs at that point?
> > It seems that it might just have re-used the previous 1MB buffer, but
> > didn't detect/report the error from the read, which would itself be bad.
> > 
> 
> It looks right to me:
> 
> ,----
> |               ...
> | 		if (read(fd, chunk_buf, chunksize) < 0) {
> | 			fprintf(stderr, "\n%s: read %s+%llu failed: %s\n",
> | 				progname, file, offset, strerror(errno));
> | 			return 1;
> | 		}
> | 		if (verify_chunk(chunk_buf, chunksize, offset, time_st,
> | 				 inode_st, file) != 0)
> | 			return 1;
> |               ...
> `----
> 
> The read() should have failed (and I should have gotten a different error
> message) but somehow it didn't - instead, verify_chunk() was called and
> *that* detected the mismatch.
> 

The offset starts at 0 and is marched along at a 1MB stride, so it's
conceivable that the read() returned 0 (avoiding the error return) and
did not touch chunk_buf (thereby leading to the verify_chunk() failure).

That would be consistent with what I got, but if so, it would be a kernel
bug somewhere in the read() calling sequence, right?

Thanks,
Nick

PS. For the record, this is the error from /var/log/messages:

,----
| ...
| Aug  5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Unhandled sense code
| Aug  5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
| Aug  5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Sense Key : Medium Error [current] 
| Aug  5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Add. Sense: No additional sense information
| Aug  5 18:14:11 shifter kernel: end_request: I/O error, dev sdn, sector 2238819328
| Aug  5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Unhandled sense code
| Aug  5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
| Aug  5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Sense Key : Medium Error [current] 
| Aug  5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Add. Sense: No additional sense information
| Aug  5 18:16:03 shifter kernel: end_request: I/O error, dev sdn, sector 2238819592
`----

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ll_ver_fs data verification failure - 96TB fs
  2009-08-06 21:43         ` Nick Dokos
@ 2009-08-06 22:01           ` Nick Dokos
  2009-08-06 22:54             ` Nick Dokos
  0 siblings, 1 reply; 11+ messages in thread
From: Nick Dokos @ 2009-08-06 22:01 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Valerie Aurora, linux-ext4, nicholas.dokos

Nick Dokos <nicholas.dokos@hp.com> wrote:

> Nick Dokos <nicholas.dokos@hp.com> wrote:
> 
> > > On Aug 06, 2009  16:37 -0400, Nick Dokos wrote:
> > > > I did that to begin with but the problem turns out to be much more
> > > > mundane: there was an IO error on one of the volumes. It wasn't quite
> > > > obvious (no red lights going off) but there *was* a message in
> > > > /var/log/messages - unfortunately I missed it. I eventually recreated
> > > > the error by trying to read the file with ``od -c'' and then went back
> > > > and found the original error. I don't know why/how ll_ver_fs managed to
> > > > read the offset and come up with a 1M difference[1] -- ``od -c'' failed with
> > > > a big thud.
> > > 
> > > Can you have a look at the error handling in ll_ver_fs at that point?
> > > It seems that it might just have re-used the previous 1MB buffer, but
> > > didn't detect/report the error from the read, which would itself be bad.
> > > 
> > 
> > It looks right to me:
> > 
> > ,----
> > |               ...
> > | 		if (read(fd, chunk_buf, chunksize) < 0) {
> > | 			fprintf(stderr, "\n%s: read %s+%llu failed: %s\n",
> > | 				progname, file, offset, strerror(errno));
> > | 			return 1;
> > | 		}
> > | 		if (verify_chunk(chunk_buf, chunksize, offset, time_st,
> > | 				 inode_st, file) != 0)
> > | 			return 1;
> > |               ...
> > `----
> > 
> > The read() should have failed (and I should have gotten a different error
> > message) but somehow it didn't - instead, verify_chunk() was called and
> > *that* detected the mismatch.
> > 
> 
> The offset starts at 0 and is marched along at a 1MB stride, so it's
> conceivable that the read() returned 0 (avoiding the error return) and
> did not touch chunk_buf (thereby leading to the verify_chunk() failure).
> 
> That would be consistent with what I got, but if so, it would be a kernel
> bug somewhere in the read() calling sequence, right?
> 
> Thanks,
> Nick
> 
> PS. For the record, this is the error from /var/log/messages:
> 
> ,----
> | ...
> | Aug  5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Unhandled sense code
> | Aug  5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> | Aug  5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Sense Key : Medium Error [current] 
> | Aug  5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Add. Sense: No additional sense information
> | Aug  5 18:14:11 shifter kernel: end_request: I/O error, dev sdn, sector 2238819328
> | Aug  5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Unhandled sense code
> | Aug  5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> | Aug  5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Sense Key : Medium Error [current] 
> | Aug  5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Add. Sense: No additional sense information
> | Aug  5 18:16:03 shifter kernel: end_request: I/O error, dev sdn, sector 2238819592
> `----
> 

Jim Owens pointed out to me that the code doesn't deal correctly with
short reads.  The verification information is at the beginning of every
4kB block, so the sector error causes a short read with exactly the
symptoms I see.  The next read() would get the error.

The code should  perhaps read something like this:

 		if ((nread = read(fd, chunk_buf, chunksize)) < 0) {
 			fprintf(stderr, "\n%s: read %s+%llu failed: %s\n",
 				progname, file, offset, strerror(errno));
 			return 1;
 		}
                if (nread < chunksize) {
                        fprintf(stderr, "short read etc");
                        /* or force the next read() to check for errors? */
                        return 1;
                }
 		if (verify_chunk(chunk_buf, chunksize, offset, time_st,
 				 inode_st, file) != 0)
 			return 1;

Thanks,
Nick

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ll_ver_fs data verification failure - 96TB fs
  2009-08-06 21:28       ` Nick Dokos
  2009-08-06 21:43         ` Nick Dokos
@ 2009-08-06 22:19         ` Andreas Dilger
  1 sibling, 0 replies; 11+ messages in thread
From: Andreas Dilger @ 2009-08-06 22:19 UTC (permalink / raw)
  To: Nick Dokos; +Cc: Valerie Aurora, linux-ext4

On Aug 06, 2009  17:28 -0400, Nick Dokos wrote:
> > On Aug 06, 2009  16:37 -0400, Nick Dokos wrote:
> > Can you have a look at the error handling in ll_ver_fs at that point?
> > It seems that it might just have re-used the previous 1MB buffer, but
> > didn't detect/report the error from the read, which would itself be bad.
> 
> It looks right to me:
> 
> ,----
> |               ...
> | 		if (read(fd, chunk_buf, chunksize) < 0) {
> | 			fprintf(stderr, "\n%s: read %s+%llu failed: %s\n",
> | 				progname, file, offset, strerror(errno));
> | 			return 1;
> | 		}
> | 		if (verify_chunk(chunk_buf, chunksize, offset, time_st,
> | 				 inode_st, file) != 0)
> | 			return 1;
> |               ...
> `----
> 
> The read() should have failed (and I should have gotten a different error
> message) but somehow it didn't - instead, verify_chunk() was called and
> *that* detected the mismatch.

Well, it seems possible that read() returned less than chunksize bytes,
and the buffer compared correctly up to the 4kB chunk that is beyond the
read data.  That looks like a small bug in llverfs, since it is legal
for read() to return less than the requested bytes.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ll_ver_fs data verification failure - 96TB fs
  2009-08-06 22:01           ` Nick Dokos
@ 2009-08-06 22:54             ` Nick Dokos
  0 siblings, 0 replies; 11+ messages in thread
From: Nick Dokos @ 2009-08-06 22:54 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Valerie Aurora, linux-ext4, nicholas.dokos

> ... 
> The code should  perhaps read something like this:
> 
>  		if ((nread = read(fd, chunk_buf, chunksize)) < 0) {
>  			fprintf(stderr, "\n%s: read %s+%llu failed: %s\n",
>  				progname, file, offset, strerror(errno));
>  			return 1;
>  		}
>                 if (nread < chunksize) {
>                         fprintf(stderr, "short read etc");
>                         /* or force the next read() to check for errors? */
>                         return 1;
>                 }
>  		if (verify_chunk(chunk_buf, chunksize, offset, time_st,
>  				 inode_st, file) != 0)
>  			return 1;
> 

... except that the write() side also does not deal with short writes,
and for a full run, the last file of the last directory will probably be
short even if no errors occur. In that case, the short read is not an
error.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-08-06 22:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-03 13:54 ll_ver_fs data verification failure - 96TB fs Nick Dokos
2009-08-03 14:22 ` Nick Dokos
2009-08-06 20:04 ` Valerie Aurora
2009-08-06 20:37   ` Nick Dokos
2009-08-06 20:50     ` Andreas Dilger
2009-08-06 21:28       ` Nick Dokos
2009-08-06 21:43         ` Nick Dokos
2009-08-06 22:01           ` Nick Dokos
2009-08-06 22:54             ` Nick Dokos
2009-08-06 22:19         ` Andreas Dilger
2009-08-06 20:36 ` Valerie Aurora

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).