linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Nick Dokos <nicholas.dokos@hp.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: Large volume ll_ver_fs results (w/ short read/write patch).
Date: Wed, 19 Aug 2009 19:04:14 -0600	[thread overview]
Message-ID: <20090820010414.GA649@webber.adilger.int> (raw)
In-Reply-To: <20654.1250520912@gamaville.dokosmarshall.org>

On Aug 17, 2009  10:55 -0400, Nick Dokos wrote:
> There were two disk errors encountered that resulted in short reads, but
> the patched ll_ver_fs continued on (patch attached). So two chunks (1MB
> each) were not completely verified (only the part that was read
> successfully was), but the rest of the fs checked out OK. Luckily, the
> fsck did not find any errors: both disk errors were in file data.
> We have replaced the disks but are not planning to repeat the test: it's
> not clear that it would tells us anything more at this point.

> write File name: /mnt/dir00725/file011          
> write complete
> 
> read File name: /mnt/dir00725/file010          
> read complete

Nick, thanks for the patch.  I'm incorporating the fixes upstream,
but one question that was raised is that (in essence) this allows
IO errors to be hit, yet and the return code from llverfs is 0.
The llverdev/llverfs tools are used not only for finding software
data corruption bugs, but also to verify the underlying media.

It was definitely a bug in the original code that there was no
error reported during the write phase if there was a short write,
but this was at least caught during the read phase because the
data would be incorrect.

What I've done is to count errors hit during read and write, and
then exit with a non-zero value if there were any IO errors hit
(as happened in your case), even if the rest of the data was
verified correctly.  This allows scanning the whole disk in a
single pass (if there are not too many underlying errors) but
still ensuring there is no false sense of security because the
program exited with 0.

The current patch can be gotten at:

https://bugzilla.lustre.org/attachment.cgi?id=25407&action=edit

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  reply	other threads:[~2009-08-20  1:04 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-17 14:55 Large volume ll_ver_fs results (w/ short read/write patch) Nick Dokos
2009-08-20  1:04 ` Andreas Dilger [this message]
2009-08-20 16:54   ` Nick Dokos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090820010414.GA649@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=nicholas.dokos@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).