From: Nick Dokos <nicholas.dokos@hp.com>
To: Andreas Dilger <adilger@sun.com>
Cc: Valerie Aurora <vaurora@redhat.com>, linux-ext4@vger.kernel.org
Cc: nicholas.dokos@hp.com
Subject: Re: ll_ver_fs data verification failure - 96TB fs
Date: Thu, 06 Aug 2009 17:43:00 -0400 [thread overview]
Message-ID: <18822.1249594980@alphaville.usa.hp.com> (raw)
In-Reply-To: Message from Nick Dokos <nicholas.dokos@hp.com> of "Thu, 06 Aug 2009 17:28:08 EDT." <18690.1249594088@alphaville.usa.hp.com>
Nick Dokos <nicholas.dokos@hp.com> wrote:
> > On Aug 06, 2009 16:37 -0400, Nick Dokos wrote:
> > > I did that to begin with but the problem turns out to be much more
> > > mundane: there was an IO error on one of the volumes. It wasn't quite
> > > obvious (no red lights going off) but there *was* a message in
> > > /var/log/messages - unfortunately I missed it. I eventually recreated
> > > the error by trying to read the file with ``od -c'' and then went back
> > > and found the original error. I don't know why/how ll_ver_fs managed to
> > > read the offset and come up with a 1M difference[1] -- ``od -c'' failed with
> > > a big thud.
> >
> > Can you have a look at the error handling in ll_ver_fs at that point?
> > It seems that it might just have re-used the previous 1MB buffer, but
> > didn't detect/report the error from the read, which would itself be bad.
> >
>
> It looks right to me:
>
> ,----
> | ...
> | if (read(fd, chunk_buf, chunksize) < 0) {
> | fprintf(stderr, "\n%s: read %s+%llu failed: %s\n",
> | progname, file, offset, strerror(errno));
> | return 1;
> | }
> | if (verify_chunk(chunk_buf, chunksize, offset, time_st,
> | inode_st, file) != 0)
> | return 1;
> | ...
> `----
>
> The read() should have failed (and I should have gotten a different error
> message) but somehow it didn't - instead, verify_chunk() was called and
> *that* detected the mismatch.
>
The offset starts at 0 and is marched along at a 1MB stride, so it's
conceivable that the read() returned 0 (avoiding the error return) and
did not touch chunk_buf (thereby leading to the verify_chunk() failure).
That would be consistent with what I got, but if so, it would be a kernel
bug somewhere in the read() calling sequence, right?
Thanks,
Nick
PS. For the record, this is the error from /var/log/messages:
,----
| ...
| Aug 5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Unhandled sense code
| Aug 5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
| Aug 5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Sense Key : Medium Error [current]
| Aug 5 18:14:11 shifter kernel: sd 10:0:7:3: [sdn] Add. Sense: No additional sense information
| Aug 5 18:14:11 shifter kernel: end_request: I/O error, dev sdn, sector 2238819328
| Aug 5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Unhandled sense code
| Aug 5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
| Aug 5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Sense Key : Medium Error [current]
| Aug 5 18:16:03 shifter kernel: sd 10:0:7:3: [sdn] Add. Sense: No additional sense information
| Aug 5 18:16:03 shifter kernel: end_request: I/O error, dev sdn, sector 2238819592
`----
next prev parent reply other threads:[~2009-08-06 21:42 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-03 13:54 ll_ver_fs data verification failure - 96TB fs Nick Dokos
2009-08-03 14:22 ` Nick Dokos
2009-08-06 20:04 ` Valerie Aurora
2009-08-06 20:37 ` Nick Dokos
2009-08-06 20:50 ` Andreas Dilger
2009-08-06 21:28 ` Nick Dokos
2009-08-06 21:43 ` Nick Dokos [this message]
2009-08-06 22:01 ` Nick Dokos
2009-08-06 22:54 ` Nick Dokos
2009-08-06 22:19 ` Andreas Dilger
2009-08-06 20:36 ` Valerie Aurora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18822.1249594980@alphaville.usa.hp.com \
--to=nicholas.dokos@hp.com \
--cc=adilger@sun.com \
--cc=linux-ext4@vger.kernel.org \
--cc=vaurora@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.