* Re: [Jfs-discussion] Re: Question about file system failure
[not found] ` <1119883314.9271.29.camel@localhost>
@ 2005-06-27 14:45 ` Dave Kleikamp
2005-06-27 15:10 ` Chris Penney
0 siblings, 1 reply; 2+ messages in thread
From: Dave Kleikamp @ 2005-06-27 14:45 UTC (permalink / raw)
To: penney; +Cc: linux-kernel, jfs-discussion
On Mon, 2005-06-27 at 09:41 -0500, Dave Kleikamp wrote:
> On Mon, 2005-06-27 at 10:10 -0400, Chris Penney wrote:
> > I had an NFS file server using JFS fail this weekend. A reboot, which
> > made fsck do a full check, seems to have cleared everything up. The
> > initial errors I got were:
> >
> > Jun 25 09:27:04 nicfs2 kernel: Incorrect number of segments after building list
> > Jun 25 09:27:04 nicfs2 kernel: counted 16, received 15
> > Jun 25 09:27:04 nicfs2 kernel: req nr_sec 320, cur_nr_sec 8
>
> These are coming from scsi_init_io() in drivers/scsi/scsi_lib.c. I
> don't know what it means, but I'm inclined to think that it indicates a
> software bug rather than a hardware error.
>
> > Jun 25 09:27:04 nicfs2 kernel: device-mapper: dm-multipath: Failing path 8:96.
> > Jun 25 09:27:04 nicfs2 kernel: cfq: depth 4 reached, tagging now on
> > Jun 25 09:27:04 nicfs2 kernel: end_request: I/O error, dev sdc, sector
> > 1592060824
> > Jun 25 09:27:04 nicfs2 kernel: device-mapper: dm-multipath: Failing path 8:32.
> > Jun 25 09:27:04 nicfs2 kernel: end_request: I/O error, dev sdc, sector
> > 1592062936
>
> I'm not sure if dm-multipath may be responsible.
>
> > Following that was a flurry of JFS errors. I assume these messages
> > have nothing at all to do with JFS, but I wanted to make certain.
>
> I don't think that JFS is the cause.
>
> > I can't turn up much googling that error. If anyone has any idea what
> > caused that I'd love to hear it.
>
> I'm copying this to linux-kernel in the hopes that someone there will be
> able to help. It would be useful to know what kernel you are running.
Well, I meant to cc linux-kernel. :-)
> > One last question, for an NFS server is it better to mount the volume
> > with errors=panic? It seems like that would keep I/Os from failing
> > due to it being a read-only file system on error. In this case it
> > would seem like a panic + boot would have let a lot of processes (this
> > is used in a batch environment) resume.
>
> Seems reasonable, but I'll let others comment.
>
> > Chris
>
> Thanks,
> Shaggy
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Jfs-discussion] Re: Question about file system failure
2005-06-27 14:45 ` [Jfs-discussion] Re: Question about file system failure Dave Kleikamp
@ 2005-06-27 15:10 ` Chris Penney
0 siblings, 0 replies; 2+ messages in thread
From: Chris Penney @ 2005-06-27 15:10 UTC (permalink / raw)
To: Dave Kleikamp; +Cc: linux-kernel, jfs-discussion
On 6/27/05, Dave Kleikamp <shaggy@austin.ibm.com> wrote:
> On Mon, 2005-06-27 at 09:41 -0500, Dave Kleikamp wrote:
> > On Mon, 2005-06-27 at 10:10 -0400, Chris Penney wrote:
> > > I had an NFS file server using JFS fail this weekend. A reboot, which
> > > made fsck do a full check, seems to have cleared everything up. The
> > > initial errors I got were:
> > >
> > > Jun 25 09:27:04 nicfs2 kernel: Incorrect number of segments after building list
> > > Jun 25 09:27:04 nicfs2 kernel: counted 16, received 15
> > > Jun 25 09:27:04 nicfs2 kernel: req nr_sec 320, cur_nr_sec 8
> >
> > These are coming from scsi_init_io() in drivers/scsi/scsi_lib.c. I
> > don't know what it means, but I'm inclined to think that it indicates a
> > software bug rather than a hardware error.
> >
> > > Jun 25 09:27:04 nicfs2 kernel: device-mapper: dm-multipath: Failing path 8:96.
> > > Jun 25 09:27:04 nicfs2 kernel: cfq: depth 4 reached, tagging now on
> > > Jun 25 09:27:04 nicfs2 kernel: end_request: I/O error, dev sdc, sector
> > > 1592060824
> > > Jun 25 09:27:04 nicfs2 kernel: device-mapper: dm-multipath: Failing path 8:32.
> > > Jun 25 09:27:04 nicfs2 kernel: end_request: I/O error, dev sdc, sector
> > > 1592062936
> >
> > I'm not sure if dm-multipath may be responsible.
> >
> > > Following that was a flurry of JFS errors. I assume these messages
> > > have nothing at all to do with JFS, but I wanted to make certain.
> >
> > I don't think that JFS is the cause.
> >
> > > I can't turn up much googling that error. If anyone has any idea what
> > > caused that I'd love to hear it.
> >
> > I'm copying this to linux-kernel in the hopes that someone there will be
> > able to help. It would be useful to know what kernel you are running.
>
> Well, I meant to cc linux-kernel. :-)
>
I'm running:
* SMP Pentium 4 w/ hyperthreading enabled (IBM x345)
* Dual QLogic 2340 HBAs
* STK D280 Disk Array (nothing in logs)
* Kernel 2.6.11.5 (built around March 24th)
* 2.6.11-rc3-udm2 patch
* linux-2.6.11-NFS_ALL patch (this was critical for me)
I'm using DM to make four multipath devices (each device is a 1TB lun)
and then one stripe made from the four multipaths. Because the array
is unsupported by multipathd (at least in the code I have) it can only
failover once. We then manually fail it back to primary if that
happens (which hasn't happened outside of testing).
> > > One last question, for an NFS server is it better to mount the volume
> > > with errors=panic? It seems like that would keep I/Os from failing
> > > due to it being a read-only file system on error. In this case it
> > > would seem like a panic + boot would have let a lot of processes (this
> > > is used in a batch environment) resume.
> >
> > Seems reasonable, but I'll let others comment.
> >
> > > Chris
> >
> > Thanks,
> > Shaggy
> --
> David Kleikamp
> IBM Linux Technology Center
>
>
Thanks,
Chris
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2005-06-27 16:01 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <111aefd05062707103d24f568@mail.gmail.com>
[not found] ` <1119883314.9271.29.camel@localhost>
2005-06-27 14:45 ` [Jfs-discussion] Re: Question about file system failure Dave Kleikamp
2005-06-27 15:10 ` Chris Penney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox