* fsck.jfs segfaults on x86_64
@ 2005-06-10 14:00 Alex Deucher
2005-06-10 14:14 ` [Jfs-discussion] " Dave Kleikamp
0 siblings, 1 reply; 6+ messages in thread
From: Alex Deucher @ 2005-06-10 14:00 UTC (permalink / raw)
To: jfs-discussion, Linux Kernel Mailing List; +Cc: ag
We have a large lvm2 logical volume (6.91T) which contains a JFS
filesystem. The volumes accessed via emulex FC HBAs connected to a
nexsan SAN. There was a bug in the SAN firmware that caused the
primary controller to lose sync with the other controller and go down.
Normally when this happens we are able to reboot the SAN and the
server and then run fsck on the volume, and everything is fine (on a
side note, we have updated the SAN firmware to fix the sync problem).
however, fsck now segfaults and the volume is dirty so it can't be
mounted. lvdisplay and vgdisplay seem to work fine displaying the
correct info. Does anyone know what may be causing the problem or how
we can fix it? If possible I'd like to save the data on the volumes.
#> time fsck.jfs /dev/vg00/lvol0
fsck.jfs version 1.1.4, 30-Oct-2003
processing started: 6/8/2005 18.1.19
Using default parameter: -p
The current device is: /dev/vg00/lvol0
Block size in bytes: 4096
Filesystem size in blocks: 1855561728
**Phase 0 - Replay Journal Log
Segmentation fault
real 1m40.396s
user 0m0.038s
sys 0m0.297s
strace:
<snip>
lseek(3, 7600357904384, SEEK_SET) = 7600357904384
read(3, "8h\36\0\0\0t\17\0\206\1\0\0\0\0\0\0\10\204\0\0\0\0\0\1"...,
4096) = 4096
lseek(3, 4679075332096, SEEK_SET) = 4679075332096
read(3, "\301\v\201B\20\0\0\0
\357\20\0\336\\\24\0\4\0\0\0\370\351"..., 4096) = 4096
lseek(3, 4677018386432, SEEK_SET) = 4677018386432
read(3, "\0\0\0D\0\0\0\0\16\1\0\0\377\377\377\377\377\377\377\377"...,
4096) = 4096
lseek(3, 7600347168768, SEEK_SET) = 7600347168768
write(3, "D\0LOGREDO: Allocating for IMap:"..., 8192) = 8192
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
/var/log/messages
Jun 8 17:34:11 nutcracker fsck.jfs[12223]: segfault at 0000000000000490 rip
00000000004178f1 rsp 00007fffff996f40 error 6
mount /mnt/san
mount: wrong fs type, bad option, bad superblock on /dev/vg00/lvol0,
or too many mounted file systems
Additional info:
Linux nutcracker 2.6.12-rc6 #1 SMP Wed Jun 8 16:46:17 EDT 2005 x86_64 AMD
Opteron(tm) Processor 244 AuthenticAMD GNU/Linux
That's the only kernel version (2.6.12) that has support for a fiber channel
Emulex card (lpfc).
Thanks in advance for any help or advice.
Alex
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Jfs-discussion] fsck.jfs segfaults on x86_64
2005-06-10 14:00 fsck.jfs segfaults on x86_64 Alex Deucher
@ 2005-06-10 14:14 ` Dave Kleikamp
2005-06-10 14:21 ` Alex Deucher
0 siblings, 1 reply; 6+ messages in thread
From: Dave Kleikamp @ 2005-06-10 14:14 UTC (permalink / raw)
To: Alex Deucher; +Cc: jfs-discussion, Linux Kernel Mailing List, ag
On Fri, 2005-06-10 at 10:00 -0400, Alex Deucher wrote:
> We have a large lvm2 logical volume (6.91T) which contains a JFS
> filesystem. The volumes accessed via emulex FC HBAs connected to a
> nexsan SAN. There was a bug in the SAN firmware that caused the
> primary controller to lose sync with the other controller and go down.
> Normally when this happens we are able to reboot the SAN and the
> server and then run fsck on the volume, and everything is fine (on a
> side note, we have updated the SAN firmware to fix the sync problem).
> however, fsck now segfaults and the volume is dirty so it can't be
> mounted. lvdisplay and vgdisplay seem to work fine displaying the
> correct info. Does anyone know what may be causing the problem or how
> we can fix it? If possible I'd like to save the data on the volumes.
>
> #> time fsck.jfs /dev/vg00/lvol0
> fsck.jfs version 1.1.4, 30-Oct-2003
1.1.4 is quite old. Can you try a recent version of jfsutils?
http://jfs.sourceforge.net/project/pub/jfsutils-1.1.8.tar.gz
If that doesn't work, you can try running "fsck.jfs
--omit_journal_replay", since it is trapping while replaying the
journal. If all else fails, you should be able to mount it read-only
(mount -oro) to recover the data.
Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Jfs-discussion] fsck.jfs segfaults on x86_64
2005-06-10 14:14 ` [Jfs-discussion] " Dave Kleikamp
@ 2005-06-10 14:21 ` Alex Deucher
2005-06-10 16:09 ` Alex Deucher
0 siblings, 1 reply; 6+ messages in thread
From: Alex Deucher @ 2005-06-10 14:21 UTC (permalink / raw)
To: Dave Kleikamp; +Cc: jfs-discussion, Linux Kernel Mailing List, ag
On 6/10/05, Dave Kleikamp <shaggy@austin.ibm.com> wrote:
> On Fri, 2005-06-10 at 10:00 -0400, Alex Deucher wrote:
> > We have a large lvm2 logical volume (6.91T) which contains a JFS
> > filesystem. The volumes accessed via emulex FC HBAs connected to a
> > nexsan SAN. There was a bug in the SAN firmware that caused the
> > primary controller to lose sync with the other controller and go down.
> > Normally when this happens we are able to reboot the SAN and the
> > server and then run fsck on the volume, and everything is fine (on a
> > side note, we have updated the SAN firmware to fix the sync problem).
> > however, fsck now segfaults and the volume is dirty so it can't be
> > mounted. lvdisplay and vgdisplay seem to work fine displaying the
> > correct info. Does anyone know what may be causing the problem or how
> > we can fix it? If possible I'd like to save the data on the volumes.
> >
> > #> time fsck.jfs /dev/vg00/lvol0
> > fsck.jfs version 1.1.4, 30-Oct-2003
>
> 1.1.4 is quite old. Can you try a recent version of jfsutils?
> http://jfs.sourceforge.net/project/pub/jfsutils-1.1.8.tar.gz
sorry, I should have mentioned that. we also tried 1.1.7 with the
same result. I can try 1.1.8 too.
>
> If that doesn't work, you can try running "fsck.jfs
> --omit_journal_replay", since it is trapping while replaying the
> journal. If all else fails, you should be able to mount it read-only
> (mount -oro) to recover the data.
cool I'll give that a try.
Thanks!
Alex
>
> Thanks,
> Shaggy
> --
> David Kleikamp
> IBM Linux Technology Center
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Jfs-discussion] fsck.jfs segfaults on x86_64
2005-06-10 14:21 ` Alex Deucher
@ 2005-06-10 16:09 ` Alex Deucher
2005-06-10 16:16 ` Dave Kleikamp
0 siblings, 1 reply; 6+ messages in thread
From: Alex Deucher @ 2005-06-10 16:09 UTC (permalink / raw)
To: Dave Kleikamp; +Cc: jfs-discussion, Linux Kernel Mailing List, ag
On 6/10/05, Alex Deucher <alexdeucher@gmail.com> wrote:
> On 6/10/05, Dave Kleikamp <shaggy@austin.ibm.com> wrote:
> > On Fri, 2005-06-10 at 10:00 -0400, Alex Deucher wrote:
> > > We have a large lvm2 logical volume (6.91T) which contains a JFS
> > > filesystem. The volumes accessed via emulex FC HBAs connected to a
> > > nexsan SAN. There was a bug in the SAN firmware that caused the
> > > primary controller to lose sync with the other controller and go down.
> > > Normally when this happens we are able to reboot the SAN and the
> > > server and then run fsck on the volume, and everything is fine (on a
> > > side note, we have updated the SAN firmware to fix the sync problem).
> > > however, fsck now segfaults and the volume is dirty so it can't be
> > > mounted. lvdisplay and vgdisplay seem to work fine displaying the
> > > correct info. Does anyone know what may be causing the problem or how
> > > we can fix it? If possible I'd like to save the data on the volumes.
> > >
> > > #> time fsck.jfs /dev/vg00/lvol0
> > > fsck.jfs version 1.1.4, 30-Oct-2003
> >
> > 1.1.4 is quite old. Can you try a recent version of jfsutils?
> > http://jfs.sourceforge.net/project/pub/jfsutils-1.1.8.tar.gz
>
> sorry, I should have mentioned that. we also tried 1.1.7 with the
> same result. I can try 1.1.8 too.
>
1.1.8 segfaulted as well.
> >
> > If that doesn't work, you can try running "fsck.jfs
> > --omit_journal_replay", since it is trapping while replaying the
> > journal. If all else fails, you should be able to mount it read-only
> > (mount -oro) to recover the data.
>
> cool I'll give that a try.
running fsck.jfs --omit_journal_replay did the trick! thanks,
Alex
>
> Thanks!
>
> Alex
>
> >
> > Thanks,
> > Shaggy
> > --
> > David Kleikamp
> > IBM Linux Technology Center
> >
> >
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Jfs-discussion] fsck.jfs segfaults on x86_64
2005-06-10 16:09 ` Alex Deucher
@ 2005-06-10 16:16 ` Dave Kleikamp
2005-06-10 16:22 ` Alex Deucher
0 siblings, 1 reply; 6+ messages in thread
From: Dave Kleikamp @ 2005-06-10 16:16 UTC (permalink / raw)
To: Alex Deucher; +Cc: jfs-discussion, Linux Kernel Mailing List, ag
On Fri, 2005-06-10 at 12:09 -0400, Alex Deucher wrote:
> 1.1.8 segfaulted as well.
Hmm. This bothers me.
> running fsck.jfs --omit_journal_replay did the trick! thanks,
Well, at least it got you going again. :^)
Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Jfs-discussion] fsck.jfs segfaults on x86_64
2005-06-10 16:16 ` Dave Kleikamp
@ 2005-06-10 16:22 ` Alex Deucher
0 siblings, 0 replies; 6+ messages in thread
From: Alex Deucher @ 2005-06-10 16:22 UTC (permalink / raw)
To: Dave Kleikamp; +Cc: jfs-discussion, Linux Kernel Mailing List, ag
On 6/10/05, Dave Kleikamp <shaggy@austin.ibm.com> wrote:
> On Fri, 2005-06-10 at 12:09 -0400, Alex Deucher wrote:
>
> > 1.1.8 segfaulted as well.
>
> Hmm. This bothers me.
let me know if there's anythign else you need me to test. I suppose
it must have been something odd in the journal because it hadn't ever
segfaulted on any previous runs before that last SAN failure.
>
> > running fsck.jfs --omit_journal_replay did the trick! thanks,
>
> Well, at least it got you going again. :^)
Thanks for your help!
Alex
>
> Thanks,
> Shaggy
> --
> David Kleikamp
> IBM Linux Technology Center
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-06-10 16:23 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-10 14:00 fsck.jfs segfaults on x86_64 Alex Deucher
2005-06-10 14:14 ` [Jfs-discussion] " Dave Kleikamp
2005-06-10 14:21 ` Alex Deucher
2005-06-10 16:09 ` Alex Deucher
2005-06-10 16:16 ` Dave Kleikamp
2005-06-10 16:22 ` Alex Deucher
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox