* xfs_force_shutdown
@ 2009-10-12 10:29 Hieu Le Trung
2009-10-12 13:23 ` xfs_force_shutdown Eric Sandeen
0 siblings, 1 reply; 8+ messages in thread
From: Hieu Le Trung @ 2009-10-12 10:29 UTC (permalink / raw)
To: xfs
Hi,
What may cause metadata becomes bad? I got xfs_force_shutdown with 0x2
parameter.
How can I analyze the metadata dump file?
Thanks,
-Hieu
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_force_shutdown
2009-10-12 10:29 xfs_force_shutdown Hieu Le Trung
@ 2009-10-12 13:23 ` Eric Sandeen
2009-10-13 8:43 ` xfs_force_shutdown Hieu Le Trung
0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-10-12 13:23 UTC (permalink / raw)
To: Hieu Le Trung; +Cc: xfs
Hieu Le Trung wrote:
> Hi,
>
> What may cause metadata becomes bad? I got xfs_force_shutdown with 0x2
> parameter.
Software bugs or hardware problems. If you provide the actual kernel
message we can offer more info on what xfs saw and why it shut down.
> How can I analyze the metadata dump file?
the metadump file is just the metadata skeleton of the filesystem; you
can mount it, repair it, point xfs_db at it to debug it, etc.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: xfs_force_shutdown
2009-10-12 13:23 ` xfs_force_shutdown Eric Sandeen
@ 2009-10-13 8:43 ` Hieu Le Trung
2009-10-13 14:51 ` xfs_force_shutdown Eric Sandeen
0 siblings, 1 reply; 8+ messages in thread
From: Hieu Le Trung @ 2009-10-13 8:43 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
Eric Sandeen wrote:
> Hieu Le Trung wrote:
> > Hi,
> >
> > What may cause metadata becomes bad? I got xfs_force_shutdown with
0x2
> > parameter.
>
> Software bugs or hardware problems. If you provide the actual kernel
> message we can offer more info on what xfs saw and why it shut down.
I'm not sure which one is it but the issue is hard to reproduce.
I have following in the dmesg but I'm not sure it's the right one
<1>I/O error in filesystem ("sda2") meta-data dev sda2 block 0xf054f4
("xlog_iodone") error 5 buf count 32768
<5>xfs_force_shutdown(sda2,0x2) called from line 956 of file
fs/xfs/xfs_log.c. Return address = 0x801288d8
Furthermore, the driver's write cache is
<5>SCSI device sda: drive cache: write back
The xfs_logprint shows 'Bad log record header'
xfs_logprint: /dev/sda2 contains a mounted and writable filesystem
data device: 0x802
log device: 0x802 daddr: 15735648 length: 20480
Header 0xa4 wanted 0xfeedbabe
**********************************************************************
* ERROR: header cycle=164 block=14634 *
**********************************************************************
Bad log record header
So I wonder what may cause bad record header?
>
> > How can I analyze the metadata dump file?
>
> the metadump file is just the metadata skeleton of the filesystem; you
> can mount it, repair it, point xfs_db at it to debug it, etc.
Is there any tutorials or guideline in using xfs_db to debug the issue?
>
> -Eric
>
Thanks,
-Hieu
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_force_shutdown
2009-10-13 8:43 ` xfs_force_shutdown Hieu Le Trung
@ 2009-10-13 14:51 ` Eric Sandeen
2009-10-13 15:15 ` xfs_force_shutdown Hieu Le Trung
0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-10-13 14:51 UTC (permalink / raw)
To: Hieu Le Trung; +Cc: xfs
Hieu Le Trung wrote:
> Eric Sandeen wrote:
>> Hieu Le Trung wrote:
>>> Hi,
>>>
>>> What may cause metadata becomes bad? I got xfs_force_shutdown with
> 0x2
>>> parameter.
>> Software bugs or hardware problems. If you provide the actual kernel
>> message we can offer more info on what xfs saw and why it shut down.
>
> I'm not sure which one is it but the issue is hard to reproduce.
> I have following in the dmesg but I'm not sure it's the right one
> <1>I/O error in filesystem ("sda2") meta-data dev sda2 block 0xf054f4
> ("xlog_iodone") error 5 buf count 32768
Were there IO errors from the storage before this? i.e. did some lower
layer go bad.
> <5>xfs_force_shutdown(sda2,0x2) called from line 956 of file
> fs/xfs/xfs_log.c. Return address = 0x801288d8
>
> Furthermore, the driver's write cache is
> <5>SCSI device sda: drive cache: write back
That's fine...
> The xfs_logprint shows 'Bad log record header'
> xfs_logprint: /dev/sda2 contains a mounted and writable filesystem
> data device: 0x802
> log device: 0x802 daddr: 15735648 length: 20480
>
> Header 0xa4 wanted 0xfeedbabe
> **********************************************************************
> * ERROR: header cycle=164 block=14634 *
> **********************************************************************
> Bad log record header
>
> So I wonder what may cause bad record header?
Probably the IO errors when attempting to write to the log ...
>>> How can I analyze the metadata dump file?
>> the metadump file is just the metadata skeleton of the filesystem; you
>> can mount it, repair it, point xfs_db at it to debug it, etc.
>
> Is there any tutorials or guideline in using xfs_db to debug the issue?
xfs_db has a manpage, but I'm not sure the answer will be found by using
it. It will only look at what data made it to the disk, and you had an
IO error.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: xfs_force_shutdown
2009-10-13 14:51 ` xfs_force_shutdown Eric Sandeen
@ 2009-10-13 15:15 ` Hieu Le Trung
2009-10-13 15:31 ` xfs_force_shutdown Eric Sandeen
0 siblings, 1 reply; 8+ messages in thread
From: Hieu Le Trung @ 2009-10-13 15:15 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
Eric Sandeen wrote:
> Hieu Le Trung wrote:
> > Eric Sandeen wrote:
> >> Hieu Le Trung wrote:
> >>> Hi,
> >>>
> >>> What may cause metadata becomes bad? I got xfs_force_shutdown with
> > 0x2
> >>> parameter.
> >> Software bugs or hardware problems. If you provide the actual
kernel
> >> message we can offer more info on what xfs saw and why it shut
down.
> >
> > I'm not sure which one is it but the issue is hard to reproduce.
> > I have following in the dmesg but I'm not sure it's the right one
> > <1>I/O error in filesystem ("sda2") meta-data dev sda2 block
> 0xf054f4
> > ("xlog_iodone") error 5 buf count 32768
>
> Were there IO errors from the storage before this? i.e. did some
lower
> layer go bad.
Before that is bunch of speed down request, maybe the real error has
been truncated
<3>ata1.00: speed down requested but no transfer mode left
<3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x10c00000 action 0x2
<3>ata1.00: tag 0 cmd 0x30 Emask 0x10 stat 0x51 err 0x84 (ATA bus
error)
> > <5>xfs_force_shutdown(sda2,0x2) called from line 956 of file
> > fs/xfs/xfs_log.c. Return address = 0x801288d8
> >
> > Furthermore, the driver's write cache is
> > <5>SCSI device sda: drive cache: write back
>
> That's fine...
But in the XFS FAQ, they require to turn off the driver write cache
http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_c
ache_on_journaled_filesystems.3F
> > The xfs_logprint shows 'Bad log record header'
> > xfs_logprint: /dev/sda2 contains a mounted and writable filesystem
> > data device: 0x802
> > log device: 0x802 daddr: 15735648 length: 20480
> >
> > Header 0xa4 wanted 0xfeedbabe
> >
**********************************************************************
> > * ERROR: header cycle=164 block=14634
*
> >
**********************************************************************
> > Bad log record header
> >
> > So I wonder what may cause bad record header?
>
> Probably the IO errors when attempting to write to the log ...
What can I do with the log? Can I debug the issue using the log?
> >>> How can I analyze the metadata dump file?
> >> the metadump file is just the metadata skeleton of the filesystem;
> you
> >> can mount it, repair it, point xfs_db at it to debug it, etc.
> >
> > Is there any tutorials or guideline in using xfs_db to debug the
> issue?
>
> xfs_db has a manpage, but I'm not sure the answer will be found by
using
> it. It will only look at what data made it to the disk, and you had
an
> IO error.
Maybe I can use the log to find out what operation is failed and make
the log becomes bad then using xfs_db to analyze on the inode or block
to find out the filename. After that I may know what's going with my
code.
Is it possible? How to do that? How to find out the inode or block from
the log, and how to map the inode into filename using xfs_db?
-Hieu
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_force_shutdown
2009-10-13 15:15 ` xfs_force_shutdown Hieu Le Trung
@ 2009-10-13 15:31 ` Eric Sandeen
2009-10-13 15:39 ` xfs_force_shutdown Hieu Le Trung
0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-10-13 15:31 UTC (permalink / raw)
To: Hieu Le Trung; +Cc: xfs
Hieu Le Trung wrote:
> Eric Sandeen wrote:
>> Hieu Le Trung wrote:
>>> Eric Sandeen wrote:
>>>> Hieu Le Trung wrote:
>>>>> Hi,
>>>>>
>>>>> What may cause metadata becomes bad? I got xfs_force_shutdown
>>>>> with
>>> 0x2
>>>>> parameter.
>>>> Software bugs or hardware problems. If you provide the actual
> kernel
>>>> message we can offer more info on what xfs saw and why it shut
> down.
>>> I'm not sure which one is it but the issue is hard to reproduce.
>>> I have following in the dmesg but I'm not sure it's the right one
>>> <1>I/O error in filesystem ("sda2") meta-data dev sda2 block
>> 0xf054f4
>>> ("xlog_iodone") error 5 buf count 32768
>> Were there IO errors from the storage before this? i.e. did some
> lower
>> layer go bad.
>
> Before that is bunch of speed down request, maybe the real error has
> been truncated <3>ata1.00: speed down requested but no transfer mode
> left <3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x10c00000 action
> 0x2 <3>ata1.00: tag 0 cmd 0x30 Emask 0x10 stat 0x51 err 0x84 (ATA bus
> error)
Ok, so you have a storage error, and XFS is just reacting to that condition.
>
>>> <5>xfs_force_shutdown(sda2,0x2) called from line 956 of file
>>> fs/xfs/xfs_log.c. Return address = 0x801288d8
>>>
>>> Furthermore, the driver's write cache is <5>SCSI device sda:
>>> drive cache: write back
>> That's fine...
>
> But in the XFS FAQ, they require to turn off the driver write cache
> http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_c
> ache_on_journaled_filesystems.3F
Either turning off write caches, or using barrier support is fine:
> With a single hard disk and barriers turned on (on=default), the
> drive write cache is flushed before an after a barrier is issued. A
> powerfail "only" loses data in the cache but no essential ordering is
> violated, and corruption will not occur.
...
>>> The xfs_logprint shows 'Bad log record header' xfs_logprint:
>>> /dev/sda2 contains a mounted and writable filesystem data device:
>>> 0x802 log device: 0x802 daddr: 15735648 length: 20480
>>>
>>> Header 0xa4 wanted 0xfeedbabe
>>>
> **********************************************************************
>
>>> * ERROR: header cycle=164 block=14634
> *
> **********************************************************************
>
>>> Bad log record header
>>>
>>> So I wonder what may cause bad record header?
>> Probably the IO errors when attempting to write to the log ...
>
> What can I do with the log? Can I debug the issue using the log?
No; your hardware failed to write a requested log item, resulting in an
inconsistent log. This is not an xfs bug - you need to focus on fixing
the underlying hardware problem. XFS cannot guarantee a consistent
filesystem if the underlying storage hardware does not complete
requested IOs....
>>>>> How can I analyze the metadata dump file?
>>>> the metadump file is just the metadata skeleton of the
>>>> filesystem;
>> you
>>>> can mount it, repair it, point xfs_db at it to debug it, etc.
>>> Is there any tutorials or guideline in using xfs_db to debug the
>> issue?
>>
>> xfs_db has a manpage, but I'm not sure the answer will be found by
> using
>> it. It will only look at what data made it to the disk, and you
>> had
> an
>> IO error.
>
> Maybe I can use the log to find out what operation is failed and make
> the log becomes bad then using xfs_db to analyze on the inode or
> block to find out the filename. After that I may know what's going
> with my code. Is it possible? How to do that? How to find out the
> inode or block from the log, and how to map the inode into filename
> using xfs_db?
What is your goal here?
All I see is "drive died, xfs stopped, filesystem was left in
inconsistent state due to hardware error" - I don't think there's
anything more to debug about what -happened-
If your goal is trying to get the filesystem back online (i.e. if it is
currently failing to mount), I'd probably suggest clearing out the log
and repairing the resulting fs with xfs_repair -L, and see what's left.
-Eric
> -Hieu
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: xfs_force_shutdown
2009-10-13 15:31 ` xfs_force_shutdown Eric Sandeen
@ 2009-10-13 15:39 ` Hieu Le Trung
2009-10-13 15:48 ` xfs_force_shutdown Eric Sandeen
0 siblings, 1 reply; 8+ messages in thread
From: Hieu Le Trung @ 2009-10-13 15:39 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
Eric Sandeen wrote:
> Hieu Le Trung wrote:
> > Eric Sandeen wrote:
> >> Hieu Le Trung wrote:
> >>> Eric Sandeen wrote:
> >>>> Hieu Le Trung wrote:
> >>>>> Hi,
> >>>>>
> >>>>> What may cause metadata becomes bad? I got xfs_force_shutdown
> >>>>> with
> >>> 0x2
> >>>>> parameter.
> >>>> Software bugs or hardware problems. If you provide the actual
> > kernel
> >>>> message we can offer more info on what xfs saw and why it shut
> > down.
> >>> I'm not sure which one is it but the issue is hard to reproduce.
> >>> I have following in the dmesg but I'm not sure it's the right one
> >>> <1>I/O error in filesystem ("sda2") meta-data dev sda2 block
> >> 0xf054f4
> >>> ("xlog_iodone") error 5 buf count 32768
> >> Were there IO errors from the storage before this? i.e. did some
> > lower
> >> layer go bad.
> >
> > Before that is bunch of speed down request, maybe the real error has
> > been truncated <3>ata1.00: speed down requested but no transfer mode
> > left <3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x10c00000 action
> > 0x2 <3>ata1.00: tag 0 cmd 0x30 Emask 0x10 stat 0x51 err 0x84 (ATA
bus
> > error)
>
> Ok, so you have a storage error, and XFS is just reacting to that
> condition.
>
> >
> >>> <5>xfs_force_shutdown(sda2,0x2) called from line 956 of file
> >>> fs/xfs/xfs_log.c. Return address = 0x801288d8
> >>>
> >>> Furthermore, the driver's write cache is <5>SCSI device sda:
> >>> drive cache: write back
> >> That's fine...
> >
> > But in the XFS FAQ, they require to turn off the driver write cache
> >
>
http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_c
> > ache_on_journaled_filesystems.3F
>
> Either turning off write caches, or using barrier support is fine:
>
> > With a single hard disk and barriers turned on (on=default), the
> > drive write cache is flushed before an after a barrier is issued. A
> > powerfail "only" loses data in the cache but no essential ordering
is
> > violated, and corruption will not occur.
How to check if barrier is supported in my environment?
> >>> The xfs_logprint shows 'Bad log record header' xfs_logprint:
> >>> /dev/sda2 contains a mounted and writable filesystem data device:
> >>> 0x802 log device: 0x802 daddr: 15735648 length: 20480
> >>>
> >>> Header 0xa4 wanted 0xfeedbabe
> >>>
> >
**********************************************************************
> >
> >>> * ERROR: header cycle=164 block=14634
> > *
> >
**********************************************************************
> >
> >>> Bad log record header
> >>>
> >>> So I wonder what may cause bad record header?
> >> Probably the IO errors when attempting to write to the log ...
> >
> > What can I do with the log? Can I debug the issue using the log?
>
> No; your hardware failed to write a requested log item, resulting in
an
> inconsistent log. This is not an xfs bug - you need to focus on
fixing
> the underlying hardware problem. XFS cannot guarantee a consistent
> filesystem if the underlying storage hardware does not complete
> requested IOs....
>
> >>>>> How can I analyze the metadata dump file?
> >>>> the metadump file is just the metadata skeleton of the
> >>>> filesystem;
> >> you
> >>>> can mount it, repair it, point xfs_db at it to debug it, etc.
> >>> Is there any tutorials or guideline in using xfs_db to debug the
> >> issue?
> >>
> >> xfs_db has a manpage, but I'm not sure the answer will be found by
> > using
> >> it. It will only look at what data made it to the disk, and you
> >> had
> > an
> >> IO error.
> >
> > Maybe I can use the log to find out what operation is failed and
make
> > the log becomes bad then using xfs_db to analyze on the inode or
> > block to find out the filename. After that I may know what's going
> > with my code. Is it possible? How to do that? How to find out the
> > inode or block from the log, and how to map the inode into filename
> > using xfs_db?
>
> What is your goal here?
>
> All I see is "drive died, xfs stopped, filesystem was left in
> inconsistent state due to hardware error" - I don't think there's
> anything more to debug about what -happened-
Actually I'm implementing a filesystem which is extended from XFS. So
maybe the error is inside my FS, or inside XFS, as well as inside the
code to read/write into my FS.
I want to find out the root cause so that I can fix it.
If it is hardware related issue, it's fine to ignore. But currently
there's no point to say that it is a hardware issue.
My FS run well, and the issue maybe can occur when running the system
for a long time and hard to reproduce.
> If your goal is trying to get the filesystem back online (i.e. if it
is
> currently failing to mount), I'd probably suggest clearing out the log
> and repairing the resulting fs with xfs_repair -L, and see what's
left.
Yes, the xfs_repair -L do well, but I need to find out what makes the
disk become such state ;(
Regards,
-Hieu
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_force_shutdown
2009-10-13 15:39 ` xfs_force_shutdown Hieu Le Trung
@ 2009-10-13 15:48 ` Eric Sandeen
0 siblings, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2009-10-13 15:48 UTC (permalink / raw)
To: Hieu Le Trung; +Cc: xfs
Hieu Le Trung wrote:
> How to check if barrier is supported in my environment?
There will be a mount-time message if barrier IO fails.
> If it is hardware related issue, it's fine to ignore. But currently
> there's no point to say that it is a hardware issue.
Yes, I think there is:
<3>ata1.00: speed down requested but no transfer mode left
<3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x10c00000 action 0x2
<3>ata1.00: tag 0 cmd 0x30 Emask 0x10 stat 0x51 err 0x84 (ATA bus
error)
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-10-13 15:46 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-12 10:29 xfs_force_shutdown Hieu Le Trung
2009-10-12 13:23 ` xfs_force_shutdown Eric Sandeen
2009-10-13 8:43 ` xfs_force_shutdown Hieu Le Trung
2009-10-13 14:51 ` xfs_force_shutdown Eric Sandeen
2009-10-13 15:15 ` xfs_force_shutdown Hieu Le Trung
2009-10-13 15:31 ` xfs_force_shutdown Eric Sandeen
2009-10-13 15:39 ` xfs_force_shutdown Hieu Le Trung
2009-10-13 15:48 ` xfs_force_shutdown Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox