* xfs_force_shutdown @ 2009-10-12 10:29 Hieu Le Trung 2009-10-12 13:23 ` xfs_force_shutdown Eric Sandeen 0 siblings, 1 reply; 8+ messages in thread From: Hieu Le Trung @ 2009-10-12 10:29 UTC (permalink / raw) To: xfs Hi, What may cause metadata becomes bad? I got xfs_force_shutdown with 0x2 parameter. How can I analyze the metadata dump file? Thanks, -Hieu _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_force_shutdown 2009-10-12 10:29 xfs_force_shutdown Hieu Le Trung @ 2009-10-12 13:23 ` Eric Sandeen 2009-10-13 8:43 ` xfs_force_shutdown Hieu Le Trung 0 siblings, 1 reply; 8+ messages in thread From: Eric Sandeen @ 2009-10-12 13:23 UTC (permalink / raw) To: Hieu Le Trung; +Cc: xfs Hieu Le Trung wrote: > Hi, > > What may cause metadata becomes bad? I got xfs_force_shutdown with 0x2 > parameter. Software bugs or hardware problems. If you provide the actual kernel message we can offer more info on what xfs saw and why it shut down. > How can I analyze the metadata dump file? the metadump file is just the metadata skeleton of the filesystem; you can mount it, repair it, point xfs_db at it to debug it, etc. -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: xfs_force_shutdown 2009-10-12 13:23 ` xfs_force_shutdown Eric Sandeen @ 2009-10-13 8:43 ` Hieu Le Trung 2009-10-13 14:51 ` xfs_force_shutdown Eric Sandeen 0 siblings, 1 reply; 8+ messages in thread From: Hieu Le Trung @ 2009-10-13 8:43 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs Eric Sandeen wrote: > Hieu Le Trung wrote: > > Hi, > > > > What may cause metadata becomes bad? I got xfs_force_shutdown with 0x2 > > parameter. > > Software bugs or hardware problems. If you provide the actual kernel > message we can offer more info on what xfs saw and why it shut down. I'm not sure which one is it but the issue is hard to reproduce. I have following in the dmesg but I'm not sure it's the right one <1>I/O error in filesystem ("sda2") meta-data dev sda2 block 0xf054f4 ("xlog_iodone") error 5 buf count 32768 <5>xfs_force_shutdown(sda2,0x2) called from line 956 of file fs/xfs/xfs_log.c. Return address = 0x801288d8 Furthermore, the driver's write cache is <5>SCSI device sda: drive cache: write back The xfs_logprint shows 'Bad log record header' xfs_logprint: /dev/sda2 contains a mounted and writable filesystem data device: 0x802 log device: 0x802 daddr: 15735648 length: 20480 Header 0xa4 wanted 0xfeedbabe ********************************************************************** * ERROR: header cycle=164 block=14634 * ********************************************************************** Bad log record header So I wonder what may cause bad record header? > > > How can I analyze the metadata dump file? > > the metadump file is just the metadata skeleton of the filesystem; you > can mount it, repair it, point xfs_db at it to debug it, etc. Is there any tutorials or guideline in using xfs_db to debug the issue? > > -Eric > Thanks, -Hieu _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_force_shutdown 2009-10-13 8:43 ` xfs_force_shutdown Hieu Le Trung @ 2009-10-13 14:51 ` Eric Sandeen 2009-10-13 15:15 ` xfs_force_shutdown Hieu Le Trung 0 siblings, 1 reply; 8+ messages in thread From: Eric Sandeen @ 2009-10-13 14:51 UTC (permalink / raw) To: Hieu Le Trung; +Cc: xfs Hieu Le Trung wrote: > Eric Sandeen wrote: >> Hieu Le Trung wrote: >>> Hi, >>> >>> What may cause metadata becomes bad? I got xfs_force_shutdown with > 0x2 >>> parameter. >> Software bugs or hardware problems. If you provide the actual kernel >> message we can offer more info on what xfs saw and why it shut down. > > I'm not sure which one is it but the issue is hard to reproduce. > I have following in the dmesg but I'm not sure it's the right one > <1>I/O error in filesystem ("sda2") meta-data dev sda2 block 0xf054f4 > ("xlog_iodone") error 5 buf count 32768 Were there IO errors from the storage before this? i.e. did some lower layer go bad. > <5>xfs_force_shutdown(sda2,0x2) called from line 956 of file > fs/xfs/xfs_log.c. Return address = 0x801288d8 > > Furthermore, the driver's write cache is > <5>SCSI device sda: drive cache: write back That's fine... > The xfs_logprint shows 'Bad log record header' > xfs_logprint: /dev/sda2 contains a mounted and writable filesystem > data device: 0x802 > log device: 0x802 daddr: 15735648 length: 20480 > > Header 0xa4 wanted 0xfeedbabe > ********************************************************************** > * ERROR: header cycle=164 block=14634 * > ********************************************************************** > Bad log record header > > So I wonder what may cause bad record header? Probably the IO errors when attempting to write to the log ... >>> How can I analyze the metadata dump file? >> the metadump file is just the metadata skeleton of the filesystem; you >> can mount it, repair it, point xfs_db at it to debug it, etc. > > Is there any tutorials or guideline in using xfs_db to debug the issue? xfs_db has a manpage, but I'm not sure the answer will be found by using it. It will only look at what data made it to the disk, and you had an IO error. -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: xfs_force_shutdown 2009-10-13 14:51 ` xfs_force_shutdown Eric Sandeen @ 2009-10-13 15:15 ` Hieu Le Trung 2009-10-13 15:31 ` xfs_force_shutdown Eric Sandeen 0 siblings, 1 reply; 8+ messages in thread From: Hieu Le Trung @ 2009-10-13 15:15 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs Eric Sandeen wrote: > Hieu Le Trung wrote: > > Eric Sandeen wrote: > >> Hieu Le Trung wrote: > >>> Hi, > >>> > >>> What may cause metadata becomes bad? I got xfs_force_shutdown with > > 0x2 > >>> parameter. > >> Software bugs or hardware problems. If you provide the actual kernel > >> message we can offer more info on what xfs saw and why it shut down. > > > > I'm not sure which one is it but the issue is hard to reproduce. > > I have following in the dmesg but I'm not sure it's the right one > > <1>I/O error in filesystem ("sda2") meta-data dev sda2 block > 0xf054f4 > > ("xlog_iodone") error 5 buf count 32768 > > Were there IO errors from the storage before this? i.e. did some lower > layer go bad. Before that is bunch of speed down request, maybe the real error has been truncated <3>ata1.00: speed down requested but no transfer mode left <3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x10c00000 action 0x2 <3>ata1.00: tag 0 cmd 0x30 Emask 0x10 stat 0x51 err 0x84 (ATA bus error) > > <5>xfs_force_shutdown(sda2,0x2) called from line 956 of file > > fs/xfs/xfs_log.c. Return address = 0x801288d8 > > > > Furthermore, the driver's write cache is > > <5>SCSI device sda: drive cache: write back > > That's fine... But in the XFS FAQ, they require to turn off the driver write cache http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_c ache_on_journaled_filesystems.3F > > The xfs_logprint shows 'Bad log record header' > > xfs_logprint: /dev/sda2 contains a mounted and writable filesystem > > data device: 0x802 > > log device: 0x802 daddr: 15735648 length: 20480 > > > > Header 0xa4 wanted 0xfeedbabe > > ********************************************************************** > > * ERROR: header cycle=164 block=14634 * > > ********************************************************************** > > Bad log record header > > > > So I wonder what may cause bad record header? > > Probably the IO errors when attempting to write to the log ... What can I do with the log? Can I debug the issue using the log? > >>> How can I analyze the metadata dump file? > >> the metadump file is just the metadata skeleton of the filesystem; > you > >> can mount it, repair it, point xfs_db at it to debug it, etc. > > > > Is there any tutorials or guideline in using xfs_db to debug the > issue? > > xfs_db has a manpage, but I'm not sure the answer will be found by using > it. It will only look at what data made it to the disk, and you had an > IO error. Maybe I can use the log to find out what operation is failed and make the log becomes bad then using xfs_db to analyze on the inode or block to find out the filename. After that I may know what's going with my code. Is it possible? How to do that? How to find out the inode or block from the log, and how to map the inode into filename using xfs_db? -Hieu _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_force_shutdown 2009-10-13 15:15 ` xfs_force_shutdown Hieu Le Trung @ 2009-10-13 15:31 ` Eric Sandeen 2009-10-13 15:39 ` xfs_force_shutdown Hieu Le Trung 0 siblings, 1 reply; 8+ messages in thread From: Eric Sandeen @ 2009-10-13 15:31 UTC (permalink / raw) To: Hieu Le Trung; +Cc: xfs Hieu Le Trung wrote: > Eric Sandeen wrote: >> Hieu Le Trung wrote: >>> Eric Sandeen wrote: >>>> Hieu Le Trung wrote: >>>>> Hi, >>>>> >>>>> What may cause metadata becomes bad? I got xfs_force_shutdown >>>>> with >>> 0x2 >>>>> parameter. >>>> Software bugs or hardware problems. If you provide the actual > kernel >>>> message we can offer more info on what xfs saw and why it shut > down. >>> I'm not sure which one is it but the issue is hard to reproduce. >>> I have following in the dmesg but I'm not sure it's the right one >>> <1>I/O error in filesystem ("sda2") meta-data dev sda2 block >> 0xf054f4 >>> ("xlog_iodone") error 5 buf count 32768 >> Were there IO errors from the storage before this? i.e. did some > lower >> layer go bad. > > Before that is bunch of speed down request, maybe the real error has > been truncated <3>ata1.00: speed down requested but no transfer mode > left <3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x10c00000 action > 0x2 <3>ata1.00: tag 0 cmd 0x30 Emask 0x10 stat 0x51 err 0x84 (ATA bus > error) Ok, so you have a storage error, and XFS is just reacting to that condition. > >>> <5>xfs_force_shutdown(sda2,0x2) called from line 956 of file >>> fs/xfs/xfs_log.c. Return address = 0x801288d8 >>> >>> Furthermore, the driver's write cache is <5>SCSI device sda: >>> drive cache: write back >> That's fine... > > But in the XFS FAQ, they require to turn off the driver write cache > http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_c > ache_on_journaled_filesystems.3F Either turning off write caches, or using barrier support is fine: > With a single hard disk and barriers turned on (on=default), the > drive write cache is flushed before an after a barrier is issued. A > powerfail "only" loses data in the cache but no essential ordering is > violated, and corruption will not occur. ... >>> The xfs_logprint shows 'Bad log record header' xfs_logprint: >>> /dev/sda2 contains a mounted and writable filesystem data device: >>> 0x802 log device: 0x802 daddr: 15735648 length: 20480 >>> >>> Header 0xa4 wanted 0xfeedbabe >>> > ********************************************************************** > >>> * ERROR: header cycle=164 block=14634 > * > ********************************************************************** > >>> Bad log record header >>> >>> So I wonder what may cause bad record header? >> Probably the IO errors when attempting to write to the log ... > > What can I do with the log? Can I debug the issue using the log? No; your hardware failed to write a requested log item, resulting in an inconsistent log. This is not an xfs bug - you need to focus on fixing the underlying hardware problem. XFS cannot guarantee a consistent filesystem if the underlying storage hardware does not complete requested IOs.... >>>>> How can I analyze the metadata dump file? >>>> the metadump file is just the metadata skeleton of the >>>> filesystem; >> you >>>> can mount it, repair it, point xfs_db at it to debug it, etc. >>> Is there any tutorials or guideline in using xfs_db to debug the >> issue? >> >> xfs_db has a manpage, but I'm not sure the answer will be found by > using >> it. It will only look at what data made it to the disk, and you >> had > an >> IO error. > > Maybe I can use the log to find out what operation is failed and make > the log becomes bad then using xfs_db to analyze on the inode or > block to find out the filename. After that I may know what's going > with my code. Is it possible? How to do that? How to find out the > inode or block from the log, and how to map the inode into filename > using xfs_db? What is your goal here? All I see is "drive died, xfs stopped, filesystem was left in inconsistent state due to hardware error" - I don't think there's anything more to debug about what -happened- If your goal is trying to get the filesystem back online (i.e. if it is currently failing to mount), I'd probably suggest clearing out the log and repairing the resulting fs with xfs_repair -L, and see what's left. -Eric > -Hieu > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: xfs_force_shutdown 2009-10-13 15:31 ` xfs_force_shutdown Eric Sandeen @ 2009-10-13 15:39 ` Hieu Le Trung 2009-10-13 15:48 ` xfs_force_shutdown Eric Sandeen 0 siblings, 1 reply; 8+ messages in thread From: Hieu Le Trung @ 2009-10-13 15:39 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs Eric Sandeen wrote: > Hieu Le Trung wrote: > > Eric Sandeen wrote: > >> Hieu Le Trung wrote: > >>> Eric Sandeen wrote: > >>>> Hieu Le Trung wrote: > >>>>> Hi, > >>>>> > >>>>> What may cause metadata becomes bad? I got xfs_force_shutdown > >>>>> with > >>> 0x2 > >>>>> parameter. > >>>> Software bugs or hardware problems. If you provide the actual > > kernel > >>>> message we can offer more info on what xfs saw and why it shut > > down. > >>> I'm not sure which one is it but the issue is hard to reproduce. > >>> I have following in the dmesg but I'm not sure it's the right one > >>> <1>I/O error in filesystem ("sda2") meta-data dev sda2 block > >> 0xf054f4 > >>> ("xlog_iodone") error 5 buf count 32768 > >> Were there IO errors from the storage before this? i.e. did some > > lower > >> layer go bad. > > > > Before that is bunch of speed down request, maybe the real error has > > been truncated <3>ata1.00: speed down requested but no transfer mode > > left <3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x10c00000 action > > 0x2 <3>ata1.00: tag 0 cmd 0x30 Emask 0x10 stat 0x51 err 0x84 (ATA bus > > error) > > Ok, so you have a storage error, and XFS is just reacting to that > condition. > > > > >>> <5>xfs_force_shutdown(sda2,0x2) called from line 956 of file > >>> fs/xfs/xfs_log.c. Return address = 0x801288d8 > >>> > >>> Furthermore, the driver's write cache is <5>SCSI device sda: > >>> drive cache: write back > >> That's fine... > > > > But in the XFS FAQ, they require to turn off the driver write cache > > > http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_c > > ache_on_journaled_filesystems.3F > > Either turning off write caches, or using barrier support is fine: > > > With a single hard disk and barriers turned on (on=default), the > > drive write cache is flushed before an after a barrier is issued. A > > powerfail "only" loses data in the cache but no essential ordering is > > violated, and corruption will not occur. How to check if barrier is supported in my environment? > >>> The xfs_logprint shows 'Bad log record header' xfs_logprint: > >>> /dev/sda2 contains a mounted and writable filesystem data device: > >>> 0x802 log device: 0x802 daddr: 15735648 length: 20480 > >>> > >>> Header 0xa4 wanted 0xfeedbabe > >>> > > ********************************************************************** > > > >>> * ERROR: header cycle=164 block=14634 > > * > > ********************************************************************** > > > >>> Bad log record header > >>> > >>> So I wonder what may cause bad record header? > >> Probably the IO errors when attempting to write to the log ... > > > > What can I do with the log? Can I debug the issue using the log? > > No; your hardware failed to write a requested log item, resulting in an > inconsistent log. This is not an xfs bug - you need to focus on fixing > the underlying hardware problem. XFS cannot guarantee a consistent > filesystem if the underlying storage hardware does not complete > requested IOs.... > > >>>>> How can I analyze the metadata dump file? > >>>> the metadump file is just the metadata skeleton of the > >>>> filesystem; > >> you > >>>> can mount it, repair it, point xfs_db at it to debug it, etc. > >>> Is there any tutorials or guideline in using xfs_db to debug the > >> issue? > >> > >> xfs_db has a manpage, but I'm not sure the answer will be found by > > using > >> it. It will only look at what data made it to the disk, and you > >> had > > an > >> IO error. > > > > Maybe I can use the log to find out what operation is failed and make > > the log becomes bad then using xfs_db to analyze on the inode or > > block to find out the filename. After that I may know what's going > > with my code. Is it possible? How to do that? How to find out the > > inode or block from the log, and how to map the inode into filename > > using xfs_db? > > What is your goal here? > > All I see is "drive died, xfs stopped, filesystem was left in > inconsistent state due to hardware error" - I don't think there's > anything more to debug about what -happened- Actually I'm implementing a filesystem which is extended from XFS. So maybe the error is inside my FS, or inside XFS, as well as inside the code to read/write into my FS. I want to find out the root cause so that I can fix it. If it is hardware related issue, it's fine to ignore. But currently there's no point to say that it is a hardware issue. My FS run well, and the issue maybe can occur when running the system for a long time and hard to reproduce. > If your goal is trying to get the filesystem back online (i.e. if it is > currently failing to mount), I'd probably suggest clearing out the log > and repairing the resulting fs with xfs_repair -L, and see what's left. Yes, the xfs_repair -L do well, but I need to find out what makes the disk become such state ;( Regards, -Hieu _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_force_shutdown 2009-10-13 15:39 ` xfs_force_shutdown Hieu Le Trung @ 2009-10-13 15:48 ` Eric Sandeen 0 siblings, 0 replies; 8+ messages in thread From: Eric Sandeen @ 2009-10-13 15:48 UTC (permalink / raw) To: Hieu Le Trung; +Cc: xfs Hieu Le Trung wrote: > How to check if barrier is supported in my environment? There will be a mount-time message if barrier IO fails. > If it is hardware related issue, it's fine to ignore. But currently > there's no point to say that it is a hardware issue. Yes, I think there is: <3>ata1.00: speed down requested but no transfer mode left <3>ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x10c00000 action 0x2 <3>ata1.00: tag 0 cmd 0x30 Emask 0x10 stat 0x51 err 0x84 (ATA bus error) -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-10-13 15:46 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-12 10:29 xfs_force_shutdown Hieu Le Trung 2009-10-12 13:23 ` xfs_force_shutdown Eric Sandeen 2009-10-13 8:43 ` xfs_force_shutdown Hieu Le Trung 2009-10-13 14:51 ` xfs_force_shutdown Eric Sandeen 2009-10-13 15:15 ` xfs_force_shutdown Hieu Le Trung 2009-10-13 15:31 ` xfs_force_shutdown Eric Sandeen 2009-10-13 15:39 ` xfs_force_shutdown Hieu Le Trung 2009-10-13 15:48 ` xfs_force_shutdown Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox