* Advice needed with file system corruption @ 2016-07-14 12:27 Steve Brooks 2016-07-14 13:05 ` Carlos Maiolino 2016-08-08 14:11 ` Emmanuel Florac 0 siblings, 2 replies; 13+ messages in thread From: Steve Brooks @ 2016-07-14 12:27 UTC (permalink / raw) To: xfs Hi All, We have a RAID system with file system issues as follows, 50 TB in RAID 6 hosted on an Adaptec 71605 controller using WD4000FYYZ drives. Centos 6.7 2.6.32-642.el6.x86_64 : xfsprogs-3.1.1-16.el6 While rebuilding a replaced disk, with the file system online and in use, the system logs showed multiple entries of; XFS (sde): Corruption detected. Unmount and run xfs_repair. [See also at the end of post for a section of XFS related errors in the log] I unmounted the filesystem and waited for the controller to finish rebuilding the array. I then moved the most important data to another RAID array on a different server. The data is generated from HPC simulations and is not backed up but can be regenerated in needed. The default el6 "xfs_repair" is in "xfsprogs-3.1.1-16.el6". I notice that the "elrepo_testing" repository has a much later version of "xfsprogs" namely xfsprogs.x86_64 4.3.0-1.el6.elrepo As far as I understand the user based tools are backwards compatible so would it be better to use the "4.3" release of "xfsprogs"instead of the default "3.1.1" included in the installation of el6? I ran an "xfs_repair -nv /dev/sde" for both "3.1.1" and "4.3" and both completed successfully showing the repairs that would have taken place. I can post these if requested. The "3.1.1" version of "xfs_repair -n" ran in 1 minute, 32 seconds The "4.3" version of "xfs_repair -n" ran in 50 seconds So my questions are [1] Which version of "xfs_repair" should I use to make the repair? [2] Is there anything I should have done differently? Many thanks for any advice given it is much appreciated. Thanks, Steve Many blocks (about 20) of code similar to this were repeated in the logs. Jul 8 18:40:17 sraid1v kernel: ffff880dca95b000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Jul 8 18:40:17 sraid1v kernel: XFS (sde): Internal error xfs_da_do_buf(2) at line 2136 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffffa0e6e81a Jul 8 18:40:17 sraid1v kernel: Jul 8 18:40:17 sraid1v kernel: Pid: 8844, comm: idl Tainted: P -- ------------ 2.6.32-642.el6.x86_64 #1 Jul 8 18:40:17 sraid1v kernel: Call Trace: Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e7b68f>] ? xfs_error_report+0x3f/0x50 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? xfs_da_read_buf+0x2a/0x30 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e7b6fe>] ? xfs_corruption_error+0x5e/0x90 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e6fc>] ? xfs_da_do_buf+0x6cc/0x770 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? xfs_da_read_buf+0x2a/0x30 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffff810154e3>] ? native_sched_clock+0x13/0x80 Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? xfs_da_read_buf+0x2a/0x30 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e74a21>] ? xfs_dir2_leaf_lookup_int+0x61/0x2c0 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e74a21>] ? xfs_dir2_leaf_lookup_int+0x61/0x2c0 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e74e05>] ? xfs_dir2_leaf_lookup+0x35/0xf0 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e71306>] ? xfs_dir2_isleaf+0x26/0x60 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e71ce4>] ? xfs_dir_lookup+0x174/0x190 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e9ea47>] ? xfs_lookup+0x87/0x110 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0eabd74>] ? xfs_vn_lookup+0x54/0xa0 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffff811a9ca5>] ? do_lookup+0x1a5/0x230 Jul 8 18:40:17 sraid1v kernel: [<ffffffff811aa823>] ? __link_path_walk+0x763/0x1060 Jul 8 18:40:17 sraid1v kernel: [<ffffffff811ab3da>] ? path_walk+0x6a/0xe0 Jul 8 18:40:17 sraid1v kernel: [<ffffffff811ab5eb>] ? filename_lookup+0x6b/0xc0 Jul 8 18:40:17 sraid1v kernel: [<ffffffff8123ac46>] ? security_file_alloc+0x16/0x20 Jul 8 18:40:17 sraid1v kernel: [<ffffffff811acac4>] ? do_filp_open+0x104/0xd20 Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e9a4fc>] ? _xfs_trans_commit+0x25c/0x310 [xfs] Jul 8 18:40:17 sraid1v kernel: [<ffffffff812a749a>] ? strncpy_from_user+0x4a/0x90 Jul 8 18:40:17 sraid1v kernel: [<ffffffff811ba252>] ? alloc_fd+0x92/0x160 Jul 8 18:40:17 sraid1v kernel: [<ffffffff81196bd7>] ? do_sys_open+0x67/0x130 Jul 8 18:40:17 sraid1v kernel: [<ffffffff81196ce0>] ? sys_open+0x20/0x30 Jul 8 18:40:17 sraid1v kernel: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b Jul 8 18:40:17 sraid1v kernel: XFS (sde): Corruption detected. Unmount and run xfs_repair Jul 8 18:40:17 sraid1v kernel: ffff880dca95b000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Jul 8 18:40:17 sraid1v kernel: XFS (sde): Internal error xfs_da_do_buf(2) at line 2136 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffffa0e6e81a Jul 8 18:40:17 sraid1v kernel: Jul 8 18:40:17 sraid1v kernel: Pid: 8844, comm: idl Tainted: P -- ------------ 2.6.32-642.el6.x86_64 #1 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice needed with file system corruption 2016-07-14 12:27 Advice needed with file system corruption Steve Brooks @ 2016-07-14 13:05 ` Carlos Maiolino 2016-07-14 13:57 ` Steve Brooks 2016-08-08 14:11 ` Emmanuel Florac 1 sibling, 1 reply; 13+ messages in thread From: Carlos Maiolino @ 2016-07-14 13:05 UTC (permalink / raw) To: Steve Brooks; +Cc: xfs Hi steve. On Thu, Jul 14, 2016 at 01:27:22PM +0100, Steve Brooks wrote: > > The "3.1.1" version of "xfs_repair -n" ran in 1 minute, 32 seconds > > The "4.3" version of "xfs_repair -n" ran in 50 seconds > Yes, the later versions are compatible with old disk-format filesystems, and they have improvements in memory usage, speed, etc too > > So my questions are > > [1] Which version of "xfs_repair" should I use to make the repair? > > [2] Is there anything I should have done differently? > No, just use the latest stable one, and the defaults, unless you have a good reason to not use default options, which by your e-mail I believe you don't have one. The logs you send below, looks from a corrupted btree, but xfs_repair should be able to fix that for you. Cheers. > > Many thanks for any advice given it is much appreciated. > > Thanks, Steve > > > > Many blocks (about 20) of code similar to this were repeated in the logs. > > Jul 8 18:40:17 sraid1v kernel: ffff880dca95b000: 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 ................ > Jul 8 18:40:17 sraid1v kernel: XFS (sde): Internal error xfs_da_do_buf(2) > at line 2136 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffffa0e6e81a > Jul 8 18:40:17 sraid1v kernel: > Jul 8 18:40:17 sraid1v kernel: Pid: 8844, comm: idl Tainted: P -- > ------------ 2.6.32-642.el6.x86_64 #1 > Jul 8 18:40:17 sraid1v kernel: Call Trace: > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e7b68f>] ? > xfs_error_report+0x3f/0x50 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? > xfs_da_read_buf+0x2a/0x30 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e7b6fe>] ? > xfs_corruption_error+0x5e/0x90 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e6fc>] ? > xfs_da_do_buf+0x6cc/0x770 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? > xfs_da_read_buf+0x2a/0x30 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffff810154e3>] ? > native_sched_clock+0x13/0x80 > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? > xfs_da_read_buf+0x2a/0x30 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e74a21>] ? > xfs_dir2_leaf_lookup_int+0x61/0x2c0 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e74a21>] ? > xfs_dir2_leaf_lookup_int+0x61/0x2c0 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e74e05>] ? > xfs_dir2_leaf_lookup+0x35/0xf0 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e71306>] ? > xfs_dir2_isleaf+0x26/0x60 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e71ce4>] ? > xfs_dir_lookup+0x174/0x190 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e9ea47>] ? xfs_lookup+0x87/0x110 > [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0eabd74>] ? > xfs_vn_lookup+0x54/0xa0 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffff811a9ca5>] ? do_lookup+0x1a5/0x230 > Jul 8 18:40:17 sraid1v kernel: [<ffffffff811aa823>] ? > __link_path_walk+0x763/0x1060 > Jul 8 18:40:17 sraid1v kernel: [<ffffffff811ab3da>] ? path_walk+0x6a/0xe0 > Jul 8 18:40:17 sraid1v kernel: [<ffffffff811ab5eb>] ? > filename_lookup+0x6b/0xc0 > Jul 8 18:40:17 sraid1v kernel: [<ffffffff8123ac46>] ? > security_file_alloc+0x16/0x20 > Jul 8 18:40:17 sraid1v kernel: [<ffffffff811acac4>] ? > do_filp_open+0x104/0xd20 > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e9a4fc>] ? > _xfs_trans_commit+0x25c/0x310 [xfs] > Jul 8 18:40:17 sraid1v kernel: [<ffffffff812a749a>] ? > strncpy_from_user+0x4a/0x90 > Jul 8 18:40:17 sraid1v kernel: [<ffffffff811ba252>] ? alloc_fd+0x92/0x160 > Jul 8 18:40:17 sraid1v kernel: [<ffffffff81196bd7>] ? > do_sys_open+0x67/0x130 > Jul 8 18:40:17 sraid1v kernel: [<ffffffff81196ce0>] ? sys_open+0x20/0x30 > Jul 8 18:40:17 sraid1v kernel: [<ffffffff8100b0d2>] ? > system_call_fastpath+0x16/0x1b > Jul 8 18:40:17 sraid1v kernel: XFS (sde): Corruption detected. Unmount and > run xfs_repair > Jul 8 18:40:17 sraid1v kernel: ffff880dca95b000: 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 ................ > Jul 8 18:40:17 sraid1v kernel: XFS (sde): Internal error xfs_da_do_buf(2) > at line 2136 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffffa0e6e81a > Jul 8 18:40:17 sraid1v kernel: > Jul 8 18:40:17 sraid1v kernel: Pid: 8844, comm: idl Tainted: P -- > ------------ 2.6.32-642.el6.x86_64 #1 > > > > > > > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs -- Carlos _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice needed with file system corruption 2016-07-14 13:05 ` Carlos Maiolino @ 2016-07-14 13:57 ` Steve Brooks 2016-07-14 14:17 ` Carlos Maiolino 0 siblings, 1 reply; 13+ messages in thread From: Steve Brooks @ 2016-07-14 13:57 UTC (permalink / raw) To: xfs Hi Carlos, Many thanks again, for your good advice. I ran the version 4.3 of "xfs_repair" as suggested below and it did it's job very quickly in 50 seconds exactly as reported in the "No modify mode". Is the time reported at the end of the "No modify mode" always a good approximation of running in "modify mode" ? Anyway all is good now and it looks like any missing files are now in the "lost+found" directory. Steve On 14/07/16 14:05, Carlos Maiolino wrote: > Hi steve. > > On Thu, Jul 14, 2016 at 01:27:22PM +0100, Steve Brooks wrote: >> The "3.1.1" version of "xfs_repair -n" ran in 1 minute, 32 seconds >> >> The "4.3" version of "xfs_repair -n" ran in 50 seconds >> > Yes, the later versions are compatible with old disk-format filesystems, > and they have improvements in memory usage, speed, etc too > >> So my questions are >> >> [1] Which version of "xfs_repair" should I use to make the repair? >> >> [2] Is there anything I should have done differently? >> > No, just use the latest stable one, and the defaults, unless you have a good > reason to not use default options, which by your e-mail I believe you don't have > one. > > The logs you send below, looks from a corrupted btree, but xfs_repair should be > able to fix that for you. > > Cheers. > > >> Many thanks for any advice given it is much appreciated. >> >> Thanks, Steve >> >> >> >> Many blocks (about 20) of code similar to this were repeated in the logs. >> >> Jul 8 18:40:17 sraid1v kernel: ffff880dca95b000: 00 00 00 00 00 00 00 00 00 >> 00 00 00 00 00 00 00 ................ >> Jul 8 18:40:17 sraid1v kernel: XFS (sde): Internal error xfs_da_do_buf(2) >> at line 2136 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffffa0e6e81a >> Jul 8 18:40:17 sraid1v kernel: >> Jul 8 18:40:17 sraid1v kernel: Pid: 8844, comm: idl Tainted: P -- >> ------------ 2.6.32-642.el6.x86_64 #1 >> Jul 8 18:40:17 sraid1v kernel: Call Trace: >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e7b68f>] ? >> xfs_error_report+0x3f/0x50 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? >> xfs_da_read_buf+0x2a/0x30 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e7b6fe>] ? >> xfs_corruption_error+0x5e/0x90 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e6fc>] ? >> xfs_da_do_buf+0x6cc/0x770 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? >> xfs_da_read_buf+0x2a/0x30 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffff810154e3>] ? >> native_sched_clock+0x13/0x80 >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? >> xfs_da_read_buf+0x2a/0x30 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e74a21>] ? >> xfs_dir2_leaf_lookup_int+0x61/0x2c0 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e74a21>] ? >> xfs_dir2_leaf_lookup_int+0x61/0x2c0 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e74e05>] ? >> xfs_dir2_leaf_lookup+0x35/0xf0 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e71306>] ? >> xfs_dir2_isleaf+0x26/0x60 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e71ce4>] ? >> xfs_dir_lookup+0x174/0x190 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e9ea47>] ? xfs_lookup+0x87/0x110 >> [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0eabd74>] ? >> xfs_vn_lookup+0x54/0xa0 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffff811a9ca5>] ? do_lookup+0x1a5/0x230 >> Jul 8 18:40:17 sraid1v kernel: [<ffffffff811aa823>] ? >> __link_path_walk+0x763/0x1060 >> Jul 8 18:40:17 sraid1v kernel: [<ffffffff811ab3da>] ? path_walk+0x6a/0xe0 >> Jul 8 18:40:17 sraid1v kernel: [<ffffffff811ab5eb>] ? >> filename_lookup+0x6b/0xc0 >> Jul 8 18:40:17 sraid1v kernel: [<ffffffff8123ac46>] ? >> security_file_alloc+0x16/0x20 >> Jul 8 18:40:17 sraid1v kernel: [<ffffffff811acac4>] ? >> do_filp_open+0x104/0xd20 >> Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e9a4fc>] ? >> _xfs_trans_commit+0x25c/0x310 [xfs] >> Jul 8 18:40:17 sraid1v kernel: [<ffffffff812a749a>] ? >> strncpy_from_user+0x4a/0x90 >> Jul 8 18:40:17 sraid1v kernel: [<ffffffff811ba252>] ? alloc_fd+0x92/0x160 >> Jul 8 18:40:17 sraid1v kernel: [<ffffffff81196bd7>] ? >> do_sys_open+0x67/0x130 >> Jul 8 18:40:17 sraid1v kernel: [<ffffffff81196ce0>] ? sys_open+0x20/0x30 >> Jul 8 18:40:17 sraid1v kernel: [<ffffffff8100b0d2>] ? >> system_call_fastpath+0x16/0x1b >> Jul 8 18:40:17 sraid1v kernel: XFS (sde): Corruption detected. Unmount and >> run xfs_repair >> Jul 8 18:40:17 sraid1v kernel: ffff880dca95b000: 00 00 00 00 00 00 00 00 00 >> 00 00 00 00 00 00 00 ................ >> Jul 8 18:40:17 sraid1v kernel: XFS (sde): Internal error xfs_da_do_buf(2) >> at line 2136 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffffa0e6e81a >> Jul 8 18:40:17 sraid1v kernel: >> Jul 8 18:40:17 sraid1v kernel: Pid: 8844, comm: idl Tainted: P -- >> ------------ 2.6.32-642.el6.x86_64 #1 >> >> >> >> >> >> >> >> _______________________________________________ >> xfs mailing list >> xfs@oss.sgi.com >> http://oss.sgi.com/mailman/listinfo/xfs -- Dr Stephen Brooks Solar MHD Theory Group Tel :: 01334 463735 Fax :: 01334 463748 --------------------------------------- Mathematical Institute North Haugh University of St. Andrews St Andrews, Fife KY16 9SS SCOTLAND --------------------------------------- _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice needed with file system corruption 2016-07-14 13:57 ` Steve Brooks @ 2016-07-14 14:17 ` Carlos Maiolino 2016-07-14 23:33 ` Dave Chinner 0 siblings, 1 reply; 13+ messages in thread From: Carlos Maiolino @ 2016-07-14 14:17 UTC (permalink / raw) To: Steve Brooks; +Cc: xfs On Thu, Jul 14, 2016 at 02:57:25PM +0100, Steve Brooks wrote: > Hi Carlos, > > Many thanks again, for your good advice. I ran the version 4.3 of > "xfs_repair" as suggested below and it did it's job very quickly in 50 > seconds exactly as reported in the "No modify mode". Is the time reported at > the end of the "No modify mode" always a good approximation of running in > "modify mode" ? Good to know. But I'm not quite sure if the no modify mode could be used as a good approximation of a real run. I would say to not take it as true giving that xfs_repair can't predict the amount of time it will need to write all modifications it needs to do on the filesystem's metadata, and it will certainly can take much more time, depending on how corrupted the filesystem is. > > Anyway all is good now and it looks like any missing files are now in the > "lost+found" directory. > > Steve > > On 14/07/16 14:05, Carlos Maiolino wrote: > > Hi steve. > > > > On Thu, Jul 14, 2016 at 01:27:22PM +0100, Steve Brooks wrote: > > > The "3.1.1" version of "xfs_repair -n" ran in 1 minute, 32 seconds > > > > > > The "4.3" version of "xfs_repair -n" ran in 50 seconds > > > > > Yes, the later versions are compatible with old disk-format filesystems, > > and they have improvements in memory usage, speed, etc too > > > > > So my questions are > > > > > > [1] Which version of "xfs_repair" should I use to make the repair? > > > > > > [2] Is there anything I should have done differently? > > > > > No, just use the latest stable one, and the defaults, unless you have a good > > reason to not use default options, which by your e-mail I believe you don't have > > one. > > > > The logs you send below, looks from a corrupted btree, but xfs_repair should be > > able to fix that for you. > > > > Cheers. > > > > > > > Many thanks for any advice given it is much appreciated. > > > > > > Thanks, Steve > > > > > > > > > > > > Many blocks (about 20) of code similar to this were repeated in the logs. > > > > > > Jul 8 18:40:17 sraid1v kernel: ffff880dca95b000: 00 00 00 00 00 00 00 00 00 > > > 00 00 00 00 00 00 00 ................ > > > Jul 8 18:40:17 sraid1v kernel: XFS (sde): Internal error xfs_da_do_buf(2) > > > at line 2136 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffffa0e6e81a > > > Jul 8 18:40:17 sraid1v kernel: > > > Jul 8 18:40:17 sraid1v kernel: Pid: 8844, comm: idl Tainted: P -- > > > ------------ 2.6.32-642.el6.x86_64 #1 > > > Jul 8 18:40:17 sraid1v kernel: Call Trace: > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e7b68f>] ? > > > xfs_error_report+0x3f/0x50 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? > > > xfs_da_read_buf+0x2a/0x30 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e7b6fe>] ? > > > xfs_corruption_error+0x5e/0x90 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e6fc>] ? > > > xfs_da_do_buf+0x6cc/0x770 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? > > > xfs_da_read_buf+0x2a/0x30 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffff810154e3>] ? > > > native_sched_clock+0x13/0x80 > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e6e81a>] ? > > > xfs_da_read_buf+0x2a/0x30 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e74a21>] ? > > > xfs_dir2_leaf_lookup_int+0x61/0x2c0 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e74a21>] ? > > > xfs_dir2_leaf_lookup_int+0x61/0x2c0 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e74e05>] ? > > > xfs_dir2_leaf_lookup+0x35/0xf0 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e71306>] ? > > > xfs_dir2_isleaf+0x26/0x60 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e71ce4>] ? > > > xfs_dir_lookup+0x174/0x190 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e9ea47>] ? xfs_lookup+0x87/0x110 > > > [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0eabd74>] ? > > > xfs_vn_lookup+0x54/0xa0 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffff811a9ca5>] ? do_lookup+0x1a5/0x230 > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffff811aa823>] ? > > > __link_path_walk+0x763/0x1060 > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffff811ab3da>] ? path_walk+0x6a/0xe0 > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffff811ab5eb>] ? > > > filename_lookup+0x6b/0xc0 > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffff8123ac46>] ? > > > security_file_alloc+0x16/0x20 > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffff811acac4>] ? > > > do_filp_open+0x104/0xd20 > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffffa0e9a4fc>] ? > > > _xfs_trans_commit+0x25c/0x310 [xfs] > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffff812a749a>] ? > > > strncpy_from_user+0x4a/0x90 > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffff811ba252>] ? alloc_fd+0x92/0x160 > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffff81196bd7>] ? > > > do_sys_open+0x67/0x130 > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffff81196ce0>] ? sys_open+0x20/0x30 > > > Jul 8 18:40:17 sraid1v kernel: [<ffffffff8100b0d2>] ? > > > system_call_fastpath+0x16/0x1b > > > Jul 8 18:40:17 sraid1v kernel: XFS (sde): Corruption detected. Unmount and > > > run xfs_repair > > > Jul 8 18:40:17 sraid1v kernel: ffff880dca95b000: 00 00 00 00 00 00 00 00 00 > > > 00 00 00 00 00 00 00 ................ > > > Jul 8 18:40:17 sraid1v kernel: XFS (sde): Internal error xfs_da_do_buf(2) > > > at line 2136 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffffa0e6e81a > > > Jul 8 18:40:17 sraid1v kernel: > > > Jul 8 18:40:17 sraid1v kernel: Pid: 8844, comm: idl Tainted: P -- > > > ------------ 2.6.32-642.el6.x86_64 #1 > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > xfs mailing list > > > xfs@oss.sgi.com > > > http://oss.sgi.com/mailman/listinfo/xfs > > -- > Dr Stephen Brooks > > Solar MHD Theory Group > Tel :: 01334 463735 > Fax :: 01334 463748 > --------------------------------------- > Mathematical Institute > North Haugh > University of St. Andrews > St Andrews, Fife KY16 9SS > SCOTLAND > --------------------------------------- > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs -- Carlos _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice needed with file system corruption 2016-07-14 14:17 ` Carlos Maiolino @ 2016-07-14 23:33 ` Dave Chinner 0 siblings, 0 replies; 13+ messages in thread From: Dave Chinner @ 2016-07-14 23:33 UTC (permalink / raw) To: Steve Brooks, xfs On Thu, Jul 14, 2016 at 04:17:51PM +0200, Carlos Maiolino wrote: > On Thu, Jul 14, 2016 at 02:57:25PM +0100, Steve Brooks wrote: > > Hi Carlos, > > > > Many thanks again, for your good advice. I ran the version 4.3 of > > "xfs_repair" as suggested below and it did it's job very quickly in 50 > > seconds exactly as reported in the "No modify mode". Is the time reported at > > the end of the "No modify mode" always a good approximation of running in > > "modify mode" ? > > Good to know. But I'm not quite sure if the no modify mode could be used as a > good approximation of a real run. I would say to not take it as true giving that > xfs_repair can't predict the amount of time it will need to write all > modifications it needs to do on the filesystem's metadata, and it will certainly > can take much more time, depending on how corrupted the filesystem is. Yup, the no-modify mode skips a couple of steps in repair - phase 5 which rebuilds freespace btrees, and phase 7 which correctly link counts - and so can only be considered the minimum runtime in "fix it all up" mode. FWIW, Phase 6 can also blow out massively in runtime if there's significant directory damage that results in needing to move lots of inodes to the lost+found directory. > > > Hi steve. > > > > > > On Thu, Jul 14, 2016 at 01:27:22PM +0100, Steve Brooks wrote: > > > > The "3.1.1" version of "xfs_repair -n" ran in 1 minute, 32 seconds > > > > > > > > The "4.3" version of "xfs_repair -n" ran in 50 seconds And it's good to know that recent performance improvements show real world benefits, not just on the badly broken filesystems I used for testing. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice needed with file system corruption 2016-07-14 12:27 Advice needed with file system corruption Steve Brooks 2016-07-14 13:05 ` Carlos Maiolino @ 2016-08-08 14:11 ` Emmanuel Florac 2016-08-08 15:38 ` Roger Willcocks 2016-08-08 16:16 ` Steve Brooks 1 sibling, 2 replies; 13+ messages in thread From: Emmanuel Florac @ 2016-08-08 14:11 UTC (permalink / raw) To: Steve Brooks; +Cc: xfs Le Thu, 14 Jul 2016 13:27:22 +0100 Steve Brooks <sjb14@st-andrews.ac.uk> écrivait: > We have a RAID system with file system issues as follows, > > 50 TB in RAID 6 hosted on an Adaptec 71605 controller using > WD4000FYYZ drives. > > Centos 6.7 2.6.32-642.el6.x86_64 : xfsprogs-3.1.1-16.el6 > > While rebuilding a replaced disk, with the file system online and in > use, the system logs showed multiple entries of; > > XFS (sde): Corruption detected. Unmount and run xfs_repair. > Late to the game, I just wanted to remark that I've unfortunately verified many times that write activity during rebuilds on Adaptec RAID controllers often creates corruption. I've reported that to Adaptec, but they don't seem to care much... -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice needed with file system corruption 2016-08-08 14:11 ` Emmanuel Florac @ 2016-08-08 15:38 ` Roger Willcocks 2016-08-08 15:44 ` Emmanuel Florac 2016-08-08 16:16 ` Steve Brooks 1 sibling, 1 reply; 13+ messages in thread From: Roger Willcocks @ 2016-08-08 15:38 UTC (permalink / raw) To: Emmanuel Florac; +Cc: Steve Brooks, xfs On Mon, 2016-08-08 at 16:11 +0200, Emmanuel Florac wrote: > Le Thu, 14 Jul 2016 13:27:22 +0100 > Steve Brooks <sjb14@st-andrews.ac.uk> écrivait: > > > We have a RAID system with file system issues as follows, > > > > 50 TB in RAID 6 hosted on an Adaptec 71605 controller using > > WD4000FYYZ drives. > > > > Centos 6.7 2.6.32-642.el6.x86_64 : xfsprogs-3.1.1-16.el6 > > > > While rebuilding a replaced disk, with the file system online and in > > use, the system logs showed multiple entries of; > > > > XFS (sde): Corruption detected. Unmount and run xfs_repair. > > > > Late to the game, I just wanted to remark that I've unfortunately > verified many times that write activity during rebuilds on Adaptec RAID > controllers often creates corruption. I've reported that to Adaptec, > but they don't seem to care much... > It rather depends on why the disk was replaced in the first place... -- Roger _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice needed with file system corruption 2016-08-08 15:38 ` Roger Willcocks @ 2016-08-08 15:44 ` Emmanuel Florac 2016-08-09 4:02 ` Gim Leong Chin 0 siblings, 1 reply; 13+ messages in thread From: Emmanuel Florac @ 2016-08-08 15:44 UTC (permalink / raw) To: Roger Willcocks; +Cc: Steve Brooks, xfs Le Mon, 08 Aug 2016 16:38:11 +0100 Roger Willcocks <roger@filmlight.ltd.uk> écrivait: > > > > Late to the game, I just wanted to remark that I've unfortunately > > verified many times that write activity during rebuilds on Adaptec > > RAID controllers often creates corruption. I've reported that to > > Adaptec, but they don't seem to care much... > > > > It rather depends on why the disk was replaced in the first place... Well, given I always use RAID-6, it shouldn't matter; a failed drive shouldn't alter the array behaviour significantly, as it simply falls back to sort-of RAID-5 (any bad block read or write should be corrected on the fly). It seems like explicitly disabling individual disk drives write-back cache somewhat mitigates the effect. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice needed with file system corruption 2016-08-08 15:44 ` Emmanuel Florac @ 2016-08-09 4:02 ` Gim Leong Chin 2016-08-09 12:40 ` Carlos E. R. 0 siblings, 1 reply; 13+ messages in thread From: Gim Leong Chin @ 2016-08-09 4:02 UTC (permalink / raw) To: Emmanuel Florac, Roger Willcocks; +Cc: Steve Brooks, xfs@oss.sgi.com [-- Attachment #1.1: Type: text/plain, Size: 1066 bytes --] From: Emmanuel Florac <eflorac@intellique.com> To: Roger Willcocks <roger@filmlight.ltd.uk> Cc: Steve Brooks <sjb14@st-andrews.ac.uk>; xfs@oss.sgi.com Sent: Monday, 8 August 2016, 23:44 Subject: Re: Advice needed with file system corruption Le Mon, 08 Aug 2016 16:38:11 +0100 Roger Willcocks <roger@filmlight.ltd.uk> écrivait: > > > > Late to the game, I just wanted to remark that I've unfortunately > > verified many times that write activity during rebuilds on Adaptec > > RAID controllers often creates corruption. I've reported that to > > Adaptec, but they don't seem to care much... > > > > It seems like explicitly disabling individual disk drives write-back > cache somewhat mitigates the effect. Drives connected to RAID controllers with battery backed cache should have their caches "disabled" (they are really set to write through mode instead). By the way, I found out in lab testing that 7200 RPM SATA drives suffer a big performance loss when doing sequential writes in cache write through mode. [-- Attachment #1.2: Type: text/html, Size: 2925 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice needed with file system corruption 2016-08-09 4:02 ` Gim Leong Chin @ 2016-08-09 12:40 ` Carlos E. R. 2016-08-09 15:43 ` Gim Leong Chin 2016-08-09 21:26 ` Dave Chinner 0 siblings, 2 replies; 13+ messages in thread From: Carlos E. R. @ 2016-08-09 12:40 UTC (permalink / raw) To: XFS mail list [-- Attachment #1.1: Type: text/plain, Size: 677 bytes --] On 2016-08-09 06:02, Gim Leong Chin wrote: > Drives connected to RAID controllers with battery backed cache should > have their caches "disabled" (they are really set to write through mode > instead). By the way, I found out in lab testing that 7200 RPM SATA > drives suffer a big performance loss when doing sequential writes in > cache write through mode.<http://oss.sgi.com/mailman/listinfo/xfs> If you disable the disk internal cache, as a consequence you also disable the disk internal write optimizations. It has to be much slower at writing. It seems to me obvious. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar) [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice needed with file system corruption 2016-08-09 12:40 ` Carlos E. R. @ 2016-08-09 15:43 ` Gim Leong Chin 2016-08-09 21:26 ` Dave Chinner 1 sibling, 0 replies; 13+ messages in thread From: Gim Leong Chin @ 2016-08-09 15:43 UTC (permalink / raw) To: Carlos E. R., XFS mail list [-- Attachment #1.1: Type: text/plain, Size: 1416 bytes --] On 2016-08-09 06:02, Gim Leong Chin wrote: >> Drives connected to RAID controllers with battery backed cache should >> have their caches "disabled" (they are really set to write through mode >> instead). By the way, I found out in lab testing that 7200 RPM SATA >> drives suffer a big performance loss when doing sequential writes in >> cache write through mode.< > If you disable the disk internal cache, as a consequence you also > disable the disk internal write optimizations. It has to be much slower > at writing. It seems to me obvious. > -- > Cheers / Saludos, > Carlos E. R. > (from 13.1 x86_64 "Bottle" at Telcontar) The drop in sequential write data rate for 3.5" 7200 RPM SATA drives was around 50%, I cannot remember the exact numbers, that is not obvious to me. As a reminder, the drive cache is really set to write through mode, it is not possible to disable the cache, as an application engineer from HGST told me, so the drive internal write optimizations are still there, just that the IO command is reported to be completed only when the data has been writen to the drive platter. 10k and 15k RPM SAS drives connected to LSI Internal RAID controllers have their drive cache "disabled" automatically, I wonder how much is the data rate drop compared to drive cache "enabled", considering that LSI IR controllers do not have cache. GL [-- Attachment #1.2: Type: text/html, Size: 3109 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice needed with file system corruption 2016-08-09 12:40 ` Carlos E. R. 2016-08-09 15:43 ` Gim Leong Chin @ 2016-08-09 21:26 ` Dave Chinner 1 sibling, 0 replies; 13+ messages in thread From: Dave Chinner @ 2016-08-09 21:26 UTC (permalink / raw) To: Carlos E. R.; +Cc: XFS mail list On Tue, Aug 09, 2016 at 02:40:26PM +0200, Carlos E. R. wrote: > On 2016-08-09 06:02, Gim Leong Chin wrote: > > > Drives connected to RAID controllers with battery backed cache should > > have their caches "disabled" (they are really set to write through mode > > instead). By the way, I found out in lab testing that 7200 RPM SATA > > drives suffer a big performance loss when doing sequential writes in > > cache write through mode.<http://oss.sgi.com/mailman/listinfo/xfs> > > If you disable the disk internal cache, as a consequence you also > disable the disk internal write optimizations. It has to be much slower > at writing. It seems to me obvious. This is why decent HW RAID controllers have a large non volatile write cache - the caching is done in the controller where it is safe from power loss, not in the drive where it is unsafe. Write optimisations happen at the RAID controller level, not at the individual drive level. As for 10/15krpm SAS drive performance, they generally are only slower in microbenchmark situations (e.g. sequential single sector writes) when the write cache is disabled. These sorts of loads aren't typically seen in the real world, so for most people there is little difference in performance on high end enterprise SAS drives when changing the cache mode.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Advice needed with file system corruption 2016-08-08 14:11 ` Emmanuel Florac 2016-08-08 15:38 ` Roger Willcocks @ 2016-08-08 16:16 ` Steve Brooks 1 sibling, 0 replies; 13+ messages in thread From: Steve Brooks @ 2016-08-08 16:16 UTC (permalink / raw) To: Emmanuel Florac, Steve Brooks; +Cc: xfs Hi, I chose the words "rebuilding a replaced disk" deliberately as I removed a disk that (according to adaptec's software) had some "media errors" even though the SMART attributes showed there were no "pending sectors" or "reallocated sectors", in fact all the SMART attributes were clean. As I was also using "RAID 6" I did not expect any issues leaving the filesystem online while rebuilding. Previous to this the RAID had been running live 24/7 for 0ver three years. Steve On 08/08/2016 15:11, Emmanuel Florac wrote: > Le Thu, 14 Jul 2016 13:27:22 +0100 > Steve Brooks <sjb14@st-andrews.ac.uk> écrivait: > >> We have a RAID system with file system issues as follows, >> >> 50 TB in RAID 6 hosted on an Adaptec 71605 controller using >> WD4000FYYZ drives. >> >> Centos 6.7 2.6.32-642.el6.x86_64 : xfsprogs-3.1.1-16.el6 >> >> While rebuilding a replaced disk, with the file system online and in >> use, the system logs showed multiple entries of; >> >> XFS (sde): Corruption detected. Unmount and run xfs_repair. >> > Late to the game, I just wanted to remark that I've unfortunately > verified many times that write activity during rebuilds on Adaptec RAID > controllers often creates corruption. I've reported that to Adaptec, > but they don't seem to care much... > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-08-09 21:26 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-14 12:27 Advice needed with file system corruption Steve Brooks 2016-07-14 13:05 ` Carlos Maiolino 2016-07-14 13:57 ` Steve Brooks 2016-07-14 14:17 ` Carlos Maiolino 2016-07-14 23:33 ` Dave Chinner 2016-08-08 14:11 ` Emmanuel Florac 2016-08-08 15:38 ` Roger Willcocks 2016-08-08 15:44 ` Emmanuel Florac 2016-08-09 4:02 ` Gim Leong Chin 2016-08-09 12:40 ` Carlos E. R. 2016-08-09 15:43 ` Gim Leong Chin 2016-08-09 21:26 ` Dave Chinner 2016-08-08 16:16 ` Steve Brooks
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox