From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Rabbitson Subject: Re: Redundancy check using "echo check > sync_action": error reporting? Date: Sun, 04 May 2008 09:30:02 +0200 Message-ID: <481D65FA.4090107@rabbit.us> References: <47DD2CD7.2090802@tuxes.nl> <20080316161451.0d17fd22@szpak> <47E26775.3000500@tuxes.nl> <20080320134747.GA28114@cthulhu.home.robinhill.me.uk> <47E2725C.1020206@tuxes.nl> <20080320163551.GG13719@mit.edu> <47E2EE64.5080101@rabbit.us> <47E3C504.3010700@tmr.com> <47E3CBAF.4090808@rabbit.us> <47E43E57.5010409@tmr.com> <20080321235557.GA11801@cthulhu.home.robinhill.me.uk> <47E4D95A.9000505@rabbit.us> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <47E4D95A.9000505@rabbit.us> Sender: linux-raid-owner@vger.kernel.org To: linux-raid@vger.kernel.org List-Id: linux-raid.ids Peter Rabbitson wrote: > Robin Hill wrote: >> On Fri Mar 21, 2008 at 07:01:43PM -0400, Bill Davidsen wrote: >> >>> Peter Rabbitson wrote: >>>> I was actually specifically advocating that md must _not_ do >>>> anything on its own. Just provide the hooks to get information (what >>>> is the current stripe state) and update information (the described >>>> repair extension). The logic that you are describing can live only >>>> in an external app, it has no place in-kernel. >>> So you advocate the current code being in the kernel, which absent a >>> hardware error makes blind assumptions about which data is valid and >>> which is not and in all cases hides the problem, instead of the code >>> I proposed, which in some cases will be able to avoid action which is >>> provably wrong and never be less likely to do the wrong thing than >>> the current code? >>> >> I would certainly advocate that the current (entirely automatic) code >> belongs in the kernel whereas any code requiring user >> intervention/decision making belongs in a user process, yes. That's not >> to say that the former should be preferred over the latter though, but >> there's really no reason to remove the in-kernel automated process until >> (or even after) a user-side repair process has been coded. > > I am asserting that automatic repair is infeasible in most > highly-redundant cases. Lets take the root raid1 of one of my busiest > servers: > > /dev/md0: > Version : 00.90.03 > Creation Time : Tue Mar 20 21:58:54 2007 > Raid Level : raid1 > Array Size : 6000128 (5.72 GiB 6.14 GB) > Used Dev Size : 6000128 (5.72 GiB 6.14 GB) > Raid Devices : 4 > Total Devices : 4 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Sat Mar 22 05:55:08 2008 > State : clean > Active Devices : 4 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 0 > > UUID : b6a11a74:8b069a29:6e26228f:2ab99bd0 (local to host > Arzamas) > Events : 0.183270 > > As you can see it is pretty old, and does not have many events to speak > of. Yet every month when the automatic check is issued I get between 512 > and 2048 in mismatch_cnt. I maintain md5sums of all files on this > filesystem, and there were no deviations for the lifetime of the array > (of course there are mismatches after upgrades, after log appends etc, > but they are all expected). So all I can do with this array is issue a > blind repair, without even having the chance to find what exactly is > causing this. Yes, it is raid1 and I could do 1:1 comparison to find > which is the offending block. How about raid10 -n f3? There is no way I > can figure out _what_ is giving me a problem. I do not know if it is a > hardware error (the md5 sums speak against it), some process with weird > write patterns resulting in heavy DMA, or a bug in md itself. > > By the way there is no swap file on this array. Just / and /var, with a > moderately busy mail spool on top. > I want to resurect this discussion with a peculiar observation - the above mismatch was caused by GRUB. I had some time this weekend and decided to take device snapshots of the 4 array members as listed above while / is mounted ro. After stripping the md superblock I ended up with data from slots 1 2 and 3 being identical, and 0 (my primary boot device) being different by about 10 bytes. Hexediting revealed that the bytes in question belong to /boot/grub/default. I realized that my grub config contains a savedefault clause, which updates the file on the raw ext3 volume before any raid assembly has taken place. Executing grub-set-default from within a booted system (with a mounted assembled raid) resulted in the subsequent md check to return 0 mismatches. To add insult to the injury the way svedefault and grub-set-default update said file are different (comments vs empty lines). So even if one savedfault's the same entry as the one set initially bu grub-set-default - the result will still be a raid1 mismatch. I assume that this condition is benign, but wanted to bring this to the attention of the masses anyway. Cheers Peter