From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Price Date: Mon, 21 Mar 2016 16:44:55 +0000 Subject: [Cluster-devel] [gfs2-utils PATCH 00/11] Misc patches regarding corrupt rindex handling In-Reply-To: <647027937.41101966.1458572006123.JavaMail.zimbra@redhat.com> References: <56E84AC0.4020202@redhat.com> <647027937.41101966.1458572006123.JavaMail.zimbra@redhat.com> Message-ID: <56F02507.2070604@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On 21/03/16 14:53, Bob Peterson wrote: > Hi Andy, > > ----- Original Message ----- >> Hi Bob, >> > (snip) >> Consider these ACKed. I've taken a look through them and they look fine, >> though I'm not hugely familiar with that part of the fsck code. Coverity >> is happy with them too. > > Does that mean I can/should push them? If you're happy with them, go right ahead. We can add tests and any tweaks that might be needed on top. > I thought you said you had some > suggestions for them. Well I thought there might be an issue with using an int for the counter in patch 10 but that seems like it should be sufficient on closer inspection. There was also one piece of code that was commented out in patch 7 instead of removed which seemed odd but ultimately not a bug. >> Outside of the scope of this set, I noticed in patch 11 that parts of >> the existing fsck.gfs2 code assume that the first rgrp lies in the first >> block after the superblock. That's likely to get confused when the file >> system is RAID stripe aligned so that's something that may need >> addressing at some point (mkfs.gfs2 -b 4096 -o swidth=16k,sunit=8k >> should mock it up for a test). > > Good to know. This could be a problem for even today's rindex repair code. > We may need to do some testing to figure out what's broken and what's not. As I understand it, the rg repair code is just using the layout assumptions for quickly discovering where the rgrps are in order to rebuild the rindex, so in most cases it shouldn't be a problem (in theory), but we'll need to add heuristics based on the new layout strategy if it gets confused. Which is where the testing comes in... >> In general it would be good to have more fsck.gfs2 tests in the >> testsuite so if there's a simple way to test these patches without >> mounting (scribbling over some rindex entries?) then I'll get that added. >> Cheers, >> Andy > > This is an excellent idea. A very long time ago, I used to have a different > fsck "nightmare" test that used a little "nukerg2.c" program (source attached) Thanks, it could live in the tests directory to make use of in tests/fsck.at and perhaps gfs2l can be extended to obsolete it later. > to nuke an rgrp or an rindex entry. Then I'd force fsck to fix it. > I may have originally written it for gfs1 rather than gfs2, and I know > I was especially targeting post-gfs2_grow rindexes. > > I can no longer find the original script, but the original sequence went > something like: > > mkfs.gfs2 > ./nukerg2 -rg X > fsck.gfs2 > # make sure $? is 1 > fsck.gfs2 > # make sure $? is 0 > mount > lvextend > gfs2_grow > ./nukerg2 -rg X > fsck.gfs2 > # make sure $? is 1 > fsck.gfs2 > # make sure $? is 0 > > For X, I'd systematically do: > 1. First rgrp > 2. Second rgrp > 3. Third rgrp > 4. Half-way point > 5. Last rgrp > 6. I'd also nuke two and three rgrps in a row IIRC. > > There's also an "-rgindex" parameter to kill the rindex entries themselves, > so I'd repeat each of the above steps, but killing the rindex entry rather > than the rgrp itself. > > This test really ought to be resurrected and added to the test suite. Ok, sounds like a plan. We can do everything up to the mount in the gfs2-utils testsuite. Perhaps we can expand on it with some gfs2-specific xfstests tests for the mount+grow cases in future. Cheers, Andy