From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jon Nelson" Subject: Re: weird issues with raid1 Date: Wed, 17 Dec 2008 22:55:04 -0600 Message-ID: References: <18757.62097.166706.244330@notabene.brown> <18758.52536.345145.238926@notabene.brown> <18759.678.74091.236787@notabene.brown> <18761.54425.725696.255055@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Disposition: inline Sender: linux-raid-owner@vger.kernel.org To: Neil Brown Cc: LinuxRaid List-Id: linux-raid.ids On Wed, Dec 17, 2008 at 10:50 PM, Jon Nelson wrote: > On Wed, Dec 17, 2008 at 10:42 PM, Neil Brown wrote: >> On Tuesday December 16, neilb@suse.de wrote: >>> On Monday December 15, jnelson-linux-raid@jamponi.net wrote: >>> > On Mon, Dec 15, 2008 at 3:33 PM, Neil Brown wrote: >>> > > On Monday December 15, jnelson-linux-raid@jamponi.net wrote: >>> > >> >>> > >> Aha! This explains a question I raised in another email. What >>> > >> happened there is a previously fully active member of the raid got >>> > >> added, somehow, as a spare, via --incremental. That's when the entire >>> > >> raid thought it needed to be rebuilt. How did that (the device being >>> > >> treated as a spare instead of as a previously fully active member) >>> > >> happen? >>> > > >>> > > It is hard to guess without details, and they might be hard to collect >>> > > after the fact. >>> > > Maybe if you have the kernel logs of when the server rebooted and the >>> > > recovery started, that might contain some hints. >>> > >>> > I hope this helps. >>> >>> Yes it does, though I generally prefer to get more complete logs. If >>> I get the surrounding log lines then I know what isn't there as well >>> as what is - and it isn't always clear at first which bits will be >>> important. >>> >>> The problem here is that --incremental doesn't provide the --re-add >>> functionality that you are depending on. That was an oversight on my >>> part. I'll see if I can get it fixed. >>> In the mean time, you'll need to use --re-add (or --add, it does the >>> same thing in your situation) to add nbd0 to the array. >> >> Actually, I'm wrong. >> --incremental does do the right thing w.r.t. --re-add. >> I couldn't reproduce your symptoms. > > OK. > >> It could be that you are hitting the bug fixed by >> commit a0da84f35b25875870270d16b6eccda4884d61a7 > > That sure sounds like it. I'd have to log to see what happened, > exactly, but I've added substantial logging around the device > discovery and addition section which manages this particular raid. > >> You would need 2.6.26 or later to have that fixed. >> Can you try with a newer kernel??? > > I hope to be giving opensuse 11.1 a try soon, which uses 2.6.27.X > afaik. I suspect I can also backport that patch to 2.6.25 easily. The kernel source for 2.6.25.18-0.2 (from suse) has this patch already, so I was already using it. Perhaps this weekend or some night this week I'll find time to try to break things again. -- Jon