From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Lyakas Subject: Re: dirty chunks on bitmap not clearing (RAID1) Date: Mon, 26 Dec 2011 20:07:16 +0200 Message-ID: References: <20110831173842.44ab5b03@notabene.brown> <20111223094815.6beaf413@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <20111223094815.6beaf413@notabene.brown> Sender: linux-raid-owner@vger.kernel.org To: NeilBrown Cc: Chris Pearson , linux-raid@vger.kernel.org List-Id: linux-raid.ids Hello Neil, from the patch it looks like for raid levels with more than a single redundancy, like 3-way raid1 or raid6, when there is an additional missing drive, the bits will still not be cleared, correct? This seems to be protected by !bitmap->mddev->degraded part. Because these bits are still needed to rebuild future drive(s)? Thanks, Alex. On Fri, Dec 23, 2011 at 12:48 AM, NeilBrown wrote: > On Wed, 31 Aug 2011 13:23:01 -0500 (CDT) Chris Pearson > wrote: > >> I'm happy to apply a patch to whichever kernel you like, but the blo= cks have since cleared, so I will try and reproduce it first. > > I have finally identified the problem here. =A0I was looking into a d= ifferent > but related problem and saw what was happening. =A0I don't know what = I didn't > notice it before. > > You can easily reproduce the problem by writing to an array with a bi= tmap > while a spare is recovering. Any bits that get set in the section tha= t has > already been recovered will stay set. > > This patch fixes it and will - with luck - be in 3.2. > > Thanks, > NeilBrown > > From b9664495d2a884fbf7195e1abe4778cc6c3ae9b7 Mon Sep 17 00:00:00 200= 1 > From: NeilBrown > Date: Fri, 23 Dec 2011 09:42:52 +1100 > Subject: [PATCH] md/bitmap: It is OK to clear bits during recovery. > > commit d0a4bb492772ce5c4bdfba3744a99ed6f6fb238f introduced a > regression which is annoying but fairly harmless. > > When writing to an array that is undergoing recovery (a spare > in being integrated into the array), writing to the array will > set bits in the bitmap, but they will not be cleared when the > write completes. > > For bits covering areas that have not been recovered yet this is not = a > problem as the recovery will clear the bits. =A0However bits set in > already-recovered region will stay set and never be cleared. > This doesn't risk data integrity. =A0The only negatives are: > =A0- next time there is a crash, more resyncing than necessary will > =A0 be done. > =A0- the bitmap doesn't look clean, which is confusing. > > While an array is recovering we don't want to update the > 'events_cleared' setting in the bitmap but we do still want to clear > bits that have very recently been set - providing they were written t= o > the recovering device. > > So split those two needs - which previously both depended on 'success= ' > and always clear the bit of the write went to all devices. > > Signed-off-by: NeilBrown > > diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c > index b690711..6d03774 100644 > --- a/drivers/md/bitmap.c > +++ b/drivers/md/bitmap.c > @@ -1393,9 +1393,6 @@ void bitmap_endwrite(struct bitmap *bitmap, sec= tor_t offset, unsigned long secto > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 atomic_read(&bitmap->= behind_writes), > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 bitmap->mddev->bitmap= _info.max_write_behind); > =A0 =A0 =A0 =A0} > - =A0 =A0 =A0 if (bitmap->mddev->degraded) > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Never clear bits or update events_cl= eared when degraded */ > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 success =3D 0; > > =A0 =A0 =A0 =A0while (sectors) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0sector_t blocks; > @@ -1409,7 +1406,7 @@ void bitmap_endwrite(struct bitmap *bitmap, sec= tor_t offset, unsigned long secto > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (success && > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (success && !bitmap->mddev->degraded= && > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0bitmap->events_cleared < bitma= p->mddev->events) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0bitmap->events_cleared= =3D bitmap->mddev->events; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0bitmap->need_sync =3D = 1; > > > >> >> On Wed, 31 Aug 2011, NeilBrown wrote: >> >> >Date: Wed, 31 Aug 2011 17:38:42 +1000 >> >From: NeilBrown >> >To: Chris Pearson >> >Cc: linux-raid@vger.kernel.org >> >Subject: Re: dirty chunks on bitmap not clearing (RAID1) >> > >> >On Mon, 29 Aug 2011 11:30:56 -0500 Chris Pearson wrote: >> > >> >> I have the same problem. =A03 chunks are always dirty. >> >> >> >> I'm using 2.6.38-8-generic and mdadm - v3.1.4 - 31st August 2010 >> >> >> >> If that's not normal, then maybe what I've done differently is th= at I >> >> created the array, raid 1, with one live and one missing disk, th= en >> >> added the second one later after writing a lot of data. >> >> >> >> Also, though probably not the cause, I continued writing data whi= le it >> >> was syncing, and a couple times during the syncing, both drives >> >> stopped responding and I had to power off. >> >> >> >> # cat /proc/mdstat >> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [rai= d5] >> >> [raid4] [raid10] >> >> md127 : active raid1 sdd1[0] sdc1[2] >> >> =A0 =A0 =A0 1904568184 blocks super 1.2 [2/2] [UU] >> >> =A0 =A0 =A0 bitmap: 3/15 pages [12KB], 65536KB chunk >> >> >> >> unused devices: >> >> >> >> # mdadm -X /dev/sd[dc]1 >> >> =A0 =A0 =A0 =A0 Filename : /dev/sdc1 >> >> =A0 =A0 =A0 =A0 =A0 =A0Magic : 6d746962 >> >> =A0 =A0 =A0 =A0 =A0Version : 4 >> >> =A0 =A0 =A0 =A0 =A0 =A0 UUID : 43761dc5:4383cf0f:41ef2dab:43e6d74= e >> >> =A0 =A0 =A0 =A0 =A0 Events : 40013 >> >> =A0 Events Cleared : 40013 >> >> =A0 =A0 =A0 =A0 =A0 =A0State : OK >> >> =A0 =A0 =A0 =A0Chunksize : 64 MB >> >> =A0 =A0 =A0 =A0 =A0 Daemon : 5s flush period >> >> =A0 =A0 =A0 Write Mode : Allow write behind, max 256 >> >> =A0 =A0 =A0 =A0Sync Size : 1904568184 (1816.34 GiB 1950.28 GB) >> >> =A0 =A0 =A0 =A0 =A0 Bitmap : 29062 bits (chunks), 3 dirty (0.0%) >> >> =A0 =A0 =A0 =A0 Filename : /dev/sdd1 >> >> =A0 =A0 =A0 =A0 =A0 =A0Magic : 6d746962 >> >> =A0 =A0 =A0 =A0 =A0Version : 4 >> >> =A0 =A0 =A0 =A0 =A0 =A0 UUID : 43761dc5:4383cf0f:41ef2dab:43e6d74= e >> >> =A0 =A0 =A0 =A0 =A0 Events : 40013 >> >> =A0 Events Cleared : 40013 >> >> =A0 =A0 =A0 =A0 =A0 =A0State : OK >> >> =A0 =A0 =A0 =A0Chunksize : 64 MB >> >> =A0 =A0 =A0 =A0 =A0 Daemon : 5s flush period >> >> =A0 =A0 =A0 Write Mode : Allow write behind, max 256 >> >> =A0 =A0 =A0 =A0Sync Size : 1904568184 (1816.34 GiB 1950.28 GB) >> >> =A0 =A0 =A0 =A0 =A0 Bitmap : 29062 bits (chunks), 3 dirty (0.0%) >> > >> >I cannot see how this would be happening. =A0If any bits are set, t= hen they >> >will be cleared after 5 seconds, and then 5 seconds later the block= holding >> >the bits will be written out so that they will appear on disk to be= cleared. >> > >> >I assume that if you write to the array, the 'dirty' count increase= s, but >> >always goes back to three? >> > >> >And if you stop the array and start it again, the '3' stays there? >> > >> >If I sent you a patch to add some tracing information would you be = able to >> >compile a new kernel with that patch applied and see what it says? >> > >> >Thanks, >> > >> >NeilBrown >> > >> > >> >> >> >> >> >> Quoting NeilBrown : >> >> >> >> > On Thu, October 15, 2009 9:39 am, aristizb@ualberta.ca wrote: >> >> >> Hello, >> >> >> >> >> >> I have a RAID1 with 2 LVM disks and I am running into a strang= e >> >> >> situation where having the 2 disks connected to the array the = bitmap >> >> >> never clears the dirty chunks. >> >> > >> >> > That shouldn't happen... >> >> > What versions of mdadm and the Linux kernel are you using? >> >> > >> >> > NeilBrown >> >> > >> >> >> >> >> >> I am assuming also that when a RAID1 is in write-through mode,= the >> >> >> bitmap =A0indicates that all the data has made it to all the d= isks if >> >> >> there are no dirty chunks using mdadm --examine-bitmap. >> >> >> >> >> >> The output of cat /proc/mdstat is: >> >> >> >> >> >> md2060 : active raid1 dm-5[1] dm-6[0] >> >> >> =A0 =A0 =A0 =A02252736 blocks [2/2] [UU] >> >> >> =A0 =A0 =A0 =A0bitmap: 1/275 pages [12KB], 4KB chunk, file: /t= mp/md2060bm >> >> >> >> >> >> >> >> >> The output of mdadm --examine-bitmap /tmp/md2060bm is: >> >> >> >> >> >> Filename : md2060bm >> >> >> =A0 =A0 =A0 =A0 =A0 =A0 Magic : 6d746962 >> >> >> =A0 =A0 =A0 =A0 =A0 Version : 4 >> >> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0UUID : ad5fb74c:bb1c654a:087b2595:8= a5d04a9 >> >> >> =A0 =A0 =A0 =A0 =A0 =A0Events : 12 >> >> >> =A0 =A0Events Cleared : 12 >> >> >> =A0 =A0 =A0 =A0 =A0 =A0 State : OK >> >> >> =A0 =A0 =A0 =A0 Chunksize : 4 KB >> >> >> =A0 =A0 =A0 =A0 =A0 =A0Daemon : 5s flush period >> >> >> =A0 =A0 =A0 =A0Write Mode : Normal >> >> >> =A0 =A0 =A0 =A0 Sync Size : 2252736 (2.15 GiB 2.31 GB) >> >> >> =A0 =A0 =A0 =A0 =A0 =A0Bitmap : 563184 bits (chunks), 3 dirty = (0.0%) >> >> >> >> >> >> >> >> >> Having the array under no IO, I waited 30 minutes but the dirt= y data >> >> >> never gets clear from the bitmap, so I presume =A0the disks ar= e not in >> >> >> sync; but after I ran a block by block comparison of the two d= evices I >> >> >> found that they are equal. >> >> >> >> >> >> The superblocks and the external bitmap tell me that all the e= vents >> >> >> are cleared, so I am confused on why the bitmap never goes to = 0 dirty >> >> >> chunks. >> >> >> >> >> >> How can I tell if the disks are in sync? >> >> >> >> >> >> >> >> >> Thank you in advance for any help >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-r= aid" in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at =A0http://vger.kernel.org/majordomo-info.h= tml >> > >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html