linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bug?] raid1 integrity checking is broken on 2.6.18-rc4
@ 2006-08-12  6:49 Chuck Ebbert
  2006-08-12  9:13 ` Justin Piszcz
  2006-08-14  6:14 ` Neil Brown
  0 siblings, 2 replies; 7+ messages in thread
From: Chuck Ebbert @ 2006-08-12  6:49 UTC (permalink / raw)
  To: linux-raid; +Cc: Neil Brown, linux-kernel

Doing this on a raid1 array:
        echo "check" >/sys/block/md0/md/sync_action

On 2.6.16.27:
        Activity lights on both mirrors show activity for a while,
        then the array status prints on the console.

On 2.6.18-rc4 + the below patch:
        Drive activity light blinks once on one drive, then the
        array status prints (obviously no checking takes place.)


Applied hotfix on 2.6.18-rc4:

--- .prev/drivers/md/md.c       2006-08-08 09:00:44.000000000 +1000
+++ ./drivers/md/md.c   2006-08-08 09:04:04.000000000 +1000
@@ -1597,6 +1597,19 @@ void md_update_sb(mddev_t * mddev)
 
 repeat:
        spin_lock_irq(&mddev->write_lock);
+
+       if (mddev->degraded && mddev->sb_dirty == 3)
+               /* If the array is degraded, then skipping spares is both
+                * dangerous and fairly pointless.
+                * Dangerous because a device that was removed from the array
+                * might have a event_count that still looks up-to-date,
+                * so it can be re-added without a resync.
+                * Pointless because if there are any spares to skip,
+                * then a recovery will happen and soon that array won't
+                * be degraded any more and the spare can go back to sleep then.
+                */
+               mddev->sb_dirty = 1;
+
        sync_req = mddev->in_sync;
        mddev->utime = get_seconds();
        if (mddev->sb_dirty == 3)
-- 
Chuck


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4
  2006-08-12  6:49 [bug?] raid1 integrity checking is broken on 2.6.18-rc4 Chuck Ebbert
@ 2006-08-12  9:13 ` Justin Piszcz
  2006-08-12 11:59   ` Michael Tokarev
  2006-08-14  6:14 ` Neil Brown
  1 sibling, 1 reply; 7+ messages in thread
From: Justin Piszcz @ 2006-08-12  9:13 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-raid, Neil Brown, linux-kernel



On Sat, 12 Aug 2006, Chuck Ebbert wrote:

> Doing this on a raid1 array:
>        echo "check" >/sys/block/md0/md/sync_action
>
> On 2.6.16.27:
>        Activity lights on both mirrors show activity for a while,
>        then the array status prints on the console.
>
> On 2.6.18-rc4 + the below patch:
>        Drive activity light blinks once on one drive, then the
>        array status prints (obviously no checking takes place.)
>
>
> Applied hotfix on 2.6.18-rc4:
>
> --- .prev/drivers/md/md.c       2006-08-08 09:00:44.000000000 +1000
> +++ ./drivers/md/md.c   2006-08-08 09:04:04.000000000 +1000
> @@ -1597,6 +1597,19 @@ void md_update_sb(mddev_t * mddev)
>
> repeat:
>        spin_lock_irq(&mddev->write_lock);
> +
> +       if (mddev->degraded && mddev->sb_dirty == 3)
> +               /* If the array is degraded, then skipping spares is both
> +                * dangerous and fairly pointless.
> +                * Dangerous because a device that was removed from the array
> +                * might have a event_count that still looks up-to-date,
> +                * so it can be re-added without a resync.
> +                * Pointless because if there are any spares to skip,
> +                * then a recovery will happen and soon that array won't
> +                * be degraded any more and the spare can go back to sleep then.
> +                */
> +               mddev->sb_dirty = 1;
> +
>        sync_req = mddev->in_sync;
>        mddev->utime = get_seconds();
>        if (mddev->sb_dirty == 3)
> -- 
> Chuck
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Is there a doc for all of the options you can echo into the sync_action? 
I'm assuming mdadm does these as well and echo is just another way to run 
work with the array?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4
  2006-08-12  9:13 ` Justin Piszcz
@ 2006-08-12 11:59   ` Michael Tokarev
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Tokarev @ 2006-08-12 11:59 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Chuck Ebbert, linux-raid, Neil Brown, linux-kernel

Justin Piszcz wrote:
> Is there a doc for all of the options you can echo into the sync_action?
> I'm assuming mdadm does these as well and echo is just another way to
> run work with the array?

How about the obvious, Documentation/md.txt ?

And no, mdadm does not perform or trigger data integrity checking.

/mjt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4
  2006-08-12  6:49 [bug?] raid1 integrity checking is broken on 2.6.18-rc4 Chuck Ebbert
  2006-08-12  9:13 ` Justin Piszcz
@ 2006-08-14  6:14 ` Neil Brown
  1 sibling, 0 replies; 7+ messages in thread
From: Neil Brown @ 2006-08-14  6:14 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-raid, linux-kernel

On Saturday August 12, 76306.1226@compuserve.com wrote:
> Doing this on a raid1 array:
>         echo "check" >/sys/block/md0/md/sync_action
> 
> On 2.6.16.27:
>         Activity lights on both mirrors show activity for a while,
>         then the array status prints on the console.
> 
> On 2.6.18-rc4 + the below patch:
>         Drive activity light blinks once on one drive, then the
>         array status prints (obviously no checking takes place.)
> 

Thanks for the report.
Easily duplicated, easily fixed.
I'll make sure this patch gets into 2.6.18.

Thanks again,
NeilBrown

Signed-off-by: Neil Brown <neilb@suse.de>

diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
--- .prev/drivers/md/raid1.c	2006-07-31 17:24:36.000000000 +1000
+++ ./drivers/md/raid1.c	2006-08-14 15:52:48.000000000 +1000
@@ -1644,15 +1644,16 @@ static sector_t sync_request(mddev_t *md
 		return 0;
 	}
 
-	/* before building a request, check if we can skip these blocks..
-	 * This call the bitmap_start_sync doesn't actually record anything
-	 */
 	if (mddev->bitmap == NULL &&
 	    mddev->recovery_cp == MaxSector &&
+	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
 	    conf->fullsync == 0) {
 		*skipped = 1;
 		return max_sector - sector_nr;
 	}
+	/* before building a request, check if we can skip these blocks..
+	 * This call the bitmap_start_sync doesn't actually record anything
+	 */
 	if (!bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 1) &&
 	    !conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
 		/* We can skip this block, and probably several more */


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4
@ 2006-08-17 20:03 Chuck Ebbert
  2006-08-28  3:47 ` Neil Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Ebbert @ 2006-08-17 20:03 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel, linux-raid

In-Reply-To: <17632.5294.559058.66914@cse.unsw.edu.au>

On Mon, 14 Aug 2006 16:14:06 +1000, Neil Brown wrote:

> > On 2.6.18-rc4 + the below patch:
> >         Drive activity light blinks once on one drive, then the
> >         array status prints (obviously no checking takes place.)
> > 
> 
> Thanks for the report.
> Easily duplicated, easily fixed.
> I'll make sure this patch gets into 2.6.18.
> 
> Thanks again,
> NeilBrown
> 

I just tried the patch and now it seems to be syncing the drives instead
of only checking them?  (At the very least the message is misleading.)

 # echo "check" >/sys/block/md0/md/sync_action
 # dmesg | tail -9
 md: syncing RAID array md0
 md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
 md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
 md: using 128k window, over a total of 104256 blocks.
 md: md0: sync done.
 RAID1 conf printout:
  --- wd:2 rd:2
  disk 0, wo:0, o:1, dev:hda9
  disk 1, wo:0, o:1, dev:sda5


> Signed-off-by: Neil Brown <neilb@suse.de>
> 
> diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
> --- .prev/drivers/md/raid1.c  2006-07-31 17:24:36.000000000 +1000
> +++ ./drivers/md/raid1.c      2006-08-14 15:52:48.000000000 +1000
> @@ -1644,15 +1644,16 @@ static sector_t sync_request(mddev_t *md
>               return 0;
>       }
>  
> -     /* before building a request, check if we can skip these blocks..
> -      * This call the bitmap_start_sync doesn't actually record anything
> -      */
>       if (mddev->bitmap == NULL &&
>           mddev->recovery_cp == MaxSector &&
> +         !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
>           conf->fullsync == 0) {
>               *skipped = 1;
>               return max_sector - sector_nr;
>       }
> +     /* before building a request, check if we can skip these blocks..
> +      * This call the bitmap_start_sync doesn't actually record anything
> +      */
>       if (!bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 1) &&
>           !conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
>               /* We can skip this block, and probably several more */

-- 
Chuck

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4
@ 2006-08-18 16:38 raid
  0 siblings, 0 replies; 7+ messages in thread
From: raid @ 2006-08-18 16:38 UTC (permalink / raw)
  To: linux-raid

Neil introduced read-checking into 2.6.16. In versions prior, mirror copies were overwritten instead of checked.

I'm running 2.6.17rc4:
# echo "check" > /sys/block/md0/md/sync_action
# dmesg
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 15542784 blocks.

# dstat -d -D sda,sdg
--disk/sda----disk/sdg-
_read write:_read write
  81k   30k:  81k   30k
  58M    0 :  58M    0
  58M    0 :  57M    0
  57M    0 :  58M    0
  58M    0 :  57M    0

Although the message uses the word "reconscruction," the drives are being checked for consistancy.



-----------------
I just tried the patch and now it seems to be syncing the drives instead
of only checking them?  (At the very least the message is misleading.)

 # echo "check" >/sys/block/md0/md/sync_action
 # dmesg | tail -9
 md: syncing RAID array md0
 md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
 md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
 md: using 128k window, over a total of 104256 blocks.
 md: md0: sync done.
 RAID1 conf printout:
  --- wd:2 rd:2
  disk 0, wo:0, o:1, dev:hda9
  disk 1, wo:0, o:1, dev:sda5

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4
  2006-08-17 20:03 Chuck Ebbert
@ 2006-08-28  3:47 ` Neil Brown
  0 siblings, 0 replies; 7+ messages in thread
From: Neil Brown @ 2006-08-28  3:47 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel, linux-raid

On Thursday August 17, 76306.1226@compuserve.com wrote:
> 
> I just tried the patch and now it seems to be syncing the drives instead
> of only checking them?  (At the very least the message is misleading.)
> 

Yes, the message is misleading.  I should fix that.

NeilBrown


>  # echo "check" >/sys/block/md0/md/sync_action
>  # dmesg | tail -9
>  md: syncing RAID array md0
>  md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
>  md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
>  md: using 128k window, over a total of 104256 blocks.
>  md: md0: sync done.
>  RAID1 conf printout:
>   --- wd:2 rd:2
>   disk 0, wo:0, o:1, dev:hda9
>   disk 1, wo:0, o:1, dev:sda5
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-08-28  3:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-12  6:49 [bug?] raid1 integrity checking is broken on 2.6.18-rc4 Chuck Ebbert
2006-08-12  9:13 ` Justin Piszcz
2006-08-12 11:59   ` Michael Tokarev
2006-08-14  6:14 ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2006-08-17 20:03 Chuck Ebbert
2006-08-28  3:47 ` Neil Brown
2006-08-18 16:38 raid

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).