RAID1 recovery fails with 2.6 kernel

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID1 recovery fails with 2.6 kernel
@ 2003-10-19 14:27 Dick Streefland
  2003-10-20  6:15 ` Neil Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Dick Streefland @ 2003-10-19 14:27 UTC (permalink / raw)
  To: linux-raid

When I mark one of the devices in a RAID1 array faulty, remove it, and
re-add it, recovery starts, but stops too early, leaving the array in
degraded mode. I'me seeing this in the latest 2.6.0 kernels, including
2.6.0-test8. The 2.4.22 kernel works OK, although the array is marked
"dirty" (?). Below is a script to reproduce the problem, followed by
the output of the script. I'm using mdadm-1.3.0.

# cat raid-recovery
#!/bin/sh

dd bs=1024k count=20 if=/dev/zero of=/tmp/img1 2> /dev/null
dd bs=1024k count=20 if=/dev/zero of=/tmp/img2 2> /dev/null
sync
losetup /dev/loop1 /tmp/img1
losetup /dev/loop2 /tmp/img2
sync
mdadm -C -n 2 -l 1 /dev/md0 /dev/loop1 /dev/loop2
sleep 1
while grep resync /proc/mdstat; do sleep 1; done
sleep 1
cat /proc/mdstat
mdadm -QD /dev/md0

mdadm /dev/md0 -f /dev/loop2
mdadm /dev/md0 -r /dev/loop2
mdadm -QD /dev/md0

mdadm /dev/md0 -a /dev/loop2
while grep recovery /proc/mdstat; do sleep 1; done
sleep 1
cat /proc/mdstat
mdadm -QD /dev/md0

mdadm -S /dev/md0
losetup -d /dev/loop1
losetup -d /dev/loop2
rm /tmp/img1
rm /tmp/img2
# ./raid-recovery 
mdadm: array /dev/md0 started.
      [=>...................]  resync =  5.0% (1280/20416) finish=0.2min speed=1280K/sec
      [==>..................]  resync = 10.0% (2176/20416) finish=0.2min speed=1088K/sec
      [===>.................]  resync = 15.0% (3200/20416) finish=0.2min speed=1066K/sec
      [====>................]  resync = 20.0% (4224/20416) finish=0.2min speed=1056K/sec
      [=====>...............]  resync = 25.0% (5248/20416) finish=0.2min speed=1049K/sec
      [======>..............]  resync = 30.0% (6144/20416) finish=0.2min speed=1024K/sec
      [=======>.............]  resync = 35.0% (8064/20416) finish=0.1min speed=1152K/sec
      [========>............]  resync = 40.0% (9088/20416) finish=0.1min speed=1136K/sec
      [=========>...........]  resync = 45.0% (10112/20416) finish=0.1min speed=1123K/sec
      [==========>..........]  resync = 50.0% (11008/20416) finish=0.1min speed=1100K/sec
      [===========>.........]  resync = 55.0% (12032/20416) finish=0.1min speed=1093K/sec
      [============>........]  resync = 60.0% (13056/20416) finish=0.1min speed=1088K/sec
      [=============>.......]  resync = 65.0% (14080/20416) finish=0.0min speed=1083K/sec
      [==============>......]  resync = 70.0% (15104/20416) finish=0.0min speed=1078K/sec
      [===============>.....]  resync = 75.0% (16000/20416) finish=0.0min speed=1066K/sec
      [================>....]  resync = 80.0% (17024/20416) finish=0.0min speed=1064K/sec
      [=================>...]  resync = 85.0% (18048/20416) finish=0.0min speed=1061K/sec
      [==================>..]  resync = 90.0% (19072/20416) finish=0.0min speed=1059K/sec
      [===================>.]  resync = 95.0% (20096/20416) finish=0.0min speed=1057K/sec
Personalities : [raid1] 
md0 : active raid1 loop2[1] loop1[0]
      20416 blocks [2/2] [UU]
      
unused devices: <none>
/dev/md0:
        Version : 00.90.01
  Creation Time : Sun Oct 19 15:53:13 2003
     Raid Level : raid1
     Array Size : 20416 (19.94 MiB 20.91 MB)
    Device Size : 20416 (19.94 MiB 20.91 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Oct 19 15:53:34 2003
          State : clean, no-errors
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0


    Number   Major   Minor   RaidDevice State
       0       7        1        0      active sync   /dev/loop1
       1       7        2        1      active sync   /dev/loop2
           UUID : 102b76c8:1af754b8:5c8d47a0:fe849836
         Events : 0.1
mdadm: set device faulty failed for /dev/loop2:  Success
mdadm: hot removed /dev/loop2
/dev/md0:
        Version : 00.90.01
  Creation Time : Sun Oct 19 15:53:13 2003
     Raid Level : raid1
     Array Size : 20416 (19.94 MiB 20.91 MB)
    Device Size : 20416 (19.94 MiB 20.91 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Oct 19 15:53:35 2003
          State : clean, no-errors
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0


    Number   Major   Minor   RaidDevice State
       0       7        1        0      active sync   /dev/loop1
       1       0        0       -1      removed
           UUID : 102b76c8:1af754b8:5c8d47a0:fe849836
         Events : 0.3
mdadm: hot added /dev/loop2
      [=>...................]  recovery =  5.0% (1024/20416) finish=0.2min speed=1024K/sec
      [==>..................]  recovery = 10.0% (2048/20416) finish=0.1min speed=2048K/sec
Personalities : [raid1] 
md0 : active raid1 loop2[2] loop1[0]
      20416 blocks [2/1] [U_]
      
unused devices: <none>
/dev/md0:
        Version : 00.90.01
  Creation Time : Sun Oct 19 15:53:13 2003
     Raid Level : raid1
     Array Size : 20416 (19.94 MiB 20.91 MB)
    Device Size : 20416 (19.94 MiB 20.91 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Oct 19 15:53:37 2003
          State : clean, no-errors
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1


    Number   Major   Minor   RaidDevice State
       0       7        1        0      active sync   /dev/loop1
       1       0        0       -1      removed
       2       7        2        1      spare   /dev/loop2
           UUID : 102b76c8:1af754b8:5c8d47a0:fe849836
         Events : 0.5

-- 
Dick Streefland                    ////               De Bilt
dick.streefland@xs4all.nl         (@ @)       The Netherlands
------------------------------oOO--(_)--OOo------------------


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID1 recovery fails with 2.6 kernel
  2003-10-19 14:27 RAID1 recovery fails with 2.6 kernel Dick Streefland
@ 2003-10-20  6:15 ` Neil Brown
  2003-10-20  8:43   ` Dick Streefland
  2003-10-22 22:54   ` Kernel OOps: bad magic 0 while RAID5 resync operation Bo Moon
  0 siblings, 2 replies; 7+ messages in thread
From: Neil Brown @ 2003-10-20  6:15 UTC (permalink / raw)
  To: Dick Streefland; +Cc: linux-raid

On Sunday October 19, spam@streefland.xs4all.nl wrote:
> When I mark one of the devices in a RAID1 array faulty, remove it, and
> re-add it, recovery starts, but stops too early, leaving the array in
> degraded mode. I'me seeing this in the latest 2.6.0 kernels, including
> 2.6.0-test8. The 2.4.22 kernel works OK, although the array is marked
> "dirty" (?). Below is a script to reproduce the problem, followed by
> the output of the script. I'm using mdadm-1.3.0.

Thanks for providing a script...
It works fine for me (2.6.0-test8).

I don't suppose there is anything in the kernel logs about write
errors on loop2 ???

Does it fail consistently for you, or only occasionally?

NeilBrown

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID1 recovery fails with 2.6 kernel
  2003-10-20  6:15 ` Neil Brown
@ 2003-10-20  8:43   ` Dick Streefland
  2003-10-22 17:43     ` Mike Tran
  2003-10-22 22:54   ` Kernel OOps: bad magic 0 while RAID5 resync operation Bo Moon
  1 sibling, 1 reply; 7+ messages in thread
From: Dick Streefland @ 2003-10-20  8:43 UTC (permalink / raw)
  To: linux-raid

Neil Brown <neilb@cse.unsw.edu.au> wrote:
| Thanks for providing a script...
| It works fine for me (2.6.0-test8).
| 
| I don't suppose there is anything in the kernel logs about write
| errors on loop2 ???

No, there was nothing unusual in the log files. I have no access to
the test machine at the moment, but there is a message when the
recovery starts, and a few seconds later the message "sync done".

| Does it fail consistently for you, or only occasionally?

It fails every time. This test was on an dual PIII 450 system, but it
also fails on a VIA C6 system with the 2.6.0-test5 kernel. Both
kernels are compiled without CONFIG_PREEMPT, because I had other
problems that might be related to this option:

  http://www.spinics.net/lists/raid/msg03507.html

Could this be related to CONFIG_DM_IOCTL_V4? I was not sure about this
option, and have not enabled it. Otherwise, I think it is time to put
in some printk's. Do you have suggestions where to start looking?

-- 
Dick Streefland                      ////                      Altium BV
dick.streefland@altium.nl           (@ @)          http://www.altium.com
--------------------------------oOO--(_)--OOo---------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID1 recovery fails with 2.6 kernel
  2003-10-20  8:43   ` Dick Streefland
@ 2003-10-22 17:43     ` Mike Tran
  2003-10-22 18:59       ` Dick Streefland
  2003-10-26 20:36       ` Dick Streefland
  0 siblings, 2 replies; 7+ messages in thread
From: Mike Tran @ 2003-10-22 17:43 UTC (permalink / raw)
  To: Dick Streefland; +Cc: linux-raid

On Mon, 2003-10-20 at 03:43, Dick Streefland wrote:
> Neil Brown <neilb@cse.unsw.edu.au> wrote:
> | Thanks for providing a script...
> | It works fine for me (2.6.0-test8).
> | 
> | I don't suppose there is anything in the kernel logs about write
> | errors on loop2 ???
> 
> No, there was nothing unusual in the log files. I have no access to
> the test machine at the moment, but there is a message when the
> recovery starts, and a few seconds later the message "sync done".
> 
> | Does it fail consistently for you, or only occasionally?
> 
> It fails every time. This test was on an dual PIII 450 system, but it
> also fails on a VIA C6 system with the 2.6.0-test5 kernel. Both
> kernels are compiled without CONFIG_PREEMPT, because I had other
> problems that might be related to this option:
> 
>   http://www.spinics.net/lists/raid/msg03507.html
> 
> Could this be related to CONFIG_DM_IOCTL_V4? I was not sure about this
> option, and have not enabled it. Otherwise, I think it is time to put
> in some printk's. Do you have suggestions where to start looking?

I have been experiencing the same problem on my test machine.  I found
out that the resync terminated early because of MD_RECOVERY_ER R bit set
by raid1's sync_write_request().  I don't understand why it fails the
sync when all the writes already completed successfully and quickly.  If
there is a need to check for "nowhere to write this to" as in 2.4.x
kernel, I think we need a different check.

The following patch for 2.6.0-test8 kernel seems to fix it.

--- a/raid1.c   2003-10-17 16:43:14.000000000 -0500
+++ b/raid1.c   2003-10-22 11:57:59.350900256 -0500
@@ -841,7 +841,7 @@
        }
 
        if (atomic_dec_and_test(&r1_bio->remaining)) {
-               md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9,
0);
+               md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9,
1);
                put_buf(r1_bio);
        }
 }







^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID1 recovery fails with 2.6 kernel
  2003-10-22 17:43     ` Mike Tran
@ 2003-10-22 18:59       ` Dick Streefland
  2003-10-26 20:36       ` Dick Streefland
  1 sibling, 0 replies; 7+ messages in thread
From: Dick Streefland @ 2003-10-22 18:59 UTC (permalink / raw)
  To: linux-raid

Mike Tran <mhtran@us.ibm.com> wrote:
| I have been experiencing the same problem on my test machine.  I found
| out that the resync terminated early because of MD_RECOVERY_ER R bit set
| by raid1's sync_write_request().  I don't understand why it fails the
| sync when all the writes already completed successfully and quickly.  If
| there is a need to check for "nowhere to write this to" as in 2.4.x
| kernel, I think we need a different check.
| 
| The following patch for 2.6.0-test8 kernel seems to fix it.
| 
| --- a/raid1.c   2003-10-17 16:43:14.000000000 -0500
| +++ b/raid1.c   2003-10-22 11:57:59.350900256 -0500
| @@ -841,7 +841,7 @@
|         }
|  
|         if (atomic_dec_and_test(&r1_bio->remaining)) {
| -               md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, 0);
| +               md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, 1);
|                 put_buf(r1_bio);
|         }
|  }

This is exactly the spot where I interrupted my investigations last
night to get some sleep. I can confirm that your patch fixes the
problem. Thanks!

-- 
Dick Streefland                    ////               De Bilt
dick.streefland@xs4all.nl         (@ @)       The Netherlands
------------------------------oOO--(_)--OOo------------------


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Kernel OOps: bad magic 0 while RAID5 resync operation
  2003-10-20  6:15 ` Neil Brown
  2003-10-20  8:43   ` Dick Streefland
@ 2003-10-22 22:54   ` Bo Moon
  1 sibling, 0 replies; 7+ messages in thread
From: Bo Moon @ 2003-10-22 22:54 UTC (permalink / raw)
  To: linux-raid

Hello!

Anyone has this OOps before? Is it harmful?

I am using debian arm linux 2.4.20.

Thanks in advance,



Bo

-------------------------- console log--------------------------------------

PRAETORIAN:/home/bmoon# cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid5]

read_ahead 4096 sectors

md0 : active raid5 hde4[3] hdc4[1] hda4[0]

20479360 blocks chunks 64k algorithm 2 [3/2] [UU_]

[=================>...] recovery = 86.2% (8838224/10239680) finish=5.4min
speed=4272K/sec

unused devices: <none>

-----------------------------------------------------Kernel
OOPS------------------------------------------------------------------------
----

PRAETORIAN:/home/bmoon# bad magic 0 (should be 304a03c), <2>kernel BUG at
/home/bo/queengate2/linux/include/linux/wait.h:229!

Unable to handle kernel NULL pointer dereference at virtual address 00000000

mm = c000e78c pgd = c37a8000

*pgd = c37ac001, *pmd = c37ac001, *pte = 00000000, *ppte = 00000000

Internal error: Oops: ffffffff

CPU: 0

pc : [<c0025844>] lr : [<c002dbf4>] Not tainted

sp : c2a1fec4 ip : c2a1fe74 fp : c2a1fed4

r10: 00000009 r9 : 00000104 r8 : c2a1ff64

r7 : 00000000 r6 : 00000100 r5 : a0000013 r4 : 00000000

r3 : 00000000 r2 : c38b3f38 r1 : c38b3f38 r0 : 00000001

Flags: nZCv IRQs off FIQs on Mode SVC_32 Segment user

Control: C37AB17F Table: C37AB17F DAC: 00000015

Process winbindd (pid: 527, stack limit = 0xc2a1e37c)

Stack: (0xc2a1fec4 to 0xc2a20000)

fec0: 0304a02c c2a1feec c2a1fed8 c002b718 c002580c 0304a028 c088a000

fee0: c2a1ff04 c2a1fef0 c005eed8 c002b6c8 00000000 c2b0b734 c2a1ff50
c2a1ff08

ff00: c005f308 c005eeb8 c2a1e000 c2a1ff20 00000000 00000000 c2a1ff60
00000009

ff20: 00000000 c088a000 00000004 00000001 c2a1e000 00000004 c1faa924
00000009

ff40: bfffe9bc c2a1ffa4 c2a1ff54 c005f648 c005f0e8 00000028 00000000
00000000

ff60: 00000009 c1faa924 c1faa928 c1faa92c c1faa930 c1faa934 c1faa938
bfffe8b4

ff80: bfffe9bc 00000009 0000008e c00206e4 c2a1e000 00000000 00000000
c2a1ffa8

ffa0: c0020540 c005f35c bfffe8b4 c00299ec 00000009 bfffe9bc 00000000
00000000

ffc0: bfffe8b4 bfffe9bc 00000009 00000000 00000000 00000000 00000000
bfffe9bc

ffe0: 0010c4d8 bfffe804 000591b0 4014482c a0000010 00000009 00000000
00000000

Backtrace:

Function entered at [<c0025800>] from [<c002b718>]

r4 = 0304A02C

Function entered at [<c002b6bc>] from [<c005eed8>]

r5 = C088A000 r4 = 0304A028

Function entered at [<c005eeac>] from [<c005f308>]

r5 = C2B0B734 r4 = 00000000

Function entered at [<c005f0dc>] from [<c005f648>]

Function entered at [<c005f350>] from [<c0020540>]

Code: eb002013 e59f0014 eb002011 e3a03000 (e5833000)

md: hde4 [events: 00000002](write) hde4's sb offset: 10249344

md: hdc4 [events: 00000002](write) hdc4's sb offset: 10239680

md: hda4 [events: 00000002](write) hda4's sb offset: 10484096

----------------------------------------------------------------------------
--------after OOPs----------------------------------

PRAETORIAN:/home/bmoon# cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid5]

read_ahead 4096 sectors

md0 : active raid5 hde4[2] hdc4[1] hda4[0]

20479360 blocks chunks 64k algorithm 2 [3/3] [UUU]


unused devices: <none>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID1 recovery fails with 2.6 kernel
  2003-10-22 17:43     ` Mike Tran
  2003-10-22 18:59       ` Dick Streefland
@ 2003-10-26 20:36       ` Dick Streefland
  1 sibling, 0 replies; 7+ messages in thread
From: Dick Streefland @ 2003-10-26 20:36 UTC (permalink / raw)
  To: linux-raid

Mike Tran <mhtran@us.ibm.com> wrote:
| I have been experiencing the same problem on my test machine.  I found
| out that the resync terminated early because of MD_RECOVERY_ER R bit set
| by raid1's sync_write_request().  I don't understand why it fails the
| sync when all the writes already completed successfully and quickly.  If
| there is a need to check for "nowhere to write this to" as in 2.4.x
| kernel, I think we need a different check.
| 
| The following patch for 2.6.0-test8 kernel seems to fix it.
| 
| --- a/raid1.c   2003-10-17 16:43:14.000000000 -0500
| +++ b/raid1.c   2003-10-22 11:57:59.350900256 -0500
| @@ -841,7 +841,7 @@
|         }
|  
|         if (atomic_dec_and_test(&r1_bio->remaining)) {
| -               md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, 0);
| +               md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, 1);
|                 put_buf(r1_bio);
|         }
|  }

Has this patch been forwarded to Linus already? It would be nice to
have this fixed before the final 2.6.0 is released.

-- 
Dick Streefland                    ////               De Bilt
dick.streefland@xs4all.nl         (@ @)       The Netherlands
------------------------------oOO--(_)--OOo------------------


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-10-26 20:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-19 14:27 RAID1 recovery fails with 2.6 kernel Dick Streefland
2003-10-20  6:15 ` Neil Brown
2003-10-20  8:43   ` Dick Streefland
2003-10-22 17:43     ` Mike Tran
2003-10-22 18:59       ` Dick Streefland
2003-10-26 20:36       ` Dick Streefland
2003-10-22 22:54   ` Kernel OOps: bad magic 0 while RAID5 resync operation Bo Moon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).