Raid1 element stuck in (S) state

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Raid1 element stuck in (S) state
@ 2014-10-27 14:18 micah anderson
  2014-10-27 20:57 ` Joe Lawrence
  2014-10-28 21:42 ` NeilBrown
  0 siblings, 2 replies; 9+ messages in thread
From: micah anderson @ 2014-10-27 14:18 UTC (permalink / raw)
  To: linux-raid


Hi,

i've got a raid1 setup, where one drive died, it was replaced with a new
one, but its stuck in a (S) state and I can't seem to get it added into
the array, /proc/mdstat looks like this:

md3 : active raid1 sdc1[2](S) sdd1[1]
      976759672 blocks super 1.2 [2/1] [_U]

where sdc1 is the replaced drive.

What is the right way to get this added back?

thanks!
micah


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 element stuck in (S) state
  2014-10-27 14:18 Raid1 element stuck in (S) state micah anderson
@ 2014-10-27 20:57 ` Joe Lawrence
  2014-10-28  4:45   ` micah
  2014-10-28 21:42 ` NeilBrown
  1 sibling, 1 reply; 9+ messages in thread
From: Joe Lawrence @ 2014-10-27 20:57 UTC (permalink / raw)
  To: micah anderson; +Cc: linux-raid

On Mon, 27 Oct 2014 10:18:47 -0400
micah anderson <micah@debian.org> wrote:

> 
> Hi,
> 
> i've got a raid1 setup, where one drive died, it was replaced with a new
> one, but its stuck in a (S) state and I can't seem to get it added into
> the array, /proc/mdstat looks like this:
> 
> md3 : active raid1 sdc1[2](S) sdd1[1]
>       976759672 blocks super 1.2 [2/1] [_U]
> 
> where sdc1 is the replaced drive.

Hi Micah,

What does the output from mdadm --detail /dev/md3 look like?

-- Joe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 element stuck in (S) state
  2014-10-27 20:57 ` Joe Lawrence
@ 2014-10-28  4:45   ` micah
  0 siblings, 0 replies; 9+ messages in thread
From: micah @ 2014-10-28  4:45 UTC (permalink / raw)
  To: Joe Lawrence, micah anderson; +Cc: linux-raid


Hi Joe,

Joe Lawrence <joe.lawrence@stratus.com> writes:

> On Mon, 27 Oct 2014 10:18:47 -0400
> micah anderson <micah@debian.org> wrote:
>
>> 
>> Hi,
>> 
>> i've got a raid1 setup, where one drive died, it was replaced with a new
>> one, but its stuck in a (S) state and I can't seem to get it added into
>> the array, /proc/mdstat looks like this:
>> 
>> md3 : active raid1 sdc1[2](S) sdd1[1]
>>       976759672 blocks super 1.2 [2/1] [_U]
>> 
>> where sdc1 is the replaced drive.
>
> Hi Micah,
>
> What does the output from mdadm --detail /dev/md3 look like?

# mdadm --detail /dev/md3
/dev/md3:
        Version : 1.2
  Creation Time : Fri Oct 21 12:22:03 2011
     Raid Level : raid1
     Array Size : 976759672 (931.51 GiB 1000.20 GB)
  Used Dev Size : 976759672 (931.51 GiB 1000.20 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Mon Oct 27 21:45:01 2014
          State : clean, degraded
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

           Name : unassigned-hostname:3
           UUID : 736c4da2:1e53a976:6b0ff39a:b0ca93c2
         Events : 2459508

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       49        1      active sync   /dev/sdd1

       2       8       33        -      spare   /dev/sdc1
# 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 element stuck in (S) state
  2014-10-27 14:18 Raid1 element stuck in (S) state micah anderson
  2014-10-27 20:57 ` Joe Lawrence
@ 2014-10-28 21:42 ` NeilBrown
  2014-10-29 14:03   ` micah
  1 sibling, 1 reply; 9+ messages in thread
From: NeilBrown @ 2014-10-28 21:42 UTC (permalink / raw)
  To: micah anderson; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 660 bytes --]

On Mon, 27 Oct 2014 10:18:47 -0400 micah anderson <micah@debian.org> wrote:

> 
> Hi,
> 
> i've got a raid1 setup, where one drive died, it was replaced with a new
> one, but its stuck in a (S) state and I can't seem to get it added into
> the array, /proc/mdstat looks like this:
> 
> md3 : active raid1 sdc1[2](S) sdd1[1]
>       976759672 blocks super 1.2 [2/1] [_U]
> 
> where sdc1 is the replaced drive.
> 
> What is the right way to get this added back?
>

I've a feeling this bug might have been fixed.
What versions of mdadm and Linux are you using?

Are there any errors in the kernel logs when you --add the device?

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 element stuck in (S) state
  2014-10-28 21:42 ` NeilBrown
@ 2014-10-29 14:03   ` micah
  2014-10-29 20:10     ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: micah @ 2014-10-29 14:03 UTC (permalink / raw)
  To: NeilBrown, micah anderson; +Cc: linux-raid

NeilBrown <neilb@suse.de> writes:

> On Mon, 27 Oct 2014 10:18:47 -0400 micah anderson <micah@debian.org> wrote:
>
>> 
>> Hi,
>> 
>> i've got a raid1 setup, where one drive died, it was replaced with a new
>> one, but its stuck in a (S) state and I can't seem to get it added into
>> the array, /proc/mdstat looks like this:
>> 
>> md3 : active raid1 sdc1[2](S) sdd1[1]
>>       976759672 blocks super 1.2 [2/1] [_U]
>> 
>> where sdc1 is the replaced drive.
>> 
>> What is the right way to get this added back?
>>
>
> I've a feeling this bug might have been fixed.
> What versions of mdadm and Linux are you using?

I'm using squeeze here, and had 3.1.4-1+8efb9d1+squeeze1 installed, I
just installed the backport, which is 3.2.5-3~bpo60+1.

> Are there any errors in the kernel logs when you --add the device?

After installing the backported 3.2.5, I tried to add it, and it said:

# mdadm --add /dev/md3 /dev/sdc1
mdadm: Cannot open /dev/sdc1: Device or resource busy

so I did a --remove of the drive, and then added it, it then proceeded
to sync the array, and after that finished, it is now back in the (S)
state.

Can I just zero the superblock of that device and re-add it in order to
resolve this?

thanks!
micah

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 element stuck in (S) state
  2014-10-29 14:03   ` micah
@ 2014-10-29 20:10     ` NeilBrown
  2014-10-29 21:32       ` micah
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2014-10-29 20:10 UTC (permalink / raw)
  To: micah; +Cc: micah anderson, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1796 bytes --]

On Wed, 29 Oct 2014 10:03:16 -0400 micah <micah@riseup.net> wrote:

> NeilBrown <neilb@suse.de> writes:
> 
> > On Mon, 27 Oct 2014 10:18:47 -0400 micah anderson <micah@debian.org> wrote:
> >
> >> 
> >> Hi,
> >> 
> >> i've got a raid1 setup, where one drive died, it was replaced with a new
> >> one, but its stuck in a (S) state and I can't seem to get it added into
> >> the array, /proc/mdstat looks like this:
> >> 
> >> md3 : active raid1 sdc1[2](S) sdd1[1]
> >>       976759672 blocks super 1.2 [2/1] [_U]
> >> 
> >> where sdc1 is the replaced drive.
> >> 
> >> What is the right way to get this added back?
> >>
> >
> > I've a feeling this bug might have been fixed.
> > What versions of mdadm and Linux are you using?
> 
> I'm using squeeze here, and had 3.1.4-1+8efb9d1+squeeze1 installed, I
> just installed the backport, which is 3.2.5-3~bpo60+1.

Is assume that is the version of mdadm.  You didn't say what version of Linux.


> 
> > Are there any errors in the kernel logs when you --add the device?

You didn't answer this question either.  Are there any messages in the
kernel log: /var/log/kern.log on debian.
Or in the output of "dmesg".

> 
> After installing the backported 3.2.5, I tried to add it, and it said:
> 
> # mdadm --add /dev/md3 /dev/sdc1
> mdadm: Cannot open /dev/sdc1: Device or resource busy
> 
> so I did a --remove of the drive, and then added it, it then proceeded
> to sync the array, and after that finished, it is now back in the (S)
> state.
> 
> Can I just zero the superblock of that device and re-add it in order to
> resolve this?


If it resyncs and the is still spare, there was almost certainly some sort of
failure.  There really must be something in the kernel logs at that time.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 element stuck in (S) state
  2014-10-29 20:10     ` NeilBrown
@ 2014-10-29 21:32       ` micah
  2014-10-29 22:47         ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: micah @ 2014-10-29 21:32 UTC (permalink / raw)
  To: NeilBrown; +Cc: micah anderson, linux-raid

NeilBrown <neilb@suse.de> writes:

> On Wed, 29 Oct 2014 10:03:16 -0400 micah <micah@riseup.net> wrote:
>
>> NeilBrown <neilb@suse.de> writes:
>> 
>> > On Mon, 27 Oct 2014 10:18:47 -0400 micah anderson <micah@debian.org> wrote:
>> >
>> >> 
>> >> Hi,
>> >> 
>> >> i've got a raid1 setup, where one drive died, it was replaced with a new
>> >> one, but its stuck in a (S) state and I can't seem to get it added into
>> >> the array, /proc/mdstat looks like this:
>> >> 
>> >> md3 : active raid1 sdc1[2](S) sdd1[1]
>> >>       976759672 blocks super 1.2 [2/1] [_U]
>> >> 
>> >> where sdc1 is the replaced drive.
>> >> 
>> >> What is the right way to get this added back?
>> >>
>> >
>> > I've a feeling this bug might have been fixed.
>> > What versions of mdadm and Linux are you using?
>> 
>> I'm using squeeze here, and had 3.1.4-1+8efb9d1+squeeze1 installed, I
>> just installed the backport, which is 3.2.5-3~bpo60+1.
>
> Is assume that is the version of mdadm.  You didn't say what version of Linux.

Yes, that is the version of mdadm. I am running squeeze, which is a
2.6.32-5 version of the kernel, and it is an amd64 machine.

>> > Are there any errors in the kernel logs when you --add the device?
>
> You didn't answer this question either.  Are there any messages in the
> kernel log: /var/log/kern.log on debian.
> Or in the output of "dmesg".

The only thing I see in the log is:

[307932.328420] mdadm: sending ioctl 1261 to a partition!
[307932.328425] mdadm: sending ioctl 1261 to a partition!
[307932.346642] mdadm: sending ioctl 1261 to a partition!
[307932.346648] mdadm: sending ioctl 1261 to a partition!
[307932.352466] mdadm: sending ioctl 1261 to a partition!
[307932.352468] mdadm: sending ioctl 1261 to a partition!
[307932.376821] mdadm: sending ioctl 1261 to a partition!
[307932.376824] mdadm: sending ioctl 1261 to a partition!
[307932.377623] mdadm: sending ioctl 1261 to a partition!
[307932.377630] mdadm: sending ioctl 1261 to a partition!
[307932.467292] md: bind<sdc1>
[307932.588154] RAID1 conf printout:
[307932.588159]  --- wd:1 rd:2
[307932.588164]  disk 0, wo:1, o:1, dev:sdc1
[307932.588167]  disk 1, wo:0, o:1, dev:sdd1
[307932.588248] md: recovery of RAID array md3
[307932.588251] md: minimum _guaranteed_  speed: 50000 KB/sec/disk.
[307932.588254] md: using maximum available idle IO bandwidth (but not more than 2000000 KB/sec) for recovery.
[307932.588260] md: using 128k window, over a total of 976759672 blocks.

but this is just when the device is added, after that it appears that
logrotation failed and I have a zero byte kern.log, and firewall spew
has filled up my dmesg ring.

>> Can I just zero the superblock of that device and re-add it in order to
>> resolve this?
>
>
> If it resyncs and the is still spare, there was almost certainly some sort of
> failure.  There really must be something in the kernel logs at that time.

It did resync, and is still a spare.... Now that I've fixed the logs,
I'm going to try it again to see if there is any error that happens
after the sync finishes.

micah

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 element stuck in (S) state
  2014-10-29 21:32       ` micah
@ 2014-10-29 22:47         ` NeilBrown
  2014-11-02 15:45           ` micah
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2014-10-29 22:47 UTC (permalink / raw)
  To: micah; +Cc: micah anderson, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3757 bytes --]

On Wed, 29 Oct 2014 17:32:43 -0400 micah <micah@riseup.net> wrote:

> NeilBrown <neilb@suse.de> writes:
> 
> > On Wed, 29 Oct 2014 10:03:16 -0400 micah <micah@riseup.net> wrote:
> >
> >> NeilBrown <neilb@suse.de> writes:
> >> 
> >> > On Mon, 27 Oct 2014 10:18:47 -0400 micah anderson <micah@debian.org> wrote:
> >> >
> >> >> 
> >> >> Hi,
> >> >> 
> >> >> i've got a raid1 setup, where one drive died, it was replaced with a new
> >> >> one, but its stuck in a (S) state and I can't seem to get it added into
> >> >> the array, /proc/mdstat looks like this:
> >> >> 
> >> >> md3 : active raid1 sdc1[2](S) sdd1[1]
> >> >>       976759672 blocks super 1.2 [2/1] [_U]
> >> >> 
> >> >> where sdc1 is the replaced drive.
> >> >> 
> >> >> What is the right way to get this added back?
> >> >>
> >> >
> >> > I've a feeling this bug might have been fixed.
> >> > What versions of mdadm and Linux are you using?
> >> 
> >> I'm using squeeze here, and had 3.1.4-1+8efb9d1+squeeze1 installed, I
> >> just installed the backport, which is 3.2.5-3~bpo60+1.
> >
> > Is assume that is the version of mdadm.  You didn't say what version of Linux.
> 
> Yes, that is the version of mdadm. I am running squeeze, which is a
> 2.6.32-5 version of the kernel, and it is an amd64 machine.

Wow.... a 5 year old kernel.

I suspect this is a kernel bug you are hitting.  I vaguely remember something
like that - spares not becoming properly activated after recovery.
I don't remember the details and a quick look at commit logs doesn't show
anything obvious.
And maybe Debian has backported something which broke something.

Can you try a newer kernel at all?


NeilBrown


> 
> >> > Are there any errors in the kernel logs when you --add the device?
> >
> > You didn't answer this question either.  Are there any messages in the
> > kernel log: /var/log/kern.log on debian.
> > Or in the output of "dmesg".
> 
> The only thing I see in the log is:
> 
> [307932.328420] mdadm: sending ioctl 1261 to a partition!
> [307932.328425] mdadm: sending ioctl 1261 to a partition!
> [307932.346642] mdadm: sending ioctl 1261 to a partition!
> [307932.346648] mdadm: sending ioctl 1261 to a partition!
> [307932.352466] mdadm: sending ioctl 1261 to a partition!
> [307932.352468] mdadm: sending ioctl 1261 to a partition!
> [307932.376821] mdadm: sending ioctl 1261 to a partition!
> [307932.376824] mdadm: sending ioctl 1261 to a partition!
> [307932.377623] mdadm: sending ioctl 1261 to a partition!
> [307932.377630] mdadm: sending ioctl 1261 to a partition!
> [307932.467292] md: bind<sdc1>
> [307932.588154] RAID1 conf printout:
> [307932.588159]  --- wd:1 rd:2
> [307932.588164]  disk 0, wo:1, o:1, dev:sdc1
> [307932.588167]  disk 1, wo:0, o:1, dev:sdd1
> [307932.588248] md: recovery of RAID array md3
> [307932.588251] md: minimum _guaranteed_  speed: 50000 KB/sec/disk.
> [307932.588254] md: using maximum available idle IO bandwidth (but not more than 2000000 KB/sec) for recovery.
> [307932.588260] md: using 128k window, over a total of 976759672 blocks.
> 
> but this is just when the device is added, after that it appears that
> logrotation failed and I have a zero byte kern.log, and firewall spew
> has filled up my dmesg ring.
> 
> >> Can I just zero the superblock of that device and re-add it in order to
> >> resolve this?
> >
> >
> > If it resyncs and the is still spare, there was almost certainly some sort of
> > failure.  There really must be something in the kernel logs at that time.
> 
> It did resync, and is still a spare.... Now that I've fixed the logs,
> I'm going to try it again to see if there is any error that happens
> after the sync finishes.
> 
> micah


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid1 element stuck in (S) state
  2014-10-29 22:47         ` NeilBrown
@ 2014-11-02 15:45           ` micah
  0 siblings, 0 replies; 9+ messages in thread
From: micah @ 2014-11-02 15:45 UTC (permalink / raw)
  To: NeilBrown; +Cc: micah anderson, linux-raid

NeilBrown <neilb@suse.de> writes:

>> >> > Are there any errors in the kernel logs when you --add the device?
>> >
>> > You didn't answer this question either.  Are there any messages in the
>> > kernel log: /var/log/kern.log on debian.
>> > Or in the output of "dmesg".
>> 
>> The only thing I see in the log is:
>> 
>> [307932.328420] mdadm: sending ioctl 1261 to a partition!
>> [307932.328425] mdadm: sending ioctl 1261 to a partition!
>> [307932.346642] mdadm: sending ioctl 1261 to a partition!
>> [307932.346648] mdadm: sending ioctl 1261 to a partition!
>> [307932.352466] mdadm: sending ioctl 1261 to a partition!
>> [307932.352468] mdadm: sending ioctl 1261 to a partition!
>> [307932.376821] mdadm: sending ioctl 1261 to a partition!
>> [307932.376824] mdadm: sending ioctl 1261 to a partition!
>> [307932.377623] mdadm: sending ioctl 1261 to a partition!
>> [307932.377630] mdadm: sending ioctl 1261 to a partition!
>> [307932.467292] md: bind<sdc1>
>> [307932.588154] RAID1 conf printout:
>> [307932.588159]  --- wd:1 rd:2
>> [307932.588164]  disk 0, wo:1, o:1, dev:sdc1
>> [307932.588167]  disk 1, wo:0, o:1, dev:sdd1
>> [307932.588248] md: recovery of RAID array md3
>> [307932.588251] md: minimum _guaranteed_  speed: 50000 KB/sec/disk.
>> [307932.588254] md: using maximum available idle IO bandwidth (but not more than 2000000 KB/sec) for recovery.
>> [307932.588260] md: using 128k window, over a total of 976759672 blocks.
>> 
>> but this is just when the device is added, after that it appears that
>> logrotation failed and I have a zero byte kern.log, and firewall spew
>> has filled up my dmesg ring.

I fixed my logging and re-added the device, and found there was a
hardware error preventing things from syncing properly. I've resolved
that error and now things are fine. Thanks for the push to look closer
there!

micah

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-11-02 15:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-27 14:18 Raid1 element stuck in (S) state micah anderson
2014-10-27 20:57 ` Joe Lawrence
2014-10-28  4:45   ` micah
2014-10-28 21:42 ` NeilBrown
2014-10-29 14:03   ` micah
2014-10-29 20:10     ` NeilBrown
2014-10-29 21:32       ` micah
2014-10-29 22:47         ` NeilBrown
2014-11-02 15:45           ` micah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).