* Raid1 element stuck in (S) state
@ 2014-10-27 14:18 micah anderson
2014-10-27 20:57 ` Joe Lawrence
2014-10-28 21:42 ` NeilBrown
0 siblings, 2 replies; 9+ messages in thread
From: micah anderson @ 2014-10-27 14:18 UTC (permalink / raw)
To: linux-raid
Hi,
i've got a raid1 setup, where one drive died, it was replaced with a new
one, but its stuck in a (S) state and I can't seem to get it added into
the array, /proc/mdstat looks like this:
md3 : active raid1 sdc1[2](S) sdd1[1]
976759672 blocks super 1.2 [2/1] [_U]
where sdc1 is the replaced drive.
What is the right way to get this added back?
thanks!
micah
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 element stuck in (S) state
2014-10-27 14:18 Raid1 element stuck in (S) state micah anderson
@ 2014-10-27 20:57 ` Joe Lawrence
2014-10-28 4:45 ` micah
2014-10-28 21:42 ` NeilBrown
1 sibling, 1 reply; 9+ messages in thread
From: Joe Lawrence @ 2014-10-27 20:57 UTC (permalink / raw)
To: micah anderson; +Cc: linux-raid
On Mon, 27 Oct 2014 10:18:47 -0400
micah anderson <micah@debian.org> wrote:
>
> Hi,
>
> i've got a raid1 setup, where one drive died, it was replaced with a new
> one, but its stuck in a (S) state and I can't seem to get it added into
> the array, /proc/mdstat looks like this:
>
> md3 : active raid1 sdc1[2](S) sdd1[1]
> 976759672 blocks super 1.2 [2/1] [_U]
>
> where sdc1 is the replaced drive.
Hi Micah,
What does the output from mdadm --detail /dev/md3 look like?
-- Joe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 element stuck in (S) state
2014-10-27 20:57 ` Joe Lawrence
@ 2014-10-28 4:45 ` micah
0 siblings, 0 replies; 9+ messages in thread
From: micah @ 2014-10-28 4:45 UTC (permalink / raw)
To: Joe Lawrence, micah anderson; +Cc: linux-raid
Hi Joe,
Joe Lawrence <joe.lawrence@stratus.com> writes:
> On Mon, 27 Oct 2014 10:18:47 -0400
> micah anderson <micah@debian.org> wrote:
>
>>
>> Hi,
>>
>> i've got a raid1 setup, where one drive died, it was replaced with a new
>> one, but its stuck in a (S) state and I can't seem to get it added into
>> the array, /proc/mdstat looks like this:
>>
>> md3 : active raid1 sdc1[2](S) sdd1[1]
>> 976759672 blocks super 1.2 [2/1] [_U]
>>
>> where sdc1 is the replaced drive.
>
> Hi Micah,
>
> What does the output from mdadm --detail /dev/md3 look like?
# mdadm --detail /dev/md3
/dev/md3:
Version : 1.2
Creation Time : Fri Oct 21 12:22:03 2011
Raid Level : raid1
Array Size : 976759672 (931.51 GiB 1000.20 GB)
Used Dev Size : 976759672 (931.51 GiB 1000.20 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Mon Oct 27 21:45:01 2014
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Name : unassigned-hostname:3
UUID : 736c4da2:1e53a976:6b0ff39a:b0ca93c2
Events : 2459508
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 49 1 active sync /dev/sdd1
2 8 33 - spare /dev/sdc1
#
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 element stuck in (S) state
2014-10-27 14:18 Raid1 element stuck in (S) state micah anderson
2014-10-27 20:57 ` Joe Lawrence
@ 2014-10-28 21:42 ` NeilBrown
2014-10-29 14:03 ` micah
1 sibling, 1 reply; 9+ messages in thread
From: NeilBrown @ 2014-10-28 21:42 UTC (permalink / raw)
To: micah anderson; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 660 bytes --]
On Mon, 27 Oct 2014 10:18:47 -0400 micah anderson <micah@debian.org> wrote:
>
> Hi,
>
> i've got a raid1 setup, where one drive died, it was replaced with a new
> one, but its stuck in a (S) state and I can't seem to get it added into
> the array, /proc/mdstat looks like this:
>
> md3 : active raid1 sdc1[2](S) sdd1[1]
> 976759672 blocks super 1.2 [2/1] [_U]
>
> where sdc1 is the replaced drive.
>
> What is the right way to get this added back?
>
I've a feeling this bug might have been fixed.
What versions of mdadm and Linux are you using?
Are there any errors in the kernel logs when you --add the device?
NeilBrown
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 element stuck in (S) state
2014-10-28 21:42 ` NeilBrown
@ 2014-10-29 14:03 ` micah
2014-10-29 20:10 ` NeilBrown
0 siblings, 1 reply; 9+ messages in thread
From: micah @ 2014-10-29 14:03 UTC (permalink / raw)
To: NeilBrown, micah anderson; +Cc: linux-raid
NeilBrown <neilb@suse.de> writes:
> On Mon, 27 Oct 2014 10:18:47 -0400 micah anderson <micah@debian.org> wrote:
>
>>
>> Hi,
>>
>> i've got a raid1 setup, where one drive died, it was replaced with a new
>> one, but its stuck in a (S) state and I can't seem to get it added into
>> the array, /proc/mdstat looks like this:
>>
>> md3 : active raid1 sdc1[2](S) sdd1[1]
>> 976759672 blocks super 1.2 [2/1] [_U]
>>
>> where sdc1 is the replaced drive.
>>
>> What is the right way to get this added back?
>>
>
> I've a feeling this bug might have been fixed.
> What versions of mdadm and Linux are you using?
I'm using squeeze here, and had 3.1.4-1+8efb9d1+squeeze1 installed, I
just installed the backport, which is 3.2.5-3~bpo60+1.
> Are there any errors in the kernel logs when you --add the device?
After installing the backported 3.2.5, I tried to add it, and it said:
# mdadm --add /dev/md3 /dev/sdc1
mdadm: Cannot open /dev/sdc1: Device or resource busy
so I did a --remove of the drive, and then added it, it then proceeded
to sync the array, and after that finished, it is now back in the (S)
state.
Can I just zero the superblock of that device and re-add it in order to
resolve this?
thanks!
micah
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 element stuck in (S) state
2014-10-29 14:03 ` micah
@ 2014-10-29 20:10 ` NeilBrown
2014-10-29 21:32 ` micah
0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2014-10-29 20:10 UTC (permalink / raw)
To: micah; +Cc: micah anderson, linux-raid
[-- Attachment #1: Type: text/plain, Size: 1796 bytes --]
On Wed, 29 Oct 2014 10:03:16 -0400 micah <micah@riseup.net> wrote:
> NeilBrown <neilb@suse.de> writes:
>
> > On Mon, 27 Oct 2014 10:18:47 -0400 micah anderson <micah@debian.org> wrote:
> >
> >>
> >> Hi,
> >>
> >> i've got a raid1 setup, where one drive died, it was replaced with a new
> >> one, but its stuck in a (S) state and I can't seem to get it added into
> >> the array, /proc/mdstat looks like this:
> >>
> >> md3 : active raid1 sdc1[2](S) sdd1[1]
> >> 976759672 blocks super 1.2 [2/1] [_U]
> >>
> >> where sdc1 is the replaced drive.
> >>
> >> What is the right way to get this added back?
> >>
> >
> > I've a feeling this bug might have been fixed.
> > What versions of mdadm and Linux are you using?
>
> I'm using squeeze here, and had 3.1.4-1+8efb9d1+squeeze1 installed, I
> just installed the backport, which is 3.2.5-3~bpo60+1.
Is assume that is the version of mdadm. You didn't say what version of Linux.
>
> > Are there any errors in the kernel logs when you --add the device?
You didn't answer this question either. Are there any messages in the
kernel log: /var/log/kern.log on debian.
Or in the output of "dmesg".
>
> After installing the backported 3.2.5, I tried to add it, and it said:
>
> # mdadm --add /dev/md3 /dev/sdc1
> mdadm: Cannot open /dev/sdc1: Device or resource busy
>
> so I did a --remove of the drive, and then added it, it then proceeded
> to sync the array, and after that finished, it is now back in the (S)
> state.
>
> Can I just zero the superblock of that device and re-add it in order to
> resolve this?
If it resyncs and the is still spare, there was almost certainly some sort of
failure. There really must be something in the kernel logs at that time.
NeilBrown
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 element stuck in (S) state
2014-10-29 20:10 ` NeilBrown
@ 2014-10-29 21:32 ` micah
2014-10-29 22:47 ` NeilBrown
0 siblings, 1 reply; 9+ messages in thread
From: micah @ 2014-10-29 21:32 UTC (permalink / raw)
To: NeilBrown; +Cc: micah anderson, linux-raid
NeilBrown <neilb@suse.de> writes:
> On Wed, 29 Oct 2014 10:03:16 -0400 micah <micah@riseup.net> wrote:
>
>> NeilBrown <neilb@suse.de> writes:
>>
>> > On Mon, 27 Oct 2014 10:18:47 -0400 micah anderson <micah@debian.org> wrote:
>> >
>> >>
>> >> Hi,
>> >>
>> >> i've got a raid1 setup, where one drive died, it was replaced with a new
>> >> one, but its stuck in a (S) state and I can't seem to get it added into
>> >> the array, /proc/mdstat looks like this:
>> >>
>> >> md3 : active raid1 sdc1[2](S) sdd1[1]
>> >> 976759672 blocks super 1.2 [2/1] [_U]
>> >>
>> >> where sdc1 is the replaced drive.
>> >>
>> >> What is the right way to get this added back?
>> >>
>> >
>> > I've a feeling this bug might have been fixed.
>> > What versions of mdadm and Linux are you using?
>>
>> I'm using squeeze here, and had 3.1.4-1+8efb9d1+squeeze1 installed, I
>> just installed the backport, which is 3.2.5-3~bpo60+1.
>
> Is assume that is the version of mdadm. You didn't say what version of Linux.
Yes, that is the version of mdadm. I am running squeeze, which is a
2.6.32-5 version of the kernel, and it is an amd64 machine.
>> > Are there any errors in the kernel logs when you --add the device?
>
> You didn't answer this question either. Are there any messages in the
> kernel log: /var/log/kern.log on debian.
> Or in the output of "dmesg".
The only thing I see in the log is:
[307932.328420] mdadm: sending ioctl 1261 to a partition!
[307932.328425] mdadm: sending ioctl 1261 to a partition!
[307932.346642] mdadm: sending ioctl 1261 to a partition!
[307932.346648] mdadm: sending ioctl 1261 to a partition!
[307932.352466] mdadm: sending ioctl 1261 to a partition!
[307932.352468] mdadm: sending ioctl 1261 to a partition!
[307932.376821] mdadm: sending ioctl 1261 to a partition!
[307932.376824] mdadm: sending ioctl 1261 to a partition!
[307932.377623] mdadm: sending ioctl 1261 to a partition!
[307932.377630] mdadm: sending ioctl 1261 to a partition!
[307932.467292] md: bind<sdc1>
[307932.588154] RAID1 conf printout:
[307932.588159] --- wd:1 rd:2
[307932.588164] disk 0, wo:1, o:1, dev:sdc1
[307932.588167] disk 1, wo:0, o:1, dev:sdd1
[307932.588248] md: recovery of RAID array md3
[307932.588251] md: minimum _guaranteed_ speed: 50000 KB/sec/disk.
[307932.588254] md: using maximum available idle IO bandwidth (but not more than 2000000 KB/sec) for recovery.
[307932.588260] md: using 128k window, over a total of 976759672 blocks.
but this is just when the device is added, after that it appears that
logrotation failed and I have a zero byte kern.log, and firewall spew
has filled up my dmesg ring.
>> Can I just zero the superblock of that device and re-add it in order to
>> resolve this?
>
>
> If it resyncs and the is still spare, there was almost certainly some sort of
> failure. There really must be something in the kernel logs at that time.
It did resync, and is still a spare.... Now that I've fixed the logs,
I'm going to try it again to see if there is any error that happens
after the sync finishes.
micah
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 element stuck in (S) state
2014-10-29 21:32 ` micah
@ 2014-10-29 22:47 ` NeilBrown
2014-11-02 15:45 ` micah
0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2014-10-29 22:47 UTC (permalink / raw)
To: micah; +Cc: micah anderson, linux-raid
[-- Attachment #1: Type: text/plain, Size: 3757 bytes --]
On Wed, 29 Oct 2014 17:32:43 -0400 micah <micah@riseup.net> wrote:
> NeilBrown <neilb@suse.de> writes:
>
> > On Wed, 29 Oct 2014 10:03:16 -0400 micah <micah@riseup.net> wrote:
> >
> >> NeilBrown <neilb@suse.de> writes:
> >>
> >> > On Mon, 27 Oct 2014 10:18:47 -0400 micah anderson <micah@debian.org> wrote:
> >> >
> >> >>
> >> >> Hi,
> >> >>
> >> >> i've got a raid1 setup, where one drive died, it was replaced with a new
> >> >> one, but its stuck in a (S) state and I can't seem to get it added into
> >> >> the array, /proc/mdstat looks like this:
> >> >>
> >> >> md3 : active raid1 sdc1[2](S) sdd1[1]
> >> >> 976759672 blocks super 1.2 [2/1] [_U]
> >> >>
> >> >> where sdc1 is the replaced drive.
> >> >>
> >> >> What is the right way to get this added back?
> >> >>
> >> >
> >> > I've a feeling this bug might have been fixed.
> >> > What versions of mdadm and Linux are you using?
> >>
> >> I'm using squeeze here, and had 3.1.4-1+8efb9d1+squeeze1 installed, I
> >> just installed the backport, which is 3.2.5-3~bpo60+1.
> >
> > Is assume that is the version of mdadm. You didn't say what version of Linux.
>
> Yes, that is the version of mdadm. I am running squeeze, which is a
> 2.6.32-5 version of the kernel, and it is an amd64 machine.
Wow.... a 5 year old kernel.
I suspect this is a kernel bug you are hitting. I vaguely remember something
like that - spares not becoming properly activated after recovery.
I don't remember the details and a quick look at commit logs doesn't show
anything obvious.
And maybe Debian has backported something which broke something.
Can you try a newer kernel at all?
NeilBrown
>
> >> > Are there any errors in the kernel logs when you --add the device?
> >
> > You didn't answer this question either. Are there any messages in the
> > kernel log: /var/log/kern.log on debian.
> > Or in the output of "dmesg".
>
> The only thing I see in the log is:
>
> [307932.328420] mdadm: sending ioctl 1261 to a partition!
> [307932.328425] mdadm: sending ioctl 1261 to a partition!
> [307932.346642] mdadm: sending ioctl 1261 to a partition!
> [307932.346648] mdadm: sending ioctl 1261 to a partition!
> [307932.352466] mdadm: sending ioctl 1261 to a partition!
> [307932.352468] mdadm: sending ioctl 1261 to a partition!
> [307932.376821] mdadm: sending ioctl 1261 to a partition!
> [307932.376824] mdadm: sending ioctl 1261 to a partition!
> [307932.377623] mdadm: sending ioctl 1261 to a partition!
> [307932.377630] mdadm: sending ioctl 1261 to a partition!
> [307932.467292] md: bind<sdc1>
> [307932.588154] RAID1 conf printout:
> [307932.588159] --- wd:1 rd:2
> [307932.588164] disk 0, wo:1, o:1, dev:sdc1
> [307932.588167] disk 1, wo:0, o:1, dev:sdd1
> [307932.588248] md: recovery of RAID array md3
> [307932.588251] md: minimum _guaranteed_ speed: 50000 KB/sec/disk.
> [307932.588254] md: using maximum available idle IO bandwidth (but not more than 2000000 KB/sec) for recovery.
> [307932.588260] md: using 128k window, over a total of 976759672 blocks.
>
> but this is just when the device is added, after that it appears that
> logrotation failed and I have a zero byte kern.log, and firewall spew
> has filled up my dmesg ring.
>
> >> Can I just zero the superblock of that device and re-add it in order to
> >> resolve this?
> >
> >
> > If it resyncs and the is still spare, there was almost certainly some sort of
> > failure. There really must be something in the kernel logs at that time.
>
> It did resync, and is still a spare.... Now that I've fixed the logs,
> I'm going to try it again to see if there is any error that happens
> after the sync finishes.
>
> micah
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Raid1 element stuck in (S) state
2014-10-29 22:47 ` NeilBrown
@ 2014-11-02 15:45 ` micah
0 siblings, 0 replies; 9+ messages in thread
From: micah @ 2014-11-02 15:45 UTC (permalink / raw)
To: NeilBrown; +Cc: micah anderson, linux-raid
NeilBrown <neilb@suse.de> writes:
>> >> > Are there any errors in the kernel logs when you --add the device?
>> >
>> > You didn't answer this question either. Are there any messages in the
>> > kernel log: /var/log/kern.log on debian.
>> > Or in the output of "dmesg".
>>
>> The only thing I see in the log is:
>>
>> [307932.328420] mdadm: sending ioctl 1261 to a partition!
>> [307932.328425] mdadm: sending ioctl 1261 to a partition!
>> [307932.346642] mdadm: sending ioctl 1261 to a partition!
>> [307932.346648] mdadm: sending ioctl 1261 to a partition!
>> [307932.352466] mdadm: sending ioctl 1261 to a partition!
>> [307932.352468] mdadm: sending ioctl 1261 to a partition!
>> [307932.376821] mdadm: sending ioctl 1261 to a partition!
>> [307932.376824] mdadm: sending ioctl 1261 to a partition!
>> [307932.377623] mdadm: sending ioctl 1261 to a partition!
>> [307932.377630] mdadm: sending ioctl 1261 to a partition!
>> [307932.467292] md: bind<sdc1>
>> [307932.588154] RAID1 conf printout:
>> [307932.588159] --- wd:1 rd:2
>> [307932.588164] disk 0, wo:1, o:1, dev:sdc1
>> [307932.588167] disk 1, wo:0, o:1, dev:sdd1
>> [307932.588248] md: recovery of RAID array md3
>> [307932.588251] md: minimum _guaranteed_ speed: 50000 KB/sec/disk.
>> [307932.588254] md: using maximum available idle IO bandwidth (but not more than 2000000 KB/sec) for recovery.
>> [307932.588260] md: using 128k window, over a total of 976759672 blocks.
>>
>> but this is just when the device is added, after that it appears that
>> logrotation failed and I have a zero byte kern.log, and firewall spew
>> has filled up my dmesg ring.
I fixed my logging and re-added the device, and found there was a
hardware error preventing things from syncing properly. I've resolved
that error and now things are fine. Thanks for the push to look closer
there!
micah
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-11-02 15:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-27 14:18 Raid1 element stuck in (S) state micah anderson
2014-10-27 20:57 ` Joe Lawrence
2014-10-28 4:45 ` micah
2014-10-28 21:42 ` NeilBrown
2014-10-29 14:03 ` micah
2014-10-29 20:10 ` NeilBrown
2014-10-29 21:32 ` micah
2014-10-29 22:47 ` NeilBrown
2014-11-02 15:45 ` micah
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).