* OK, Now this is really weird
@ 2011-02-26 7:00 Leslie Rhorer
2011-02-26 7:36 ` Jeff Woods
0 siblings, 1 reply; 8+ messages in thread
From: Leslie Rhorer @ 2011-02-26 7:00 UTC (permalink / raw)
To: 'Linux RAID'
I have a pair of drives each of whose 3 partitions are members of a
set of 3 RAID arrays. One of the two drives had a flaky power connection
which I thought I had fixed, but I guess not, because the drive was taken
offline again on Tuesday. The significant issue, however, is that both
times the drive failed, mdadm behaved really oddly. The first time I
thought it might just be some odd anomaly, but the second time it did
precisely the same thing. Both times, when the drive was de-registered by
udev, the first two arrays properly responded to the failure, but the third
array did not. Here is the layout:
ARRAY /dev/md1 metadata=0.90 UUID=4cde286c:0687556a:4d9996dd:dd23e701
ARRAY /dev/md2 metadata=1.2 name=Backup:2
UUID=d45ff663:9e53774c:6fcf9968:21692025
ARRAY /dev/md3 metadata=1.2 name=Backup:3
UUID=51d22c47:10f58974:0b27ef04:5609d357
Here is the result from examining the live parttions:
/dev/sdl1:
Magic : a92b4efc
Version : 0.90.00
UUID : 4cde286c:0687556a:4d9996dd:dd23e701 (local to host Backup)
Creation Time : Fri Jun 11 20:45:51 2010
Raid Level : raid1
Used Dev Size : 6144704 (5.86 GiB 6.29 GB)
Array Size : 6144704 (5.86 GiB 6.29 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Update Time : Sat Feb 26 00:47:19 2011
State : clean
Internal Bitmap : present
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Checksum : c127a1bf - correct
Events : 1014
Number Major Minor RaidDevice State
this 1 8 177 1 active sync /dev/sdl1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
/dev/sdl2:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : d45ff663:9e53774c:6fcf9968:21692025
Name : Backup:2 (local to host Backup)
Creation Time : Sat Dec 19 22:59:43 2009
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 554884828 (264.59 GiB 284.10 GB)
Array Size : 554884828 (264.59 GiB 284.10 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : e0896263:c0f95d43:9c0cb92a:79a95210
Internal Bitmap : 8 sectors from superblock
Update Time : Sat Feb 26 00:47:18 2011
Checksum : 41881e60 - correct
Events : 902752
Device Role : Active device 1
Array State : .A ('A' == active, '.' == missing)
/dev/sdl3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 51d22c47:10f58974:0b27ef04:5609d357
Name : Backup:3 (local to host Backup)
Creation Time : Sat May 29 14:16:22 2010
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 409593096 (195.31 GiB 209.71 GB)
Array Size : 409593096 (195.31 GiB 209.71 GB)
Data Offset : 144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 982c9519:48d21940:3720b6d5:dfb0a312
Internal Bitmap : 8 sectors from superblock
Update Time : Wed Feb 9 20:02:26 2011
Checksum : 6c78f4a2 - correct
Events : 364740
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing)
Here are the array details:
/dev/md1:
Version : 0.90
Creation Time : Fri Jun 11 20:45:51 2010
Raid Level : raid1
Array Size : 6144704 (5.86 GiB 6.29 GB)
Used Dev Size : 6144704 (5.86 GiB 6.29 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Feb 26 00:53:23 2011
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
UUID : 4cde286c:0687556a:4d9996dd:dd23e701 (local to host Backup)
Events : 0.1016
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 177 1 active sync /dev/sdl1
2 8 161 - faulty spare
/dev/md2:
Version : 1.2
Creation Time : Sat Dec 19 22:59:43 2009
Raid Level : raid1
Array Size : 277442414 (264.59 GiB 284.10 GB)
Used Dev Size : 277442414 (264.59 GiB 284.10 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Feb 26 00:53:47 2011
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Name : Backup:2 (local to host Backup)
UUID : d45ff663:9e53774c:6fcf9968:21692025
Events : 902890
Number Major Minor RaidDevice State
0 0 0 0 removed
3 8 178 1 active sync /dev/sdl2
2 8 162 - faulty spare
/dev/md3:
Version : 1.2
Creation Time : Sat May 29 14:16:22 2010
Raid Level : raid1
Array Size : 204796548 (195.31 GiB 209.71 GB)
Used Dev Size : 204796548 (195.31 GiB 209.71 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Wed Feb 9 20:02:26 2011
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Name : Backup:3 (local to host Backup)
UUID : 51d22c47:10f58974:0b27ef04:5609d357
Events : 364740
Number Major Minor RaidDevice State
2 8 163 0 faulty spare rebuilding
3 8 179 1 active sync /dev/sdl3
So what gives? /dev/sdk3 no longer even exists, so why hasn't it
been failed and removed on /dev /md3 like it has on /dev/md1 and /dev/md2?
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: OK, Now this is really weird
2011-02-26 7:00 OK, Now this is really weird Leslie Rhorer
@ 2011-02-26 7:36 ` Jeff Woods
2011-02-26 11:20 ` Leslie Rhorer
0 siblings, 1 reply; 8+ messages in thread
From: Jeff Woods @ 2011-02-26 7:36 UTC (permalink / raw)
To: lrhorer; +Cc: 'Linux RAID'
Quoting Leslie Rhorer <lrhorer@satx.rr.com>:
> I have a pair of drives each of whose 3 partitions are members of a
> set of 3 RAID arrays. One of the two drives had a flaky power connection
> which I thought I had fixed, but I guess not, because the drive was taken
> offline again on Tuesday. The significant issue, however, is that both
> times the drive failed, mdadm behaved really oddly. The first time I
> thought it might just be some odd anomaly, but the second time it did
> precisely the same thing. Both times, when the drive was de-registered by
> udev, the first two arrays properly responded to the failure, but the third
> array did not. Here is the layout:
[snip lots of technical details]
> So what gives? /dev/sdk3 no longer even exists, so why hasn't it
> been failed and removed on /dev /md3 like it has on /dev/md1 and /dev/md2?
Is it possible there has been no I/O request for /dev/md3 since
/dev/sdk failed?
--
Jeff Woods <jeff@jeffwoods.us>
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: OK, Now this is really weird
2011-02-26 7:36 ` Jeff Woods
@ 2011-02-26 11:20 ` Leslie Rhorer
2011-02-26 11:35 ` Mathias Burén
0 siblings, 1 reply; 8+ messages in thread
From: Leslie Rhorer @ 2011-02-26 11:20 UTC (permalink / raw)
To: 'Jeff Woods'; +Cc: 'Linux RAID'
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Jeff Woods
> Sent: Saturday, February 26, 2011 1:36 AM
> To: lrhorer@satx.rr.com
> Cc: 'Linux RAID'
> Subject: Re: OK, Now this is really weird
>
> Quoting Leslie Rhorer <lrhorer@satx.rr.com>:
> > I have a pair of drives each of whose 3 partitions are members of a
> > set of 3 RAID arrays. One of the two drives had a flaky power
> connection
> > which I thought I had fixed, but I guess not, because the drive was
> taken
> > offline again on Tuesday. The significant issue, however, is that both
> > times the drive failed, mdadm behaved really oddly. The first time I
> > thought it might just be some odd anomaly, but the second time it did
> > precisely the same thing. Both times, when the drive was de-registered
> by
> > udev, the first two arrays properly responded to the failure, but the
> third
> > array did not. Here is the layout:
>
> [snip lots of technical details]
>
> > So what gives? /dev/sdk3 no longer even exists, so why hasn't it
> > been failed and removed on /dev /md3 like it has on /dev/md1 and
> /dev/md2?
>
> Is it possible there has been no I/O request for /dev/md3 since
> /dev/sdk failed?
Well, I thought about that. It's swap space, so I suppose it's
possible. I would have thought, however, that mdadm would fail a missing
member whether there is any I/O or not.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: OK, Now this is really weird
2011-02-26 11:20 ` Leslie Rhorer
@ 2011-02-26 11:35 ` Mathias Burén
2011-02-26 21:34 ` NeilBrown
2011-02-27 7:15 ` Leslie Rhorer
0 siblings, 2 replies; 8+ messages in thread
From: Mathias Burén @ 2011-02-26 11:35 UTC (permalink / raw)
To: lrhorer; +Cc: Jeff Woods, Linux RAID
On 26 February 2011 11:20, Leslie Rhorer <lrhorer@satx.rr.com> wrote:
>
>
>> -----Original Message-----
>> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
>> owner@vger.kernel.org] On Behalf Of Jeff Woods
>> Sent: Saturday, February 26, 2011 1:36 AM
>> To: lrhorer@satx.rr.com
>> Cc: 'Linux RAID'
>> Subject: Re: OK, Now this is really weird
>>
>> Quoting Leslie Rhorer <lrhorer@satx.rr.com>:
>> > I have a pair of drives each of whose 3 partitions are members of a
>> > set of 3 RAID arrays. One of the two drives had a flaky power
>> connection
>> > which I thought I had fixed, but I guess not, because the drive was
>> taken
>> > offline again on Tuesday. The significant issue, however, is that both
>> > times the drive failed, mdadm behaved really oddly. The first time I
>> > thought it might just be some odd anomaly, but the second time it did
>> > precisely the same thing. Both times, when the drive was de-registered
>> by
>> > udev, the first two arrays properly responded to the failure, but the
>> third
>> > array did not. Here is the layout:
>>
>> [snip lots of technical details]
>>
>> > So what gives? /dev/sdk3 no longer even exists, so why hasn't it
>> > been failed and removed on /dev /md3 like it has on /dev/md1 and
>> /dev/md2?
>>
>> Is it possible there has been no I/O request for /dev/md3 since
>> /dev/sdk failed?
>
> Well, I thought about that. It's swap space, so I suppose it's
> possible. I would have thought, however, that mdadm would fail a missing
> member whether there is any I/O or not.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
I thought so as well. But how will mdadm know is the device is faulty,
unless the device is generating errors? (which usually only happens on
read and/or write)
// Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: OK, Now this is really weird
2011-02-26 11:35 ` Mathias Burén
@ 2011-02-26 21:34 ` NeilBrown
2011-02-27 7:22 ` Leslie Rhorer
2011-02-27 7:15 ` Leslie Rhorer
1 sibling, 1 reply; 8+ messages in thread
From: NeilBrown @ 2011-02-26 21:34 UTC (permalink / raw)
To: Mathias Burén; +Cc: lrhorer, Jeff Woods, Linux RAID
On Sat, 26 Feb 2011 11:35:11 +0000 Mathias Burén <mathias.buren@gmail.com>
wrote:
> On 26 February 2011 11:20, Leslie Rhorer <lrhorer@satx.rr.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> >> owner@vger.kernel.org] On Behalf Of Jeff Woods
> >> Sent: Saturday, February 26, 2011 1:36 AM
> >> To: lrhorer@satx.rr.com
> >> Cc: 'Linux RAID'
> >> Subject: Re: OK, Now this is really weird
> >>
> >> Quoting Leslie Rhorer <lrhorer@satx.rr.com>:
> >> > I have a pair of drives each of whose 3 partitions are members of a
> >> > set of 3 RAID arrays. One of the two drives had a flaky power
> >> connection
> >> > which I thought I had fixed, but I guess not, because the drive was
> >> taken
> >> > offline again on Tuesday. The significant issue, however, is that both
> >> > times the drive failed, mdadm behaved really oddly. The first time I
> >> > thought it might just be some odd anomaly, but the second time it did
> >> > precisely the same thing. Both times, when the drive was de-registered
> >> by
> >> > udev, the first two arrays properly responded to the failure, but the
> >> third
> >> > array did not. Here is the layout:
> >>
> >> [snip lots of technical details]
> >>
> >> > So what gives? /dev/sdk3 no longer even exists, so why hasn't it
> >> > been failed and removed on /dev /md3 like it has on /dev/md1 and
> >> /dev/md2?
> >>
> >> Is it possible there has been no I/O request for /dev/md3 since
> >> /dev/sdk failed?
> >
> > Well, I thought about that. It's swap space, so I suppose it's
> > possible. I would have thought, however, that mdadm would fail a missing
> > member whether there is any I/O or not.
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> I thought so as well. But how will mdadm know is the device is faulty,
> unless the device is generating errors? (which usually only happens on
> read and/or write)
With very recent mdadm the command
mdadm -If sdXX
will find any md array that has /dev/sdXX as a member and will fail and
remove it.
Note the device name is 'sdxx', not '/dev/something'. This is because that
at the time you want to do this, udev has probably removed all trace
from /dev so you need to use the name mentioned in /proc/mdstat
or /sys/block/mdXX/md/dev-$DEVNAME
You can set up a udev rule to run mdadm like this automatically when a device
is hot-unplugged.
something like
SUBSYSTEM=="block", ACTION=="remove", RUN+="/sbin/mdadm -If $name --path $env{ID_PATH}"
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread* RE: OK, Now this is really weird
2011-02-26 21:34 ` NeilBrown
@ 2011-02-27 7:22 ` Leslie Rhorer
2011-02-27 7:57 ` NeilBrown
0 siblings, 1 reply; 8+ messages in thread
From: Leslie Rhorer @ 2011-02-27 7:22 UTC (permalink / raw)
To: 'NeilBrown'; +Cc: 'Linux RAID'
> > >> > So what gives? /dev/sdk3 no longer even exists, so why hasn't
> it
> > >> > been failed and removed on /dev /md3 like it has on /dev/md1 and
> > >> /dev/md2?
> > >>
> > >> Is it possible there has been no I/O request for /dev/md3 since
> > >> /dev/sdk failed?
> > >
> > > Well, I thought about that. It's swap space, so I suppose it's
> > > possible. I would have thought, however, that mdadm would fail a
> missing
> > > member whether there is any I/O or not.
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> >
> > I thought so as well. But how will mdadm know is the device is faulty,
> > unless the device is generating errors? (which usually only happens on
> > read and/or write)
>
> With very recent mdadm the command
>
> mdadm -If sdXX
>
> will find any md array that has /dev/sdXX as a member and will fail and
> remove it.
No, it's version 3.1.4, and that gives me a "Device or Resource
busy" error. It does report that it set sdk3 faulty, but the hot remove
fails.
So how can I remove the drive (so I can add it back)?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: OK, Now this is really weird
2011-02-27 7:22 ` Leslie Rhorer
@ 2011-02-27 7:57 ` NeilBrown
0 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2011-02-27 7:57 UTC (permalink / raw)
To: lrhorer; +Cc: 'Linux RAID'
On Sun, 27 Feb 2011 01:22:41 -0600 "Leslie Rhorer" <lrhorer@satx.rr.com>
wrote:
>
> > > >> > So what gives? /dev/sdk3 no longer even exists, so why hasn't
> > it
> > > >> > been failed and removed on /dev /md3 like it has on /dev/md1 and
> > > >> /dev/md2?
> > > >>
> > > >> Is it possible there has been no I/O request for /dev/md3 since
> > > >> /dev/sdk failed?
> > > >
> > > > Well, I thought about that. It's swap space, so I suppose it's
> > > > possible. I would have thought, however, that mdadm would fail a
> > missing
> > > > member whether there is any I/O or not.
> > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> > in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > >
> > >
> > > I thought so as well. But how will mdadm know is the device is faulty,
> > > unless the device is generating errors? (which usually only happens on
> > > read and/or write)
> >
> > With very recent mdadm the command
> >
> > mdadm -If sdXX
> >
> > will find any md array that has /dev/sdXX as a member and will fail and
> > remove it.
>
> No, it's version 3.1.4, and that gives me a "Device or Resource
> busy" error. It does report that it set sdk3 faulty, but the hot remove
> fails.
>
> So how can I remove the drive (so I can add it back)?
Maybe:
mdadm /dev/md2 --remove failed
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: OK, Now this is really weird
2011-02-26 11:35 ` Mathias Burén
2011-02-26 21:34 ` NeilBrown
@ 2011-02-27 7:15 ` Leslie Rhorer
1 sibling, 0 replies; 8+ messages in thread
From: Leslie Rhorer @ 2011-02-27 7:15 UTC (permalink / raw)
To: 'Mathias Burén'; +Cc: 'Linux RAID'
> >> > So what gives? /dev/sdk3 no longer even exists, so why hasn't it
> >> > been failed and removed on /dev /md3 like it has on /dev/md1 and
> >> /dev/md2?
> >>
> >> Is it possible there has been no I/O request for /dev/md3 since
> >> /dev/sdk failed?
> >
> > Well, I thought about that. It's swap space, so I suppose it's
> > possible. I would have thought, however, that mdadm would fail a
> missing
> > member whether there is any I/O or not.
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> I thought so as well. But how will mdadm know is the device is faulty,
> unless the device is generating errors? (which usually only happens on
> read and/or write)
Well, reading here, I believe I have seen posts talking about mdadm waking
up sleeping spindles periodically, thereby killing part of the power saving
functions of "green" drives. Have those posts been in error? It's been
days since the drive "failed".
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-02-27 7:57 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-26 7:00 OK, Now this is really weird Leslie Rhorer
2011-02-26 7:36 ` Jeff Woods
2011-02-26 11:20 ` Leslie Rhorer
2011-02-26 11:35 ` Mathias Burén
2011-02-26 21:34 ` NeilBrown
2011-02-27 7:22 ` Leslie Rhorer
2011-02-27 7:57 ` NeilBrown
2011-02-27 7:15 ` Leslie Rhorer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox