* Linux RAID autodetect partitions go missing from /dev, but fdisk can see them
@ 2008-12-10 20:56 Tor Arne Vestbø
2008-12-16 3:27 ` Neil Brown
0 siblings, 1 reply; 10+ messages in thread
From: Tor Arne Vestbø @ 2008-12-10 20:56 UTC (permalink / raw)
To: linux-raid
Hi all,
I have a very strange problem that I've been trying to debug for days
now. I'm hoping someone on this list may have ran into this before, or
have any tips on how I can continue debugging this, because I have to
admit I'm a little lost...
I had a RAID5 with four drives and one spare, /dev/sd[bcde]1 +
/dev/sdf1, and everything was working fine, until one day one of the
drives in the array (sdb) no longer had a partition (sdb1). Letting the
spare take over I ignored this for a few days, but then it happened
again, this time with sdc. Kernel is 2.6.22.17, and I've compiled all
raid support in. The symptoms are:
- The kernel seems to detect the partitions (lines 396 and 407 in the
dmesg [1])
- But once the boot process finishes and the RAID is started, there is
no longer any sdc1 or sdb1, so the RAID fails to start (lines 550-576 in
dmesg [1])
- Running fdisk -l shows that the drives in question (sdb and sdc) do
have similar partitions as the other working drives, namely one Linux
RAID autodetect partition each (see command output [2])
- But, the partitions are missing from /proc/partitions (see [3])
- Manually adding device nodes using mknod works, but doing file -sL
on the device gives "writable, no read permission", even though
permissions are the same as the other sd* nodes in /dev
- Running 'partprobe -s' successfully finds the two missing partitions
and adds device nodes, and the nodes can be 'file -sL'ed, but when
trying to assemble the array again with these new nodes in the system,
I'm told that sdc1 is not found, and after the --assemble is done, the
device nodes are once again missing (!) see [4]
- I've tried using the 'dmraid' command to look for fakeraid
partitions or meta data on the drives, which I was told could mess up
the auto-detection of Linux software ride partitions, but could not find
any issues.
As you can tell I've exhausted all my current options, so any help on
what I could try next would be very much appreciated. I am especially
curious as to why I lose the partitions when mdadm tries to assemble the
array?
Thanks!
Tor Arne Vestbø
[1] http://pastebin.com/m15b9c275 dmesg
[2] http://pastebin.com/f50fb323a fdisk -l
[3] http://pastebin.com/f4547c2ca cat /proc/partitions
[4] http://pastebin.com/m4475c9ae partprobe + mdadm --assemble
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Linux RAID autodetect partitions go missing from /dev, but fdisk can see them
@ 2008-12-11 6:41 Tor Arne Vestbø
0 siblings, 0 replies; 10+ messages in thread
From: Tor Arne Vestbø @ 2008-12-11 6:41 UTC (permalink / raw)
To: linux-raid
Hi all,
I have a very strange problem that I've been trying to debug for days
now. I'm hoping someone on this list may have ran into this before, or
have any tips on how I can continue debugging this, because I have to
admit I'm a little lost...
I had a RAID5 with four drives and one spare, /dev/sd[bcde]1 +
/dev/sdf1, and everything was working fine, until one day one of the
drives in the array (sdb) no longer had a partition (sdb1). Letting the
spare take over I ignored this for a few days, but then it happened
again, this time with sdc. Kernel is 2.6.22.17, and I've compiled all
raid support in. The symptoms are:
- The kernel seems to detect the partitions (lines 396 and 407 in the
dmesg [1])
- But once the boot process finishes and the RAID is started, there is
no longer any sdc1 or sdb1, so the RAID fails to start (lines 550-576 in
dmesg [1])
- Running fdisk -l shows that the drives in question (sdb and sdc) do
have similar partitions as the other working drives, namely one Linux
RAID autodetect partition each (see command output [2])
- But, the partitions are missing from /proc/partitions (see [3])
- Manually adding device nodes using mknod works, but doing file -sL
on the device gives "writable, no read permission", even though
permissions are the same as the other sd* nodes in /dev
- Running 'partprobe -s' successfully finds the two missing partitions
and adds device nodes, and the nodes can be 'file -sL'ed, but when
trying to assemble the array again with these new nodes in the system,
I'm told that sdc1 is not found, and after the --assemble is done, the
device nodes are once again missing (!) see [4]
- I've tried using the 'dmraid' command to look for fakeraid
partitions or meta data on the drives, which I was told could mess up
the auto-detection of Linux software ride partitions, but could not find
any issues.
As you can tell I've exhausted all my current options, so any help on
what I could try next would be very much appreciated. I am especially
curious as to why I lose the partitions when mdadm tries to assemble the
array?
Thanks!
Tor Arne Vestbø
[1] http://pastebin.com/m15b9c275 dmesg
[2] http://pastebin.com/f50fb323a fdisk -l
[3] http://pastebin.com/f4547c2ca cat /proc/partitions
[4] http://pastebin.com/m4475c9ae partprobe + mdadm --assemble
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Linux RAID autodetect partitions go missing from /dev, but fdisk can see them
2008-12-10 20:56 Linux RAID autodetect partitions go missing from /dev, but fdisk can see them Tor Arne Vestbø
@ 2008-12-16 3:27 ` Neil Brown
2008-12-18 22:03 ` Tor Arne Vestbø
2008-12-18 22:22 ` Tor Arne Vestbø
0 siblings, 2 replies; 10+ messages in thread
From: Neil Brown @ 2008-12-16 3:27 UTC (permalink / raw)
To: Tor Arne Vestbø; +Cc: linux-raid
On Wednesday December 10, torarnv@gmail.com wrote:
> Hi all,
>
> I have a very strange problem that I've been trying to debug for days
> now. I'm hoping someone on this list may have ran into this before, or
> have any tips on how I can continue debugging this, because I have to
> admit I'm a little lost...
Yes, it does sound rather weird.
Can you:
mdadm -Esv
and
mdadm --stop /dev/md0
strace -o /tmp/str -s 200 mdadm --assemble --scan --verbose /dev/md0
and send me the output and /tmp/str.
Also the contents of /etc/mdadm.conf might help.
Thanks,
NeilBrown
>
> I had a RAID5 with four drives and one spare, /dev/sd[bcde]1 +
> /dev/sdf1, and everything was working fine, until one day one of the
> drives in the array (sdb) no longer had a partition (sdb1). Letting the
> spare take over I ignored this for a few days, but then it happened
> again, this time with sdc. Kernel is 2.6.22.17, and I've compiled all
> raid support in. The symptoms are:
>
> - The kernel seems to detect the partitions (lines 396 and 407 in the
> dmesg [1])
>
> - But once the boot process finishes and the RAID is started, there is
> no longer any sdc1 or sdb1, so the RAID fails to start (lines 550-576 in
> dmesg [1])
>
> - Running fdisk -l shows that the drives in question (sdb and sdc) do
> have similar partitions as the other working drives, namely one Linux
> RAID autodetect partition each (see command output [2])
>
> - But, the partitions are missing from /proc/partitions (see [3])
>
> - Manually adding device nodes using mknod works, but doing file -sL
> on the device gives "writable, no read permission", even though
> permissions are the same as the other sd* nodes in /dev
>
> - Running 'partprobe -s' successfully finds the two missing partitions
> and adds device nodes, and the nodes can be 'file -sL'ed, but when
> trying to assemble the array again with these new nodes in the system,
> I'm told that sdc1 is not found, and after the --assemble is done, the
> device nodes are once again missing (!) see [4]
>
> - I've tried using the 'dmraid' command to look for fakeraid
> partitions or meta data on the drives, which I was told could mess up
> the auto-detection of Linux software ride partitions, but could not find
> any issues.
>
>
> As you can tell I've exhausted all my current options, so any help on
> what I could try next would be very much appreciated. I am especially
> curious as to why I lose the partitions when mdadm tries to assemble the
> array?
>
> Thanks!
>
> Tor Arne Vestbø
>
> [1] http://pastebin.com/m15b9c275 dmesg
> [2] http://pastebin.com/f50fb323a fdisk -l
> [3] http://pastebin.com/f4547c2ca cat /proc/partitions
> [4] http://pastebin.com/m4475c9ae partprobe + mdadm --assemble
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Linux RAID autodetect partitions go missing from /dev, but fdisk can see them
2008-12-16 3:27 ` Neil Brown
@ 2008-12-18 22:03 ` Tor Arne Vestbø
2008-12-18 22:19 ` Tor Arne Vestbø
[not found] ` <18762.53424.819087.495066@notabene.brown>
2008-12-18 22:22 ` Tor Arne Vestbø
1 sibling, 2 replies; 10+ messages in thread
From: Tor Arne Vestbø @ 2008-12-18 22:03 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Hi Neil!
Neil Brown wrote:
> On Wednesday December 10, torarnv@gmail.com wrote:
>> I have a very strange problem that I've been trying to debug for
>> days now. I had a RAID5 with four drives and one spare,
>> /dev/sd[bcde]1 + /dev/sdf1, and everything was working fine, until
>> one day one of the drives in the array (sdb) no longer had a
>> partition (sdb1). Letting the spare take over I ignored this for a
>> few days, but then it happened again, this time with sdc1.
>> I'm hoping someone on this list may have ran into this before, or
>> have any tips on how I can continue debugging this, because I have to
>> admit I'm a little lost...
>
> Yes, it does sound rather weird.
First of all, thank you so much for helping me out with this, as I'm
still very lost :)
In addition to the things listed in the first e-mail, I've also tried
installing the latest kernel from kernel.org, but that did not solve
anything. Also, in case it's relevant, I'm running openSUSE 10.3.
> Can you:
>
> mdadm -Esv
http://pastebin.com/d7b14d14e
For some reason it seems to think that /dev/sdc and /dev/sdb are part of
the array, while it really is /dev/sdc1 and /dev/sdb1. I'm guessing
since they are missing somehow from the device nodes in /dev mdadm
assumes the disk itself is the member?
> and
> mdadm --stop /dev/md0
> strace -o /tmp/str -s 200 mdadm --assemble --scan --verbose /dev/md0
http://pastebin.com/f2c1db2e4
The original array had sd[bcde]1 + sdf1 as spare. Then sdb1 went missing
and the spare kicked in, and then sdc1 went missing, leaving me with a
degraded array.
> Also the contents of /etc/mdadm.conf might help.
http://pastebin.com/f573346ef
Is there anything else I can run, cat, and/or paste that would shed
light over what's going on?
> Thanks,
Thank _you_ :)
Tor Arne
>> raid support in. The symptoms are:
>>
>> - The kernel seems to detect the partitions (lines 396 and 407 in the
>> dmesg [1])
>>
>> - But once the boot process finishes and the RAID is started, there is
>> no longer any sdc1 or sdb1, so the RAID fails to start (lines 550-576 in
>> dmesg [1])
>>
>> - Running fdisk -l shows that the drives in question (sdb and sdc) do
>> have similar partitions as the other working drives, namely one Linux
>> RAID autodetect partition each (see command output [2])
>>
>> - But, the partitions are missing from /proc/partitions (see [3])
>>
>> - Manually adding device nodes using mknod works, but doing file -sL
>> on the device gives "writable, no read permission", even though
>> permissions are the same as the other sd* nodes in /dev
>>
>> - Running 'partprobe -s' successfully finds the two missing partitions
>> and adds device nodes, and the nodes can be 'file -sL'ed, but when
>> trying to assemble the array again with these new nodes in the system,
>> I'm told that sdc1 is not found, and after the --assemble is done, the
>> device nodes are once again missing (!) see [4]
>>
>> - I've tried using the 'dmraid' command to look for fakeraid
>> partitions or meta data on the drives, which I was told could mess up
>> the auto-detection of Linux software ride partitions, but could not find
>> any issues.
>>
>>
>> As you can tell I've exhausted all my current options, so any help on
>> what I could try next would be very much appreciated. I am especially
>> curious as to why I lose the partitions when mdadm tries to assemble the
>> array?
>>
>> Thanks!
>>
>> Tor Arne Vestbø
>>
>> [1] http://pastebin.com/m15b9c275 dmesg
>> [2] http://pastebin.com/f50fb323a fdisk -l
>> [3] http://pastebin.com/f4547c2ca cat /proc/partitions
>> [4] http://pastebin.com/m4475c9ae partprobe + mdadm --assemble
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Linux RAID autodetect partitions go missing from /dev, but fdisk can see them
2008-12-18 22:03 ` Tor Arne Vestbø
@ 2008-12-18 22:19 ` Tor Arne Vestbø
[not found] ` <18762.53424.819087.495066@notabene.brown>
1 sibling, 0 replies; 10+ messages in thread
From: Tor Arne Vestbø @ 2008-12-18 22:19 UTC (permalink / raw)
To: linux-raid
Tor Arne Vestbø wrote:
> The original array had sd[bcde]1 + sdf1 as spare. Then sdb1 went missing
> and the spare kicked in, and then sdc1 went missing, leaving me with a
> degraded array.
FYI, I decided to delete the Linux auto-detect partition on /dev/sdb so
I could re-create it and add /dev/sdb1 to the array again, so I at least
have four drives and not a degraded array (don't want to lose all my data).
This worked fine; after fdisk'ing /dev/sdb, deleting the partition and
recreating it with the same parameters is showed up as /dev/sdb1, and I
was able to add it to the array using mdadm /dev/md0 -a /dev/sdb1. The
array is now rebuilding.
We still have /dev/sdc1 as a test-case though, so the original problem
is still there, just not for two drives but for one. I would really like
to figure out why it's happening, so that suddenly it does not happen to
two drives at the same time, rendering my array dead :/
Tor Arne
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Linux RAID autodetect partitions go missing from /dev, but fdisk can see them
2008-12-16 3:27 ` Neil Brown
2008-12-18 22:03 ` Tor Arne Vestbø
@ 2008-12-18 22:22 ` Tor Arne Vestbø
1 sibling, 0 replies; 10+ messages in thread
From: Tor Arne Vestbø @ 2008-12-18 22:22 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
>> On Wednesday December 10, torarnv@gmail.com wrote:
>>> Running 'partprobe -s' successfully finds the two missing partitions
>>> and adds device nodes, and the nodes can be 'file -sL'ed, but when
>>> trying to assemble the array again with these new nodes in the system,
>>> I'm told that sdc1 is not found, and after the --assemble is done, the
>>> device nodes are once again missing (!) see [4]
Neil Brown wrote:
> Can you:
> strace -o /tmp/str -s 200 mdadm --assemble --scan --verbose /dev/md0
Should I try this after doing partprobe first? In that case /dev/sdb1
and /dev/sdc1 exist, and we might see from the trace what causes them to
disappear?
>> [4] http://pastebin.com/m4475c9ae partprobe + mdadm --assemble
Tor Arne
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Linux RAID autodetect partitions go missing from /dev, but fdisk can see them
[not found] ` <18762.53424.819087.495066@notabene.brown>
@ 2009-01-18 17:51 ` Tor Arne Vestbø
2009-01-19 16:18 ` Tor Arne Vestbø
2009-01-31 13:19 ` Tor Arne Vestbø
0 siblings, 2 replies; 10+ messages in thread
From: Tor Arne Vestbø @ 2009-01-18 17:51 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Hi again,
Sorry for the delay (Christmas vacation, ++). I still have the problem
though.
My attempt to re-create /dev/sdb1 by fdisk'ing it again and adding it to
the array failed. That is, mdadm started re-constructing the array, and
that succeeded, but when I rebooted /dev/sdb1 was missing again :/
Neil Brown wrote:
> If a partition start at an offset in the device which is a multiple of
> 64K, and ends at the end of the device, then both the partition and
> the whole device will appear to have the same superblock. This can
> cause confusion.
I think you are on to something here! I did partition all my disks with
just one partition, starting on the first block and taking up all the
remaining space, and set it to type Linux RAID auto detect.
Here's the output of the fdisk -pu and mdmadm -E commands before running
blockdev:
http://pastebin.com/f4cad67ea
http://pastebin.com/f34be2b86
It seems that mdadm thinks sd[bc] are part of the array directly
somehow, but for sd[def] the partition itself is in the array, which I
guess supports your suspicion?
Here's the output after running blockdev --rereadpt /dev/sdc :
http://pastebin.com/f725ee0c3
Suddenly mdadm -E /dev/sdc* shows info for both sdc and sdc1, and guess
what? /dev/sdc1 is back as a device node :)
So is this a case of the problem you described about
superblock-confusion? If so, is it fixable?
Thank you so much for your help in this! I really do feel I'm one big
step closer to solving this!
Tor Arne
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Linux RAID autodetect partitions go missing from /dev, but fdisk can see them
2009-01-18 17:51 ` Tor Arne Vestbø
@ 2009-01-19 16:18 ` Tor Arne Vestbø
2009-01-31 13:19 ` Tor Arne Vestbø
1 sibling, 0 replies; 10+ messages in thread
From: Tor Arne Vestbø @ 2009-01-19 16:18 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Tor Arne Vestbø wrote:
> So is this a case of the problem you described about
> superblock-confusion? If so, is it fixable?
Btw, I cloned mdadm from your site, so I have an updated version now. I
haven't installed it yet, in case it can damage the existing RAID, but
if there are any tests I can run with the updated version I'm ready for
them :)
Tor Arne
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Linux RAID autodetect partitions go missing from /dev, but fdisk can see them
2009-01-18 17:51 ` Tor Arne Vestbø
2009-01-19 16:18 ` Tor Arne Vestbø
@ 2009-01-31 13:19 ` Tor Arne Vestbø
2009-01-31 18:50 ` Richard Scobie
1 sibling, 1 reply; 10+ messages in thread
From: Tor Arne Vestbø @ 2009-01-31 13:19 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Tor Arne Vestbø wrote:
> Neil Brown wrote:
>> If a partition start at an offset in the device which is a multiple of
>> 64K, and ends at the end of the device, then both the partition and
>> the whole device will appear to have the same superblock. This can
>> cause confusion.
>
> I think you are on to something here! I did partition all my disks with
> just one partition, starting on the first block and taking up all the
> remaining space, and set it to type Linux RAID auto detect.
[snip]
> So is this a case of the problem you described about
> superblock-confusion? If so, is it fixable?
It appears it is :)
For googlability, here's what I did:
Changed the DEVICE line in my mdadm.conf from:
DEVICE partitions
to
DEVICE /dev/sd[bcdef]1
Now the array is assembled correctly at boot.
I'm still confused why the device nodes went missing as a result of the
failed assembly earlier, but that's a minor issue I can live with :)
Tor Arne
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Linux RAID autodetect partitions go missing from /dev, but fdisk can see them
2009-01-31 13:19 ` Tor Arne Vestbø
@ 2009-01-31 18:50 ` Richard Scobie
0 siblings, 0 replies; 10+ messages in thread
From: Richard Scobie @ 2009-01-31 18:50 UTC (permalink / raw)
To: Tor Arne Vestbø; +Cc: Linux RAID Mailing List
Tor Arne Vestbø wrote:
> Changed the DEVICE line in my mdadm.conf from:
>
> DEVICE partitions
>
> to
>
> DEVICE /dev/sd[bcdef]1
>
> Now the array is assembled correctly at boot.
A safer way of acheiving this is to list the array in mdadm.conf by
UUID, which will guarantee all the correct devices are used - sd devices
can move around under some conditions.
eg.
ARRAY /dev/md1 level=raid1 num-devices=2
UUID=f6bd7495:52288189:40f44282:1c220686
ARRAY /dev/md0 level=raid1 num-devices=2
UUID=f7d720fe:a5d0724c:c9bd10fa:a496ed51
Note each ARRAY entry and UUID are entered as a single line.
Regards,
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-01-31 18:50 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-10 20:56 Linux RAID autodetect partitions go missing from /dev, but fdisk can see them Tor Arne Vestbø
2008-12-16 3:27 ` Neil Brown
2008-12-18 22:03 ` Tor Arne Vestbø
2008-12-18 22:19 ` Tor Arne Vestbø
[not found] ` <18762.53424.819087.495066@notabene.brown>
2009-01-18 17:51 ` Tor Arne Vestbø
2009-01-19 16:18 ` Tor Arne Vestbø
2009-01-31 13:19 ` Tor Arne Vestbø
2009-01-31 18:50 ` Richard Scobie
2008-12-18 22:22 ` Tor Arne Vestbø
-- strict thread matches above, loose matches on Subject: below --
2008-12-11 6:41 Tor Arne Vestbø
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).