From: Pavel Hofman <pavel.hofman@ivitera.com>
To: Phil Turmel <philip@turmel.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid auto-assembly upon boot - device order
Date: Tue, 28 Jun 2011 14:01:25 +0200 [thread overview]
Message-ID: <4E09C295.8040102@ivitera.com> (raw)
In-Reply-To: <4E09B50B.20306@turmel.org>
Hi Phil,
Dne 28.6.2011 13:03, Phil Turmel napsal(a):
> Good morning, Pavel,
>
> On 06/28/2011 06:18 AM, Pavel Hofman wrote:
>>
>>
>> Hi Phil,
>>
>> This is my rather complex setup: Personalities : [raid1] [raid0]
>> md4 : active raid0 sdb1[0] sdd3[1] 2178180864 blocks 64k chunks
>>
>> md2 : active raid1 sdc2[0] sdd2[1] 8787456 blocks [2/2] [UU]
>>
>> md3 : active raid0 sda1[0] sdc3[1] 2178180864 blocks 64k chunks
>>
>> md7 : active raid1 md6[2] md5[1] 2178180592 blocks super 1.0 [2/1]
>> [_U] [===========>.........] recovery = 59.3%
>> (1293749868/2178180592) finish=164746.8min speed=87K/sec
>>
>> md6 : active raid1 md4[0] 2178180728 blocks super 1.0 [2/1] [U_]
>>
>> md5 : active raid1 md3[2] 2178180728 blocks super 1.0 [2/1] [U_]
>> bitmap: 9/9 pages [36KB], 131072KB chunk
>>
>> md1 : active raid1 sdc1[0] sdd1[3] 10739328 blocks [5/2] [U__U_]
>>
>>
>> You can see md7 recoverying, even though both md5 and md6 were
>> present.
>
> Yes, but md5 & md6 are themselves degraded. Should not have started
> unless you are globally enabling it.
>
> ps. "lsdrv" would be really useful here to understand your layering
> setup.
>
> http://github.com/pturmel/lsdrv
Thanks a lot for your quick reply. And for your wonderful tool too.
orfeus:/boot# lsdrv
PCI [AMD_IDE] 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
└─ide 2.0 HL-DT-ST RW/DVD GCC-H20N {[No Information Found]}
└─hde: [33:0] Empty/Unknown 4.00g
PCI [sata_nv] 00:05.0 IDE interface: nVidia Corporation MCP55 SATA
Controller (rev a3)
├─scsi 0:0:0:0 ATA SAMSUNG HD753LJ {S13UJDWQ912345}
│ └─sda: [8:0] MD raid10 (4) 698.64g inactive
{646f62e3:626d2cb3:05afacbb:371c5cc4}
│ └─sda1: [8:1] MD raid0 (0/2) 698.64g md3 clean in_sync
{8c9c28dd:ac12a9ef:a6200310:fe6d9686}
│ └─md3: [9:3] MD raid1 (0/2) 2.03t md5 active in_sync
'orfeus:5' {2f88c280:3d7af418:e8d459c5:782e3ed2}
│ └─md5: [9:5] MD raid1 (1/2) 2.03t md7 active in_sync
'orfeus:7' {dde16cd5:2e17c743:fcc7926c:fcf5081e}
│ └─md7: [9:7] (xfs) 2.03t 'backup'
{d987301b-dfb1-4c99-8f72-f4b400ba46c9}
│ └─Mounted as /dev/md7 @ /mnt/raid
└─scsi 1:0:0:0 ATA ST3750330AS {9QK0VFJ9}
└─sdb: [8:16] Empty/Unknown 698.64g
└─sdb1: [8:17] MD raid0 (0/2) 698.64g md4 clean in_sync
{ce213d01:e50809ed:a6200310:fe6d9686}
└─md4: [9:4] MD raid1 (0/2) 2.03t md6 active in_sync
''orfeus':6' {1f83ea99:a9e4d498:a6543047:af0a3b38}
└─md6: [9:6] MD raid1 (0/2) 2.03t md7 active spare
''orfeus':7' {dde16cd5:2e17c743:fcc7926c:fcf5081e}
PCI [sata_nv] 00:05.1 IDE interface: nVidia Corporation MCP55 SATA
Controller (rev a3)
├─scsi 2:0:0:0 ATA ST31500341AS {9VS15Y1L}
│ └─sdc: [8:32] Empty/Unknown 1.36t
│ ├─sdc1: [8:33] MD raid1 (0/5) 10.24g md1 clean in_sync
{588cbbfd:4835b4da:0d7a0b1c:7bf552bb}
│ │ └─md1: [9:1] (ext3) 10.24g {f620df1e-6dd6-43ab-b4e6-8e1fd4a447f7}
│ │ └─Mounted as /dev/md1 @ /
│ ├─sdc2: [8:34] MD raid1 (0/2) 8.38g md2 clean in_sync
{28714b52:55b123f5:a6200310:fe6d9686}
│ │ └─md2: [9:2] (swap) 8.38g {1804bbc6-a61b-44ea-9cc9-ac3ce6f17305}
│ └─sdc3: [8:35] MD raid0 (1/2) 1.35t md3 clean in_sync
{8c9c28dd:ac12a9ef:a6200310:fe6d9686}
└─scsi 3:0:0:0 ATA ST31500341AS {9VS13H4N}
└─sdd: [8:48] Empty/Unknown 1.36t
├─sdd1: [8:49] MD raid1 (3/5) 10.24g md1 clean in_sync
{588cbbfd:4835b4da:0d7a0b1c:7bf552bb}
├─sdd2: [8:50] MD raid1 (1/2) 8.38g md2 clean in_sync
{28714b52:55b123f5:a6200310:fe6d9686}
└─sdd3: [8:51] MD raid0 (1/2) 1.35t md4 clean in_sync
{ce213d01:e50809ed:a6200310:fe6d9686}
Still you got the setup at the first look fine without the visualisation :)
>
>
> I suspect it is merely timing. You are using degraded arrays
> deliberately as part of your backup scheme, which means you must be
> using "start_dirty_degraded" as a kernel parameter. That enables
> md7, which you don't want degraded, to start degraded when md6 is a
> hundred or so milliseconds late to the party.
Running rgrep on /etc and /boot reveals no such kernel parameter on this
system. I have never had problems with the arrays not starting, perhaps
it is hard-compiled in debian kernel (lenny)? Config for the current
kernel in /boot does not list any such parameter either.
Woould using this parameter just change the timing?
>
> I think you have a couple options:
>
> 1) Don't run degraded arrays. Use other backup tools.
It took me several years to find a reasonably fast way to offline-backup
that partition with tens of millions of backuppc hardlinks :)
> 2) Remove md7
> from your mdadm.conf in your initramfs. Don't let early userspace
> assemble it. The extra time should then allow your initscripts on
> your real root fs to assemble it with both members. This only works
> if md7 does not contain your real root fs.
Fantastic, I will do so. Just have to find a way to keep different
mdadm.conf in /etc and in initramfs while preserving the useful
update-initramfs functionality :)
>
>> Plus how can can a background reconstruction be started on md6, if
>> it is degraded and the other mirroring part is not even present?
>
> Don't know. Maybe one of your existing drives is occupying a
> major/minor combination that your esata drive occupied on your last
> backup. I'm pretty sure the message is harmless. I noticed that md5
> has a bitmap, but md6 does not. I wonder if adding a bitmap to md6
> would change the timing enough to help you.
Wow, there is bitmap missing on md6 indeed. I swear it was there, in the
past :) It cuts down significantly the synchronization time for offline
copies. I have two offline drive sets - each rotating every two weeks.
One offline set plugs into md5, the other one into md6. This way I can
have two bitmaps, one for each set. Apparently, not now :-)
>
> Relying on timing variations for successful boot doesn't sound great
> to me.
You are right. Hopefully the significantly delayed assembly will work OK.
I very appreciate your help, thanks a lot,
Pavel.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-06-28 12:01 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-27 14:15 Raid auto-assembly upon boot - device order Pavel Hofman
2011-06-27 14:47 ` Phil Turmel
2011-06-28 10:18 ` Pavel Hofman
2011-06-28 11:03 ` Phil Turmel
2011-06-28 12:01 ` Pavel Hofman [this message]
2011-06-28 15:39 ` Phil Turmel
2011-06-28 19:18 ` Pavel Hofman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E09C295.8040102@ivitera.com \
--to=pavel.hofman@ivitera.com \
--cc=linux-raid@vger.kernel.org \
--cc=philip@turmel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox