In this partition scheme, grub does not find md information?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* In this partition scheme, grub does not find md information?
@ 2008-01-29  4:44 Moshe Yudkowsky
  2008-01-29  5:08 ` Neil Brown
  0 siblings, 1 reply; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-29  4:44 UTC (permalink / raw)
  To: linux-raid

I'm finding a problem that isn't covered by the usual FAQs and online 
recipes.

Attempted setup: RAID 10 array with 4 disks.

Because Debian doesn't include RAID10 in its installation disks, I 
created a Debian installation on the first partition of sda, in 
/dev/sda1. Eventually I'll probably convert it to swap, but in the 
meantime that 4G has  a complete 2.6.18 install (Debian stable).

I created a RAID 10 array of four partitions, /dev/md/all, out of 
/dev/sd[abcd]2.

Using fdisk/cfdisk, I created the partition/dev/md/all1 (500 MB) for 
/boot, and the parition /dev/md/all2  with all remaining space into one 
large partition (about 850 GB). That larger partition contains /, /usr, 
/home, etc. each as a separate LVM volume. I copied usr, var, etc. (but 
not proc or sys, of course) files over to the raid array, mounted that 
array, did a chroot to its root, and started grub.

I admit that I'm no grub expert, but it's clear that grub cannot "find" 
any of the information in /dev/md/all1. For example,

grub> find /boot/grub/this_is_raid

can't find a file that exists only on the raid array. Grub only searches 
/dev/sda1, not /dev/md/all1.

Perhaps I'm mistaken but I though it was possible to do boot from 
/dev/md/all1.

I've tried other attacks but without success. For example, also while in 
chroot,

grub-install /dev/md/all2 does not work. (Nor does it work with the 
--root=/boot option.)

I also tried modifications to menu.lst, adding root=/dev/md/all1 to the 
kernel command, but RAID array's version of menu.lst is never detected.

What I do see is

grub> find /boot/grub/stage1
  (hd0,0)

which indicates (as far as I can tell) that it's found the information 
written on /dev/sda1 and nothing in /dev/md/all1.

Am I trying to do something that's basically impossible?

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29  4:44 In this partition scheme, grub does not find md information? Moshe Yudkowsky
@ 2008-01-29  5:08 ` Neil Brown
  2008-01-29 11:02   ` Moshe Yudkowsky
  0 siblings, 1 reply; 60+ messages in thread
From: Neil Brown @ 2008-01-29  5:08 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: linux-raid

On Monday January 28, moshe@pobox.com wrote:
> 
> Perhaps I'm mistaken but I though it was possible to do boot from 
> /dev/md/all1.

It is my understanding that grub cannot boot from RAID.
You can boot from raid1 by the expedient of booting from one of the
halves.
A common approach is to make a small raid1 which contains /boot and
boot from that.  Then use the rest of your devices for raid10 or raid5
or whatever.
> 
> Am I trying to do something that's basically impossible?

I believe so.

NeilBrown

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29  5:08 ` Neil Brown
@ 2008-01-29 11:02   ` Moshe Yudkowsky
  2008-01-29 11:14     ` Peter Rabbitson
  2008-01-29 14:04     ` Keld Jørn Simonsen
  0 siblings, 2 replies; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-29 11:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil, thanks for writing. A couple of follow-up questions to you and the 
group:

Neil Brown wrote:
> On Monday January 28, moshe@pobox.com wrote:
>> Perhaps I'm mistaken but I though it was possible to do boot from 
>> /dev/md/all1.
> 
> It is my understanding that grub cannot boot from RAID.

Ah. Well, even though LILO seems to be less classy and in current 
disfavor, can I boot RAID10/RAID5 from LILO?

> You can boot from raid1 by the expedient of booting from one of the
> halves.

One of the puzzling things about this is that I conceive of RAID10 as 
two RAID1 pairs, with RAID0 on top of to join them into a large drive. 
However, when I use --level=10  to create my md drive, I cannot find out 
which two pairs are the RAID1's: the --detail doesn't give that 
information. Re-reading the md(4) man page, I think I'm badly mistaken 
about RAID10.

Furthermore, since grub cannot find the /boot on the md drive, I deduce 
that RAID10 isn't what the 'net descriptions say it is.

> A common approach is to make a small raid1 which contains /boot and
> boot from that.  Then use the rest of your devices for raid10 or raid5
> or whatever.

Ah. Ny understanding from a previous question to this group was that 
using one partition of the drive for RAID1 and the other for RAID5 would 
(a) create inefficiencies in read/write cycles as the two different md 
drives maintained conflicting internal tables of the overall physical 
drive state and (b) would create problems if one or the other failed.

Under the alternative solution (booting from half of a raid1) since I'm 
booting from just one of the halves or the raid1, I would have to set up 
grub on both halves. If one physical drive fails, grub would fail over 
to the next device.

(My original question was prompted by my theory that multiple RAID5s, 
built out of different partitions, would be faster than a single large 
drive -- more threads to perform calculations during writes to different 
parts of the physical drives.)

>> Am I trying to do something that's basically impossible?
> 
> I believe so.

If the answers above don't lead to a resolution, I can create two RAID1 
pairs and join them using LVM. I would take a hit by using LVM to tie 
the pairs intead of RAID0, I suppose, but I would avoid the performance 
hit of multiple md drives on a single physical drive, and I could even 
run a hot spare through a sparing group. Any comments on the performance 
hit -- is raid1L a really bad idea for some reason?

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
  "It's a sobering thought, for example, to realize that by the time
   he was my age, Mozart had been dead for two years."
    					-- Tom Lehrer

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 11:02   ` Moshe Yudkowsky
@ 2008-01-29 11:14     ` Peter Rabbitson
  2008-01-29 11:29       ` Moshe Yudkowsky
  2008-01-29 14:07       ` Michael Tokarev
  2008-01-29 14:04     ` Keld Jørn Simonsen
  1 sibling, 2 replies; 60+ messages in thread
From: Peter Rabbitson @ 2008-01-29 11:14 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: linux-raid

Moshe Yudkowsky wrote:
> 
> One of the puzzling things about this is that I conceive of RAID10 as 
> two RAID1 pairs, with RAID0 on top of to join them into a large drive. 
> However, when I use --level=10  to create my md drive, I cannot find out 
> which two pairs are the RAID1's: the --detail doesn't give that 
> information. Re-reading the md(4) man page, I think I'm badly mistaken 
> about RAID10.
> 
> Furthermore, since grub cannot find the /boot on the md drive, I deduce 
> that RAID10 isn't what the 'net descriptions say it is.
> 

It is exactly what the names implies - a new kind of RAID :) The setup you 
describe is not RAID10 it is RAID1+0. As far as how linux RAID10 works - here 
is an excellent article: 
http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

Peter

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 11:14     ` Peter Rabbitson
@ 2008-01-29 11:29       ` Moshe Yudkowsky
  2008-01-29 14:09         ` Michael Tokarev
  2008-01-29 14:07       ` Michael Tokarev
  1 sibling, 1 reply; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-29 11:29 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: linux-raid

Peter Rabbitson wrote:

> It is exactly what the names implies - a new kind of RAID :) The setup 
> you describe is not RAID10 it is RAID1+0. As far as how linux RAID10 
> works - here is an excellent article: 
> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

Thanks. Let's just say that the md(4) man page was finally penetrating 
my brain, but the Wikipedia article helped a great deal. I had thought 
md's RAID10 was more "standard."

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
"Rumor is information distilled so finely that it can filter through 
anything."
          --  Terry Pratchet, _Feet of Clay_

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 11:29       ` Moshe Yudkowsky
@ 2008-01-29 14:09         ` Michael Tokarev
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Tokarev @ 2008-01-29 14:09 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Peter Rabbitson, linux-raid

Moshe Yudkowsky wrote:
> Peter Rabbitson wrote:
> 
>> It is exactly what the names implies - a new kind of RAID :) The setup
>> you describe is not RAID10 it is RAID1+0. As far as how linux RAID10
>> works - here is an excellent article:
>> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
> 
> Thanks. Let's just say that the md(4) man page was finally penetrating
> my brain, but the Wikipedia article helped a great deal. I had thought
> md's RAID10 was more "standard."

It is exactly "standard" - when you create it with default settings
and with even number of drives (2, 4, 6, 8, ...), it will be exactly
"standard" raid10 (or raid1+0, whatever) as described in various
places on the net.

But if you use odd number of drives, or if you pass some fancy --layout
option, it will look differently.  Still not suitable for lilo or
grub, at least their current versions.

/mjt

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 11:14     ` Peter Rabbitson
  2008-01-29 11:29       ` Moshe Yudkowsky
@ 2008-01-29 14:07       ` Michael Tokarev
  2008-01-29 14:47         ` Peter Rabbitson
  2008-01-29 14:48         ` Keld Jørn Simonsen
  1 sibling, 2 replies; 60+ messages in thread
From: Michael Tokarev @ 2008-01-29 14:07 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: Moshe Yudkowsky, linux-raid

Peter Rabbitson wrote:
> Moshe Yudkowsky wrote:
>>
>> One of the puzzling things about this is that I conceive of RAID10 as
>> two RAID1 pairs, with RAID0 on top of to join them into a large drive.
>> However, when I use --level=10  to create my md drive, I cannot find
>> out which two pairs are the RAID1's: the --detail doesn't give that
>> information. Re-reading the md(4) man page, I think I'm badly mistaken
>> about RAID10.
>>
>> Furthermore, since grub cannot find the /boot on the md drive, I
>> deduce that RAID10 isn't what the 'net descriptions say it is.

In fact, everything matches.  For lilo to work, it basically needs
a whole filesystem on the same physical drive.  It's exactly the case
with raid1 (and only).  With raid10, half of the filesystem is on one
mirror, and another half is on another mirror.  Like this:

 filesystem          blocks on raid0
 blocks              DiskA    DiskB

  0                  0
  1                           1
  2                  2
  3                           3
  4                  4
  5                           5
  ..

(this is          (this is the actual
what LILO          layout)
expects)

(Difference between raid10 and raid0 is that
each of diskA and diskB is in fact composed of
two identical devices).

If your kernel is located in filesytem blocks
number 2 and 3 for example, lilo has to read
BOTH halves, but it is not smart enough to
figure it out - it can only read everything
from a single drive.

> It is exactly what the names implies - a new kind of RAID :) The setup
> you describe is not RAID10 it is RAID1+0.

Raid10 IS RAID1+0 ;)
It's just that linux raid10 driver can utilize more.. interesting ways
to lay out the data.

/mjt

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 14:07       ` Michael Tokarev
@ 2008-01-29 14:47         ` Peter Rabbitson
  2008-01-29 15:13           ` Michael Tokarev
                             ` (2 more replies)
  2008-01-29 14:48         ` Keld Jørn Simonsen
  1 sibling, 3 replies; 60+ messages in thread
From: Peter Rabbitson @ 2008-01-29 14:47 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Moshe Yudkowsky, linux-raid

Michael Tokarev wrote:
  > Raid10 IS RAID1+0 ;)
> It's just that linux raid10 driver can utilize more.. interesting ways
> to lay out the data.

This is misleading, and adds to the confusion existing even before linux 
raid10. When you say raid10 in the hardware raid world, what do you mean? 
Stripes of mirrors? Mirrors of stripes? Some proprietary extension?

What Neil did was generalize the concept of N drives - M copies, and called it 
10 because it could exactly mimic the layout of conventional 1+0 [*]. However 
thinking about md level 10 in the terms of RAID 1+0 is wrong. Two examples 
(there are many more):

	* mdadm -C -l 10 -n 3 -o f2 /dev/md10 /dev/sda1 /dev/sdb1 /dev/sdc1
Odd number of drives, no parity calculation overhead, yet the setup can still 
suffer a loss of a single drive

	* mdadm -C -l 10 -n 2 -o f2 /dev/md10 /dev/sda1 /dev/sdb1
This seems useless at first, as it effectively creates a RAID1 setup, without 
preserving the FS format on disk. However md10 has read balancing code, so one 
could get a single thread sustained read at a speed twice what he could 
possibly get with md1 in the current implementation

I guess I will sit down tonight and craft some patches to the existing md* man 
pages. Some things are indeed left unsaid.

Peter

[*] The layout is the same but the functionality is different. If you have 1+0 
on 4 drives, you can survive a loss of 2 drives as long as they are part of 
different mirrors. mdadm -C -l 10 -n 4 -o n2 <drives> however will _NOT_ 
survive a loss of 2 drives.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 14:47         ` Peter Rabbitson
@ 2008-01-29 15:13           ` Michael Tokarev
  2008-01-29 15:41             ` Peter Rabbitson
                               ` (2 more replies)
  2008-01-29 15:57           ` In this partition scheme, grub does not find md information? Moshe Yudkowsky
  2008-01-30 11:03           ` David Greaves
  2 siblings, 3 replies; 60+ messages in thread
From: Michael Tokarev @ 2008-01-29 15:13 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: Moshe Yudkowsky, linux-raid

Peter Rabbitson wrote:
> Michael Tokarev wrote:
>  > Raid10 IS RAID1+0 ;)
>> It's just that linux raid10 driver can utilize more.. interesting ways
>> to lay out the data.
> 
> This is misleading, and adds to the confusion existing even before linux
> raid10. When you say raid10 in the hardware raid world, what do you
> mean? Stripes of mirrors? Mirrors of stripes? Some proprietary extension?

Mirrors of stripes makes no sense.

> What Neil did was generalize the concept of N drives - M copies, and
> called it 10 because it could exactly mimic the layout of conventional
> 1+0 [*]. However thinking about md level 10 in the terms of RAID 1+0 is
> wrong. Two examples (there are many more):
> 
>     * mdadm -C -l 10 -n 3 -o f2 /dev/md10 /dev/sda1 /dev/sdb1 /dev/sdc1
                       ^^^^ ^^^^^

Those are "interesting ways"

> Odd number of drives, no parity calculation overhead, yet the setup can
> still suffer a loss of a single drive
> 
>     * mdadm -C -l 10 -n 2 -o f2 /dev/md10 /dev/sda1 /dev/sdb1
                            ^^^^^

And this one too.

There are more-or-less standard raid LEVELS, including
raid10 (which is the same as raid1+0, or a stripe on top
of mirrors - note it does not mean 4 drives, you can
use 6 - stripe over 3 mirrors each of 2 components; or
the reverse - stripe over 2 mirrors of 3 components each
etc).

Vendors often adds their own extensions, sometimes calling
them as the original level, and sometimes giving them new
names, especially in the marketing speak.

Linux raid10 MODULE (which implements that standard raid10
LEVEL in full) adds some quite.. unusual extensions to that
standard raid10 LEVEL.  The resulting layout is also called
raid10 in linux (ie, not giving new names), but it's not that
raid10 (which is again the same as raid1+0) as commonly known
in various literature and on the internet.  Yet raid10 module
fully implements STANDARD raid10 LEVEL.

/mjt

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 15:13           ` Michael Tokarev
@ 2008-01-29 15:41             ` Peter Rabbitson
  2008-01-29 16:51               ` Michael Tokarev
  2008-01-29 16:16             ` Moshe Yudkowsky
  2008-01-29 16:26             ` Keld Jørn Simonsen
  2 siblings, 1 reply; 60+ messages in thread
From: Peter Rabbitson @ 2008-01-29 15:41 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Moshe Yudkowsky, linux-raid

Michael Tokarev wrote:
> Linux raid10 MODULE (which implements that standard raid10
> LEVEL in full) adds some quite.. unusual extensions to that
> standard raid10 LEVEL.  The resulting layout is also called
> raid10 in linux (ie, not giving new names), but it's not that
> raid10 (which is again the same as raid1+0) as commonly known
> in various literature and on the internet.  Yet raid10 module
> fully implements STANDARD raid10 LEVEL.

I will let Neil speak about what he meant by RAID10: whether it is raid10 + 
weird extensions, or a generalization of drive/stripe layouts.

However if you want to be so anal about names and specifications: md raid 10 
is not a _full_ 1+0 implementation. Consider the textbook scenario with 4 drives:

(A mirroring B) striped with (C mirroring D)

When only drives A and C are present, md raid 10 with near offset will not 
start, whereas "standard" RAID 1+0 is expected to keep clunking away.

Peter

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 15:41             ` Peter Rabbitson
@ 2008-01-29 16:51               ` Michael Tokarev
  2008-01-29 17:51                 ` Keld Jørn Simonsen
  0 siblings, 1 reply; 60+ messages in thread
From: Michael Tokarev @ 2008-01-29 16:51 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: Moshe Yudkowsky, linux-raid

Peter Rabbitson wrote:
[]
> However if you want to be so anal about names and specifications: md
> raid 10 is not a _full_ 1+0 implementation. Consider the textbook
> scenario with 4 drives:
> 
> (A mirroring B) striped with (C mirroring D)
> 
> When only drives A and C are present, md raid 10 with near offset will
> not start, whereas "standard" RAID 1+0 is expected to keep clunking away.

Ugh.  Yes. offset is linux extension.

But md raid 10 with default, n2 (without offset), configuration will behave
exactly like in "classic" docs.

Again.  Linux md raid10 module implements standard raid10 as known in
all widely used docs.  And IN ADDITION, it can do OTHER FORMS, which
differs from "classic" variant.  Pretty like a hardware raid card from
a brand vendor probably implements their own variations of standard
raid levels.

/mjt

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 16:51               ` Michael Tokarev
@ 2008-01-29 17:51                 ` Keld Jørn Simonsen
  0 siblings, 0 replies; 60+ messages in thread
From: Keld Jørn Simonsen @ 2008-01-29 17:51 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Peter Rabbitson, Moshe Yudkowsky, linux-raid

On Tue, Jan 29, 2008 at 07:51:07PM +0300, Michael Tokarev wrote:
> Peter Rabbitson wrote:
> []
> > However if you want to be so anal about names and specifications: md
> > raid 10 is not a _full_ 1+0 implementation. Consider the textbook
> > scenario with 4 drives:
> > 
> > (A mirroring B) striped with (C mirroring D)
> > 
> > When only drives A and C are present, md raid 10 with near offset will
> > not start, whereas "standard" RAID 1+0 is expected to keep clunking away.
> 
> Ugh.  Yes. offset is linux extension.
> 
> But md raid 10 with default, n2 (without offset), configuration will behave
> exactly like in "classic" docs.

I would like to understand this fully. What Peter described for mdraid10:
" md raid 10 with near offset " I believe is vanilla raid10 without any
options (or near=2, far=1). Will that not start if we are unlucky to
have 2 drives failing, but we are lucky that the data on the two
remaining drives actually have all the data?

Same question for a raid10,f2 array. I think it would be easy to
investigate, when the number of drives are even, if all data is present,
and then happily run an array with some failing disks.

Say for a 4 drive raid10,f2 disks A and D are failing, then all data
should be present on drives B and C, given that A and C have the even
chunks, and B and D have the odd chunks. Likewise for a 6 drive array,
etc for all multiples of 2, with F2.

best regards
keld

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 15:13           ` Michael Tokarev
  2008-01-29 15:41             ` Peter Rabbitson
@ 2008-01-29 16:16             ` Moshe Yudkowsky
  2008-01-29 16:34               ` Peter Rabbitson
  2008-01-29 16:42               ` Michael Tokarev
  2008-01-29 16:26             ` Keld Jørn Simonsen
  2 siblings, 2 replies; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-29 16:16 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Peter Rabbitson, linux-raid

Michael Tokarev wrote:

> There are more-or-less standard raid LEVELS, including
> raid10 (which is the same as raid1+0, or a stripe on top
> of mirrors - note it does not mean 4 drives, you can
> use 6 - stripe over 3 mirrors each of 2 components; or
> the reverse - stripe over 2 mirrors of 3 components each
> etc).

Here's a baseline question: if I create a RAID10 array using default 
settings, what do I get? I thought I was getting RAID1+0; am I really?

My superblocks, by the way, are marked version 01; my metadata in 
mdadm.conf asked for 1.2. I wonder what I really got. The real question 
in my mind now is why grub can't find the info, and either it's because 
of 1.2 superblocks or because of sub-partitioning of components.


-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
  "You may not be interested in war, but war is interested in you."
    				-- Leon Trotsky

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 16:16             ` Moshe Yudkowsky
@ 2008-01-29 16:34               ` Peter Rabbitson
  2008-01-29 19:34                 ` Moshe Yudkowsky
  2008-01-30 12:01                 ` Peter Rabbitson
  2008-01-29 16:42               ` Michael Tokarev
  1 sibling, 2 replies; 60+ messages in thread
From: Peter Rabbitson @ 2008-01-29 16:34 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Michael Tokarev, linux-raid

Moshe Yudkowsky wrote:
> Here's a baseline question: if I create a RAID10 array using default 
> settings, what do I get? I thought I was getting RAID1+0; am I really?

Maybe you are, depending on your settings, but this is beyond the point. No 
matter what 1+0 you have (linux, classic, or otherwise) you can not boot from 
it, as there is no way to see the underlying filesystem without the RAID layer.

With the current state of affairs (available mainstream bootloaders) the rule is:
Block devices containing the kernel/initrd image _must_ be either:
	* a regular block device (/sda1, /hda, /fd0, etc.)
	* or a linux RAID 1 with the superblock at the end of the device (0.9 or 1.2)


> My superblocks, by the way, are marked version 01; my metadata in 
> mdadm.conf asked for 1.2. I wonder what I really got.

This is how you find the actual raid version:

mdadm -D /dev/md[X] | grep Version

This will return a string of the form XX.YY.ZZ. Your superblock version is XX.YY.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 16:34               ` Peter Rabbitson
@ 2008-01-29 19:34                 ` Moshe Yudkowsky
  2008-01-29 20:21                   ` Keld Jørn Simonsen
                                     ` (2 more replies)
  2008-01-30 12:01                 ` Peter Rabbitson
  1 sibling, 3 replies; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-29 19:34 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: Michael Tokarev, linux-raid

I'd like to thank everyone who wrote in with comments and explanations. 
And in particular it's nice to see that I'm not the only one who's confused.

I'm going to convert back to the RAID 1 setup I had before for /boot, 2 
hot and 2 spare across four drives. No, that's wrong: 4 hot makes the 
most sense.

And given that RAID 10 doesn't seem to confer (for me, as far as I can 
tell) advantages in speed or reliability -- or the ability to mount just 
one surviving disk of a mirrored pair -- over RAID 5, I think I'll 
convert back to RAID 5, put in a hot spare, and do regular backups (as 
always). Oh, and use reiserfs with data=journal.

Comments back:

Peter Rabbitson wrote:

> Maybe you are, depending on your settings, but this is beyond the point. 
> No matter what 1+0 you have (linux, classic, or otherwise) you can not 
> boot from it, as there is no way to see the underlying filesystem 
> without the RAID layer.

Sir, thank you for this unequivocal comment. This comment clears up all 
my confusion. I had a wrong mental model of how file system maps work.

> With the current state of affairs (available mainstream bootloaders) the 
> rule is:
> Block devices containing the kernel/initrd image _must_ be either:
>     * a regular block device (/sda1, /hda, /fd0, etc.)
>     * or a linux RAID 1 with the superblock at the end of the device 
> (0.9 or 1.2)

Thaks even more: 1.2 it is.

> This is how you find the actual raid version:
> 
> mdadm -D /dev/md[X] | grep Version
> 
> This will return a string of the form XX.YY.ZZ. Your superblock version 
> is XX.YY.

Ah hah!

Mr. Tokarev wrote:

> By the way, on all our systems I use small (256Mb for small-software systems,
> sometimes 512M, but 1G should be sufficient) partition for a root filesystem
> (/etc, /bin, /sbin, /lib, and /boot), and put it on a raid1 on all...
> ... doing [it]
> this way, you always have all the tools necessary to repair a damaged system
> even in case your raid didn't start, or you forgot where your root disk is
> etc etc.

An excellent idea. I was going to put just /boot on the RAID 1, but 
there's no reason why I can't add a bit more room and put them all 
there. (Because I was having so much fun on the install, I'm using 4GB 
that I was going to use for swap space to mount base install and I'm 
working from their to build the RAID. Same idea.)

Hmmm... I wonder if this more expansive /bin, /sbin, and /lib causes 
hits on the RAID1 drive which ultimately degrade overall performance? 
/lib is hit only at boot time to load the kernel, I'll guess, but /bin 
includes such common tools as bash and grep.

> Also, placing /dev on a tmpfs helps alot to minimize number of writes
> necessary for root fs.

Another interesting idea. I'm not familiar with using tmpfs (no need, 
until now); but I wonder how you create the devices you need when you're 
doing a rescue.

Again, my thanks to everyone who responded and clarified.

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
"Practically perfect people never permit sentiment to muddle their 
thinking."
    			-- Mary Poppins

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 19:34                 ` Moshe Yudkowsky
@ 2008-01-29 20:21                   ` Keld Jørn Simonsen
  2008-01-29 22:14                     ` Moshe Yudkowsky
  2008-01-29 23:44                   ` Bill Davidsen
  2008-01-30 13:11                   ` Michael Tokarev
  2 siblings, 1 reply; 60+ messages in thread
From: Keld Jørn Simonsen @ 2008-01-29 20:21 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Peter Rabbitson, Michael Tokarev, linux-raid

On Tue, Jan 29, 2008 at 01:34:37PM -0600, Moshe Yudkowsky wrote:
> 
> I'm going to convert back to the RAID 1 setup I had before for /boot, 2 
> hot and 2 spare across four drives. No, that's wrong: 4 hot makes the 
> most sense.
> 
> And given that RAID 10 doesn't seem to confer (for me, as far as I can 
> tell) advantages in speed or reliability -- or the ability to mount just 
> one surviving disk of a mirrored pair -- over RAID 5, I think I'll 
> convert back to RAID 5, put in a hot spare, and do regular backups (as 
> always). Oh, and use reiserfs with data=journal.

Hmm, my idea was to use a raid10,f2 4 disk raid for the /root, or a o2
layout. I think it would offer quite some speed advantage over raid5. 
At least I had on a 4 disk raid5 only a random performance of about 130
MB/s while the raid10 gave 180-200 MB/s. Also sequential read was
significantly faster on raid10. I do think I can get about 320 MB/s 
on the raid10,f2, but I need to have a bigger power supply to support my
disks before I can go on testing. The key here is bigger readahead.
I only got 150 MB/s for raid5 sequential reads. 

I think the sequential read could be significant in the boot time,
and then for the single user running on the system, namely the system
administrator (=me), even under reasonable load.

I would be interested if you would experiment with this wrt boot time,
for example the difference between /root on a raid5, raid10,f2 and raid10,o2.

> Comments back:
> 
> Mr. Tokarev wrote:
> 
> >By the way, on all our systems I use small (256Mb for small-software 
> >systems,
> >sometimes 512M, but 1G should be sufficient) partition for a root 
> >filesystem
> >(/etc, /bin, /sbin, /lib, and /boot), and put it on a raid1 on all...
> >... doing [it]
> >this way, you always have all the tools necessary to repair a damaged 
> >system
> >even in case your raid didn't start, or you forgot where your root disk is
> >etc etc.
> 
> An excellent idea. I was going to put just /boot on the RAID 1, but 
> there's no reason why I can't add a bit more room and put them all 
> there. (Because I was having so much fun on the install, I'm using 4GB 
> that I was going to use for swap space to mount base install and I'm 
> working from their to build the RAID. Same idea.)

If you put more than /boot on the raid1, then you will not get the added
performance of raid10 for all your system utilities. 

I am not sure about redundance, but a raid1 and a raid10 should be
equally vulnerable to a 1 disk faliure. If you use a 4 disk raid1 for 
/root, then of cause you can survive 3 disk crashes.

I am not sure that 4 disks in a raid1 for /root give added performance, 
as grub only sees the /root raid1 as a normal disk, but maybe some kind of
remounting makes it get its raid behaviour.

> >Also, placing /dev on a tmpfs helps alot to minimize number of writes
> >necessary for root fs.

I thought of using the noatime mount option for /root.

best regards
Keld

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 20:21                   ` Keld Jørn Simonsen
@ 2008-01-29 22:14                     ` Moshe Yudkowsky
  2008-01-29 23:45                       ` Bill Davidsen
  2008-01-30  0:17                       ` Keld Jørn Simonsen
  0 siblings, 2 replies; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-29 22:14 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

Keld Jørn Simonsen wrote:

Based on your reports of better performance on RAID10 -- which are more 
significant that I'd expected -- I'll just go with RAID10. The only 
question now is if LVM is worth the performance hit or not.

> I would be interested if you would experiment with this wrt boot time,
> for example the difference between /root on a raid5, raid10,f2 and raid10,o2.

According to man md(4), the o2 is likely to offer the best combination 
of read and write performance. Why would you consider f2 instead?

I'm unlike to do any testing beyond running bonnie++ or something 
similar once it's installed.

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 22:14                     ` Moshe Yudkowsky
@ 2008-01-29 23:45                       ` Bill Davidsen
  2008-01-30  0:13                         ` Moshe Yudkowsky
  2008-01-30  0:17                       ` Keld Jørn Simonsen
  1 sibling, 1 reply; 60+ messages in thread
From: Bill Davidsen @ 2008-01-29 23:45 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Keld Jørn Simonsen, linux-raid

Moshe Yudkowsky wrote:
> Keld Jørn Simonsen wrote:
>
> Based on your reports of better performance on RAID10 -- which are 
> more significant that I'd expected -- I'll just go with RAID10. The 
> only question now is if LVM is worth the performance hit or not.
>
>> I would be interested if you would experiment with this wrt boot time,
>> for example the difference between /root on a raid5, raid10,f2 and 
>> raid10,o2.
>
> According to man md(4), the o2 is likely to offer the best combination 
> of read and write performance. Why would you consider f2 instead?
>
f2 is faster for read, most systems spend more time reading than writing.

> I'm unlike to do any testing beyond running bonnie++ or something 
> similar once it's installed.
>
>


-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 23:45                       ` Bill Davidsen
@ 2008-01-30  0:13                         ` Moshe Yudkowsky
  2008-01-30 22:36                           ` Bill Davidsen
  0 siblings, 1 reply; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-30  0:13 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Keld Jørn Simonsen, linux-raid

Bill Davidsen wrote:

>> According to man md(4), the o2 is likely to offer the best combination 
>> of read and write performance. Why would you consider f2 instead?
>>
> f2 is faster for read, most systems spend more time reading than writing.

According to md(4), offset "should give similar read characteristics to 
'far' if a suitably large chunk size is used, but without as much 
seeking for writes."

Is the man page not correct, conditionally true, or simply not 
understood by me (most likely case)?

I wonder what "suitably large" is...

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
  "The seconds marched past, transversing that mysterious boundary that
   separates the future from the past."
    			-- Jack Vance, "The Face"

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30  0:13                         ` Moshe Yudkowsky
@ 2008-01-30 22:36                           ` Bill Davidsen
  0 siblings, 0 replies; 60+ messages in thread
From: Bill Davidsen @ 2008-01-30 22:36 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Keld Jørn Simonsen, linux-raid

Moshe Yudkowsky wrote:
> Bill Davidsen wrote:
>
>>> According to man md(4), the o2 is likely to offer the best 
>>> combination of read and write performance. Why would you consider f2 
>>> instead?
>>>
>> f2 is faster for read, most systems spend more time reading than 
>> writing.
>
> According to md(4), offset "should give similar read characteristics 
> to 'far' if a suitably large chunk size is used, but without as much 
> seeking for writes."
>
> Is the man page not correct, conditionally true, or simply not 
> understood by me (most likely case)?
>
> I wonder what "suitably large" is...
>
My personal experience is that as chunk gets larger random write gets 
slower, sequential gets faster. I don't have numbers any more, but 
20-30% is sort of the limit of what I saw for any chunk size I consider 
reasonable. f2 is faster for sequential reading, tune your system to 
annoy you least. ;-)

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 22:14                     ` Moshe Yudkowsky
  2008-01-29 23:45                       ` Bill Davidsen
@ 2008-01-30  0:17                       ` Keld Jørn Simonsen
  1 sibling, 0 replies; 60+ messages in thread
From: Keld Jørn Simonsen @ 2008-01-30  0:17 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: linux-raid

On Tue, Jan 29, 2008 at 04:14:24PM -0600, Moshe Yudkowsky wrote:
> Keld Jørn Simonsen wrote:
> 
> Based on your reports of better performance on RAID10 -- which are more 
> significant that I'd expected -- I'll just go with RAID10. The only 
> question now is if LVM is worth the performance hit or not.

Hmm, LVM for what purpose? For the root system, I think it is not 
an issue. Just have a large enough partition, it is not more than 10- 20
GB anyway, which is around 1 % of the disk sizes that we talk about
today with new disks in raids.

> >I would be interested if you would experiment with this wrt boot time,
> >for example the difference between /root on a raid5, raid10,f2 and 
> >raid10,o2.
> 
> According to man md(4), the o2 is likely to offer the best combination 
> of read and write performance. Why would you consider f2 instead?

I have no experiences with o2, and little experience with f2.
But I kind of designed f2. I have not fully grasped o2 yet. 

But my take is that for writes, this would be random writes, and that is
almost the same for all layouts. However, when/if a disk is faulty, then 
f2 has considerably worse performance for sequential reads,
approximating the performance of random reads, which in some cases is
about half the speed of sequential reads. For sequential reads and
random reads I think f2 would be faster than o2, due to the smaller 
average seek times, and use of the faster part of the disk.

I am still wondering how o2 gets to do striping, I don't understand it
given the layout schemes I have seen. F2 OTOH is designed for striping.

I would like to see some figures, tho. My testing environment is, as
said, not operationable right now, but will be OK possibly later this
week.

> I'm unlike to do any testing beyond running bonnie++ or something 
> similar once it's installed.

I do some crude testing like reading concurrently 1000 files of 20 MB, 
and then just cat file >/dev/null of a 4 GB file. The RAM caches needs
to be not capable of holding the files.

Looking on boot times could also be interesting. I would like as litte
downtime as possible.

But it depends on your purpose and thus pattern of use. Many systems
tend to be read oriented, and for that I think f2 is the better
alternative.

best regards
keld
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 19:34                 ` Moshe Yudkowsky
  2008-01-29 20:21                   ` Keld Jørn Simonsen
@ 2008-01-29 23:44                   ` Bill Davidsen
  2008-01-30  0:22                     ` Keld Jørn Simonsen
  2008-01-30 13:11                   ` Michael Tokarev
  2 siblings, 1 reply; 60+ messages in thread
From: Bill Davidsen @ 2008-01-29 23:44 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Peter Rabbitson, Michael Tokarev, linux-raid

Moshe Yudkowsky wrote:
> I'd like to thank everyone who wrote in with comments and 
> explanations. And in particular it's nice to see that I'm not the only 
> one who's confused.
>
> I'm going to convert back to the RAID 1 setup I had before for /boot, 
> 2 hot and 2 spare across four drives. No, that's wrong: 4 hot makes 
> the most sense.
>
> And given that RAID 10 doesn't seem to confer (for me, as far as I can 
> tell) advantages in speed or reliability -- or the ability to mount 
> just one surviving disk of a mirrored pair -- over RAID 5, I think 
> I'll convert back to RAID 5, put in a hot spare, and do regular 
> backups (as always). Oh, and use reiserfs with data=journal.
>
Depending on near/far choices, raid10 should be faster than raid5, with 
far read should be quite a bit faster. You can't boot off raid10, and if 
you put your swap on it many recovery CDs won't use it. But for general 
use and swap on a normally booted system it is quite fast.
> Comments back:
>
> Peter Rabbitson wrote:
>
>> Maybe you are, depending on your settings, but this is beyond the 
>> point. No matter what 1+0 you have (linux, classic, or otherwise) you 
>> can not boot from it, as there is no way to see the underlying 
>> filesystem without the RAID layer.
>
> Sir, thank you for this unequivocal comment. This comment clears up 
> all my confusion. I had a wrong mental model of how file system maps 
> work.
>
>> With the current state of affairs (available mainstream bootloaders) 
>> the rule is:
>> Block devices containing the kernel/initrd image _must_ be either:
>>     * a regular block device (/sda1, /hda, /fd0, etc.)
>>     * or a linux RAID 1 with the superblock at the end of the device 
>> (0.9 or 1.2)
>
> Thaks even more: 1.2 it is.
>
>> This is how you find the actual raid version:
>>
>> mdadm -D /dev/md[X] | grep Version
>>
>> This will return a string of the form XX.YY.ZZ. Your superblock 
>> version is XX.YY.
>
> Ah hah!
>
> Mr. Tokarev wrote:
>
>> By the way, on all our systems I use small (256Mb for small-software 
>> systems,
>> sometimes 512M, but 1G should be sufficient) partition for a root 
>> filesystem
>> (/etc, /bin, /sbin, /lib, and /boot), and put it on a raid1 on all...
>> ... doing [it]
>> this way, you always have all the tools necessary to repair a damaged 
>> system
>> even in case your raid didn't start, or you forgot where your root 
>> disk is
>> etc etc.
>
> An excellent idea. I was going to put just /boot on the RAID 1, but 
> there's no reason why I can't add a bit more room and put them all 
> there. (Because I was having so much fun on the install, I'm using 4GB 
> that I was going to use for swap space to mount base install and I'm 
> working from their to build the RAID. Same idea.)
>
> Hmmm... I wonder if this more expansive /bin, /sbin, and /lib causes 
> hits on the RAID1 drive which ultimately degrade overall performance? 
> /lib is hit only at boot time to load the kernel, I'll guess, but /bin 
> includes such common tools as bash and grep.
>
>> Also, placing /dev on a tmpfs helps alot to minimize number of writes
>> necessary for root fs.
>
> Another interesting idea. I'm not familiar with using tmpfs (no need, 
> until now); but I wonder how you create the devices you need when 
> you're doing a rescue.
>
> Again, my thanks to everyone who responded and clarified.
>


-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 23:44                   ` Bill Davidsen
@ 2008-01-30  0:22                     ` Keld Jørn Simonsen
  2008-01-30  0:26                       ` Peter Rabbitson
  2008-01-30  0:32                       ` Moshe Yudkowsky
  0 siblings, 2 replies; 60+ messages in thread
From: Keld Jørn Simonsen @ 2008-01-30  0:22 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Moshe Yudkowsky, Peter Rabbitson, Michael Tokarev, linux-raid

On Tue, Jan 29, 2008 at 06:44:20PM -0500, Bill Davidsen wrote:

> Depending on near/far choices, raid10 should be faster than raid5, with 
> far read should be quite a bit faster. You can't boot off raid10, and if 
> you put your swap on it many recovery CDs won't use it. But for general 
> use and swap on a normally booted system it is quite fast.

Hmm, why would you put swap on a raid10? I would in a production
environment always put it on separate swap partitions, possibly a number,
given that a number of drives are available.

best regards
keld

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30  0:22                     ` Keld Jørn Simonsen
@ 2008-01-30  0:26                       ` Peter Rabbitson
  2008-01-30 22:39                         ` Bill Davidsen
  2008-01-30  0:32                       ` Moshe Yudkowsky
  1 sibling, 1 reply; 60+ messages in thread
From: Peter Rabbitson @ 2008-01-30  0:26 UTC (permalink / raw)
  To: Keld Jørn Simonsen
  Cc: Bill Davidsen, Moshe Yudkowsky, Michael Tokarev, linux-raid

Keld Jørn Simonsen wrote:
> On Tue, Jan 29, 2008 at 06:44:20PM -0500, Bill Davidsen wrote:
> 
>> Depending on near/far choices, raid10 should be faster than raid5, with 
>> far read should be quite a bit faster. You can't boot off raid10, and if 
>> you put your swap on it many recovery CDs won't use it. But for general 
>> use and swap on a normally booted system it is quite fast.
> 
> Hmm, why would you put swap on a raid10? I would in a production
> environment always put it on separate swap partitions, possibly a number,
> given that a number of drives are available.
> 

Because you want some redundancy for the swap as well. A swap partition/file 
becoming inaccessible is equivalent to yanking out a stick of memory out of 
your motherboard.

Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30  0:26                       ` Peter Rabbitson
@ 2008-01-30 22:39                         ` Bill Davidsen
  0 siblings, 0 replies; 60+ messages in thread
From: Bill Davidsen @ 2008-01-30 22:39 UTC (permalink / raw)
  To: Peter Rabbitson
  Cc: Keld Jørn Simonsen, Moshe Yudkowsky, Michael Tokarev,
	linux-raid

Peter Rabbitson wrote:
> Keld Jørn Simonsen wrote:
>> On Tue, Jan 29, 2008 at 06:44:20PM -0500, Bill Davidsen wrote:
>>
>>> Depending on near/far choices, raid10 should be faster than raid5, 
>>> with far read should be quite a bit faster. You can't boot off 
>>> raid10, and if you put your swap on it many recovery CDs won't use 
>>> it. But for general use and swap on a normally booted system it is 
>>> quite fast.
>>
>> Hmm, why would you put swap on a raid10? I would in a production
>> environment always put it on separate swap partitions, possibly a 
>> number,
>> given that a number of drives are available.
>>
>
> Because you want some redundancy for the swap as well. A swap 
> partition/file becoming inaccessible is equivalent to yanking out a 
> stick of memory out of your motherboard.

I can't say it better. Losing a swap area will make the system fail in 
one way or the other, in my systems typicalls expressed as a crash of 
varying severity. I use raid10 because it is the fastest reliable level 
I've found.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30  0:22                     ` Keld Jørn Simonsen
  2008-01-30  0:26                       ` Peter Rabbitson
@ 2008-01-30  0:32                       ` Moshe Yudkowsky
  2008-01-30  0:53                         ` Keld Jørn Simonsen
  1 sibling, 1 reply; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-30  0:32 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid


> Hmm, why would you put swap on a raid10? I would in a production
> environment always put it on separate swap partitions, possibly a number,
> given that a number of drives are available.

I put swap onto non-RAID, separate partitions on all 4 disks.

In a production server, however, I'd use swap on RAID in order to 
prevent server downtime if a disk fails -- a suddenly bad swap can 
easily (will absolutely?) cause the server to crash (even though you can 
boot the server up again afterwards on the surviving swap partitions).

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
  "She will have fun who knows when to work
   and when not to work."
    			-- Segami

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30  0:32                       ` Moshe Yudkowsky
@ 2008-01-30  0:53                         ` Keld Jørn Simonsen
  2008-01-30  1:00                           ` Moshe Yudkowsky
  0 siblings, 1 reply; 60+ messages in thread
From: Keld Jørn Simonsen @ 2008-01-30  0:53 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: linux-raid

On Tue, Jan 29, 2008 at 06:32:54PM -0600, Moshe Yudkowsky wrote:
> 
> >Hmm, why would you put swap on a raid10? I would in a production
> >environment always put it on separate swap partitions, possibly a number,
> >given that a number of drives are available.
> 
> In a production server, however, I'd use swap on RAID in order to 
> prevent server downtime if a disk fails -- a suddenly bad swap can 
> easily (will absolutely?) cause the server to crash (even though you can 
> boot the server up again afterwards on the surviving swap partitions).

I see. Which file system type would be good for this?
I normally use XFS but maybe other FS is better, given that swap is used
very randomly 8read/write).

Will a bad swap crash the system?

best regards
keld

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30  0:53                         ` Keld Jørn Simonsen
@ 2008-01-30  1:00                           ` Moshe Yudkowsky
  2008-01-31 14:40                             ` Bill Davidsen
  0 siblings, 1 reply; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-30  1:00 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

Keld Jørn Simonsen wrote:
> On Tue, Jan 29, 2008 at 06:32:54PM -0600, Moshe Yudkowsky wrote:
>>> Hmm, why would you put swap on a raid10? I would in a production
>>> environment always put it on separate swap partitions, possibly a number,
>>> given that a number of drives are available.
>> In a production server, however, I'd use swap on RAID in order to 
>> prevent server downtime if a disk fails -- a suddenly bad swap can 
>> easily (will absolutely?) cause the server to crash (even though you can 
>> boot the server up again afterwards on the surviving swap partitions).
> 
> I see. Which file system type would be good for this?
> I normally use XFS but maybe other FS is better, given that swap is used
> very randomly 8read/write).
> 
> Will a bad swap crash the system?

Well, Peter says it will, and that's good enough for me. :-)

As for which file system: I would use fdisk to partition the md disk and 
then use mkswap on the partition to make it into a swap partition. It's 
a naive approach but I suspect it's almost certainly the correct one.

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
  "There are more ways to skin a cat than nuking it from orbit
     -- but it's the only way to be sure."
     				-- Eliezer Yudkowsky
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30  1:00                           ` Moshe Yudkowsky
@ 2008-01-31 14:40                             ` Bill Davidsen
  0 siblings, 0 replies; 60+ messages in thread
From: Bill Davidsen @ 2008-01-31 14:40 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Keld Jørn Simonsen, linux-raid

Moshe Yudkowsky wrote:
> Keld Jørn Simonsen wrote:
>> On Tue, Jan 29, 2008 at 06:32:54PM -0600, Moshe Yudkowsky wrote:
>>>> Hmm, why would you put swap on a raid10? I would in a production
>>>> environment always put it on separate swap partitions, possibly a 
>>>> number,
>>>> given that a number of drives are available.
>>> In a production server, however, I'd use swap on RAID in order to 
>>> prevent server downtime if a disk fails -- a suddenly bad swap can 
>>> easily (will absolutely?) cause the server to crash (even though you 
>>> can boot the server up again afterwards on the surviving swap 
>>> partitions).
>>
>> I see. Which file system type would be good for this?
>> I normally use XFS but maybe other FS is better, given that swap is used
>> very randomly 8read/write).
>>
>> Will a bad swap crash the system?
>
> Well, Peter says it will, and that's good enough for me. :-)
>
I've done unplanned research into this, it will crash the system, and if 
you're unlucky some part of what's needed for a graceful crash will be 
swapped out :-(
> As for which file system: I would use fdisk to partition the md disk 
> and then use mkswap on the partition to make it into a swap partition. 
> It's a naive approach but I suspect it's almost certainly the correct 
> one.
>
I generally dedicate a partition of each drive to swap, but the type is 
"raid array." Then I create a raid10 on that set of partitions and 
mkswap on the md device. While raid10 is fast and reliable, raid[56] 
have similar reliability and a higher usable space from any given 
configuration.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 19:34                 ` Moshe Yudkowsky
  2008-01-29 20:21                   ` Keld Jørn Simonsen
  2008-01-29 23:44                   ` Bill Davidsen
@ 2008-01-30 13:11                   ` Michael Tokarev
  2008-01-30 14:10                     ` Moshe Yudkowsky
  2 siblings, 1 reply; 60+ messages in thread
From: Michael Tokarev @ 2008-01-30 13:11 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Peter Rabbitson, linux-raid

Moshe Yudkowsky wrote:
[]
> Mr. Tokarev wrote:
> 
>> By the way, on all our systems I use small (256Mb for small-software systems,
>> sometimes 512M, but 1G should be sufficient) partition for a root filesystem
>> (/etc, /bin, /sbin, /lib, and /boot), and put it on a raid1 on all...
>> ... doing [it]
>> this way, you always have all the tools necessary to repair a damaged system
>> even in case your raid didn't start, or you forgot where your root disk is
>> etc etc.
> 
> An excellent idea. I was going to put just /boot on the RAID 1, but
> there's no reason why I can't add a bit more room and put them all
> there. (Because I was having so much fun on the install, I'm using 4GB
> that I was going to use for swap space to mount base install and I'm
> working from their to build the RAID. Same idea.)
> 
> Hmmm... I wonder if this more expansive /bin, /sbin, and /lib causes
> hits on the RAID1 drive which ultimately degrade overall performance?
> /lib is hit only at boot time to load the kernel, I'll guess, but /bin
> includes such common tools as bash and grep.

You don't care of the speed of your root filesystem.  Note there are
two speeds - write and read.

You only write to root (including /bin and /lib and so on) during
software (re)install and during some configuration work (writing
/etc/password and the like).  First is very infrequent, and both
needs only a few writes, -- so write speed isn't important.

Read speed also not that important, because most commonly used
stuff from there will be cached anyway (like libc.so, bash and
grep), and again, reading such tiny stuff - it doesn't matter
if it's "fast" raid or a slow one.

What you do care about the speed of devices where your large,
commonly accessed/modified files - such as video files esp.
when you want streaming video - are resides.  And even here,
unless you've special requirement for speed, you will not
notice any difference between "slow" and "fast" raid levels.

For typical filesystem usage, raid5 works good for both reads
and (cached, delayed) writes.  It's workloads like databases
where raid5 performs badly.

What you do care about is your data integrity.  It's not really
interesting to reinstall a system or lose your data in case if
something goes wrong, and it's best to have recovery tools as
easily available as possible.  Plus, amount of space you need.

>> Also, placing /dev on a tmpfs helps alot to minimize number of writes
>> necessary for root fs.
> 
> Another interesting idea. I'm not familiar with using tmpfs (no need,
> until now); but I wonder how you create the devices you need when you're
> doing a rescue.

When you start udev, your /dev will be on tmpfs.

/mjt

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30 13:11                   ` Michael Tokarev
@ 2008-01-30 14:10                     ` Moshe Yudkowsky
  2008-01-30 14:41                       ` Michael Tokarev
  2008-01-31 14:59                       ` Bill Davidsen
  0 siblings, 2 replies; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-30 14:10 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-raid

Michael Tokarev wrote:

> You only write to root (including /bin and /lib and so on) during
> software (re)install and during some configuration work (writing
> /etc/password and the like).  First is very infrequent, and both
> needs only a few writes, -- so write speed isn't important.

Thanks, but I didn't make myself clear. The preformance problem I'm 
concerned about was having different md drives accessing different 
partitions.

For example, I can partition the drives as follows:

/dev/sd[abcd]1 -- RAID1, /boot

/dev/sd[abcd]2 -- RAID5, the rest of the file system

I originally had asked, way back when, if having different md drives on 
different partitions of the *same* disk was a problem for perfomance -- 
  or if, for some reason (e.g., threading) it was actually smarter to do 
it that way. The answer I received was from Iustin Pop, who said :

Iustin Pop wrote:
> md code works better if it's only one array per physical drive,
>     because it keeps statistics per array (like last accessed sector,
>     etc.) and if you combine two arrays on the same drive these
>     statistics are not exactly true anymore

So if I use /boot on its own drive and it's only accessed at startup, 
the /boot will only be accessed that one time and afterwards won't cause 
problems for the drive statistics. However, if I use put /boot, /bin, 
and /sbin on this RAID1 drive, it will always be accessed and it might 
create a performance issue.

To return to that peformance question, since I have to create at least 2 
md drives using different partitions, I wonder if it's smarter to create 
multiple md drives for better performance.

/dev/sd[abcd]1 -- RAID1, the /boot, /dev, /bin/, /sbin

/dev/sd[abcd]2 -- RAID5, most of the rest of the file system

/dev/sd[abcd]3 -- RAID10 o2, a drive that does a lot of downloading (writes)

> For typical filesystem usage, raid5 works good for both reads
> and (cached, delayed) writes.  It's workloads like databases
> where raid5 performs badly.

Ah, very interesting. Is this true even for (dare I say it?) bittorrent 
downloads?

> What you do care about is your data integrity.  It's not really
> interesting to reinstall a system or lose your data in case if
> something goes wrong, and it's best to have recovery tools as
> easily available as possible.  Plus, amount of space you need.

Sure, I understand. And backing up in case someone steals your server. 
But did you have something specific in mind when you wrote this? Don't 
all these configurations (RAID5 vs. RAID10) have equal recovery tools?

Or were you referring to the file system? Reiserfs and XFS both seem to 
have decent recovery tools. LVM is a little tempting because it allows 
for snapshots, but on the other hand I wonder if I'd find it useful.

>>> Also, placing /dev on a tmpfs helps alot to minimize number of writes
>>> necessary for root fs.
>> Another interesting idea. I'm not familiar with using tmpfs (no need,
>> until now); but I wonder how you create the devices you need when you're
>> doing a rescue.
> 
> When you start udev, your /dev will be on tmpfs.

Sure, that's what mount shows me right now -- using a standard Debian 
install. What did you suggest I change?

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
"Many that live deserve death. And some that die deserve life. Can you 
give it to
them? Then do not be too eager to deal out death in judgement. For even the
wise cannot see all ends."
				-- Gandalf (J.R.R. Tolkien)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30 14:10                     ` Moshe Yudkowsky
@ 2008-01-30 14:41                       ` Michael Tokarev
  2008-01-31 14:59                       ` Bill Davidsen
  1 sibling, 0 replies; 60+ messages in thread
From: Michael Tokarev @ 2008-01-30 14:41 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: linux-raid

Moshe Yudkowsky wrote:
> Michael Tokarev wrote:
> 
>> You only write to root (including /bin and /lib and so on) during
>> software (re)install and during some configuration work (writing
>> /etc/password and the like).  First is very infrequent, and both
>> needs only a few writes, -- so write speed isn't important.
> 
> Thanks, but I didn't make myself clear. The preformance problem I'm
> concerned about was having different md drives accessing different
> partitions.
> 
> For example, I can partition the drives as follows:
> 
> /dev/sd[abcd]1 -- RAID1, /boot
> 
> /dev/sd[abcd]2 -- RAID5, the rest of the file system
> 
> I originally had asked, way back when, if having different md drives on
> different partitions of the *same* disk was a problem for perfomance --
>  or if, for some reason (e.g., threading) it was actually smarter to do
> it that way. The answer I received was from Iustin Pop, who said :
> 
> Iustin Pop wrote:
>> md code works better if it's only one array per physical drive,
>>     because it keeps statistics per array (like last accessed sector,
>>     etc.) and if you combine two arrays on the same drive these
>>     statistics are not exactly true anymore
> 
> So if I use /boot on its own drive and it's only accessed at startup,
> the /boot will only be accessed that one time and afterwards won't cause
> problems for the drive statistics. However, if I use put /boot, /bin,
> and /sbin on this RAID1 drive, it will always be accessed and it might
> create a performance issue.

To be fair, I didn't notice any measurable difference in real life
usage - be it single (probably partitioned further) large raid array
or several separate arrays on different partitions - at least when
there are two components - the "core system" (root fs) and the rest.
Sure, theoretically it should be different, but it seems that in
practice it doesn't make much of a difference.

>> For typical filesystem usage, raid5 works good for both reads
>> and (cached, delayed) writes.  It's workloads like databases
>> where raid5 performs badly.
> 
> Ah, very interesting. Is this true even for (dare I say it?) bittorrent
> downloads?

I don't see why not.  Bittorrent (and the like) writes quite
intelligently, doing alot of buffering of its own.  It writes
SLOWLY.  And it allows the filesystem to cache and optimize
writes.

>> What you do care about is your data integrity.  It's not really
>> interesting to reinstall a system or lose your data in case if
>> something goes wrong, and it's best to have recovery tools as
>> easily available as possible.  Plus, amount of space you need.
> 
> Sure, I understand. And backing up in case someone steals your server.
> But did you have something specific in mind when you wrote this? Don't
> all these configurations (RAID5 vs. RAID10) have equal recovery tools?

Well, I mean that if you've all the basic tools available on the
system even without raid (i mean, root fs on raid1 without any fancy
stuff), you probably have more chances for recovery if it ever will
be necessary.

Yes, reconstructing raid10 is a bit easier than raid5, when the
talk is about MANUAL reconstructing.  But this is something usually
not done anyway, because of complexity, easy to throw away the
data by mistake, and because mdadm somewhat works for recovery
by its own (there are cases when I know how to manually reconstruct
the array data when mdadm can't help me - for example, in case of
raid1 with two half-failed drives - ie, first half of driveA and
second half of driveB works - mdadm wont let me recover from such
situation even if I know that all my data is here).  So basically
there's no difference in "recoverability" of raid5 vs raid10.

>>>> Also, placing /dev on a tmpfs helps alot to minimize number of writes
>>>> necessary for root fs.
>>> Another interesting idea. I'm not familiar with using tmpfs (no need,
>>> until now); but I wonder how you create the devices you need when you're
>>> doing a rescue.
>>
>> When you start udev, your /dev will be on tmpfs.
> 
> Sure, that's what mount shows me right now -- using a standard Debian
> install. What did you suggest I change?

I didn't suggest any change.  I just pointed out that /dev on a tmpfs
reduces writes for root filesystem (as is mounting with -o noatime or
-o nodiratime).  With udev, /dev is already on a tmpfs.

/mjt

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30 14:10                     ` Moshe Yudkowsky
  2008-01-30 14:41                       ` Michael Tokarev
@ 2008-01-31 14:59                       ` Bill Davidsen
  2008-02-02 20:17                         ` Bill Davidsen
  1 sibling, 1 reply; 60+ messages in thread
From: Bill Davidsen @ 2008-01-31 14:59 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Michael Tokarev, linux-raid

Moshe Yudkowsky wrote:
> Michael Tokarev wrote:
>
>> You only write to root (including /bin and /lib and so on) during
>> software (re)install and during some configuration work (writing
>> /etc/password and the like).  First is very infrequent, and both
>> needs only a few writes, -- so write speed isn't important.
>
> Thanks, but I didn't make myself clear. The preformance problem I'm 
> concerned about was having different md drives accessing different 
> partitions.
>
> For example, I can partition the drives as follows:
>
> /dev/sd[abcd]1 -- RAID1, /boot
>
> /dev/sd[abcd]2 -- RAID5, the rest of the file system
>
> I originally had asked, way back when, if having different md drives 
> on different partitions of the *same* disk was a problem for 
> perfomance --  or if, for some reason (e.g., threading) it was 
> actually smarter to do it that way. The answer I received was from 
> Iustin Pop, who said :
>
> Iustin Pop wrote:
>> md code works better if it's only one array per physical drive,
>>     because it keeps statistics per array (like last accessed sector,
>>     etc.) and if you combine two arrays on the same drive these
>>     statistics are not exactly true anymore
>
> So if I use /boot on its own drive and it's only accessed at startup, 
> the /boot will only be accessed that one time and afterwards won't 
> cause problems for the drive statistics. However, if I use put /boot, 
> /bin, and /sbin on this RAID1 drive, it will always be accessed and it 
> might create a performance issue.
>

I always put /boot on a separate partition, just to run raid1 which I 
don't use elsewhere.

> To return to that peformance question, since I have to create at least 
> 2 md drives using different partitions, I wonder if it's smarter to 
> create multiple md drives for better performance.
>
> /dev/sd[abcd]1 -- RAID1, the /boot, /dev, /bin/, /sbin
>
> /dev/sd[abcd]2 -- RAID5, most of the rest of the file system
>
> /dev/sd[abcd]3 -- RAID10 o2, a drive that does a lot of downloading 
> (writes)
>
I think the speed of downloads is so far below the capacity of an array 
that you won't notice, and hopefully you will use things you download 
more than once, so you still get more reads than writes.

>> For typical filesystem usage, raid5 works good for both reads
>> and (cached, delayed) writes.  It's workloads like databases
>> where raid5 performs badly.
>
> Ah, very interesting. Is this true even for (dare I say it?) 
> bittorrent downloads?
>
What do you have for bandwidth? Probably not more than a T3 (145Mbit) 
which will max out at ~15MB/s, far below the write performance of a 
single drive, much less an array (even raid5).

>> What you do care about is your data integrity.  It's not really
>> interesting to reinstall a system or lose your data in case if
>> something goes wrong, and it's best to have recovery tools as
>> easily available as possible.  Plus, amount of space you need.
>
> Sure, I understand. And backing up in case someone steals your server. 
> But did you have something specific in mind when you wrote this? Don't 
> all these configurations (RAID5 vs. RAID10) have equal recovery tools?
>
> Or were you referring to the file system? Reiserfs and XFS both seem 
> to have decent recovery tools. LVM is a little tempting because it 
> allows for snapshots, but on the other hand I wonder if I'd find it 
> useful.
>
If you are worried about performance, perhaps some reading of comments 
on LVM would be in order. I personally view it as a trade-off of 
performance for flexibility.
>
>>>> Also, placing /dev on a tmpfs helps alot to minimize number of writes
>>>> necessary for root fs.
>>> Another interesting idea. I'm not familiar with using tmpfs (no need,
>>> until now); but I wonder how you create the devices you need when 
>>> you're
>>> doing a rescue.
>>
>> When you start udev, your /dev will be on tmpfs.
>
> Sure, that's what mount shows me right now -- using a standard Debian 
> install. What did you suggest I change?
>
>


-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-31 14:59                       ` Bill Davidsen
@ 2008-02-02 20:17                         ` Bill Davidsen
  0 siblings, 0 replies; 60+ messages in thread
From: Bill Davidsen @ 2008-02-02 20:17 UTC (permalink / raw)
  To: linux-raid; +Cc: Moshe Yudkowsky, Michael Tokarev

Bill Davidsen wrote:
> Moshe Yudkowsky wrote:
>> Michael Tokarev wrote:
>>
>
>> To return to that peformance question, since I have to create at 
>> least 2 md drives using different partitions, I wonder if it's 
>> smarter to create multiple md drives for better performance.
>>
>> /dev/sd[abcd]1 -- RAID1, the /boot, /dev, /bin/, /sbin
>>
>> /dev/sd[abcd]2 -- RAID5, most of the rest of the file system
>>
>> /dev/sd[abcd]3 -- RAID10 o2, a drive that does a lot of downloading 
>> (writes)
>>
> I think the speed of downloads is so far below the capacity of an 
> array that you won't notice, and hopefully you will use things you 
> download more than once, so you still get more reads than writes.
>
>>> For typical filesystem usage, raid5 works good for both reads
>>> and (cached, delayed) writes.  It's workloads like databases
>>> where raid5 performs badly.
>>
>> Ah, very interesting. Is this true even for (dare I say it?) 
>> bittorrent downloads?
>>
> What do you have for bandwidth? Probably not more than a T3 (145Mbit) 
> which will max out at ~15MB/s, far below the write performance of a 
> single drive, much less an array (even raid5).
It has been pointed out that I have a double typo there, I meant OC3 not 
T3, and 155Mbit.  Still, the most someone is likely to have, even in a 
large company.  Still not a large chance of being faster than the disk 
in raid-10 mode.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 16:34               ` Peter Rabbitson
  2008-01-29 19:34                 ` Moshe Yudkowsky
@ 2008-01-30 12:01                 ` Peter Rabbitson
  1 sibling, 0 replies; 60+ messages in thread
From: Peter Rabbitson @ 2008-01-30 12:01 UTC (permalink / raw)
  Cc: linux-raid

Peter Rabbitson wrote:
> Moshe Yudkowsky wrote:
>> Here's a baseline question: if I create a RAID10 array using default 
>> settings, what do I get? I thought I was getting RAID1+0; am I really?
> 
> Maybe you are, depending on your settings, but this is beyond the point. 
> No matter what 1+0 you have (linux, classic, or otherwise) you can not 
> boot from it, as there is no way to see the underlying filesystem 
> without the RAID layer.
> 
> With the current state of affairs (available mainstream bootloaders) the 
> rule is:
> Block devices containing the kernel/initrd image _must_ be either:
>     * a regular block device (/sda1, /hda, /fd0, etc.)
>     * or a linux RAID 1 with the superblock at the end of the device 
> (0.9 or 1.2)
> 
> 

If any poor soul finds this in the mailing list archives, the above should read:

...
	* or a linux RAID 1 with the superblock at the end of the device (either 
version 0.9 or _1.0_)
....

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 16:16             ` Moshe Yudkowsky
  2008-01-29 16:34               ` Peter Rabbitson
@ 2008-01-29 16:42               ` Michael Tokarev
  1 sibling, 0 replies; 60+ messages in thread
From: Michael Tokarev @ 2008-01-29 16:42 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Peter Rabbitson, linux-raid

Moshe Yudkowsky wrote:
> Michael Tokarev wrote:
> 
>> There are more-or-less standard raid LEVELS, including
>> raid10 (which is the same as raid1+0, or a stripe on top
>> of mirrors - note it does not mean 4 drives, you can
>> use 6 - stripe over 3 mirrors each of 2 components; or
>> the reverse - stripe over 2 mirrors of 3 components each
>> etc).
> 
> Here's a baseline question: if I create a RAID10 array using default
> settings, what do I get? I thought I was getting RAID1+0; am I really?

..default settings AND even (4, 6, 8, 10, ...) number of drives.  It
will be "standard" raid10 or raid1+0 which is the same, as many stripes
of mirrored (2 copies) data as fits with the number of disks.  With odd
number of disks it obviously will be soemthing else, not a "standard"
raid10.

> My superblocks, by the way, are marked version 01; my metadata in
> mdadm.conf asked for 1.2. I wonder what I really got. The real question

Ugh.  Another source of confusion.  In --superblock=1.2, "1" stands
for the format, and "2" stands for the placement.  So it's really
format version 1.  From mdadm(8):

              1, 1.0, 1.1, 1.2
                     Use  the  new  version-1 format superblock.  This has few
                     restrictions.   The  different  sub-versions  store   the
                     superblock  at  different locations on the device, either
                     at the end (for 1.0), at the start (for 1.1) or  4K  from
                     the start (for 1.2).


> in my mind now is why grub can't find the info, and either it's because
> of 1.2 superblocks or because of sub-partitioning of components.

As has been said numerous times in this thread, grub can't be used with
anything but raid1 to start with (the same is true for lilo).  Raid10
(or raid1+0, which is the same) - be it standard or linux extension format -
is NOT raid1.

/mjt

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 15:13           ` Michael Tokarev
  2008-01-29 15:41             ` Peter Rabbitson
  2008-01-29 16:16             ` Moshe Yudkowsky
@ 2008-01-29 16:26             ` Keld Jørn Simonsen
  2008-01-29 16:46               ` Michael Tokarev
  2 siblings, 1 reply; 60+ messages in thread
From: Keld Jørn Simonsen @ 2008-01-29 16:26 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Peter Rabbitson, Moshe Yudkowsky, linux-raid

On Tue, Jan 29, 2008 at 06:13:41PM +0300, Michael Tokarev wrote:
> 
> Linux raid10 MODULE (which implements that standard raid10
> LEVEL in full) adds some quite.. unusual extensions to that
> standard raid10 LEVEL.  The resulting layout is also called
> raid10 in linux (ie, not giving new names), but it's not that
> raid10 (which is again the same as raid1+0) as commonly known
> in various literature and on the internet.  Yet raid10 module
> fully implements STANDARD raid10 LEVEL.

My understanding is that you can have a linux raid10 of only 2
drives, while the standard RAID 1+0 requires 4 drives, so this is a huge
difference.

I am not sure what vanilla linux raid10 (near=2, far=1)
has of properties. I think it can run with only 1 disk, but I think it
does not have striping capabilities. It would be nice to have more 
info on this, eg in the man page. 

Is there an official web page for mdadm?
And maybe the raid faq could be updated?

best regards
keld 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 16:26             ` Keld Jørn Simonsen
@ 2008-01-29 16:46               ` Michael Tokarev
  2008-01-29 18:01                 ` Keld Jørn Simonsen
  0 siblings, 1 reply; 60+ messages in thread
From: Michael Tokarev @ 2008-01-29 16:46 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Peter Rabbitson, Moshe Yudkowsky, linux-raid

Keld Jørn Simonsen wrote:
> On Tue, Jan 29, 2008 at 06:13:41PM +0300, Michael Tokarev wrote:
>> Linux raid10 MODULE (which implements that standard raid10
>> LEVEL in full) adds some quite.. unusual extensions to that
>> standard raid10 LEVEL.  The resulting layout is also called
>> raid10 in linux (ie, not giving new names), but it's not that
>> raid10 (which is again the same as raid1+0) as commonly known
>> in various literature and on the internet.  Yet raid10 module
>> fully implements STANDARD raid10 LEVEL.
> 
> My understanding is that you can have a linux raid10 of only 2
> drives, while the standard RAID 1+0 requires 4 drives, so this is a huge
> difference.

Ugh.  2-drive raid10 is effectively just a raid1.  I.e, mirroring
without any striping. (Or, backwards, striping without mirroring).

So to say, raid1 is just one particular configuration of raid10 -
with only one mirror.

Pretty much like with raid5 of 2 disks - it's the same as raid1.

> I am not sure what vanilla linux raid10 (near=2, far=1)
> has of properties. I think it can run with only 1 disk, but I think it

number of copies should be <= number of disks, so no.

> does not have striping capabilities. It would be nice to have more 
> info on this, eg in the man page. 

It's all in there really.  See md(4).  Maybe it's not that
verbose, but it's not a user's guide (as in: a large book),
after all.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 16:46               ` Michael Tokarev
@ 2008-01-29 18:01                 ` Keld Jørn Simonsen
  2008-01-30 13:37                   ` Michael Tokarev
  0 siblings, 1 reply; 60+ messages in thread
From: Keld Jørn Simonsen @ 2008-01-29 18:01 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Peter Rabbitson, Moshe Yudkowsky, linux-raid

On Tue, Jan 29, 2008 at 07:46:58PM +0300, Michael Tokarev wrote:
> Keld Jørn Simonsen wrote:
> > On Tue, Jan 29, 2008 at 06:13:41PM +0300, Michael Tokarev wrote:
> >> Linux raid10 MODULE (which implements that standard raid10
> >> LEVEL in full) adds some quite.. unusual extensions to that
> >> standard raid10 LEVEL.  The resulting layout is also called
> >> raid10 in linux (ie, not giving new names), but it's not that
> >> raid10 (which is again the same as raid1+0) as commonly known
> >> in various literature and on the internet.  Yet raid10 module
> >> fully implements STANDARD raid10 LEVEL.
> > 
> > My understanding is that you can have a linux raid10 of only 2
> > drives, while the standard RAID 1+0 requires 4 drives, so this is a huge
> > difference.
> 
> Ugh.  2-drive raid10 is effectively just a raid1.  I.e, mirroring
> without any striping. (Or, backwards, striping without mirroring).

OK.  

uhm, well, I did not understand: "(Or, backwards, striping without
mirroring)."  I don't think a 2 drive vanilla raid10 will do striping.
Please explain.

> Pretty much like with raid5 of 2 disks - it's the same as raid1.

I think in raid5 of 2 disks, half of the chunks are parity chynks which
are evenly distributed over the two disks, and the parity chunk is the
XOR of the data chunk. But maybe I am wrong. Also the behaviour of suce
a raid5 is different from a raid1 as the parity chunk is not used as
data.
> 
> > I am not sure what vanilla linux raid10 (near=2, far=1)
> > has of properties. I think it can run with only 1 disk, but I think it
> 
> number of copies should be <= number of disks, so no.

I have a clear understanding that in a vanilla linux raid10 (near=2, far=1)
you can run with one failing disk, that is with only one working disk.
Am I wrong?

> > does not have striping capabilities. It would be nice to have more 
> > info on this, eg in the man page. 
> 
> It's all in there really.  See md(4).  Maybe it's not that
> verbose, but it's not a user's guide (as in: a large book),
> after all.

Some man pages have examples. Or info could be written in the faq or in
wikipedia.

Best regards
keld
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 18:01                 ` Keld Jørn Simonsen
@ 2008-01-30 13:37                   ` Michael Tokarev
  2008-01-30 14:47                     ` Peter Rabbitson
  0 siblings, 1 reply; 60+ messages in thread
From: Michael Tokarev @ 2008-01-30 13:37 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Peter Rabbitson, Moshe Yudkowsky, linux-raid

Keld Jørn Simonsen wrote:
[]
>> Ugh.  2-drive raid10 is effectively just a raid1.  I.e, mirroring
>> without any striping. (Or, backwards, striping without mirroring).
> 
> uhm, well, I did not understand: "(Or, backwards, striping without
> mirroring)."  I don't think a 2 drive vanilla raid10 will do striping.
 Please explain.

I was referring to raid0+1 here - mirror of stripes.  Which makes
no sense by its own, but when we create such thing on only 2 drives,
it becomes just raid0...  "Backwards" as raid1+0 vs raid0+1.

This is just to show that various raid levels, in corner cases,
tends to "transform" from one to another.

>> Pretty much like with raid5 of 2 disks - it's the same as raid1.
> 
> I think in raid5 of 2 disks, half of the chunks are parity chynks which
> are evenly distributed over the two disks, and the parity chunk is the
> XOR of the data chunk. But maybe I am wrong. Also the behaviour of suce
> a raid5 is different from a raid1 as the parity chunk is not used as
> data.

With N-disk raid5, parity in a row is calculated by XORing together
data from all the rest of the disks (N-1), ie, P = D1 ^ ... ^D(N-1).

In case of 2-disk raid5 (it's also a corner case), the above formula
becomes just P = D1.  So, parity block in each row contains exactly
the same data as data block, effectively turning the whole thing into
a raid1 of two disks.  Sure in raid5 parity blocks called just that -
parity, but in reality that parity is THE SAME as data (again, in
case of only 2-disk raid5).

>>> I am not sure what vanilla linux raid10 (near=2, far=1)
>>> has of properties. I think it can run with only 1 disk, but I think it
>> number of copies should be <= number of disks, so no.
> 
> I have a clear understanding that in a vanilla linux raid10 (near=2, far=1)
> you can run with one failing disk, that is with only one working disk.
> Am I wrong?

In fact, with (all softs) or raid10, it's not only the number of drives
that can fail that matters, but also WHICH drives can fail.  In classic
raid10:

    DiskA   DiskB  DiskC  DiskD
      0       0      1      1
      2       2      3      3
      4       4      5      5
      ....

(where numbers are the data blocks), you can have only 2 working
disks (ie, 2 failed), but only from different pairs.  You can't
have A and B failed and C and D working for example - you'll lose
half the data and thus the filesystem.  You can have A and C failed
however, or A and D, or B&C, or B&D.

You see - in the above example, all numbers (data blocks) should be
present at least once (after you pull a drive or two or more).  If
at least some numbers don't appear at all, your raid array's dead.

Now write out the layout you want to use like the above, and try
"removing" some drives, and see if you still have all numbers.

For example, with 3-disk linux raid10:

  A  B  C
  0  0  1
  1  2  2
  3  3  4
  4  5  5
  ....

We can't pull 2 drives anymore here.  Eg, pulling A&B removes
0 and 3. Pulling B&C removes 2 and 5.  A&C = 1 and 4.

With 5-drive linux raid10:

   A  B  C  D  E
   0  0  1  1  2
   2  3  3  4  4
   5  5  6  6  7
   7  8  8  9  9
  10 10 11 11 12
   ...

A&B can't be removed - 0, 5.  A&C CAN be removed, as
are A&D.  But not A&E - losing 2 and 7.  And so on.

6-disk raid10 with 3 copies of each (near=3 with linux):

   A B C D E F
   0 0 0 1 1 1
   2 2 2 3 3 3

It can run as long as from each triple (ABC and DEF), at
least one disk is here.  Ie, you can lose up to 4 drives,
as far as the condition is true.  But if you lose only
3 - A&B&C or D&E&F - it can't work anymore.

The same goes for raid5 and raid6, but they're symmetric --
any single (raid5) or double (raid6) disk failure is Ok.
The principle is this:

  raid5: P = D1^D2^D3^...^D(N-1)
so, you either have all Di (nothing to reconstruct), or
you have all but one Di AND P - in this case, missing Dm
can be recalculated as
  Dm = P^D1^...^D(m-1)^D(m+1)^...^D(N-1)
(ie, a XOR of all the remaining blocks including parity).
(exactly the same applies to raid4, because each row in
raid4 is identical to that of raid5, the difference is
that parity disk is different in each row in raid5, while
in raid4 it stays the same).

I wont write the formula for raid6 as it's somewhat more
complicated, but the effect is the same - any data block
can be reconstructed from any N-2 drives.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30 13:37                   ` Michael Tokarev
@ 2008-01-30 14:47                     ` Peter Rabbitson
  2008-01-30 15:21                       ` Keld Jørn Simonsen
  0 siblings, 1 reply; 60+ messages in thread
From: Peter Rabbitson @ 2008-01-30 14:47 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Keld Jørn Simonsen, Moshe Yudkowsky, linux-raid

Michael Tokarev wrote:

> With 5-drive linux raid10:
> 
>    A  B  C  D  E
>    0  0  1  1  2
>    2  3  3  4  4
>    5  5  6  6  7
>    7  8  8  9  9
>   10 10 11 11 12
>    ...
> 
> A&B can't be removed - 0, 5.  A&C CAN be removed, as
> are A&D.  But not A&E - losing 2 and 7.  And so on.

I stand corrected by Michael, this is indeed the case with the current state 
of md raid 10. Either my observations were incorrect when I made them a year 
and a half ago, or some fixes went into the kernel since then.

In any way - linux md10 does behave exactly as a classic raid 1+0 when created 
with -n D -p nS where D and S are both even and D = 2S.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30 14:47                     ` Peter Rabbitson
@ 2008-01-30 15:21                       ` Keld Jørn Simonsen
  2008-01-30 15:35                         ` Peter Rabbitson
  0 siblings, 1 reply; 60+ messages in thread
From: Keld Jørn Simonsen @ 2008-01-30 15:21 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: Michael Tokarev, Moshe Yudkowsky, linux-raid

On Wed, Jan 30, 2008 at 03:47:30PM +0100, Peter Rabbitson wrote:
> Michael Tokarev wrote:
> 
> >With 5-drive linux raid10:
> >
> >   A  B  C  D  E
> >   0  0  1  1  2
> >   2  3  3  4  4
> >   5  5  6  6  7
> >   7  8  8  9  9
> >  10 10 11 11 12
> >   ...
> >
> >A&B can't be removed - 0, 5.  A&C CAN be removed, as
> >are A&D.  But not A&E - losing 2 and 7.  And so on.

I see. Does the kernel code allow this? And mdadm?

And can B+E be removed safely, and C+E and B+D? 

best regards
keld

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30 15:21                       ` Keld Jørn Simonsen
@ 2008-01-30 15:35                         ` Peter Rabbitson
  2008-01-30 15:46                           ` Loop devices to RAID? (was Re: In this partition scheme, grub does not find md information?) Moshe Yudkowsky
  0 siblings, 1 reply; 60+ messages in thread
From: Peter Rabbitson @ 2008-01-30 15:35 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Michael Tokarev, Moshe Yudkowsky, linux-raid

Keld Jørn Simonsen wrote:
> On Wed, Jan 30, 2008 at 03:47:30PM +0100, Peter Rabbitson wrote:
>> Michael Tokarev wrote:
>>
>>> With 5-drive linux raid10:
>>>
>>>   A  B  C  D  E
>>>   0  0  1  1  2
>>>   2  3  3  4  4
>>>   5  5  6  6  7
>>>   7  8  8  9  9
>>>  10 10 11 11 12
>>>   ...
>>>
>>> A&B can't be removed - 0, 5.  A&C CAN be removed, as
>>> are A&D.  But not A&E - losing 2 and 7.  And so on.
> 
> I see. Does the kernel code allow this? And mdadm?
> 
> And can B+E be removed safely, and C+E and B+D? 
> 

It seems like it. I just created the above raid configuration with 5 loop 
devices. Everything behaved just like Michael described. When the wrong drives 
disappeared - I started getting IO errors.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Loop devices to RAID? (was Re: In this partition scheme, grub does not find md information?)
  2008-01-30 15:35                         ` Peter Rabbitson
@ 2008-01-30 15:46                           ` Moshe Yudkowsky
  2008-01-30 15:56                             ` Tim Southerwood
  0 siblings, 1 reply; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-30 15:46 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: linux-raid

Peter Rabbitson wrote:
> It seems like it. I just created the above raid configuration with 5 
> loop devices. Everything behaved just like Michael described. When the 
> wrong drives disappeared - I started getting IO errors.

My mind boggles. I know how to mount an ISO as a loop device onto the 
file system, but if you'd be so kind, can you give a super-brief 
description on how to get a loop device to look like an actual partition 
that can be made into a RAID array? I can see this software-only 
solution as being quite interesting for testing in general.

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
  "I'm very well aquainted/with the seven deadly sins/
   I keep a busy schedule/ to try to fit them in."
    			-- Warren Zevon, "Mr. Bad Example"

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: Loop devices to RAID? (was Re: In this partition scheme, grub does not find md information?)
  2008-01-30 15:46                           ` Loop devices to RAID? (was Re: In this partition scheme, grub does not find md information?) Moshe Yudkowsky
@ 2008-01-30 15:56                             ` Tim Southerwood
  0 siblings, 0 replies; 60+ messages in thread
From: Tim Southerwood @ 2008-01-30 15:56 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Peter Rabbitson, linux-raid

Moshe Yudkowsky wrote:
> My mind boggles. I know how to mount an ISO as a loop device onto the 
> file system, but if you'd be so kind, can you give a super-brief 
> description on how to get a loop device to look like an actual partition 
> that can be made into a RAID array? I can see this software-only 
> solution as being quite interesting for testing in general.
> 

I tried this a while back, IIRC the procedure was:

1) Make some empty files of the required length each.

2) Use losetup to mount each one onto a loop device (loop0-3 say).

3) Use /dev/loop[0-3] as component devices to mdadm as you would use any 
other device or partition. It is not necessary to partition the loop 
devices, use them whole.

HTH

Tim

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 14:47         ` Peter Rabbitson
  2008-01-29 15:13           ` Michael Tokarev
@ 2008-01-29 15:57           ` Moshe Yudkowsky
  2008-01-29 16:37             ` Keld Jørn Simonsen
  2008-01-30 11:03             ` David Greaves
  2008-01-30 11:03           ` David Greaves
  2 siblings, 2 replies; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-29 15:57 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: linux-raid

Peter Rabbitson wrote:

> [*] The layout is the same but the functionality is different. If you 
> have 1+0 on 4 drives, you can survive a loss of 2 drives as long as they 
> are part of different mirrors. mdadm -C -l 10 -n 4 -o n2 <drives> 
> however will _NOT_ survive a loss of 2 drives.

In my 4 drive system, I'm clearly not getting 1+0's ability to use grub 
out of the RAID10.  I expect it's because I used 1.2 superblocks (why 
not use the latest, I said, foolishly...) and therefore the RAID10 -- 
with even number of drives -- can't be read by grub. If you'd patch that 
information into the man pages that'd be very useful indeed.

Thanks for your attention to this!

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
  "no user serviceable parts below this line"
    				-- From a Perl program by mengwong@pobox.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 15:57           ` In this partition scheme, grub does not find md information? Moshe Yudkowsky
@ 2008-01-29 16:37             ` Keld Jørn Simonsen
  2008-01-29 16:57               ` Michael Tokarev
  2008-01-30 11:03             ` David Greaves
  1 sibling, 1 reply; 60+ messages in thread
From: Keld Jørn Simonsen @ 2008-01-29 16:37 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Peter Rabbitson, linux-raid

On Tue, Jan 29, 2008 at 09:57:48AM -0600, Moshe Yudkowsky wrote:
> 
> In my 4 drive system, I'm clearly not getting 1+0's ability to use grub 
> out of the RAID10.  I expect it's because I used 1.2 superblocks (why 
> not use the latest, I said, foolishly...) and therefore the RAID10 -- 
> with even number of drives -- can't be read by grub. If you'd patch that 
> information into the man pages that'd be very useful indeed.

If you have 4 drives, I think the right thing is to use a raid1 with 4
drives, for your /boot partition. Then yo can survive that 3 disks
crash!


If you want the extra performance, then I think you should not bother
too much for the kernel and initrd load time - which of cause is not
striping on the disks, but some performance improvement can be expected.
Then you can have the rest of /root on a raid10,f2 with 4 disks.

best regards
keld

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 16:37             ` Keld Jørn Simonsen
@ 2008-01-29 16:57               ` Michael Tokarev
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Tokarev @ 2008-01-29 16:57 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Moshe Yudkowsky, Peter Rabbitson, linux-raid

Keld Jørn Simonsen wrote:
> On Tue, Jan 29, 2008 at 09:57:48AM -0600, Moshe Yudkowsky wrote:
>> In my 4 drive system, I'm clearly not getting 1+0's ability to use grub 
>> out of the RAID10.  I expect it's because I used 1.2 superblocks (why 
>> not use the latest, I said, foolishly...) and therefore the RAID10 -- 
>> with even number of drives -- can't be read by grub. If you'd patch that 
>> information into the man pages that'd be very useful indeed.
> 
> If you have 4 drives, I think the right thing is to use a raid1 with 4
> drives, for your /boot partition. Then yo can survive that 3 disks
> crash!

By the way, on all our systems I use small (256Mb for small-software systems,
sometimes 512M, but 1G should be sufficient) partition for a root filesystem
(/etc, /bin, /sbin, /lib, and /boot), and put it on a raid1 on all (usually
identical) drives - be it 4 or 6 or more of them.  Root filesystem does not
change often, or at least it's write speed isn't that important.  But doing
this way, you always have all the tools necessary to repair a damaged system
even in case your raid didn't start, or you forgot where your root disk is
etc etc.

But in this setup, /usr, /home, /var and so on should be separate partitions.
Also, placing /dev on a tmpfs helps alot to minimize number of writes
necessary for root fs.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 15:57           ` In this partition scheme, grub does not find md information? Moshe Yudkowsky
  2008-01-29 16:37             ` Keld Jørn Simonsen
@ 2008-01-30 11:03             ` David Greaves
  2008-01-30 11:44               ` Moshe Yudkowsky
  2008-02-04 16:49               ` In this partition scheme, grub does not find md information? John Stoffel
  1 sibling, 2 replies; 60+ messages in thread
From: David Greaves @ 2008-01-30 11:03 UTC (permalink / raw)
  To: Moshe Yudkowsky, Neil Brown; +Cc: Peter Rabbitson, linux-raid, Michael Tokarev

On 26 Oct 2007, Neil Brown wrote:
>On Thursday October 25, david@dgreaves.com wrote:
>> I also suspect that a *lot* of people will assume that the highest superblock
>> version is the best and should be used for new installs etc.
>
> Grumble... why can't people expect what I want them to expect?

Moshe Yudkowsky wrote:
> I expect it's because I used 1.2 superblocks (why
> not use the latest, I said, foolishly...) and therefore the RAID10 --

Aha - an 'in the wild' example of why we should deprecate '0.9 1.0 1.1, 1.2' and
rename the superblocks to data-version + on-disk-location :)

David

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30 11:03             ` David Greaves
@ 2008-01-30 11:44               ` Moshe Yudkowsky
  2008-01-30 12:00                 ` WRONG INFO (was Re: In this partition scheme, grub does not find md information?) Peter Rabbitson
  2008-02-04 16:49               ` In this partition scheme, grub does not find md information? John Stoffel
  1 sibling, 1 reply; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-30 11:44 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid

David Greaves wrote:

> Moshe Yudkowsky wrote:
>> I expect it's because I used 1.2 superblocks (why
>> not use the latest, I said, foolishly...) and therefore the RAID10 --
> 
> Aha - an 'in the wild' example of why we should deprecate '0.9 1.0 1.1, 1.2' and
> rename the superblocks to data-version + on-disk-location :)

Even if renamed, I'd still need a Clue as to why to prefer one scheme 
over the other. For example, I've now learned that if I want to set up a 
RAID1 /boot, it must actually be 1.2 or grub won't be able to read it. 
(I would therefore argue that if the new version ever becomes default, 
then the default sub-version ought to be 1.2.)

As to the wiki: I am not certain I found the Wiki you're referring to; I 
did find others, and none had the ringing clarity of Peter's definitive 
"RAID10 won't work for /boot."

The process I'm going through -- cloning an old amd-k7 server into a new 
amd64 server -- is something I will document, and this particular grub 
issue is one of the things I intend to mention. So, where is this Wiki 
of which you speak?

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
  "A kind word will go a long way, but a kind word and
   a gun will go even further."
    					-- Al Capone

^ permalink raw reply	[flat|nested] 60+ messages in thread

* WRONG INFO (was Re: In this partition scheme, grub does not find md information?)
  2008-01-30 11:44               ` Moshe Yudkowsky
@ 2008-01-30 12:00                 ` Peter Rabbitson
  2008-01-30 12:41                   ` David Greaves
  2008-01-30 13:39                   ` Michael Tokarev
  0 siblings, 2 replies; 60+ messages in thread
From: Peter Rabbitson @ 2008-01-30 12:00 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: David Greaves, linux-raid

Moshe Yudkowsky wrote:
> over the other. For example, I've now learned that if I want to set up a 
> RAID1 /boot, it must actually be 1.2 or grub won't be able to read it. 
> (I would therefore argue that if the new version ever becomes default, 
> then the default sub-version ought to be 1.2.)

In the discussion yesterday I myself made a serious typo, that should not 
spread. The only superblock version that will work with current GRUB is 1.0 
_not_ 1.2.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: WRONG INFO (was Re: In this partition scheme, grub does not find md information?)
  2008-01-30 12:00                 ` WRONG INFO (was Re: In this partition scheme, grub does not find md information?) Peter Rabbitson
@ 2008-01-30 12:41                   ` David Greaves
  2008-01-30 13:39                   ` Michael Tokarev
  1 sibling, 0 replies; 60+ messages in thread
From: David Greaves @ 2008-01-30 12:41 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: Moshe Yudkowsky, linux-raid

Peter Rabbitson wrote:
> Moshe Yudkowsky wrote:
>> over the other. For example, I've now learned that if I want to set up
>> a RAID1 /boot, it must actually be 1.2 or grub won't be able to read
>> it. (I would therefore argue that if the new version ever becomes
>> default, then the default sub-version ought to be 1.2.)
> 
> In the discussion yesterday I myself made a serious typo, that should
> not spread. The only superblock version that will work with current GRUB
> is 1.0 _not_ 1.2.

Ah, the joys of consolidated and yet editable documentation - like a wiki....

David

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: WRONG INFO (was Re: In this partition scheme, grub does not find md information?)
  2008-01-30 12:00                 ` WRONG INFO (was Re: In this partition scheme, grub does not find md information?) Peter Rabbitson
  2008-01-30 12:41                   ` David Greaves
@ 2008-01-30 13:39                   ` Michael Tokarev
  1 sibling, 0 replies; 60+ messages in thread
From: Michael Tokarev @ 2008-01-30 13:39 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: Moshe Yudkowsky, David Greaves, linux-raid

Peter Rabbitson wrote:
> Moshe Yudkowsky wrote:
>> over the other. For example, I've now learned that if I want to set up
>> a RAID1 /boot, it must actually be 1.2 or grub won't be able to read
>> it. (I would therefore argue that if the new version ever becomes
>> default, then the default sub-version ought to be 1.2.)
> 
> In the discussion yesterday I myself made a serious typo, that should
> not spread. The only superblock version that will work with current GRUB
> is 1.0 _not_ 1.2.

Ghrrm.  1.0, or 0.9.  0.9 is still the default with mdadm.

/mjt

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-30 11:03             ` David Greaves
  2008-01-30 11:44               ` Moshe Yudkowsky
@ 2008-02-04 16:49               ` John Stoffel
  2008-02-04 17:26                 ` Michael Tokarev
  1 sibling, 1 reply; 60+ messages in thread
From: John Stoffel @ 2008-02-04 16:49 UTC (permalink / raw)
  To: David Greaves
  Cc: Moshe Yudkowsky, Neil Brown, Peter Rabbitson, linux-raid,
	Michael Tokarev


David> On 26 Oct 2007, Neil Brown wrote:
>> On Thursday October 25, david@dgreaves.com wrote:
>>> I also suspect that a *lot* of people will assume that the highest superblock
>>> version is the best and should be used for new installs etc.
>> 
>> Grumble... why can't people expect what I want them to expect?


David> Moshe Yudkowsky wrote:
>> I expect it's because I used 1.2 superblocks (why
>> not use the latest, I said, foolishly...) and therefore the RAID10 --

David> Aha - an 'in the wild' example of why we should deprecate '0.9
David> 1.0 1.1, 1.2' and rename the superblocks to data-version +
David> on-disk-location :)

As the person who started this entire thread ages ago about the *poor*
naming convetion used for RAID Superblocks, I have to agree.

I'd much rather see 1.near, 1.far, 1.both or something like that added
in.

Heck, we don't have to remove the support for the old 1.0, 1.1, 1.2
names either, just make the default be something more user friendly.

C'mon, how many of you are programmed to believe that 1.2 is better
than 1.0?  But when they're not different, just just different
placements, then it's confusing.

John

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-02-04 16:49               ` In this partition scheme, grub does not find md information? John Stoffel
@ 2008-02-04 17:26                 ` Michael Tokarev
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Tokarev @ 2008-02-04 17:26 UTC (permalink / raw)
  To: John Stoffel; +Cc: David Greaves, Moshe Yudkowsky, Peter Rabbitson, linux-raid

John Stoffel wrote:
[]
> C'mon, how many of you are programmed to believe that 1.2 is better
> than 1.0?  But when they're not different, just just different
> placements, then it's confusing.

Speaking of "more is better" thing...

There were quite a few bugs fixed in recent months wrt version 1
superblocks - both in kernel and in mdadm.  While 0.90 format is
stable for a very long time, and unless you're hitting its limits
(namely, max 26 drives in an array, no "homehost" field), there's
nothing which makes v1 superblocks better than 0.90 ones.

In my view, "better" = stable first, faster/easier/whatever second.

/mjt

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 14:47         ` Peter Rabbitson
  2008-01-29 15:13           ` Michael Tokarev
  2008-01-29 15:57           ` In this partition scheme, grub does not find md information? Moshe Yudkowsky
@ 2008-01-30 11:03           ` David Greaves
  2 siblings, 0 replies; 60+ messages in thread
From: David Greaves @ 2008-01-30 11:03 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: Michael Tokarev, Moshe Yudkowsky, linux-raid, keld

Peter Rabbitson wrote:
> I guess I will sit down tonight and craft some patches to the existing
> md* man pages. Some things are indeed left unsaid.
If you want to be more verbose than a man page allows then there's always the
wiki/FAQ...

http://linux-raid.osdl.org/

Keld Jørn Simonsen wrote:
> Is there an official web page for mdadm?
> And maybe the raid faq could be updated?

That *is* the linux-raid FAQ brought up to date (with the consent of the
original authors)

Of course being a wiki means it is now a shared, community responsibility - and
to all present and future readers: that means you too ;)

David

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 14:07       ` Michael Tokarev
  2008-01-29 14:47         ` Peter Rabbitson
@ 2008-01-29 14:48         ` Keld Jørn Simonsen
  2008-01-29 16:00           ` Moshe Yudkowsky
  1 sibling, 1 reply; 60+ messages in thread
From: Keld Jørn Simonsen @ 2008-01-29 14:48 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Peter Rabbitson, Moshe Yudkowsky, linux-raid

On Tue, Jan 29, 2008 at 05:07:27PM +0300, Michael Tokarev wrote:
> Peter Rabbitson wrote:
> > Moshe Yudkowsky wrote:
> >>
> 
> > It is exactly what the names implies - a new kind of RAID :) The setup
> > you describe is not RAID10 it is RAID1+0.
> 
> Raid10 IS RAID1+0 ;)
> It's just that linux raid10 driver can utilize more.. interesting ways
> to lay out the data.

My understandining is that raid10 is different from RAID1+0

Traditional  RAID1+0 is composed of two RAID1's combined into one RAID0.
It takes 4 drives to make it work. Linux raid10 only takes 2 drives to
work.

Traditional RAID1+0 only have one way of laying out the blocks. 

raid10 have a number of ways to do layout, namely the near, far and
offset ways, layout=n2, f2, o2 respectively.

Traditional RAID1+0 can only do striping of half of the disks involved,
while raid10 can do striping on all disks in the far and offset layouts.

I looked around on the net for documentation of this. The first hits (on
Google) for mkadm did not have descriptions of raid10. Wikipedia
describes raid 10 as a synonym for raid1+0. I think there is too much
confusion on the raid10 term, and that also the marveleous linux raid10
layouts is a little known secret beyound maybe the circles of this
linux-raid list. We should tell others more about the wondersi of raid10.

And then I would like a good reference for describing how raid10,o2
works and why bigger chunks work. 

Best regards
keld

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 14:48         ` Keld Jørn Simonsen
@ 2008-01-29 16:00           ` Moshe Yudkowsky
  2008-01-29 16:25             ` Peter Rabbitson
  0 siblings, 1 reply; 60+ messages in thread
From: Moshe Yudkowsky @ 2008-01-29 16:00 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

Keld Jørn Simonsen wrote:

> raid10 have a number of ways to do layout, namely the near, far and
> offset ways, layout=n2, f2, o2 respectively.

The default layout, according to --detail, is "near=2, far=1." If I 
understand what's been written so far on the topic, that's automatically 
incompatible with 1+0.

-- 
Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 16:00           ` Moshe Yudkowsky
@ 2008-01-29 16:25             ` Peter Rabbitson
  0 siblings, 0 replies; 60+ messages in thread
From: Peter Rabbitson @ 2008-01-29 16:25 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Keld Jørn Simonsen, linux-raid

Moshe Yudkowsky wrote:
> Keld Jørn Simonsen wrote:
> 
>> raid10 have a number of ways to do layout, namely the near, far and
>> offset ways, layout=n2, f2, o2 respectively.
> 
> The default layout, according to --detail, is "near=2, far=1." If I 
> understand what's been written so far on the topic, that's automatically 
> incompatible with 1+0.
> 

Unfortunately you are interpreting this wrong as well. far=1 is just a way of 
saying 'no copies of type far'.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: In this partition scheme, grub does not find md information?
  2008-01-29 11:02   ` Moshe Yudkowsky
  2008-01-29 11:14     ` Peter Rabbitson
@ 2008-01-29 14:04     ` Keld Jørn Simonsen
  1 sibling, 0 replies; 60+ messages in thread
From: Keld Jørn Simonsen @ 2008-01-29 14:04 UTC (permalink / raw)
  To: Moshe Yudkowsky; +Cc: Neil Brown, linux-raid

On Tue, Jan 29, 2008 at 05:02:57AM -0600, Moshe Yudkowsky wrote:
> Neil, thanks for writing. A couple of follow-up questions to you and the 
> group:
> 
> If the answers above don't lead to a resolution, I can create two RAID1 
> pairs and join them using LVM. I would take a hit by using LVM to tie 
> the pairs intead of RAID0, I suppose, but I would avoid the performance 
> hit of multiple md drives on a single physical drive, and I could even 
> run a hot spare through a sparing group. Any comments on the performance 
> hit -- is raid1L a really bad idea for some reason?

You can of cause construct a traditional raid-1+0 in Linux as you describe here,
but this is different from linux raid10 (with its different layout
possibilities). And constructing two grub/lilos on two disks for a raid1
on /boot seems to be the right way for a reasonably secured system.

best regards
keld

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2008-02-04 17:26 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-29  4:44 In this partition scheme, grub does not find md information? Moshe Yudkowsky
2008-01-29  5:08 ` Neil Brown
2008-01-29 11:02   ` Moshe Yudkowsky
2008-01-29 11:14     ` Peter Rabbitson
2008-01-29 11:29       ` Moshe Yudkowsky
2008-01-29 14:09         ` Michael Tokarev
2008-01-29 14:07       ` Michael Tokarev
2008-01-29 14:47         ` Peter Rabbitson
2008-01-29 15:13           ` Michael Tokarev
2008-01-29 15:41             ` Peter Rabbitson
2008-01-29 16:51               ` Michael Tokarev
2008-01-29 17:51                 ` Keld Jørn Simonsen
2008-01-29 16:16             ` Moshe Yudkowsky
2008-01-29 16:34               ` Peter Rabbitson
2008-01-29 19:34                 ` Moshe Yudkowsky
2008-01-29 20:21                   ` Keld Jørn Simonsen
2008-01-29 22:14                     ` Moshe Yudkowsky
2008-01-29 23:45                       ` Bill Davidsen
2008-01-30  0:13                         ` Moshe Yudkowsky
2008-01-30 22:36                           ` Bill Davidsen
2008-01-30  0:17                       ` Keld Jørn Simonsen
2008-01-29 23:44                   ` Bill Davidsen
2008-01-30  0:22                     ` Keld Jørn Simonsen
2008-01-30  0:26                       ` Peter Rabbitson
2008-01-30 22:39                         ` Bill Davidsen
2008-01-30  0:32                       ` Moshe Yudkowsky
2008-01-30  0:53                         ` Keld Jørn Simonsen
2008-01-30  1:00                           ` Moshe Yudkowsky
2008-01-31 14:40                             ` Bill Davidsen
2008-01-30 13:11                   ` Michael Tokarev
2008-01-30 14:10                     ` Moshe Yudkowsky
2008-01-30 14:41                       ` Michael Tokarev
2008-01-31 14:59                       ` Bill Davidsen
2008-02-02 20:17                         ` Bill Davidsen
2008-01-30 12:01                 ` Peter Rabbitson
2008-01-29 16:42               ` Michael Tokarev
2008-01-29 16:26             ` Keld Jørn Simonsen
2008-01-29 16:46               ` Michael Tokarev
2008-01-29 18:01                 ` Keld Jørn Simonsen
2008-01-30 13:37                   ` Michael Tokarev
2008-01-30 14:47                     ` Peter Rabbitson
2008-01-30 15:21                       ` Keld Jørn Simonsen
2008-01-30 15:35                         ` Peter Rabbitson
2008-01-30 15:46                           ` Loop devices to RAID? (was Re: In this partition scheme, grub does not find md information?) Moshe Yudkowsky
2008-01-30 15:56                             ` Tim Southerwood
2008-01-29 15:57           ` In this partition scheme, grub does not find md information? Moshe Yudkowsky
2008-01-29 16:37             ` Keld Jørn Simonsen
2008-01-29 16:57               ` Michael Tokarev
2008-01-30 11:03             ` David Greaves
2008-01-30 11:44               ` Moshe Yudkowsky
2008-01-30 12:00                 ` WRONG INFO (was Re: In this partition scheme, grub does not find md information?) Peter Rabbitson
2008-01-30 12:41                   ` David Greaves
2008-01-30 13:39                   ` Michael Tokarev
2008-02-04 16:49               ` In this partition scheme, grub does not find md information? John Stoffel
2008-02-04 17:26                 ` Michael Tokarev
2008-01-30 11:03           ` David Greaves
2008-01-29 14:48         ` Keld Jørn Simonsen
2008-01-29 16:00           ` Moshe Yudkowsky
2008-01-29 16:25             ` Peter Rabbitson
2008-01-29 14:04     ` Keld Jørn Simonsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).