* In this partition scheme, grub does not find md information? @ 2008-01-29 4:44 Moshe Yudkowsky 2008-01-29 5:08 ` Neil Brown 0 siblings, 1 reply; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-29 4:44 UTC (permalink / raw) To: linux-raid I'm finding a problem that isn't covered by the usual FAQs and online recipes. Attempted setup: RAID 10 array with 4 disks. Because Debian doesn't include RAID10 in its installation disks, I created a Debian installation on the first partition of sda, in /dev/sda1. Eventually I'll probably convert it to swap, but in the meantime that 4G has a complete 2.6.18 install (Debian stable). I created a RAID 10 array of four partitions, /dev/md/all, out of /dev/sd[abcd]2. Using fdisk/cfdisk, I created the partition/dev/md/all1 (500 MB) for /boot, and the parition /dev/md/all2 with all remaining space into one large partition (about 850 GB). That larger partition contains /, /usr, /home, etc. each as a separate LVM volume. I copied usr, var, etc. (but not proc or sys, of course) files over to the raid array, mounted that array, did a chroot to its root, and started grub. I admit that I'm no grub expert, but it's clear that grub cannot "find" any of the information in /dev/md/all1. For example, grub> find /boot/grub/this_is_raid can't find a file that exists only on the raid array. Grub only searches /dev/sda1, not /dev/md/all1. Perhaps I'm mistaken but I though it was possible to do boot from /dev/md/all1. I've tried other attacks but without success. For example, also while in chroot, grub-install /dev/md/all2 does not work. (Nor does it work with the --root=/boot option.) I also tried modifications to menu.lst, adding root=/dev/md/all1 to the kernel command, but RAID array's version of menu.lst is never detected. What I do see is grub> find /boot/grub/stage1 (hd0,0) which indicates (as far as I can tell) that it's found the information written on /dev/sda1 and nothing in /dev/md/all1. Am I trying to do something that's basically impossible? -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 4:44 In this partition scheme, grub does not find md information? Moshe Yudkowsky @ 2008-01-29 5:08 ` Neil Brown 2008-01-29 11:02 ` Moshe Yudkowsky 0 siblings, 1 reply; 60+ messages in thread From: Neil Brown @ 2008-01-29 5:08 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: linux-raid On Monday January 28, moshe@pobox.com wrote: > > Perhaps I'm mistaken but I though it was possible to do boot from > /dev/md/all1. It is my understanding that grub cannot boot from RAID. You can boot from raid1 by the expedient of booting from one of the halves. A common approach is to make a small raid1 which contains /boot and boot from that. Then use the rest of your devices for raid10 or raid5 or whatever. > > Am I trying to do something that's basically impossible? I believe so. NeilBrown ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 5:08 ` Neil Brown @ 2008-01-29 11:02 ` Moshe Yudkowsky 2008-01-29 11:14 ` Peter Rabbitson 2008-01-29 14:04 ` Keld Jørn Simonsen 0 siblings, 2 replies; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-29 11:02 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Neil, thanks for writing. A couple of follow-up questions to you and the group: Neil Brown wrote: > On Monday January 28, moshe@pobox.com wrote: >> Perhaps I'm mistaken but I though it was possible to do boot from >> /dev/md/all1. > > It is my understanding that grub cannot boot from RAID. Ah. Well, even though LILO seems to be less classy and in current disfavor, can I boot RAID10/RAID5 from LILO? > You can boot from raid1 by the expedient of booting from one of the > halves. One of the puzzling things about this is that I conceive of RAID10 as two RAID1 pairs, with RAID0 on top of to join them into a large drive. However, when I use --level=10 to create my md drive, I cannot find out which two pairs are the RAID1's: the --detail doesn't give that information. Re-reading the md(4) man page, I think I'm badly mistaken about RAID10. Furthermore, since grub cannot find the /boot on the md drive, I deduce that RAID10 isn't what the 'net descriptions say it is. > A common approach is to make a small raid1 which contains /boot and > boot from that. Then use the rest of your devices for raid10 or raid5 > or whatever. Ah. Ny understanding from a previous question to this group was that using one partition of the drive for RAID1 and the other for RAID5 would (a) create inefficiencies in read/write cycles as the two different md drives maintained conflicting internal tables of the overall physical drive state and (b) would create problems if one or the other failed. Under the alternative solution (booting from half of a raid1) since I'm booting from just one of the halves or the raid1, I would have to set up grub on both halves. If one physical drive fails, grub would fail over to the next device. (My original question was prompted by my theory that multiple RAID5s, built out of different partitions, would be faster than a single large drive -- more threads to perform calculations during writes to different parts of the physical drives.) >> Am I trying to do something that's basically impossible? > > I believe so. If the answers above don't lead to a resolution, I can create two RAID1 pairs and join them using LVM. I would take a hit by using LVM to tie the pairs intead of RAID0, I suppose, but I would avoid the performance hit of multiple md drives on a single physical drive, and I could even run a hot spare through a sparing group. Any comments on the performance hit -- is raid1L a really bad idea for some reason? -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe "It's a sobering thought, for example, to realize that by the time he was my age, Mozart had been dead for two years." -- Tom Lehrer ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 11:02 ` Moshe Yudkowsky @ 2008-01-29 11:14 ` Peter Rabbitson 2008-01-29 11:29 ` Moshe Yudkowsky 2008-01-29 14:07 ` Michael Tokarev 2008-01-29 14:04 ` Keld Jørn Simonsen 1 sibling, 2 replies; 60+ messages in thread From: Peter Rabbitson @ 2008-01-29 11:14 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: linux-raid Moshe Yudkowsky wrote: > > One of the puzzling things about this is that I conceive of RAID10 as > two RAID1 pairs, with RAID0 on top of to join them into a large drive. > However, when I use --level=10 to create my md drive, I cannot find out > which two pairs are the RAID1's: the --detail doesn't give that > information. Re-reading the md(4) man page, I think I'm badly mistaken > about RAID10. > > Furthermore, since grub cannot find the /boot on the md drive, I deduce > that RAID10 isn't what the 'net descriptions say it is. > It is exactly what the names implies - a new kind of RAID :) The setup you describe is not RAID10 it is RAID1+0. As far as how linux RAID10 works - here is an excellent article: http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 Peter ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 11:14 ` Peter Rabbitson @ 2008-01-29 11:29 ` Moshe Yudkowsky 2008-01-29 14:09 ` Michael Tokarev 2008-01-29 14:07 ` Michael Tokarev 1 sibling, 1 reply; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-29 11:29 UTC (permalink / raw) To: Peter Rabbitson; +Cc: linux-raid Peter Rabbitson wrote: > It is exactly what the names implies - a new kind of RAID :) The setup > you describe is not RAID10 it is RAID1+0. As far as how linux RAID10 > works - here is an excellent article: > http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 Thanks. Let's just say that the md(4) man page was finally penetrating my brain, but the Wikipedia article helped a great deal. I had thought md's RAID10 was more "standard." -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe "Rumor is information distilled so finely that it can filter through anything." -- Terry Pratchet, _Feet of Clay_ ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 11:29 ` Moshe Yudkowsky @ 2008-01-29 14:09 ` Michael Tokarev 0 siblings, 0 replies; 60+ messages in thread From: Michael Tokarev @ 2008-01-29 14:09 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Peter Rabbitson, linux-raid Moshe Yudkowsky wrote: > Peter Rabbitson wrote: > >> It is exactly what the names implies - a new kind of RAID :) The setup >> you describe is not RAID10 it is RAID1+0. As far as how linux RAID10 >> works - here is an excellent article: >> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 > > Thanks. Let's just say that the md(4) man page was finally penetrating > my brain, but the Wikipedia article helped a great deal. I had thought > md's RAID10 was more "standard." It is exactly "standard" - when you create it with default settings and with even number of drives (2, 4, 6, 8, ...), it will be exactly "standard" raid10 (or raid1+0, whatever) as described in various places on the net. But if you use odd number of drives, or if you pass some fancy --layout option, it will look differently. Still not suitable for lilo or grub, at least their current versions. /mjt ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 11:14 ` Peter Rabbitson 2008-01-29 11:29 ` Moshe Yudkowsky @ 2008-01-29 14:07 ` Michael Tokarev 2008-01-29 14:47 ` Peter Rabbitson 2008-01-29 14:48 ` Keld Jørn Simonsen 1 sibling, 2 replies; 60+ messages in thread From: Michael Tokarev @ 2008-01-29 14:07 UTC (permalink / raw) To: Peter Rabbitson; +Cc: Moshe Yudkowsky, linux-raid Peter Rabbitson wrote: > Moshe Yudkowsky wrote: >> >> One of the puzzling things about this is that I conceive of RAID10 as >> two RAID1 pairs, with RAID0 on top of to join them into a large drive. >> However, when I use --level=10 to create my md drive, I cannot find >> out which two pairs are the RAID1's: the --detail doesn't give that >> information. Re-reading the md(4) man page, I think I'm badly mistaken >> about RAID10. >> >> Furthermore, since grub cannot find the /boot on the md drive, I >> deduce that RAID10 isn't what the 'net descriptions say it is. In fact, everything matches. For lilo to work, it basically needs a whole filesystem on the same physical drive. It's exactly the case with raid1 (and only). With raid10, half of the filesystem is on one mirror, and another half is on another mirror. Like this: filesystem blocks on raid0 blocks DiskA DiskB 0 0 1 1 2 2 3 3 4 4 5 5 .. (this is (this is the actual what LILO layout) expects) (Difference between raid10 and raid0 is that each of diskA and diskB is in fact composed of two identical devices). If your kernel is located in filesytem blocks number 2 and 3 for example, lilo has to read BOTH halves, but it is not smart enough to figure it out - it can only read everything from a single drive. > It is exactly what the names implies - a new kind of RAID :) The setup > you describe is not RAID10 it is RAID1+0. Raid10 IS RAID1+0 ;) It's just that linux raid10 driver can utilize more.. interesting ways to lay out the data. /mjt ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 14:07 ` Michael Tokarev @ 2008-01-29 14:47 ` Peter Rabbitson 2008-01-29 15:13 ` Michael Tokarev ` (2 more replies) 2008-01-29 14:48 ` Keld Jørn Simonsen 1 sibling, 3 replies; 60+ messages in thread From: Peter Rabbitson @ 2008-01-29 14:47 UTC (permalink / raw) To: Michael Tokarev; +Cc: Moshe Yudkowsky, linux-raid Michael Tokarev wrote: > Raid10 IS RAID1+0 ;) > It's just that linux raid10 driver can utilize more.. interesting ways > to lay out the data. This is misleading, and adds to the confusion existing even before linux raid10. When you say raid10 in the hardware raid world, what do you mean? Stripes of mirrors? Mirrors of stripes? Some proprietary extension? What Neil did was generalize the concept of N drives - M copies, and called it 10 because it could exactly mimic the layout of conventional 1+0 [*]. However thinking about md level 10 in the terms of RAID 1+0 is wrong. Two examples (there are many more): * mdadm -C -l 10 -n 3 -o f2 /dev/md10 /dev/sda1 /dev/sdb1 /dev/sdc1 Odd number of drives, no parity calculation overhead, yet the setup can still suffer a loss of a single drive * mdadm -C -l 10 -n 2 -o f2 /dev/md10 /dev/sda1 /dev/sdb1 This seems useless at first, as it effectively creates a RAID1 setup, without preserving the FS format on disk. However md10 has read balancing code, so one could get a single thread sustained read at a speed twice what he could possibly get with md1 in the current implementation I guess I will sit down tonight and craft some patches to the existing md* man pages. Some things are indeed left unsaid. Peter [*] The layout is the same but the functionality is different. If you have 1+0 on 4 drives, you can survive a loss of 2 drives as long as they are part of different mirrors. mdadm -C -l 10 -n 4 -o n2 <drives> however will _NOT_ survive a loss of 2 drives. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 14:47 ` Peter Rabbitson @ 2008-01-29 15:13 ` Michael Tokarev 2008-01-29 15:41 ` Peter Rabbitson ` (2 more replies) 2008-01-29 15:57 ` In this partition scheme, grub does not find md information? Moshe Yudkowsky 2008-01-30 11:03 ` David Greaves 2 siblings, 3 replies; 60+ messages in thread From: Michael Tokarev @ 2008-01-29 15:13 UTC (permalink / raw) To: Peter Rabbitson; +Cc: Moshe Yudkowsky, linux-raid Peter Rabbitson wrote: > Michael Tokarev wrote: > > Raid10 IS RAID1+0 ;) >> It's just that linux raid10 driver can utilize more.. interesting ways >> to lay out the data. > > This is misleading, and adds to the confusion existing even before linux > raid10. When you say raid10 in the hardware raid world, what do you > mean? Stripes of mirrors? Mirrors of stripes? Some proprietary extension? Mirrors of stripes makes no sense. > What Neil did was generalize the concept of N drives - M copies, and > called it 10 because it could exactly mimic the layout of conventional > 1+0 [*]. However thinking about md level 10 in the terms of RAID 1+0 is > wrong. Two examples (there are many more): > > * mdadm -C -l 10 -n 3 -o f2 /dev/md10 /dev/sda1 /dev/sdb1 /dev/sdc1 ^^^^ ^^^^^ Those are "interesting ways" > Odd number of drives, no parity calculation overhead, yet the setup can > still suffer a loss of a single drive > > * mdadm -C -l 10 -n 2 -o f2 /dev/md10 /dev/sda1 /dev/sdb1 ^^^^^ And this one too. There are more-or-less standard raid LEVELS, including raid10 (which is the same as raid1+0, or a stripe on top of mirrors - note it does not mean 4 drives, you can use 6 - stripe over 3 mirrors each of 2 components; or the reverse - stripe over 2 mirrors of 3 components each etc). Vendors often adds their own extensions, sometimes calling them as the original level, and sometimes giving them new names, especially in the marketing speak. Linux raid10 MODULE (which implements that standard raid10 LEVEL in full) adds some quite.. unusual extensions to that standard raid10 LEVEL. The resulting layout is also called raid10 in linux (ie, not giving new names), but it's not that raid10 (which is again the same as raid1+0) as commonly known in various literature and on the internet. Yet raid10 module fully implements STANDARD raid10 LEVEL. /mjt ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 15:13 ` Michael Tokarev @ 2008-01-29 15:41 ` Peter Rabbitson 2008-01-29 16:51 ` Michael Tokarev 2008-01-29 16:16 ` Moshe Yudkowsky 2008-01-29 16:26 ` Keld Jørn Simonsen 2 siblings, 1 reply; 60+ messages in thread From: Peter Rabbitson @ 2008-01-29 15:41 UTC (permalink / raw) To: Michael Tokarev; +Cc: Moshe Yudkowsky, linux-raid Michael Tokarev wrote: > Linux raid10 MODULE (which implements that standard raid10 > LEVEL in full) adds some quite.. unusual extensions to that > standard raid10 LEVEL. The resulting layout is also called > raid10 in linux (ie, not giving new names), but it's not that > raid10 (which is again the same as raid1+0) as commonly known > in various literature and on the internet. Yet raid10 module > fully implements STANDARD raid10 LEVEL. I will let Neil speak about what he meant by RAID10: whether it is raid10 + weird extensions, or a generalization of drive/stripe layouts. However if you want to be so anal about names and specifications: md raid 10 is not a _full_ 1+0 implementation. Consider the textbook scenario with 4 drives: (A mirroring B) striped with (C mirroring D) When only drives A and C are present, md raid 10 with near offset will not start, whereas "standard" RAID 1+0 is expected to keep clunking away. Peter ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 15:41 ` Peter Rabbitson @ 2008-01-29 16:51 ` Michael Tokarev 2008-01-29 17:51 ` Keld Jørn Simonsen 0 siblings, 1 reply; 60+ messages in thread From: Michael Tokarev @ 2008-01-29 16:51 UTC (permalink / raw) To: Peter Rabbitson; +Cc: Moshe Yudkowsky, linux-raid Peter Rabbitson wrote: [] > However if you want to be so anal about names and specifications: md > raid 10 is not a _full_ 1+0 implementation. Consider the textbook > scenario with 4 drives: > > (A mirroring B) striped with (C mirroring D) > > When only drives A and C are present, md raid 10 with near offset will > not start, whereas "standard" RAID 1+0 is expected to keep clunking away. Ugh. Yes. offset is linux extension. But md raid 10 with default, n2 (without offset), configuration will behave exactly like in "classic" docs. Again. Linux md raid10 module implements standard raid10 as known in all widely used docs. And IN ADDITION, it can do OTHER FORMS, which differs from "classic" variant. Pretty like a hardware raid card from a brand vendor probably implements their own variations of standard raid levels. /mjt ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 16:51 ` Michael Tokarev @ 2008-01-29 17:51 ` Keld Jørn Simonsen 0 siblings, 0 replies; 60+ messages in thread From: Keld Jørn Simonsen @ 2008-01-29 17:51 UTC (permalink / raw) To: Michael Tokarev; +Cc: Peter Rabbitson, Moshe Yudkowsky, linux-raid On Tue, Jan 29, 2008 at 07:51:07PM +0300, Michael Tokarev wrote: > Peter Rabbitson wrote: > [] > > However if you want to be so anal about names and specifications: md > > raid 10 is not a _full_ 1+0 implementation. Consider the textbook > > scenario with 4 drives: > > > > (A mirroring B) striped with (C mirroring D) > > > > When only drives A and C are present, md raid 10 with near offset will > > not start, whereas "standard" RAID 1+0 is expected to keep clunking away. > > Ugh. Yes. offset is linux extension. > > But md raid 10 with default, n2 (without offset), configuration will behave > exactly like in "classic" docs. I would like to understand this fully. What Peter described for mdraid10: " md raid 10 with near offset " I believe is vanilla raid10 without any options (or near=2, far=1). Will that not start if we are unlucky to have 2 drives failing, but we are lucky that the data on the two remaining drives actually have all the data? Same question for a raid10,f2 array. I think it would be easy to investigate, when the number of drives are even, if all data is present, and then happily run an array with some failing disks. Say for a 4 drive raid10,f2 disks A and D are failing, then all data should be present on drives B and C, given that A and C have the even chunks, and B and D have the odd chunks. Likewise for a 6 drive array, etc for all multiples of 2, with F2. best regards keld ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 15:13 ` Michael Tokarev 2008-01-29 15:41 ` Peter Rabbitson @ 2008-01-29 16:16 ` Moshe Yudkowsky 2008-01-29 16:34 ` Peter Rabbitson 2008-01-29 16:42 ` Michael Tokarev 2008-01-29 16:26 ` Keld Jørn Simonsen 2 siblings, 2 replies; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-29 16:16 UTC (permalink / raw) To: Michael Tokarev; +Cc: Peter Rabbitson, linux-raid Michael Tokarev wrote: > There are more-or-less standard raid LEVELS, including > raid10 (which is the same as raid1+0, or a stripe on top > of mirrors - note it does not mean 4 drives, you can > use 6 - stripe over 3 mirrors each of 2 components; or > the reverse - stripe over 2 mirrors of 3 components each > etc). Here's a baseline question: if I create a RAID10 array using default settings, what do I get? I thought I was getting RAID1+0; am I really? My superblocks, by the way, are marked version 01; my metadata in mdadm.conf asked for 1.2. I wonder what I really got. The real question in my mind now is why grub can't find the info, and either it's because of 1.2 superblocks or because of sub-partitioning of components. -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe "You may not be interested in war, but war is interested in you." -- Leon Trotsky ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 16:16 ` Moshe Yudkowsky @ 2008-01-29 16:34 ` Peter Rabbitson 2008-01-29 19:34 ` Moshe Yudkowsky 2008-01-30 12:01 ` Peter Rabbitson 2008-01-29 16:42 ` Michael Tokarev 1 sibling, 2 replies; 60+ messages in thread From: Peter Rabbitson @ 2008-01-29 16:34 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Michael Tokarev, linux-raid Moshe Yudkowsky wrote: > Here's a baseline question: if I create a RAID10 array using default > settings, what do I get? I thought I was getting RAID1+0; am I really? Maybe you are, depending on your settings, but this is beyond the point. No matter what 1+0 you have (linux, classic, or otherwise) you can not boot from it, as there is no way to see the underlying filesystem without the RAID layer. With the current state of affairs (available mainstream bootloaders) the rule is: Block devices containing the kernel/initrd image _must_ be either: * a regular block device (/sda1, /hda, /fd0, etc.) * or a linux RAID 1 with the superblock at the end of the device (0.9 or 1.2) > My superblocks, by the way, are marked version 01; my metadata in > mdadm.conf asked for 1.2. I wonder what I really got. This is how you find the actual raid version: mdadm -D /dev/md[X] | grep Version This will return a string of the form XX.YY.ZZ. Your superblock version is XX.YY. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 16:34 ` Peter Rabbitson @ 2008-01-29 19:34 ` Moshe Yudkowsky 2008-01-29 20:21 ` Keld Jørn Simonsen ` (2 more replies) 2008-01-30 12:01 ` Peter Rabbitson 1 sibling, 3 replies; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-29 19:34 UTC (permalink / raw) To: Peter Rabbitson; +Cc: Michael Tokarev, linux-raid I'd like to thank everyone who wrote in with comments and explanations. And in particular it's nice to see that I'm not the only one who's confused. I'm going to convert back to the RAID 1 setup I had before for /boot, 2 hot and 2 spare across four drives. No, that's wrong: 4 hot makes the most sense. And given that RAID 10 doesn't seem to confer (for me, as far as I can tell) advantages in speed or reliability -- or the ability to mount just one surviving disk of a mirrored pair -- over RAID 5, I think I'll convert back to RAID 5, put in a hot spare, and do regular backups (as always). Oh, and use reiserfs with data=journal. Comments back: Peter Rabbitson wrote: > Maybe you are, depending on your settings, but this is beyond the point. > No matter what 1+0 you have (linux, classic, or otherwise) you can not > boot from it, as there is no way to see the underlying filesystem > without the RAID layer. Sir, thank you for this unequivocal comment. This comment clears up all my confusion. I had a wrong mental model of how file system maps work. > With the current state of affairs (available mainstream bootloaders) the > rule is: > Block devices containing the kernel/initrd image _must_ be either: > * a regular block device (/sda1, /hda, /fd0, etc.) > * or a linux RAID 1 with the superblock at the end of the device > (0.9 or 1.2) Thaks even more: 1.2 it is. > This is how you find the actual raid version: > > mdadm -D /dev/md[X] | grep Version > > This will return a string of the form XX.YY.ZZ. Your superblock version > is XX.YY. Ah hah! Mr. Tokarev wrote: > By the way, on all our systems I use small (256Mb for small-software systems, > sometimes 512M, but 1G should be sufficient) partition for a root filesystem > (/etc, /bin, /sbin, /lib, and /boot), and put it on a raid1 on all... > ... doing [it] > this way, you always have all the tools necessary to repair a damaged system > even in case your raid didn't start, or you forgot where your root disk is > etc etc. An excellent idea. I was going to put just /boot on the RAID 1, but there's no reason why I can't add a bit more room and put them all there. (Because I was having so much fun on the install, I'm using 4GB that I was going to use for swap space to mount base install and I'm working from their to build the RAID. Same idea.) Hmmm... I wonder if this more expansive /bin, /sbin, and /lib causes hits on the RAID1 drive which ultimately degrade overall performance? /lib is hit only at boot time to load the kernel, I'll guess, but /bin includes such common tools as bash and grep. > Also, placing /dev on a tmpfs helps alot to minimize number of writes > necessary for root fs. Another interesting idea. I'm not familiar with using tmpfs (no need, until now); but I wonder how you create the devices you need when you're doing a rescue. Again, my thanks to everyone who responded and clarified. -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe "Practically perfect people never permit sentiment to muddle their thinking." -- Mary Poppins ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 19:34 ` Moshe Yudkowsky @ 2008-01-29 20:21 ` Keld Jørn Simonsen 2008-01-29 22:14 ` Moshe Yudkowsky 2008-01-29 23:44 ` Bill Davidsen 2008-01-30 13:11 ` Michael Tokarev 2 siblings, 1 reply; 60+ messages in thread From: Keld Jørn Simonsen @ 2008-01-29 20:21 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Peter Rabbitson, Michael Tokarev, linux-raid On Tue, Jan 29, 2008 at 01:34:37PM -0600, Moshe Yudkowsky wrote: > > I'm going to convert back to the RAID 1 setup I had before for /boot, 2 > hot and 2 spare across four drives. No, that's wrong: 4 hot makes the > most sense. > > And given that RAID 10 doesn't seem to confer (for me, as far as I can > tell) advantages in speed or reliability -- or the ability to mount just > one surviving disk of a mirrored pair -- over RAID 5, I think I'll > convert back to RAID 5, put in a hot spare, and do regular backups (as > always). Oh, and use reiserfs with data=journal. Hmm, my idea was to use a raid10,f2 4 disk raid for the /root, or a o2 layout. I think it would offer quite some speed advantage over raid5. At least I had on a 4 disk raid5 only a random performance of about 130 MB/s while the raid10 gave 180-200 MB/s. Also sequential read was significantly faster on raid10. I do think I can get about 320 MB/s on the raid10,f2, but I need to have a bigger power supply to support my disks before I can go on testing. The key here is bigger readahead. I only got 150 MB/s for raid5 sequential reads. I think the sequential read could be significant in the boot time, and then for the single user running on the system, namely the system administrator (=me), even under reasonable load. I would be interested if you would experiment with this wrt boot time, for example the difference between /root on a raid5, raid10,f2 and raid10,o2. > Comments back: > > Mr. Tokarev wrote: > > >By the way, on all our systems I use small (256Mb for small-software > >systems, > >sometimes 512M, but 1G should be sufficient) partition for a root > >filesystem > >(/etc, /bin, /sbin, /lib, and /boot), and put it on a raid1 on all... > >... doing [it] > >this way, you always have all the tools necessary to repair a damaged > >system > >even in case your raid didn't start, or you forgot where your root disk is > >etc etc. > > An excellent idea. I was going to put just /boot on the RAID 1, but > there's no reason why I can't add a bit more room and put them all > there. (Because I was having so much fun on the install, I'm using 4GB > that I was going to use for swap space to mount base install and I'm > working from their to build the RAID. Same idea.) If you put more than /boot on the raid1, then you will not get the added performance of raid10 for all your system utilities. I am not sure about redundance, but a raid1 and a raid10 should be equally vulnerable to a 1 disk faliure. If you use a 4 disk raid1 for /root, then of cause you can survive 3 disk crashes. I am not sure that 4 disks in a raid1 for /root give added performance, as grub only sees the /root raid1 as a normal disk, but maybe some kind of remounting makes it get its raid behaviour. > >Also, placing /dev on a tmpfs helps alot to minimize number of writes > >necessary for root fs. I thought of using the noatime mount option for /root. best regards Keld ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 20:21 ` Keld Jørn Simonsen @ 2008-01-29 22:14 ` Moshe Yudkowsky 2008-01-29 23:45 ` Bill Davidsen 2008-01-30 0:17 ` Keld Jørn Simonsen 0 siblings, 2 replies; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-29 22:14 UTC (permalink / raw) To: Keld Jørn Simonsen; +Cc: linux-raid Keld Jørn Simonsen wrote: Based on your reports of better performance on RAID10 -- which are more significant that I'd expected -- I'll just go with RAID10. The only question now is if LVM is worth the performance hit or not. > I would be interested if you would experiment with this wrt boot time, > for example the difference between /root on a raid5, raid10,f2 and raid10,o2. According to man md(4), the o2 is likely to offer the best combination of read and write performance. Why would you consider f2 instead? I'm unlike to do any testing beyond running bonnie++ or something similar once it's installed. -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 22:14 ` Moshe Yudkowsky @ 2008-01-29 23:45 ` Bill Davidsen 2008-01-30 0:13 ` Moshe Yudkowsky 2008-01-30 0:17 ` Keld Jørn Simonsen 1 sibling, 1 reply; 60+ messages in thread From: Bill Davidsen @ 2008-01-29 23:45 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Keld Jørn Simonsen, linux-raid Moshe Yudkowsky wrote: > Keld Jørn Simonsen wrote: > > Based on your reports of better performance on RAID10 -- which are > more significant that I'd expected -- I'll just go with RAID10. The > only question now is if LVM is worth the performance hit or not. > >> I would be interested if you would experiment with this wrt boot time, >> for example the difference between /root on a raid5, raid10,f2 and >> raid10,o2. > > According to man md(4), the o2 is likely to offer the best combination > of read and write performance. Why would you consider f2 instead? > f2 is faster for read, most systems spend more time reading than writing. > I'm unlike to do any testing beyond running bonnie++ or something > similar once it's installed. > > -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 23:45 ` Bill Davidsen @ 2008-01-30 0:13 ` Moshe Yudkowsky 2008-01-30 22:36 ` Bill Davidsen 0 siblings, 1 reply; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-30 0:13 UTC (permalink / raw) To: Bill Davidsen; +Cc: Keld Jørn Simonsen, linux-raid Bill Davidsen wrote: >> According to man md(4), the o2 is likely to offer the best combination >> of read and write performance. Why would you consider f2 instead? >> > f2 is faster for read, most systems spend more time reading than writing. According to md(4), offset "should give similar read characteristics to 'far' if a suitably large chunk size is used, but without as much seeking for writes." Is the man page not correct, conditionally true, or simply not understood by me (most likely case)? I wonder what "suitably large" is... -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe "The seconds marched past, transversing that mysterious boundary that separates the future from the past." -- Jack Vance, "The Face" ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 0:13 ` Moshe Yudkowsky @ 2008-01-30 22:36 ` Bill Davidsen 0 siblings, 0 replies; 60+ messages in thread From: Bill Davidsen @ 2008-01-30 22:36 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Keld Jørn Simonsen, linux-raid Moshe Yudkowsky wrote: > Bill Davidsen wrote: > >>> According to man md(4), the o2 is likely to offer the best >>> combination of read and write performance. Why would you consider f2 >>> instead? >>> >> f2 is faster for read, most systems spend more time reading than >> writing. > > According to md(4), offset "should give similar read characteristics > to 'far' if a suitably large chunk size is used, but without as much > seeking for writes." > > Is the man page not correct, conditionally true, or simply not > understood by me (most likely case)? > > I wonder what "suitably large" is... > My personal experience is that as chunk gets larger random write gets slower, sequential gets faster. I don't have numbers any more, but 20-30% is sort of the limit of what I saw for any chunk size I consider reasonable. f2 is faster for sequential reading, tune your system to annoy you least. ;-) -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 22:14 ` Moshe Yudkowsky 2008-01-29 23:45 ` Bill Davidsen @ 2008-01-30 0:17 ` Keld Jørn Simonsen 1 sibling, 0 replies; 60+ messages in thread From: Keld Jørn Simonsen @ 2008-01-30 0:17 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: linux-raid On Tue, Jan 29, 2008 at 04:14:24PM -0600, Moshe Yudkowsky wrote: > Keld Jørn Simonsen wrote: > > Based on your reports of better performance on RAID10 -- which are more > significant that I'd expected -- I'll just go with RAID10. The only > question now is if LVM is worth the performance hit or not. Hmm, LVM for what purpose? For the root system, I think it is not an issue. Just have a large enough partition, it is not more than 10- 20 GB anyway, which is around 1 % of the disk sizes that we talk about today with new disks in raids. > >I would be interested if you would experiment with this wrt boot time, > >for example the difference between /root on a raid5, raid10,f2 and > >raid10,o2. > > According to man md(4), the o2 is likely to offer the best combination > of read and write performance. Why would you consider f2 instead? I have no experiences with o2, and little experience with f2. But I kind of designed f2. I have not fully grasped o2 yet. But my take is that for writes, this would be random writes, and that is almost the same for all layouts. However, when/if a disk is faulty, then f2 has considerably worse performance for sequential reads, approximating the performance of random reads, which in some cases is about half the speed of sequential reads. For sequential reads and random reads I think f2 would be faster than o2, due to the smaller average seek times, and use of the faster part of the disk. I am still wondering how o2 gets to do striping, I don't understand it given the layout schemes I have seen. F2 OTOH is designed for striping. I would like to see some figures, tho. My testing environment is, as said, not operationable right now, but will be OK possibly later this week. > I'm unlike to do any testing beyond running bonnie++ or something > similar once it's installed. I do some crude testing like reading concurrently 1000 files of 20 MB, and then just cat file >/dev/null of a 4 GB file. The RAM caches needs to be not capable of holding the files. Looking on boot times could also be interesting. I would like as litte downtime as possible. But it depends on your purpose and thus pattern of use. Many systems tend to be read oriented, and for that I think f2 is the better alternative. best regards keld - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 19:34 ` Moshe Yudkowsky 2008-01-29 20:21 ` Keld Jørn Simonsen @ 2008-01-29 23:44 ` Bill Davidsen 2008-01-30 0:22 ` Keld Jørn Simonsen 2008-01-30 13:11 ` Michael Tokarev 2 siblings, 1 reply; 60+ messages in thread From: Bill Davidsen @ 2008-01-29 23:44 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Peter Rabbitson, Michael Tokarev, linux-raid Moshe Yudkowsky wrote: > I'd like to thank everyone who wrote in with comments and > explanations. And in particular it's nice to see that I'm not the only > one who's confused. > > I'm going to convert back to the RAID 1 setup I had before for /boot, > 2 hot and 2 spare across four drives. No, that's wrong: 4 hot makes > the most sense. > > And given that RAID 10 doesn't seem to confer (for me, as far as I can > tell) advantages in speed or reliability -- or the ability to mount > just one surviving disk of a mirrored pair -- over RAID 5, I think > I'll convert back to RAID 5, put in a hot spare, and do regular > backups (as always). Oh, and use reiserfs with data=journal. > Depending on near/far choices, raid10 should be faster than raid5, with far read should be quite a bit faster. You can't boot off raid10, and if you put your swap on it many recovery CDs won't use it. But for general use and swap on a normally booted system it is quite fast. > Comments back: > > Peter Rabbitson wrote: > >> Maybe you are, depending on your settings, but this is beyond the >> point. No matter what 1+0 you have (linux, classic, or otherwise) you >> can not boot from it, as there is no way to see the underlying >> filesystem without the RAID layer. > > Sir, thank you for this unequivocal comment. This comment clears up > all my confusion. I had a wrong mental model of how file system maps > work. > >> With the current state of affairs (available mainstream bootloaders) >> the rule is: >> Block devices containing the kernel/initrd image _must_ be either: >> * a regular block device (/sda1, /hda, /fd0, etc.) >> * or a linux RAID 1 with the superblock at the end of the device >> (0.9 or 1.2) > > Thaks even more: 1.2 it is. > >> This is how you find the actual raid version: >> >> mdadm -D /dev/md[X] | grep Version >> >> This will return a string of the form XX.YY.ZZ. Your superblock >> version is XX.YY. > > Ah hah! > > Mr. Tokarev wrote: > >> By the way, on all our systems I use small (256Mb for small-software >> systems, >> sometimes 512M, but 1G should be sufficient) partition for a root >> filesystem >> (/etc, /bin, /sbin, /lib, and /boot), and put it on a raid1 on all... >> ... doing [it] >> this way, you always have all the tools necessary to repair a damaged >> system >> even in case your raid didn't start, or you forgot where your root >> disk is >> etc etc. > > An excellent idea. I was going to put just /boot on the RAID 1, but > there's no reason why I can't add a bit more room and put them all > there. (Because I was having so much fun on the install, I'm using 4GB > that I was going to use for swap space to mount base install and I'm > working from their to build the RAID. Same idea.) > > Hmmm... I wonder if this more expansive /bin, /sbin, and /lib causes > hits on the RAID1 drive which ultimately degrade overall performance? > /lib is hit only at boot time to load the kernel, I'll guess, but /bin > includes such common tools as bash and grep. > >> Also, placing /dev on a tmpfs helps alot to minimize number of writes >> necessary for root fs. > > Another interesting idea. I'm not familiar with using tmpfs (no need, > until now); but I wonder how you create the devices you need when > you're doing a rescue. > > Again, my thanks to everyone who responded and clarified. > -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 23:44 ` Bill Davidsen @ 2008-01-30 0:22 ` Keld Jørn Simonsen 2008-01-30 0:26 ` Peter Rabbitson 2008-01-30 0:32 ` Moshe Yudkowsky 0 siblings, 2 replies; 60+ messages in thread From: Keld Jørn Simonsen @ 2008-01-30 0:22 UTC (permalink / raw) To: Bill Davidsen Cc: Moshe Yudkowsky, Peter Rabbitson, Michael Tokarev, linux-raid On Tue, Jan 29, 2008 at 06:44:20PM -0500, Bill Davidsen wrote: > Depending on near/far choices, raid10 should be faster than raid5, with > far read should be quite a bit faster. You can't boot off raid10, and if > you put your swap on it many recovery CDs won't use it. But for general > use and swap on a normally booted system it is quite fast. Hmm, why would you put swap on a raid10? I would in a production environment always put it on separate swap partitions, possibly a number, given that a number of drives are available. best regards keld ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 0:22 ` Keld Jørn Simonsen @ 2008-01-30 0:26 ` Peter Rabbitson 2008-01-30 22:39 ` Bill Davidsen 2008-01-30 0:32 ` Moshe Yudkowsky 1 sibling, 1 reply; 60+ messages in thread From: Peter Rabbitson @ 2008-01-30 0:26 UTC (permalink / raw) To: Keld Jørn Simonsen Cc: Bill Davidsen, Moshe Yudkowsky, Michael Tokarev, linux-raid Keld Jørn Simonsen wrote: > On Tue, Jan 29, 2008 at 06:44:20PM -0500, Bill Davidsen wrote: > >> Depending on near/far choices, raid10 should be faster than raid5, with >> far read should be quite a bit faster. You can't boot off raid10, and if >> you put your swap on it many recovery CDs won't use it. But for general >> use and swap on a normally booted system it is quite fast. > > Hmm, why would you put swap on a raid10? I would in a production > environment always put it on separate swap partitions, possibly a number, > given that a number of drives are available. > Because you want some redundancy for the swap as well. A swap partition/file becoming inaccessible is equivalent to yanking out a stick of memory out of your motherboard. Peter - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 0:26 ` Peter Rabbitson @ 2008-01-30 22:39 ` Bill Davidsen 0 siblings, 0 replies; 60+ messages in thread From: Bill Davidsen @ 2008-01-30 22:39 UTC (permalink / raw) To: Peter Rabbitson Cc: Keld Jørn Simonsen, Moshe Yudkowsky, Michael Tokarev, linux-raid Peter Rabbitson wrote: > Keld Jørn Simonsen wrote: >> On Tue, Jan 29, 2008 at 06:44:20PM -0500, Bill Davidsen wrote: >> >>> Depending on near/far choices, raid10 should be faster than raid5, >>> with far read should be quite a bit faster. You can't boot off >>> raid10, and if you put your swap on it many recovery CDs won't use >>> it. But for general use and swap on a normally booted system it is >>> quite fast. >> >> Hmm, why would you put swap on a raid10? I would in a production >> environment always put it on separate swap partitions, possibly a >> number, >> given that a number of drives are available. >> > > Because you want some redundancy for the swap as well. A swap > partition/file becoming inaccessible is equivalent to yanking out a > stick of memory out of your motherboard. I can't say it better. Losing a swap area will make the system fail in one way or the other, in my systems typicalls expressed as a crash of varying severity. I use raid10 because it is the fastest reliable level I've found. -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 0:22 ` Keld Jørn Simonsen 2008-01-30 0:26 ` Peter Rabbitson @ 2008-01-30 0:32 ` Moshe Yudkowsky 2008-01-30 0:53 ` Keld Jørn Simonsen 1 sibling, 1 reply; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-30 0:32 UTC (permalink / raw) To: Keld Jørn Simonsen; +Cc: linux-raid > Hmm, why would you put swap on a raid10? I would in a production > environment always put it on separate swap partitions, possibly a number, > given that a number of drives are available. I put swap onto non-RAID, separate partitions on all 4 disks. In a production server, however, I'd use swap on RAID in order to prevent server downtime if a disk fails -- a suddenly bad swap can easily (will absolutely?) cause the server to crash (even though you can boot the server up again afterwards on the surviving swap partitions). -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe "She will have fun who knows when to work and when not to work." -- Segami ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 0:32 ` Moshe Yudkowsky @ 2008-01-30 0:53 ` Keld Jørn Simonsen 2008-01-30 1:00 ` Moshe Yudkowsky 0 siblings, 1 reply; 60+ messages in thread From: Keld Jørn Simonsen @ 2008-01-30 0:53 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: linux-raid On Tue, Jan 29, 2008 at 06:32:54PM -0600, Moshe Yudkowsky wrote: > > >Hmm, why would you put swap on a raid10? I would in a production > >environment always put it on separate swap partitions, possibly a number, > >given that a number of drives are available. > > In a production server, however, I'd use swap on RAID in order to > prevent server downtime if a disk fails -- a suddenly bad swap can > easily (will absolutely?) cause the server to crash (even though you can > boot the server up again afterwards on the surviving swap partitions). I see. Which file system type would be good for this? I normally use XFS but maybe other FS is better, given that swap is used very randomly 8read/write). Will a bad swap crash the system? best regards keld ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 0:53 ` Keld Jørn Simonsen @ 2008-01-30 1:00 ` Moshe Yudkowsky 2008-01-31 14:40 ` Bill Davidsen 0 siblings, 1 reply; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-30 1:00 UTC (permalink / raw) To: Keld Jørn Simonsen; +Cc: linux-raid Keld Jørn Simonsen wrote: > On Tue, Jan 29, 2008 at 06:32:54PM -0600, Moshe Yudkowsky wrote: >>> Hmm, why would you put swap on a raid10? I would in a production >>> environment always put it on separate swap partitions, possibly a number, >>> given that a number of drives are available. >> In a production server, however, I'd use swap on RAID in order to >> prevent server downtime if a disk fails -- a suddenly bad swap can >> easily (will absolutely?) cause the server to crash (even though you can >> boot the server up again afterwards on the surviving swap partitions). > > I see. Which file system type would be good for this? > I normally use XFS but maybe other FS is better, given that swap is used > very randomly 8read/write). > > Will a bad swap crash the system? Well, Peter says it will, and that's good enough for me. :-) As for which file system: I would use fdisk to partition the md disk and then use mkswap on the partition to make it into a swap partition. It's a naive approach but I suspect it's almost certainly the correct one. -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe "There are more ways to skin a cat than nuking it from orbit -- but it's the only way to be sure." -- Eliezer Yudkowsky - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 1:00 ` Moshe Yudkowsky @ 2008-01-31 14:40 ` Bill Davidsen 0 siblings, 0 replies; 60+ messages in thread From: Bill Davidsen @ 2008-01-31 14:40 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Keld Jørn Simonsen, linux-raid Moshe Yudkowsky wrote: > Keld Jørn Simonsen wrote: >> On Tue, Jan 29, 2008 at 06:32:54PM -0600, Moshe Yudkowsky wrote: >>>> Hmm, why would you put swap on a raid10? I would in a production >>>> environment always put it on separate swap partitions, possibly a >>>> number, >>>> given that a number of drives are available. >>> In a production server, however, I'd use swap on RAID in order to >>> prevent server downtime if a disk fails -- a suddenly bad swap can >>> easily (will absolutely?) cause the server to crash (even though you >>> can boot the server up again afterwards on the surviving swap >>> partitions). >> >> I see. Which file system type would be good for this? >> I normally use XFS but maybe other FS is better, given that swap is used >> very randomly 8read/write). >> >> Will a bad swap crash the system? > > Well, Peter says it will, and that's good enough for me. :-) > I've done unplanned research into this, it will crash the system, and if you're unlucky some part of what's needed for a graceful crash will be swapped out :-( > As for which file system: I would use fdisk to partition the md disk > and then use mkswap on the partition to make it into a swap partition. > It's a naive approach but I suspect it's almost certainly the correct > one. > I generally dedicate a partition of each drive to swap, but the type is "raid array." Then I create a raid10 on that set of partitions and mkswap on the md device. While raid10 is fast and reliable, raid[56] have similar reliability and a higher usable space from any given configuration. -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 19:34 ` Moshe Yudkowsky 2008-01-29 20:21 ` Keld Jørn Simonsen 2008-01-29 23:44 ` Bill Davidsen @ 2008-01-30 13:11 ` Michael Tokarev 2008-01-30 14:10 ` Moshe Yudkowsky 2 siblings, 1 reply; 60+ messages in thread From: Michael Tokarev @ 2008-01-30 13:11 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Peter Rabbitson, linux-raid Moshe Yudkowsky wrote: [] > Mr. Tokarev wrote: > >> By the way, on all our systems I use small (256Mb for small-software systems, >> sometimes 512M, but 1G should be sufficient) partition for a root filesystem >> (/etc, /bin, /sbin, /lib, and /boot), and put it on a raid1 on all... >> ... doing [it] >> this way, you always have all the tools necessary to repair a damaged system >> even in case your raid didn't start, or you forgot where your root disk is >> etc etc. > > An excellent idea. I was going to put just /boot on the RAID 1, but > there's no reason why I can't add a bit more room and put them all > there. (Because I was having so much fun on the install, I'm using 4GB > that I was going to use for swap space to mount base install and I'm > working from their to build the RAID. Same idea.) > > Hmmm... I wonder if this more expansive /bin, /sbin, and /lib causes > hits on the RAID1 drive which ultimately degrade overall performance? > /lib is hit only at boot time to load the kernel, I'll guess, but /bin > includes such common tools as bash and grep. You don't care of the speed of your root filesystem. Note there are two speeds - write and read. You only write to root (including /bin and /lib and so on) during software (re)install and during some configuration work (writing /etc/password and the like). First is very infrequent, and both needs only a few writes, -- so write speed isn't important. Read speed also not that important, because most commonly used stuff from there will be cached anyway (like libc.so, bash and grep), and again, reading such tiny stuff - it doesn't matter if it's "fast" raid or a slow one. What you do care about the speed of devices where your large, commonly accessed/modified files - such as video files esp. when you want streaming video - are resides. And even here, unless you've special requirement for speed, you will not notice any difference between "slow" and "fast" raid levels. For typical filesystem usage, raid5 works good for both reads and (cached, delayed) writes. It's workloads like databases where raid5 performs badly. What you do care about is your data integrity. It's not really interesting to reinstall a system or lose your data in case if something goes wrong, and it's best to have recovery tools as easily available as possible. Plus, amount of space you need. >> Also, placing /dev on a tmpfs helps alot to minimize number of writes >> necessary for root fs. > > Another interesting idea. I'm not familiar with using tmpfs (no need, > until now); but I wonder how you create the devices you need when you're > doing a rescue. When you start udev, your /dev will be on tmpfs. /mjt ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 13:11 ` Michael Tokarev @ 2008-01-30 14:10 ` Moshe Yudkowsky 2008-01-30 14:41 ` Michael Tokarev 2008-01-31 14:59 ` Bill Davidsen 0 siblings, 2 replies; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-30 14:10 UTC (permalink / raw) To: Michael Tokarev; +Cc: linux-raid Michael Tokarev wrote: > You only write to root (including /bin and /lib and so on) during > software (re)install and during some configuration work (writing > /etc/password and the like). First is very infrequent, and both > needs only a few writes, -- so write speed isn't important. Thanks, but I didn't make myself clear. The preformance problem I'm concerned about was having different md drives accessing different partitions. For example, I can partition the drives as follows: /dev/sd[abcd]1 -- RAID1, /boot /dev/sd[abcd]2 -- RAID5, the rest of the file system I originally had asked, way back when, if having different md drives on different partitions of the *same* disk was a problem for perfomance -- or if, for some reason (e.g., threading) it was actually smarter to do it that way. The answer I received was from Iustin Pop, who said : Iustin Pop wrote: > md code works better if it's only one array per physical drive, > because it keeps statistics per array (like last accessed sector, > etc.) and if you combine two arrays on the same drive these > statistics are not exactly true anymore So if I use /boot on its own drive and it's only accessed at startup, the /boot will only be accessed that one time and afterwards won't cause problems for the drive statistics. However, if I use put /boot, /bin, and /sbin on this RAID1 drive, it will always be accessed and it might create a performance issue. To return to that peformance question, since I have to create at least 2 md drives using different partitions, I wonder if it's smarter to create multiple md drives for better performance. /dev/sd[abcd]1 -- RAID1, the /boot, /dev, /bin/, /sbin /dev/sd[abcd]2 -- RAID5, most of the rest of the file system /dev/sd[abcd]3 -- RAID10 o2, a drive that does a lot of downloading (writes) > For typical filesystem usage, raid5 works good for both reads > and (cached, delayed) writes. It's workloads like databases > where raid5 performs badly. Ah, very interesting. Is this true even for (dare I say it?) bittorrent downloads? > What you do care about is your data integrity. It's not really > interesting to reinstall a system or lose your data in case if > something goes wrong, and it's best to have recovery tools as > easily available as possible. Plus, amount of space you need. Sure, I understand. And backing up in case someone steals your server. But did you have something specific in mind when you wrote this? Don't all these configurations (RAID5 vs. RAID10) have equal recovery tools? Or were you referring to the file system? Reiserfs and XFS both seem to have decent recovery tools. LVM is a little tempting because it allows for snapshots, but on the other hand I wonder if I'd find it useful. >>> Also, placing /dev on a tmpfs helps alot to minimize number of writes >>> necessary for root fs. >> Another interesting idea. I'm not familiar with using tmpfs (no need, >> until now); but I wonder how you create the devices you need when you're >> doing a rescue. > > When you start udev, your /dev will be on tmpfs. Sure, that's what mount shows me right now -- using a standard Debian install. What did you suggest I change? -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe "Many that live deserve death. And some that die deserve life. Can you give it to them? Then do not be too eager to deal out death in judgement. For even the wise cannot see all ends." -- Gandalf (J.R.R. Tolkien) ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 14:10 ` Moshe Yudkowsky @ 2008-01-30 14:41 ` Michael Tokarev 2008-01-31 14:59 ` Bill Davidsen 1 sibling, 0 replies; 60+ messages in thread From: Michael Tokarev @ 2008-01-30 14:41 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: linux-raid Moshe Yudkowsky wrote: > Michael Tokarev wrote: > >> You only write to root (including /bin and /lib and so on) during >> software (re)install and during some configuration work (writing >> /etc/password and the like). First is very infrequent, and both >> needs only a few writes, -- so write speed isn't important. > > Thanks, but I didn't make myself clear. The preformance problem I'm > concerned about was having different md drives accessing different > partitions. > > For example, I can partition the drives as follows: > > /dev/sd[abcd]1 -- RAID1, /boot > > /dev/sd[abcd]2 -- RAID5, the rest of the file system > > I originally had asked, way back when, if having different md drives on > different partitions of the *same* disk was a problem for perfomance -- > or if, for some reason (e.g., threading) it was actually smarter to do > it that way. The answer I received was from Iustin Pop, who said : > > Iustin Pop wrote: >> md code works better if it's only one array per physical drive, >> because it keeps statistics per array (like last accessed sector, >> etc.) and if you combine two arrays on the same drive these >> statistics are not exactly true anymore > > So if I use /boot on its own drive and it's only accessed at startup, > the /boot will only be accessed that one time and afterwards won't cause > problems for the drive statistics. However, if I use put /boot, /bin, > and /sbin on this RAID1 drive, it will always be accessed and it might > create a performance issue. To be fair, I didn't notice any measurable difference in real life usage - be it single (probably partitioned further) large raid array or several separate arrays on different partitions - at least when there are two components - the "core system" (root fs) and the rest. Sure, theoretically it should be different, but it seems that in practice it doesn't make much of a difference. >> For typical filesystem usage, raid5 works good for both reads >> and (cached, delayed) writes. It's workloads like databases >> where raid5 performs badly. > > Ah, very interesting. Is this true even for (dare I say it?) bittorrent > downloads? I don't see why not. Bittorrent (and the like) writes quite intelligently, doing alot of buffering of its own. It writes SLOWLY. And it allows the filesystem to cache and optimize writes. >> What you do care about is your data integrity. It's not really >> interesting to reinstall a system or lose your data in case if >> something goes wrong, and it's best to have recovery tools as >> easily available as possible. Plus, amount of space you need. > > Sure, I understand. And backing up in case someone steals your server. > But did you have something specific in mind when you wrote this? Don't > all these configurations (RAID5 vs. RAID10) have equal recovery tools? Well, I mean that if you've all the basic tools available on the system even without raid (i mean, root fs on raid1 without any fancy stuff), you probably have more chances for recovery if it ever will be necessary. Yes, reconstructing raid10 is a bit easier than raid5, when the talk is about MANUAL reconstructing. But this is something usually not done anyway, because of complexity, easy to throw away the data by mistake, and because mdadm somewhat works for recovery by its own (there are cases when I know how to manually reconstruct the array data when mdadm can't help me - for example, in case of raid1 with two half-failed drives - ie, first half of driveA and second half of driveB works - mdadm wont let me recover from such situation even if I know that all my data is here). So basically there's no difference in "recoverability" of raid5 vs raid10. >>>> Also, placing /dev on a tmpfs helps alot to minimize number of writes >>>> necessary for root fs. >>> Another interesting idea. I'm not familiar with using tmpfs (no need, >>> until now); but I wonder how you create the devices you need when you're >>> doing a rescue. >> >> When you start udev, your /dev will be on tmpfs. > > Sure, that's what mount shows me right now -- using a standard Debian > install. What did you suggest I change? I didn't suggest any change. I just pointed out that /dev on a tmpfs reduces writes for root filesystem (as is mounting with -o noatime or -o nodiratime). With udev, /dev is already on a tmpfs. /mjt ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 14:10 ` Moshe Yudkowsky 2008-01-30 14:41 ` Michael Tokarev @ 2008-01-31 14:59 ` Bill Davidsen 2008-02-02 20:17 ` Bill Davidsen 1 sibling, 1 reply; 60+ messages in thread From: Bill Davidsen @ 2008-01-31 14:59 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Michael Tokarev, linux-raid Moshe Yudkowsky wrote: > Michael Tokarev wrote: > >> You only write to root (including /bin and /lib and so on) during >> software (re)install and during some configuration work (writing >> /etc/password and the like). First is very infrequent, and both >> needs only a few writes, -- so write speed isn't important. > > Thanks, but I didn't make myself clear. The preformance problem I'm > concerned about was having different md drives accessing different > partitions. > > For example, I can partition the drives as follows: > > /dev/sd[abcd]1 -- RAID1, /boot > > /dev/sd[abcd]2 -- RAID5, the rest of the file system > > I originally had asked, way back when, if having different md drives > on different partitions of the *same* disk was a problem for > perfomance -- or if, for some reason (e.g., threading) it was > actually smarter to do it that way. The answer I received was from > Iustin Pop, who said : > > Iustin Pop wrote: >> md code works better if it's only one array per physical drive, >> because it keeps statistics per array (like last accessed sector, >> etc.) and if you combine two arrays on the same drive these >> statistics are not exactly true anymore > > So if I use /boot on its own drive and it's only accessed at startup, > the /boot will only be accessed that one time and afterwards won't > cause problems for the drive statistics. However, if I use put /boot, > /bin, and /sbin on this RAID1 drive, it will always be accessed and it > might create a performance issue. > I always put /boot on a separate partition, just to run raid1 which I don't use elsewhere. > To return to that peformance question, since I have to create at least > 2 md drives using different partitions, I wonder if it's smarter to > create multiple md drives for better performance. > > /dev/sd[abcd]1 -- RAID1, the /boot, /dev, /bin/, /sbin > > /dev/sd[abcd]2 -- RAID5, most of the rest of the file system > > /dev/sd[abcd]3 -- RAID10 o2, a drive that does a lot of downloading > (writes) > I think the speed of downloads is so far below the capacity of an array that you won't notice, and hopefully you will use things you download more than once, so you still get more reads than writes. >> For typical filesystem usage, raid5 works good for both reads >> and (cached, delayed) writes. It's workloads like databases >> where raid5 performs badly. > > Ah, very interesting. Is this true even for (dare I say it?) > bittorrent downloads? > What do you have for bandwidth? Probably not more than a T3 (145Mbit) which will max out at ~15MB/s, far below the write performance of a single drive, much less an array (even raid5). >> What you do care about is your data integrity. It's not really >> interesting to reinstall a system or lose your data in case if >> something goes wrong, and it's best to have recovery tools as >> easily available as possible. Plus, amount of space you need. > > Sure, I understand. And backing up in case someone steals your server. > But did you have something specific in mind when you wrote this? Don't > all these configurations (RAID5 vs. RAID10) have equal recovery tools? > > Or were you referring to the file system? Reiserfs and XFS both seem > to have decent recovery tools. LVM is a little tempting because it > allows for snapshots, but on the other hand I wonder if I'd find it > useful. > If you are worried about performance, perhaps some reading of comments on LVM would be in order. I personally view it as a trade-off of performance for flexibility. > >>>> Also, placing /dev on a tmpfs helps alot to minimize number of writes >>>> necessary for root fs. >>> Another interesting idea. I'm not familiar with using tmpfs (no need, >>> until now); but I wonder how you create the devices you need when >>> you're >>> doing a rescue. >> >> When you start udev, your /dev will be on tmpfs. > > Sure, that's what mount shows me right now -- using a standard Debian > install. What did you suggest I change? > > -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-31 14:59 ` Bill Davidsen @ 2008-02-02 20:17 ` Bill Davidsen 0 siblings, 0 replies; 60+ messages in thread From: Bill Davidsen @ 2008-02-02 20:17 UTC (permalink / raw) To: linux-raid; +Cc: Moshe Yudkowsky, Michael Tokarev Bill Davidsen wrote: > Moshe Yudkowsky wrote: >> Michael Tokarev wrote: >> > >> To return to that peformance question, since I have to create at >> least 2 md drives using different partitions, I wonder if it's >> smarter to create multiple md drives for better performance. >> >> /dev/sd[abcd]1 -- RAID1, the /boot, /dev, /bin/, /sbin >> >> /dev/sd[abcd]2 -- RAID5, most of the rest of the file system >> >> /dev/sd[abcd]3 -- RAID10 o2, a drive that does a lot of downloading >> (writes) >> > I think the speed of downloads is so far below the capacity of an > array that you won't notice, and hopefully you will use things you > download more than once, so you still get more reads than writes. > >>> For typical filesystem usage, raid5 works good for both reads >>> and (cached, delayed) writes. It's workloads like databases >>> where raid5 performs badly. >> >> Ah, very interesting. Is this true even for (dare I say it?) >> bittorrent downloads? >> > What do you have for bandwidth? Probably not more than a T3 (145Mbit) > which will max out at ~15MB/s, far below the write performance of a > single drive, much less an array (even raid5). It has been pointed out that I have a double typo there, I meant OC3 not T3, and 155Mbit. Still, the most someone is likely to have, even in a large company. Still not a large chance of being faster than the disk in raid-10 mode. -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 16:34 ` Peter Rabbitson 2008-01-29 19:34 ` Moshe Yudkowsky @ 2008-01-30 12:01 ` Peter Rabbitson 1 sibling, 0 replies; 60+ messages in thread From: Peter Rabbitson @ 2008-01-30 12:01 UTC (permalink / raw) Cc: linux-raid Peter Rabbitson wrote: > Moshe Yudkowsky wrote: >> Here's a baseline question: if I create a RAID10 array using default >> settings, what do I get? I thought I was getting RAID1+0; am I really? > > Maybe you are, depending on your settings, but this is beyond the point. > No matter what 1+0 you have (linux, classic, or otherwise) you can not > boot from it, as there is no way to see the underlying filesystem > without the RAID layer. > > With the current state of affairs (available mainstream bootloaders) the > rule is: > Block devices containing the kernel/initrd image _must_ be either: > * a regular block device (/sda1, /hda, /fd0, etc.) > * or a linux RAID 1 with the superblock at the end of the device > (0.9 or 1.2) > > If any poor soul finds this in the mailing list archives, the above should read: ... * or a linux RAID 1 with the superblock at the end of the device (either version 0.9 or _1.0_) .... ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 16:16 ` Moshe Yudkowsky 2008-01-29 16:34 ` Peter Rabbitson @ 2008-01-29 16:42 ` Michael Tokarev 1 sibling, 0 replies; 60+ messages in thread From: Michael Tokarev @ 2008-01-29 16:42 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Peter Rabbitson, linux-raid Moshe Yudkowsky wrote: > Michael Tokarev wrote: > >> There are more-or-less standard raid LEVELS, including >> raid10 (which is the same as raid1+0, or a stripe on top >> of mirrors - note it does not mean 4 drives, you can >> use 6 - stripe over 3 mirrors each of 2 components; or >> the reverse - stripe over 2 mirrors of 3 components each >> etc). > > Here's a baseline question: if I create a RAID10 array using default > settings, what do I get? I thought I was getting RAID1+0; am I really? ..default settings AND even (4, 6, 8, 10, ...) number of drives. It will be "standard" raid10 or raid1+0 which is the same, as many stripes of mirrored (2 copies) data as fits with the number of disks. With odd number of disks it obviously will be soemthing else, not a "standard" raid10. > My superblocks, by the way, are marked version 01; my metadata in > mdadm.conf asked for 1.2. I wonder what I really got. The real question Ugh. Another source of confusion. In --superblock=1.2, "1" stands for the format, and "2" stands for the placement. So it's really format version 1. From mdadm(8): 1, 1.0, 1.1, 1.2 Use the new version-1 format superblock. This has few restrictions. The different sub-versions store the superblock at different locations on the device, either at the end (for 1.0), at the start (for 1.1) or 4K from the start (for 1.2). > in my mind now is why grub can't find the info, and either it's because > of 1.2 superblocks or because of sub-partitioning of components. As has been said numerous times in this thread, grub can't be used with anything but raid1 to start with (the same is true for lilo). Raid10 (or raid1+0, which is the same) - be it standard or linux extension format - is NOT raid1. /mjt ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 15:13 ` Michael Tokarev 2008-01-29 15:41 ` Peter Rabbitson 2008-01-29 16:16 ` Moshe Yudkowsky @ 2008-01-29 16:26 ` Keld Jørn Simonsen 2008-01-29 16:46 ` Michael Tokarev 2 siblings, 1 reply; 60+ messages in thread From: Keld Jørn Simonsen @ 2008-01-29 16:26 UTC (permalink / raw) To: Michael Tokarev; +Cc: Peter Rabbitson, Moshe Yudkowsky, linux-raid On Tue, Jan 29, 2008 at 06:13:41PM +0300, Michael Tokarev wrote: > > Linux raid10 MODULE (which implements that standard raid10 > LEVEL in full) adds some quite.. unusual extensions to that > standard raid10 LEVEL. The resulting layout is also called > raid10 in linux (ie, not giving new names), but it's not that > raid10 (which is again the same as raid1+0) as commonly known > in various literature and on the internet. Yet raid10 module > fully implements STANDARD raid10 LEVEL. My understanding is that you can have a linux raid10 of only 2 drives, while the standard RAID 1+0 requires 4 drives, so this is a huge difference. I am not sure what vanilla linux raid10 (near=2, far=1) has of properties. I think it can run with only 1 disk, but I think it does not have striping capabilities. It would be nice to have more info on this, eg in the man page. Is there an official web page for mdadm? And maybe the raid faq could be updated? best regards keld ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 16:26 ` Keld Jørn Simonsen @ 2008-01-29 16:46 ` Michael Tokarev 2008-01-29 18:01 ` Keld Jørn Simonsen 0 siblings, 1 reply; 60+ messages in thread From: Michael Tokarev @ 2008-01-29 16:46 UTC (permalink / raw) To: Keld Jørn Simonsen; +Cc: Peter Rabbitson, Moshe Yudkowsky, linux-raid Keld Jørn Simonsen wrote: > On Tue, Jan 29, 2008 at 06:13:41PM +0300, Michael Tokarev wrote: >> Linux raid10 MODULE (which implements that standard raid10 >> LEVEL in full) adds some quite.. unusual extensions to that >> standard raid10 LEVEL. The resulting layout is also called >> raid10 in linux (ie, not giving new names), but it's not that >> raid10 (which is again the same as raid1+0) as commonly known >> in various literature and on the internet. Yet raid10 module >> fully implements STANDARD raid10 LEVEL. > > My understanding is that you can have a linux raid10 of only 2 > drives, while the standard RAID 1+0 requires 4 drives, so this is a huge > difference. Ugh. 2-drive raid10 is effectively just a raid1. I.e, mirroring without any striping. (Or, backwards, striping without mirroring). So to say, raid1 is just one particular configuration of raid10 - with only one mirror. Pretty much like with raid5 of 2 disks - it's the same as raid1. > I am not sure what vanilla linux raid10 (near=2, far=1) > has of properties. I think it can run with only 1 disk, but I think it number of copies should be <= number of disks, so no. > does not have striping capabilities. It would be nice to have more > info on this, eg in the man page. It's all in there really. See md(4). Maybe it's not that verbose, but it's not a user's guide (as in: a large book), after all. /mjt - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 16:46 ` Michael Tokarev @ 2008-01-29 18:01 ` Keld Jørn Simonsen 2008-01-30 13:37 ` Michael Tokarev 0 siblings, 1 reply; 60+ messages in thread From: Keld Jørn Simonsen @ 2008-01-29 18:01 UTC (permalink / raw) To: Michael Tokarev; +Cc: Peter Rabbitson, Moshe Yudkowsky, linux-raid On Tue, Jan 29, 2008 at 07:46:58PM +0300, Michael Tokarev wrote: > Keld Jørn Simonsen wrote: > > On Tue, Jan 29, 2008 at 06:13:41PM +0300, Michael Tokarev wrote: > >> Linux raid10 MODULE (which implements that standard raid10 > >> LEVEL in full) adds some quite.. unusual extensions to that > >> standard raid10 LEVEL. The resulting layout is also called > >> raid10 in linux (ie, not giving new names), but it's not that > >> raid10 (which is again the same as raid1+0) as commonly known > >> in various literature and on the internet. Yet raid10 module > >> fully implements STANDARD raid10 LEVEL. > > > > My understanding is that you can have a linux raid10 of only 2 > > drives, while the standard RAID 1+0 requires 4 drives, so this is a huge > > difference. > > Ugh. 2-drive raid10 is effectively just a raid1. I.e, mirroring > without any striping. (Or, backwards, striping without mirroring). OK. uhm, well, I did not understand: "(Or, backwards, striping without mirroring)." I don't think a 2 drive vanilla raid10 will do striping. Please explain. > Pretty much like with raid5 of 2 disks - it's the same as raid1. I think in raid5 of 2 disks, half of the chunks are parity chynks which are evenly distributed over the two disks, and the parity chunk is the XOR of the data chunk. But maybe I am wrong. Also the behaviour of suce a raid5 is different from a raid1 as the parity chunk is not used as data. > > > I am not sure what vanilla linux raid10 (near=2, far=1) > > has of properties. I think it can run with only 1 disk, but I think it > > number of copies should be <= number of disks, so no. I have a clear understanding that in a vanilla linux raid10 (near=2, far=1) you can run with one failing disk, that is with only one working disk. Am I wrong? > > does not have striping capabilities. It would be nice to have more > > info on this, eg in the man page. > > It's all in there really. See md(4). Maybe it's not that > verbose, but it's not a user's guide (as in: a large book), > after all. Some man pages have examples. Or info could be written in the faq or in wikipedia. Best regards keld - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 18:01 ` Keld Jørn Simonsen @ 2008-01-30 13:37 ` Michael Tokarev 2008-01-30 14:47 ` Peter Rabbitson 0 siblings, 1 reply; 60+ messages in thread From: Michael Tokarev @ 2008-01-30 13:37 UTC (permalink / raw) To: Keld Jørn Simonsen; +Cc: Peter Rabbitson, Moshe Yudkowsky, linux-raid Keld Jørn Simonsen wrote: [] >> Ugh. 2-drive raid10 is effectively just a raid1. I.e, mirroring >> without any striping. (Or, backwards, striping without mirroring). > > uhm, well, I did not understand: "(Or, backwards, striping without > mirroring)." I don't think a 2 drive vanilla raid10 will do striping. Please explain. I was referring to raid0+1 here - mirror of stripes. Which makes no sense by its own, but when we create such thing on only 2 drives, it becomes just raid0... "Backwards" as raid1+0 vs raid0+1. This is just to show that various raid levels, in corner cases, tends to "transform" from one to another. >> Pretty much like with raid5 of 2 disks - it's the same as raid1. > > I think in raid5 of 2 disks, half of the chunks are parity chynks which > are evenly distributed over the two disks, and the parity chunk is the > XOR of the data chunk. But maybe I am wrong. Also the behaviour of suce > a raid5 is different from a raid1 as the parity chunk is not used as > data. With N-disk raid5, parity in a row is calculated by XORing together data from all the rest of the disks (N-1), ie, P = D1 ^ ... ^D(N-1). In case of 2-disk raid5 (it's also a corner case), the above formula becomes just P = D1. So, parity block in each row contains exactly the same data as data block, effectively turning the whole thing into a raid1 of two disks. Sure in raid5 parity blocks called just that - parity, but in reality that parity is THE SAME as data (again, in case of only 2-disk raid5). >>> I am not sure what vanilla linux raid10 (near=2, far=1) >>> has of properties. I think it can run with only 1 disk, but I think it >> number of copies should be <= number of disks, so no. > > I have a clear understanding that in a vanilla linux raid10 (near=2, far=1) > you can run with one failing disk, that is with only one working disk. > Am I wrong? In fact, with (all softs) or raid10, it's not only the number of drives that can fail that matters, but also WHICH drives can fail. In classic raid10: DiskA DiskB DiskC DiskD 0 0 1 1 2 2 3 3 4 4 5 5 .... (where numbers are the data blocks), you can have only 2 working disks (ie, 2 failed), but only from different pairs. You can't have A and B failed and C and D working for example - you'll lose half the data and thus the filesystem. You can have A and C failed however, or A and D, or B&C, or B&D. You see - in the above example, all numbers (data blocks) should be present at least once (after you pull a drive or two or more). If at least some numbers don't appear at all, your raid array's dead. Now write out the layout you want to use like the above, and try "removing" some drives, and see if you still have all numbers. For example, with 3-disk linux raid10: A B C 0 0 1 1 2 2 3 3 4 4 5 5 .... We can't pull 2 drives anymore here. Eg, pulling A&B removes 0 and 3. Pulling B&C removes 2 and 5. A&C = 1 and 4. With 5-drive linux raid10: A B C D E 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 ... A&B can't be removed - 0, 5. A&C CAN be removed, as are A&D. But not A&E - losing 2 and 7. And so on. 6-disk raid10 with 3 copies of each (near=3 with linux): A B C D E F 0 0 0 1 1 1 2 2 2 3 3 3 It can run as long as from each triple (ABC and DEF), at least one disk is here. Ie, you can lose up to 4 drives, as far as the condition is true. But if you lose only 3 - A&B&C or D&E&F - it can't work anymore. The same goes for raid5 and raid6, but they're symmetric -- any single (raid5) or double (raid6) disk failure is Ok. The principle is this: raid5: P = D1^D2^D3^...^D(N-1) so, you either have all Di (nothing to reconstruct), or you have all but one Di AND P - in this case, missing Dm can be recalculated as Dm = P^D1^...^D(m-1)^D(m+1)^...^D(N-1) (ie, a XOR of all the remaining blocks including parity). (exactly the same applies to raid4, because each row in raid4 is identical to that of raid5, the difference is that parity disk is different in each row in raid5, while in raid4 it stays the same). I wont write the formula for raid6 as it's somewhat more complicated, but the effect is the same - any data block can be reconstructed from any N-2 drives. /mjt - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 13:37 ` Michael Tokarev @ 2008-01-30 14:47 ` Peter Rabbitson 2008-01-30 15:21 ` Keld Jørn Simonsen 0 siblings, 1 reply; 60+ messages in thread From: Peter Rabbitson @ 2008-01-30 14:47 UTC (permalink / raw) To: Michael Tokarev; +Cc: Keld Jørn Simonsen, Moshe Yudkowsky, linux-raid Michael Tokarev wrote: > With 5-drive linux raid10: > > A B C D E > 0 0 1 1 2 > 2 3 3 4 4 > 5 5 6 6 7 > 7 8 8 9 9 > 10 10 11 11 12 > ... > > A&B can't be removed - 0, 5. A&C CAN be removed, as > are A&D. But not A&E - losing 2 and 7. And so on. I stand corrected by Michael, this is indeed the case with the current state of md raid 10. Either my observations were incorrect when I made them a year and a half ago, or some fixes went into the kernel since then. In any way - linux md10 does behave exactly as a classic raid 1+0 when created with -n D -p nS where D and S are both even and D = 2S. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 14:47 ` Peter Rabbitson @ 2008-01-30 15:21 ` Keld Jørn Simonsen 2008-01-30 15:35 ` Peter Rabbitson 0 siblings, 1 reply; 60+ messages in thread From: Keld Jørn Simonsen @ 2008-01-30 15:21 UTC (permalink / raw) To: Peter Rabbitson; +Cc: Michael Tokarev, Moshe Yudkowsky, linux-raid On Wed, Jan 30, 2008 at 03:47:30PM +0100, Peter Rabbitson wrote: > Michael Tokarev wrote: > > >With 5-drive linux raid10: > > > > A B C D E > > 0 0 1 1 2 > > 2 3 3 4 4 > > 5 5 6 6 7 > > 7 8 8 9 9 > > 10 10 11 11 12 > > ... > > > >A&B can't be removed - 0, 5. A&C CAN be removed, as > >are A&D. But not A&E - losing 2 and 7. And so on. I see. Does the kernel code allow this? And mdadm? And can B+E be removed safely, and C+E and B+D? best regards keld ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 15:21 ` Keld Jørn Simonsen @ 2008-01-30 15:35 ` Peter Rabbitson 2008-01-30 15:46 ` Loop devices to RAID? (was Re: In this partition scheme, grub does not find md information?) Moshe Yudkowsky 0 siblings, 1 reply; 60+ messages in thread From: Peter Rabbitson @ 2008-01-30 15:35 UTC (permalink / raw) To: Keld Jørn Simonsen; +Cc: Michael Tokarev, Moshe Yudkowsky, linux-raid Keld Jørn Simonsen wrote: > On Wed, Jan 30, 2008 at 03:47:30PM +0100, Peter Rabbitson wrote: >> Michael Tokarev wrote: >> >>> With 5-drive linux raid10: >>> >>> A B C D E >>> 0 0 1 1 2 >>> 2 3 3 4 4 >>> 5 5 6 6 7 >>> 7 8 8 9 9 >>> 10 10 11 11 12 >>> ... >>> >>> A&B can't be removed - 0, 5. A&C CAN be removed, as >>> are A&D. But not A&E - losing 2 and 7. And so on. > > I see. Does the kernel code allow this? And mdadm? > > And can B+E be removed safely, and C+E and B+D? > It seems like it. I just created the above raid configuration with 5 loop devices. Everything behaved just like Michael described. When the wrong drives disappeared - I started getting IO errors. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Loop devices to RAID? (was Re: In this partition scheme, grub does not find md information?) 2008-01-30 15:35 ` Peter Rabbitson @ 2008-01-30 15:46 ` Moshe Yudkowsky 2008-01-30 15:56 ` Tim Southerwood 0 siblings, 1 reply; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-30 15:46 UTC (permalink / raw) To: Peter Rabbitson; +Cc: linux-raid Peter Rabbitson wrote: > It seems like it. I just created the above raid configuration with 5 > loop devices. Everything behaved just like Michael described. When the > wrong drives disappeared - I started getting IO errors. My mind boggles. I know how to mount an ISO as a loop device onto the file system, but if you'd be so kind, can you give a super-brief description on how to get a loop device to look like an actual partition that can be made into a RAID array? I can see this software-only solution as being quite interesting for testing in general. -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe "I'm very well aquainted/with the seven deadly sins/ I keep a busy schedule/ to try to fit them in." -- Warren Zevon, "Mr. Bad Example" ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Loop devices to RAID? (was Re: In this partition scheme, grub does not find md information?) 2008-01-30 15:46 ` Loop devices to RAID? (was Re: In this partition scheme, grub does not find md information?) Moshe Yudkowsky @ 2008-01-30 15:56 ` Tim Southerwood 0 siblings, 0 replies; 60+ messages in thread From: Tim Southerwood @ 2008-01-30 15:56 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Peter Rabbitson, linux-raid Moshe Yudkowsky wrote: > My mind boggles. I know how to mount an ISO as a loop device onto the > file system, but if you'd be so kind, can you give a super-brief > description on how to get a loop device to look like an actual partition > that can be made into a RAID array? I can see this software-only > solution as being quite interesting for testing in general. > I tried this a while back, IIRC the procedure was: 1) Make some empty files of the required length each. 2) Use losetup to mount each one onto a loop device (loop0-3 say). 3) Use /dev/loop[0-3] as component devices to mdadm as you would use any other device or partition. It is not necessary to partition the loop devices, use them whole. HTH Tim ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 14:47 ` Peter Rabbitson 2008-01-29 15:13 ` Michael Tokarev @ 2008-01-29 15:57 ` Moshe Yudkowsky 2008-01-29 16:37 ` Keld Jørn Simonsen 2008-01-30 11:03 ` David Greaves 2008-01-30 11:03 ` David Greaves 2 siblings, 2 replies; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-29 15:57 UTC (permalink / raw) To: Peter Rabbitson; +Cc: linux-raid Peter Rabbitson wrote: > [*] The layout is the same but the functionality is different. If you > have 1+0 on 4 drives, you can survive a loss of 2 drives as long as they > are part of different mirrors. mdadm -C -l 10 -n 4 -o n2 <drives> > however will _NOT_ survive a loss of 2 drives. In my 4 drive system, I'm clearly not getting 1+0's ability to use grub out of the RAID10. I expect it's because I used 1.2 superblocks (why not use the latest, I said, foolishly...) and therefore the RAID10 -- with even number of drives -- can't be read by grub. If you'd patch that information into the man pages that'd be very useful indeed. Thanks for your attention to this! -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe "no user serviceable parts below this line" -- From a Perl program by mengwong@pobox.com ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 15:57 ` In this partition scheme, grub does not find md information? Moshe Yudkowsky @ 2008-01-29 16:37 ` Keld Jørn Simonsen 2008-01-29 16:57 ` Michael Tokarev 2008-01-30 11:03 ` David Greaves 1 sibling, 1 reply; 60+ messages in thread From: Keld Jørn Simonsen @ 2008-01-29 16:37 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Peter Rabbitson, linux-raid On Tue, Jan 29, 2008 at 09:57:48AM -0600, Moshe Yudkowsky wrote: > > In my 4 drive system, I'm clearly not getting 1+0's ability to use grub > out of the RAID10. I expect it's because I used 1.2 superblocks (why > not use the latest, I said, foolishly...) and therefore the RAID10 -- > with even number of drives -- can't be read by grub. If you'd patch that > information into the man pages that'd be very useful indeed. If you have 4 drives, I think the right thing is to use a raid1 with 4 drives, for your /boot partition. Then yo can survive that 3 disks crash! If you want the extra performance, then I think you should not bother too much for the kernel and initrd load time - which of cause is not striping on the disks, but some performance improvement can be expected. Then you can have the rest of /root on a raid10,f2 with 4 disks. best regards keld ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 16:37 ` Keld Jørn Simonsen @ 2008-01-29 16:57 ` Michael Tokarev 0 siblings, 0 replies; 60+ messages in thread From: Michael Tokarev @ 2008-01-29 16:57 UTC (permalink / raw) To: Keld Jørn Simonsen; +Cc: Moshe Yudkowsky, Peter Rabbitson, linux-raid Keld Jørn Simonsen wrote: > On Tue, Jan 29, 2008 at 09:57:48AM -0600, Moshe Yudkowsky wrote: >> In my 4 drive system, I'm clearly not getting 1+0's ability to use grub >> out of the RAID10. I expect it's because I used 1.2 superblocks (why >> not use the latest, I said, foolishly...) and therefore the RAID10 -- >> with even number of drives -- can't be read by grub. If you'd patch that >> information into the man pages that'd be very useful indeed. > > If you have 4 drives, I think the right thing is to use a raid1 with 4 > drives, for your /boot partition. Then yo can survive that 3 disks > crash! By the way, on all our systems I use small (256Mb for small-software systems, sometimes 512M, but 1G should be sufficient) partition for a root filesystem (/etc, /bin, /sbin, /lib, and /boot), and put it on a raid1 on all (usually identical) drives - be it 4 or 6 or more of them. Root filesystem does not change often, or at least it's write speed isn't that important. But doing this way, you always have all the tools necessary to repair a damaged system even in case your raid didn't start, or you forgot where your root disk is etc etc. But in this setup, /usr, /home, /var and so on should be separate partitions. Also, placing /dev on a tmpfs helps alot to minimize number of writes necessary for root fs. /mjt - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 15:57 ` In this partition scheme, grub does not find md information? Moshe Yudkowsky 2008-01-29 16:37 ` Keld Jørn Simonsen @ 2008-01-30 11:03 ` David Greaves 2008-01-30 11:44 ` Moshe Yudkowsky 2008-02-04 16:49 ` In this partition scheme, grub does not find md information? John Stoffel 1 sibling, 2 replies; 60+ messages in thread From: David Greaves @ 2008-01-30 11:03 UTC (permalink / raw) To: Moshe Yudkowsky, Neil Brown; +Cc: Peter Rabbitson, linux-raid, Michael Tokarev On 26 Oct 2007, Neil Brown wrote: >On Thursday October 25, david@dgreaves.com wrote: >> I also suspect that a *lot* of people will assume that the highest superblock >> version is the best and should be used for new installs etc. > > Grumble... why can't people expect what I want them to expect? Moshe Yudkowsky wrote: > I expect it's because I used 1.2 superblocks (why > not use the latest, I said, foolishly...) and therefore the RAID10 -- Aha - an 'in the wild' example of why we should deprecate '0.9 1.0 1.1, 1.2' and rename the superblocks to data-version + on-disk-location :) David ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 11:03 ` David Greaves @ 2008-01-30 11:44 ` Moshe Yudkowsky 2008-01-30 12:00 ` WRONG INFO (was Re: In this partition scheme, grub does not find md information?) Peter Rabbitson 2008-02-04 16:49 ` In this partition scheme, grub does not find md information? John Stoffel 1 sibling, 1 reply; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-30 11:44 UTC (permalink / raw) To: David Greaves; +Cc: linux-raid David Greaves wrote: > Moshe Yudkowsky wrote: >> I expect it's because I used 1.2 superblocks (why >> not use the latest, I said, foolishly...) and therefore the RAID10 -- > > Aha - an 'in the wild' example of why we should deprecate '0.9 1.0 1.1, 1.2' and > rename the superblocks to data-version + on-disk-location :) Even if renamed, I'd still need a Clue as to why to prefer one scheme over the other. For example, I've now learned that if I want to set up a RAID1 /boot, it must actually be 1.2 or grub won't be able to read it. (I would therefore argue that if the new version ever becomes default, then the default sub-version ought to be 1.2.) As to the wiki: I am not certain I found the Wiki you're referring to; I did find others, and none had the ringing clarity of Peter's definitive "RAID10 won't work for /boot." The process I'm going through -- cloning an old amd-k7 server into a new amd64 server -- is something I will document, and this particular grub issue is one of the things I intend to mention. So, where is this Wiki of which you speak? -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe "A kind word will go a long way, but a kind word and a gun will go even further." -- Al Capone ^ permalink raw reply [flat|nested] 60+ messages in thread
* WRONG INFO (was Re: In this partition scheme, grub does not find md information?) 2008-01-30 11:44 ` Moshe Yudkowsky @ 2008-01-30 12:00 ` Peter Rabbitson 2008-01-30 12:41 ` David Greaves 2008-01-30 13:39 ` Michael Tokarev 0 siblings, 2 replies; 60+ messages in thread From: Peter Rabbitson @ 2008-01-30 12:00 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: David Greaves, linux-raid Moshe Yudkowsky wrote: > over the other. For example, I've now learned that if I want to set up a > RAID1 /boot, it must actually be 1.2 or grub won't be able to read it. > (I would therefore argue that if the new version ever becomes default, > then the default sub-version ought to be 1.2.) In the discussion yesterday I myself made a serious typo, that should not spread. The only superblock version that will work with current GRUB is 1.0 _not_ 1.2. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: WRONG INFO (was Re: In this partition scheme, grub does not find md information?) 2008-01-30 12:00 ` WRONG INFO (was Re: In this partition scheme, grub does not find md information?) Peter Rabbitson @ 2008-01-30 12:41 ` David Greaves 2008-01-30 13:39 ` Michael Tokarev 1 sibling, 0 replies; 60+ messages in thread From: David Greaves @ 2008-01-30 12:41 UTC (permalink / raw) To: Peter Rabbitson; +Cc: Moshe Yudkowsky, linux-raid Peter Rabbitson wrote: > Moshe Yudkowsky wrote: >> over the other. For example, I've now learned that if I want to set up >> a RAID1 /boot, it must actually be 1.2 or grub won't be able to read >> it. (I would therefore argue that if the new version ever becomes >> default, then the default sub-version ought to be 1.2.) > > In the discussion yesterday I myself made a serious typo, that should > not spread. The only superblock version that will work with current GRUB > is 1.0 _not_ 1.2. Ah, the joys of consolidated and yet editable documentation - like a wiki.... David ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: WRONG INFO (was Re: In this partition scheme, grub does not find md information?) 2008-01-30 12:00 ` WRONG INFO (was Re: In this partition scheme, grub does not find md information?) Peter Rabbitson 2008-01-30 12:41 ` David Greaves @ 2008-01-30 13:39 ` Michael Tokarev 1 sibling, 0 replies; 60+ messages in thread From: Michael Tokarev @ 2008-01-30 13:39 UTC (permalink / raw) To: Peter Rabbitson; +Cc: Moshe Yudkowsky, David Greaves, linux-raid Peter Rabbitson wrote: > Moshe Yudkowsky wrote: >> over the other. For example, I've now learned that if I want to set up >> a RAID1 /boot, it must actually be 1.2 or grub won't be able to read >> it. (I would therefore argue that if the new version ever becomes >> default, then the default sub-version ought to be 1.2.) > > In the discussion yesterday I myself made a serious typo, that should > not spread. The only superblock version that will work with current GRUB > is 1.0 _not_ 1.2. Ghrrm. 1.0, or 0.9. 0.9 is still the default with mdadm. /mjt ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-30 11:03 ` David Greaves 2008-01-30 11:44 ` Moshe Yudkowsky @ 2008-02-04 16:49 ` John Stoffel 2008-02-04 17:26 ` Michael Tokarev 1 sibling, 1 reply; 60+ messages in thread From: John Stoffel @ 2008-02-04 16:49 UTC (permalink / raw) To: David Greaves Cc: Moshe Yudkowsky, Neil Brown, Peter Rabbitson, linux-raid, Michael Tokarev David> On 26 Oct 2007, Neil Brown wrote: >> On Thursday October 25, david@dgreaves.com wrote: >>> I also suspect that a *lot* of people will assume that the highest superblock >>> version is the best and should be used for new installs etc. >> >> Grumble... why can't people expect what I want them to expect? David> Moshe Yudkowsky wrote: >> I expect it's because I used 1.2 superblocks (why >> not use the latest, I said, foolishly...) and therefore the RAID10 -- David> Aha - an 'in the wild' example of why we should deprecate '0.9 David> 1.0 1.1, 1.2' and rename the superblocks to data-version + David> on-disk-location :) As the person who started this entire thread ages ago about the *poor* naming convetion used for RAID Superblocks, I have to agree. I'd much rather see 1.near, 1.far, 1.both or something like that added in. Heck, we don't have to remove the support for the old 1.0, 1.1, 1.2 names either, just make the default be something more user friendly. C'mon, how many of you are programmed to believe that 1.2 is better than 1.0? But when they're not different, just just different placements, then it's confusing. John ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-02-04 16:49 ` In this partition scheme, grub does not find md information? John Stoffel @ 2008-02-04 17:26 ` Michael Tokarev 0 siblings, 0 replies; 60+ messages in thread From: Michael Tokarev @ 2008-02-04 17:26 UTC (permalink / raw) To: John Stoffel; +Cc: David Greaves, Moshe Yudkowsky, Peter Rabbitson, linux-raid John Stoffel wrote: [] > C'mon, how many of you are programmed to believe that 1.2 is better > than 1.0? But when they're not different, just just different > placements, then it's confusing. Speaking of "more is better" thing... There were quite a few bugs fixed in recent months wrt version 1 superblocks - both in kernel and in mdadm. While 0.90 format is stable for a very long time, and unless you're hitting its limits (namely, max 26 drives in an array, no "homehost" field), there's nothing which makes v1 superblocks better than 0.90 ones. In my view, "better" = stable first, faster/easier/whatever second. /mjt ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 14:47 ` Peter Rabbitson 2008-01-29 15:13 ` Michael Tokarev 2008-01-29 15:57 ` In this partition scheme, grub does not find md information? Moshe Yudkowsky @ 2008-01-30 11:03 ` David Greaves 2 siblings, 0 replies; 60+ messages in thread From: David Greaves @ 2008-01-30 11:03 UTC (permalink / raw) To: Peter Rabbitson; +Cc: Michael Tokarev, Moshe Yudkowsky, linux-raid, keld Peter Rabbitson wrote: > I guess I will sit down tonight and craft some patches to the existing > md* man pages. Some things are indeed left unsaid. If you want to be more verbose than a man page allows then there's always the wiki/FAQ... http://linux-raid.osdl.org/ Keld Jørn Simonsen wrote: > Is there an official web page for mdadm? > And maybe the raid faq could be updated? That *is* the linux-raid FAQ brought up to date (with the consent of the original authors) Of course being a wiki means it is now a shared, community responsibility - and to all present and future readers: that means you too ;) David - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 14:07 ` Michael Tokarev 2008-01-29 14:47 ` Peter Rabbitson @ 2008-01-29 14:48 ` Keld Jørn Simonsen 2008-01-29 16:00 ` Moshe Yudkowsky 1 sibling, 1 reply; 60+ messages in thread From: Keld Jørn Simonsen @ 2008-01-29 14:48 UTC (permalink / raw) To: Michael Tokarev; +Cc: Peter Rabbitson, Moshe Yudkowsky, linux-raid On Tue, Jan 29, 2008 at 05:07:27PM +0300, Michael Tokarev wrote: > Peter Rabbitson wrote: > > Moshe Yudkowsky wrote: > >> > > > It is exactly what the names implies - a new kind of RAID :) The setup > > you describe is not RAID10 it is RAID1+0. > > Raid10 IS RAID1+0 ;) > It's just that linux raid10 driver can utilize more.. interesting ways > to lay out the data. My understandining is that raid10 is different from RAID1+0 Traditional RAID1+0 is composed of two RAID1's combined into one RAID0. It takes 4 drives to make it work. Linux raid10 only takes 2 drives to work. Traditional RAID1+0 only have one way of laying out the blocks. raid10 have a number of ways to do layout, namely the near, far and offset ways, layout=n2, f2, o2 respectively. Traditional RAID1+0 can only do striping of half of the disks involved, while raid10 can do striping on all disks in the far and offset layouts. I looked around on the net for documentation of this. The first hits (on Google) for mkadm did not have descriptions of raid10. Wikipedia describes raid 10 as a synonym for raid1+0. I think there is too much confusion on the raid10 term, and that also the marveleous linux raid10 layouts is a little known secret beyound maybe the circles of this linux-raid list. We should tell others more about the wondersi of raid10. And then I would like a good reference for describing how raid10,o2 works and why bigger chunks work. Best regards keld ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 14:48 ` Keld Jørn Simonsen @ 2008-01-29 16:00 ` Moshe Yudkowsky 2008-01-29 16:25 ` Peter Rabbitson 0 siblings, 1 reply; 60+ messages in thread From: Moshe Yudkowsky @ 2008-01-29 16:00 UTC (permalink / raw) To: Keld Jørn Simonsen; +Cc: linux-raid Keld Jørn Simonsen wrote: > raid10 have a number of ways to do layout, namely the near, far and > offset ways, layout=n2, f2, o2 respectively. The default layout, according to --detail, is "near=2, far=1." If I understand what's been written so far on the topic, that's automatically incompatible with 1+0. -- Moshe Yudkowsky * moshe@pobox.com * www.pobox.com/~moshe - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 16:00 ` Moshe Yudkowsky @ 2008-01-29 16:25 ` Peter Rabbitson 0 siblings, 0 replies; 60+ messages in thread From: Peter Rabbitson @ 2008-01-29 16:25 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Keld Jørn Simonsen, linux-raid Moshe Yudkowsky wrote: > Keld Jørn Simonsen wrote: > >> raid10 have a number of ways to do layout, namely the near, far and >> offset ways, layout=n2, f2, o2 respectively. > > The default layout, according to --detail, is "near=2, far=1." If I > understand what's been written so far on the topic, that's automatically > incompatible with 1+0. > Unfortunately you are interpreting this wrong as well. far=1 is just a way of saying 'no copies of type far'. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: In this partition scheme, grub does not find md information? 2008-01-29 11:02 ` Moshe Yudkowsky 2008-01-29 11:14 ` Peter Rabbitson @ 2008-01-29 14:04 ` Keld Jørn Simonsen 1 sibling, 0 replies; 60+ messages in thread From: Keld Jørn Simonsen @ 2008-01-29 14:04 UTC (permalink / raw) To: Moshe Yudkowsky; +Cc: Neil Brown, linux-raid On Tue, Jan 29, 2008 at 05:02:57AM -0600, Moshe Yudkowsky wrote: > Neil, thanks for writing. A couple of follow-up questions to you and the > group: > > If the answers above don't lead to a resolution, I can create two RAID1 > pairs and join them using LVM. I would take a hit by using LVM to tie > the pairs intead of RAID0, I suppose, but I would avoid the performance > hit of multiple md drives on a single physical drive, and I could even > run a hot spare through a sparing group. Any comments on the performance > hit -- is raid1L a really bad idea for some reason? You can of cause construct a traditional raid-1+0 in Linux as you describe here, but this is different from linux raid10 (with its different layout possibilities). And constructing two grub/lilos on two disks for a raid1 on /boot seems to be the right way for a reasonably secured system. best regards keld ^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2008-02-04 17:26 UTC | newest] Thread overview: 60+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-01-29 4:44 In this partition scheme, grub does not find md information? Moshe Yudkowsky 2008-01-29 5:08 ` Neil Brown 2008-01-29 11:02 ` Moshe Yudkowsky 2008-01-29 11:14 ` Peter Rabbitson 2008-01-29 11:29 ` Moshe Yudkowsky 2008-01-29 14:09 ` Michael Tokarev 2008-01-29 14:07 ` Michael Tokarev 2008-01-29 14:47 ` Peter Rabbitson 2008-01-29 15:13 ` Michael Tokarev 2008-01-29 15:41 ` Peter Rabbitson 2008-01-29 16:51 ` Michael Tokarev 2008-01-29 17:51 ` Keld Jørn Simonsen 2008-01-29 16:16 ` Moshe Yudkowsky 2008-01-29 16:34 ` Peter Rabbitson 2008-01-29 19:34 ` Moshe Yudkowsky 2008-01-29 20:21 ` Keld Jørn Simonsen 2008-01-29 22:14 ` Moshe Yudkowsky 2008-01-29 23:45 ` Bill Davidsen 2008-01-30 0:13 ` Moshe Yudkowsky 2008-01-30 22:36 ` Bill Davidsen 2008-01-30 0:17 ` Keld Jørn Simonsen 2008-01-29 23:44 ` Bill Davidsen 2008-01-30 0:22 ` Keld Jørn Simonsen 2008-01-30 0:26 ` Peter Rabbitson 2008-01-30 22:39 ` Bill Davidsen 2008-01-30 0:32 ` Moshe Yudkowsky 2008-01-30 0:53 ` Keld Jørn Simonsen 2008-01-30 1:00 ` Moshe Yudkowsky 2008-01-31 14:40 ` Bill Davidsen 2008-01-30 13:11 ` Michael Tokarev 2008-01-30 14:10 ` Moshe Yudkowsky 2008-01-30 14:41 ` Michael Tokarev 2008-01-31 14:59 ` Bill Davidsen 2008-02-02 20:17 ` Bill Davidsen 2008-01-30 12:01 ` Peter Rabbitson 2008-01-29 16:42 ` Michael Tokarev 2008-01-29 16:26 ` Keld Jørn Simonsen 2008-01-29 16:46 ` Michael Tokarev 2008-01-29 18:01 ` Keld Jørn Simonsen 2008-01-30 13:37 ` Michael Tokarev 2008-01-30 14:47 ` Peter Rabbitson 2008-01-30 15:21 ` Keld Jørn Simonsen 2008-01-30 15:35 ` Peter Rabbitson 2008-01-30 15:46 ` Loop devices to RAID? (was Re: In this partition scheme, grub does not find md information?) Moshe Yudkowsky 2008-01-30 15:56 ` Tim Southerwood 2008-01-29 15:57 ` In this partition scheme, grub does not find md information? Moshe Yudkowsky 2008-01-29 16:37 ` Keld Jørn Simonsen 2008-01-29 16:57 ` Michael Tokarev 2008-01-30 11:03 ` David Greaves 2008-01-30 11:44 ` Moshe Yudkowsky 2008-01-30 12:00 ` WRONG INFO (was Re: In this partition scheme, grub does not find md information?) Peter Rabbitson 2008-01-30 12:41 ` David Greaves 2008-01-30 13:39 ` Michael Tokarev 2008-02-04 16:49 ` In this partition scheme, grub does not find md information? John Stoffel 2008-02-04 17:26 ` Michael Tokarev 2008-01-30 11:03 ` David Greaves 2008-01-29 14:48 ` Keld Jørn Simonsen 2008-01-29 16:00 ` Moshe Yudkowsky 2008-01-29 16:25 ` Peter Rabbitson 2008-01-29 14:04 ` Keld Jørn Simonsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).