* "Missing" RAID devices @ 2013-05-21 12:51 Jim Santos 2013-05-21 15:31 ` Phil Turmel 2013-05-21 16:23 ` Doug Ledford 0 siblings, 2 replies; 23+ messages in thread From: Jim Santos @ 2013-05-21 12:51 UTC (permalink / raw) To: linux-raid Hi, I recently upgraded my 2 disk RAID-1 array from 1.5TB to 2TB disks. When I started I had 10 MD devices. Since the last partition was small, I removed the filesystems and deleted the associated RAID device. The I created two new devices and split the extra 500 GB between them. Everything was good until I rebooted. Now two of the raid devices are 'gone'. Here is what mdadm shows at the moment: santos@bender:/etc/mdadm$ sudo mdadm --examine --scan ARRAY /dev/md120 UUID=3fa7ec2e:7a19093c:34d348bd:5f81ddcb ARRAY /dev/md119 UUID=302f5a8a:f963c76e:53433e04:4146b04d ARRAY /dev/md125 UUID=a0535363:3799280a:1b1dd2b9:44680fa2 ARRAY /dev/md124 UUID=54742635:24ba2035:bcddc93e:73fc2c19 ARRAY /dev/md123 UUID=56613b81:36bc9bcd:fe538cc8:51817c6b ARRAY /dev/md126 UUID=c5d4e3e5:ab520847:f5483f4e:4a7373f8 ARRAY /dev/md127 UUID=3983523a:740e09fa:84f2985e:c6521efa ARRAY /dev/md121 UUID=35bd0dff:1fa423f5:f8fb6389:ecaefea8 ARRAY /dev/md122 UUID=ebbd009c:33fcc44c:8907793b:1d800ee1 ARRAY /dev/md/11 metadata=1.2 UUID=32bde3b1:b0475886:3ce4bba7:3bd12900 name=bender:11 ARRAY /dev/md/12 metadata=1.2 UUID=0257dbbd:42d8d666:2173709f:4dd0a1a6 name=bender:12 The last two devices are the new ones. fstab: UUID="57382674-3289-4cc8-a2d0-a57d44ea1458" /home ext3 defaults 2 UUID="478cca47-6f26-4550-9813-223d0d65b851" /mnt/part06 ext3 defaults 2 UUID="58c85e32-07d9-445d-b971-34d96c03a765" /mnt/part07 ext3 defaults 2 UUID="5d7d89cd-c04d-4916-bb9a-a28256d928a9" /mnt/part08 ext3 defaults 2 UUID="1da60cba-5f1d-4a8f-925a-6e3d80611b8d" /mnt/part09 ext3 defaults 2 UUID="241199c9-4b6a-49d6-a270-1db07691f99c" /mnt/part10 ext3 defaults 2 <-- Failed to mount UUID="af7dc965-261b-4995-84ca-0ebb0c014efb" /mnt/part11 ext3 defaults 2 <-- Failed to mount UUID="6a896a4e-aee0-41a8-8a29-2cb021f4149c" /mnt/part12 ext3 defaults 2 UUID="96379e3c-78f6-4cb4-984a-85be455263c3" /mnt/part13 ext3 defaults 2 UUID="2f8c7084-8610-4f11-97b8-5bc25ad9a7df" /mnt/part14 ext4 defaults 2 UUID="015fe797-e79c-4dd0-bbef-4c6c59287262" /mnt/part15 ext4 defaults 2 mount: /dev/md120 on /home type ext3 (rw) /dev/md119 on /mnt/part06 type ext3 (rw) /dev/md125 on /mnt/part07 type ext3 (rw) /dev/md124 on /mnt/part08 type ext3 (rw) /dev/md123 on /mnt/part09 type ext3 (rw) /dev/md121 on /mnt/part12 type ext3 (rw) /dev/md122 on /mnt/part13 type ext3 (rw) /dev/md126 on /mnt/part14 type ext4 (rw) <--- New /dev/md127 on /mnt/part15 type ext4 (rw) <--- New What seems to have happened is that /dev/md/11 and /dev/md/12 are now named /dev/md126 and /dev/md127. If I examine a couple of the partitions that should be part of /dev/md126 and /dev/md127, I see this: santos@bender:/etc/mdadm$ sudo mdadm --examine /dev/sda7 /dev/sda7: Magic : a92b4efc Version : 0.90.00 UUID : c5d4e3e5:ab520847:f5483f4e:4a7373f8 Creation Time : Fri Oct 30 12:30:04 2009 Raid Level : raid1 Used Dev Size : 157292288 (150.01 GiB 161.07 GB) Array Size : 157292288 (150.01 GiB 161.07 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 126 Update Time : Mon May 20 15:00:19 2013 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : fff3c4a9 - correct Events : 7003 Number Major Minor RaidDevice State this 0 8 7 0 active sync /dev/sda7 0 0 8 7 0 active sync /dev/sda7 1 1 8 23 1 active sync /dev/sdb7 santos@bender:/etc/mdadm$ sudo mdadm --examine /dev/sda8 /dev/sda8: Magic : a92b4efc Version : 0.90.00 UUID : 3983523a:740e09fa:84f2985e:c6521efa Creation Time : Fri Oct 30 12:30:21 2009 Raid Level : raid1 Used Dev Size : 157292288 (150.01 GiB 161.07 GB) Array Size : 157292288 (150.01 GiB 161.07 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 127 Update Time : Mon May 20 15:00:19 2013 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 47e70f24 - correct Events : 1665 Number Major Minor RaidDevice State this 0 8 8 0 active sync /dev/sda8 0 0 8 8 0 active sync /dev/sda8 1 1 8 24 1 active sync /dev/sdb8 Notice the Preferred Minor numbers. It looks like the two new RAID devices took over the devices with the highest minor numbers. I don't know what I did to get into this situation, but could really use some help getting out of it. TIA ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-21 12:51 "Missing" RAID devices Jim Santos @ 2013-05-21 15:31 ` Phil Turmel 2013-05-21 22:22 ` Jim Santos 2013-05-21 16:23 ` Doug Ledford 1 sibling, 1 reply; 23+ messages in thread From: Phil Turmel @ 2013-05-21 15:31 UTC (permalink / raw) To: Jim Santos; +Cc: linux-raid Hi Jim, On 05/21/2013 08:51 AM, Jim Santos wrote: > Hi, > I recently upgraded my 2 disk RAID-1 array from 1.5TB to 2TB disks. > When I started I had 10 MD devices. Since the last partition was > small, I removed the filesystems and deleted the associated RAID > device. The I created two new devices and split the extra 500 GB > between them. Everything was good until I rebooted. Now two of the > raid devices are 'gone'. [snip /] > Notice the Preferred Minor numbers. It looks like the two new RAID > devices took over the devices with the highest minor numbers. Preferred minors are only used when assembling with kernel internal auto-assembly (deprecated), which only works on meta-data v0.90, and only if an initramfs is not present. Boot-time assembly is otherwise governed by the copy of mdadm.conf in your initramfs. You appear to have failed to update your initramfs. This is complicated by your failure to avoid mdadm's "fallback" minor numbers that are used when an array is assembled without an entry in mdadm.conf. > I don't know what I did to get into this situation, but could really > use some help getting out of it. mdadm is called by modern initramfs boot scripts to assemble raid devices as they are encountered. If the given device is not a member of an array listed in mdadm.conf, mdadm picks the next unused minor number starting at 127 and counting down. Mdadm must have found the members of your new arrays before it found members of the arrays your old mdadm.conf listed for md127 and md126. Any time you update your mdadm.conf in your root filesystem, you must remember to regenerate your initramfs so mdadm has the correct information at boot time (before root is mounted). To minimize future confusion, I recommend you renumber all of your arrays in mdadm.conf starting with minor number 1. Then update your initramfs. Then reboot. Your fstab uses uuids (wisely), so you don't need any particular minor numbers. So you can and should avoid the minor numbers mdadm will use as defaults. HTH, Phil ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-21 15:31 ` Phil Turmel @ 2013-05-21 22:22 ` Jim Santos 2013-05-22 0:02 ` Phil Turmel 0 siblings, 1 reply; 23+ messages in thread From: Jim Santos @ 2013-05-21 22:22 UTC (permalink / raw) To: linux-raid Hi, Thanks for pointing out the initramfs problem. I guess I should have figured that out myself, since I've had to update initramfs in the past, but it just totally slipped my mind. And the strange device numbering just threw me complete off track. As far as how the devices got numbered that way in the first place, I really don't know. I assembled them and that is how it came out. Since I was initially doing this to learn about SW RAID, I'm sure that I made a rookie mistake or two along the way. The reason that there are so many filesystems is that I wanted to try to minimize any loss if one of them got corrupted. Maybe it isn't the best way to do it, but it made sense to me at the time. I am more than open to suggestions. When I started doing this to better understand SW RAID, I wanted to make things as simple as possible so I didn't use the LVM. That and it didn't seem like I would gain much by using it. Al I need is simple RAID1 devices I never planned on changing the layout other than maybe increasing the size of the disks. Maybe that flies in the face of 'best practices', since you can be sure what your future needs would be. How would you suggest I set things up if I did use LVs? /boot and / are on a separate disk on RAID1 devices with 1.x superblocks. At the moment, they are the only thing that aren't giving me a problem :-) Many thanks, JIm On Tue, May 21, 2013 at 11:31 AM, Phil Turmel <philip@turmel.org> wrote: > Hi Jim, > > On 05/21/2013 08:51 AM, Jim Santos wrote: >> Hi, >> I recently upgraded my 2 disk RAID-1 array from 1.5TB to 2TB disks. >> When I started I had 10 MD devices. Since the last partition was >> small, I removed the filesystems and deleted the associated RAID >> device. The I created two new devices and split the extra 500 GB >> between them. Everything was good until I rebooted. Now two of the >> raid devices are 'gone'. > > [snip /] > >> Notice the Preferred Minor numbers. It looks like the two new RAID >> devices took over the devices with the highest minor numbers. > > Preferred minors are only used when assembling with kernel internal > auto-assembly (deprecated), which only works on meta-data v0.90, and > only if an initramfs is not present. Boot-time assembly is otherwise > governed by the copy of mdadm.conf in your initramfs. > > You appear to have failed to update your initramfs. This is complicated > by your failure to avoid mdadm's "fallback" minor numbers that are used > when an array is assembled without an entry in mdadm.conf. > >> I don't know what I did to get into this situation, but could really >> use some help getting out of it. > > mdadm is called by modern initramfs boot scripts to assemble raid > devices as they are encountered. If the given device is not a member of > an array listed in mdadm.conf, mdadm picks the next unused minor number > starting at 127 and counting down. Mdadm must have found the members of > your new arrays before it found members of the arrays your old > mdadm.conf listed for md127 and md126. > > Any time you update your mdadm.conf in your root filesystem, you must > remember to regenerate your initramfs so mdadm has the correct > information at boot time (before root is mounted). > > To minimize future confusion, I recommend you renumber all of your > arrays in mdadm.conf starting with minor number 1. Then update your > initramfs. Then reboot. > > Your fstab uses uuids (wisely), so you don't need any particular minor > numbers. So you can and should avoid the minor numbers mdadm will use > as defaults. > > HTH, > > Phil ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-21 22:22 ` Jim Santos @ 2013-05-22 0:02 ` Phil Turmel 2013-05-22 0:16 ` Jim Santos 2013-05-22 22:43 ` Stan Hoeppner 0 siblings, 2 replies; 23+ messages in thread From: Phil Turmel @ 2013-05-22 0:02 UTC (permalink / raw) To: Jim Santos; +Cc: linux-raid Hi Jim, On 05/21/2013 06:22 PM, Jim Santos wrote: > Hi, > > Thanks for pointing out the initramfs problem. I guess I should have > figured that out myself, since I've had to update initramfs in the > past, but it just totally slipped my mind. And the strange device > numbering just threw me complete off track. Does this mean you're back to running? Did you follow my instructions? > As far as how the devices got numbered that way in the first place, I > really don't know. I assembled them and that is how it came out. > Since I was initially doing this to learn about SW RAID, I'm sure that > I made a rookie mistake or two along the way. No problem. You probably rebooted once between creating all your raids and generating the mdadm.conf file. (Using mdadm -Es >>/etc/mdadm.conf) The reboot would have cause initramfs assembly without instructions, using available minors starting at 127. Then the --scan into mdadm.conf would have "locked it in". > The reason that there are so many filesystems is that I wanted to try > to minimize any loss if one of them got corrupted. Maybe it isn't the > best way to do it, but it made sense to me at the time. I am more > than open to suggestions. > > When I started doing this to better understand SW RAID, I wanted to > make things as simple as possible so I didn't use the LVM. That and > it didn't seem like I would gain much by using it. Al I need is > simple RAID1 devices I never planned on changing the layout other than > maybe increasing the size of the disks. Maybe that flies in the face > of 'best practices', since you can be sure what your future needs > would be. How would you suggest I set things up if I did use LVs? Simple is good. My preferred setup for light duty is two arrays spread over all available disks. First is /dev/md1, a small (~500m) n-way mirror with v1.0 metadata for use as /boot. The other, /dev/md2, uses the balance of the disks in either raid10,far3 or raid6. If raid6, I use a chunk size of 16k. I put LVM on top of /dev/md2, with LVs for swap, /, /home, /tmp, and /bulk. The latter is for photos, music, video, mythtv, et cetera. I generally leave 10% of the volume group unallocated until I see how the usage patterns go. LVM makes it easy to add space to existing LVs on the run--even for the root filesystem. LVM also makes it possible to move LVs from one array to another without downtime. This is especially handy when you have a root filesystem inside a raid10. (MD raid10 cannot be reshaped yet.) Anyways, you asked my opinion. I don't run any heavy duty systems, so look to others for those situations. > /boot and / are on a separate disk on RAID1 devices with 1.x > superblocks. At the moment, they are the only thing that aren't > giving me a problem :-) I guess that means the answers to my first questions are no? Phil ps. The convention on kernel.org is to use reply-to-all, to trim replies, and to either bottom-post or interleave. FWIW. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-22 0:02 ` Phil Turmel @ 2013-05-22 0:16 ` Jim Santos 2013-05-22 22:43 ` Stan Hoeppner 1 sibling, 0 replies; 23+ messages in thread From: Jim Santos @ 2013-05-22 0:16 UTC (permalink / raw) To: Phil Turmel; +Cc: linux-raid On Tue, May 21, 2013 at 8:02 PM, Phil Turmel <philip@turmel.org> wrote: > > Does this mean you're back to running? Did you follow my instructions? > I hadn't when I last posted, but by following your instructions I got everything working a few minutes ago. Thanks for giving me a description of how you like to lay things out. Jim ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-22 0:02 ` Phil Turmel 2013-05-22 0:16 ` Jim Santos @ 2013-05-22 22:43 ` Stan Hoeppner 2013-05-22 23:26 ` Phil Turmel 1 sibling, 1 reply; 23+ messages in thread From: Stan Hoeppner @ 2013-05-22 22:43 UTC (permalink / raw) To: Phil Turmel, Linux RAID Sorry for the dup Phil, hit the wrong reply button. On 5/21/2013 7:02 PM, Phil Turmel wrote: ... > ...First is /dev/md1, a small (~500m) n-way > ...as /boot. The other, /dev/md2, uses > ...raid10,far3 or raid6. > > I put LVM on top of /dev/md2, with LVs for swap, ... /tmp Swap and tmp atop an LV atop RAID6? The former will always RMW on page writes, the latter quite often will cause RMW. As you stated your performance requirements are modest. However, for the archives, putting swap on a parity array, let alone a double parity array, is not good practice. -- Stan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-22 22:43 ` Stan Hoeppner @ 2013-05-22 23:26 ` Phil Turmel 2013-05-23 5:59 ` Stan Hoeppner 2013-05-23 8:22 ` David Brown 0 siblings, 2 replies; 23+ messages in thread From: Phil Turmel @ 2013-05-22 23:26 UTC (permalink / raw) To: stan; +Cc: Linux RAID On 05/22/2013 06:43 PM, Stan Hoeppner wrote: > Sorry for the dup Phil, hit the wrong reply button. No worries. > On 5/21/2013 7:02 PM, Phil Turmel wrote: > ... >> ...First is /dev/md1, a small (~500m) n-way >> ...as /boot. The other, /dev/md2, uses >> ...raid10,far3 or raid6. >> >> I put LVM on top of /dev/md2, with LVs for swap, ... /tmp > > Swap and tmp atop an LV atop RAID6? The former will always RMW on page > writes, the latter quite often will cause RMW. As you stated your > performance requirements are modest. However, for the archives, putting > swap on a parity array, let alone a double parity array, is not good > practice. Ah, good point. Hasn't hurt me yet, but it would if I pushed anything hard. I'll have to revise my baseline to always have a small raid10,f3 to go with the raid6. Meanwhile, I'm applying some of the general ideas I've seen from you: I've acquired a pair of Crucial M4 SSDs for my new home media server to keep small files and databases away from the bulk storage. Not in service yet, but I'm very pleased so far. I'm pretty sure the new kit is way overkill for a media server... :-) Thanks, Phil ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-22 23:26 ` Phil Turmel @ 2013-05-23 5:59 ` Stan Hoeppner 2013-05-23 8:30 ` keld 2013-05-23 8:22 ` David Brown 1 sibling, 1 reply; 23+ messages in thread From: Stan Hoeppner @ 2013-05-23 5:59 UTC (permalink / raw) To: Phil Turmel; +Cc: Linux RAID On 5/22/2013 6:26 PM, Phil Turmel wrote: > On 05/22/2013 06:43 PM, Stan Hoeppner wrote: >> Sorry for the dup Phil, hit the wrong reply button. > > No worries. > >> On 5/21/2013 7:02 PM, Phil Turmel wrote: >> ... >>> ...First is /dev/md1, a small (~500m) n-way >>> ...as /boot. The other, /dev/md2, uses >>> ...raid10,far3 or raid6. >>> >>> I put LVM on top of /dev/md2, with LVs for swap, ... /tmp >> >> Swap and tmp atop an LV atop RAID6? The former will always RMW on page >> writes, the latter quite often will cause RMW. As you stated your >> performance requirements are modest. However, for the archives, putting >> swap on a parity array, let alone a double parity array, is not good >> practice. > > Ah, good point. Hasn't hurt me yet, but it would if I pushed anything > hard. I'll have to revise my baseline to always have a small raid10,f3 > to go with the raid6. Yeah, the kicker here is that swap on a parity array seems to work fine, right up until the moment it doesn't. And that's when the kernel goes into heavy swapping due to any number of causes. When that happens, you're heavily into RMW, disk heads are bang'n, latency goes through the roof. If any programs are trying to access files on the parity array, say a mildly busy IMAP, FTP, etc, server, everything grinds to a halt. With your particular setup, instead you might use n additional partitions, one each across the physical disks that comprise your n-way RAID1. You would configure the partition type of each as (82) Linux swap, and add them all to fstab with equal priority. The kernel will interleave the 4KB swap page writes evenly across all of these partitions, yielding swap performance similar to an n-way RAID0 stripe. The downside to this setup is the kernel probably crashes if you lose one of these disks and thus the swap partition on it. So you could simply make another md/RAID1 of these n partitions if n is an odd number of spindles. Or n/2 RAID1 arrays if n is even. Then put one swap partition on each RAID1 device and do swap interleaving across the RAID1 pairs as described above in the non RAID case. The reason for this last configuration is simple-- more swap throughput for the same number of physical writes. With a 4 drive RAID1 and a single swap partition atop, each 4KB page write to swap generates a 4KB write to each of the 4 disks, 16KB total. If you create two RAID1s and put a swap partition on each and interleave them, each 4KB page write to swap generates only two 4KB writes, 8KB total. Here for each 16KB written you're moving two pages to swap instead of one. Thus your swap bandwidth is doubled. But you still have redundancy and crash avoidance if one disk fails. You may be tempted to use md/RAID10 of some layout to optimize for writes, but you'd gain nothing, and you'd lose some performance due to overhead. The partitions you'll be using in this case are so small that they easily fit in a single physical disk track, thus no head movement is required to seek between sectors, only rotation of the platter. Another advantage to this hybrid approach is less disk space consumed. If you need 8GB of swap, a 4-way RAID1 swap partition requires 32GB of disk space, 8GB per disk. With the n/2 RAID1 approach and 4 disks it requires half that, 16GB. With the no redundancy interleaved approach it requires 1/4th, only 2GB per disk, 8GB total. With today's mechanical disk capacities this isn't a concern. But if using SSDs it can be. > Meanwhile, I'm applying some of the general ideas I've seen from you: > I've acquired a pair of Crucial M4 SSDs for my new home media server to > keep small files and databases away from the bulk storage. Not in > service yet, but I'm very pleased so far. If the two are competing for seeks thus slowing everything down, moving the random access stuff to SSD should help. > I'm pretty sure the new kit is way overkill for a media server... :-) Not so many years ago folks would have said the same about 4TB mech drives. ;) -- Stan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-23 5:59 ` Stan Hoeppner @ 2013-05-23 8:30 ` keld 2013-05-24 3:45 ` Stan Hoeppner 0 siblings, 1 reply; 23+ messages in thread From: keld @ 2013-05-23 8:30 UTC (permalink / raw) To: Stan Hoeppner; +Cc: Phil Turmel, Linux RAID On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote: > On 5/22/2013 6:26 PM, Phil Turmel wrote: > > On 05/22/2013 06:43 PM, Stan Hoeppner wrote: > >> Sorry for the dup Phil, hit the wrong reply button. > > > > No worries. > > > >> On 5/21/2013 7:02 PM, Phil Turmel wrote: > >> ... > >>> ...First is /dev/md1, a small (~500m) n-way > >>> ...as /boot. The other, /dev/md2, uses > >>> ...raid10,far3 or raid6. > >>> > >>> I put LVM on top of /dev/md2, with LVs for swap, ... /tmp > >> > >> Swap and tmp atop an LV atop RAID6? The former will always RMW on page > >> writes, the latter quite often will cause RMW. As you stated your > >> performance requirements are modest. However, for the archives, putting > >> swap on a parity array, let alone a double parity array, is not good > >> practice. > > > > Ah, good point. Hasn't hurt me yet, but it would if I pushed anything > > hard. I'll have to revise my baseline to always have a small raid10,f3 > > to go with the raid6. > > Yeah, the kicker here is that swap on a parity array seems to work fine, > right up until the moment it doesn't. And that's when the kernel goes > into heavy swapping due to any number of causes. When that happens, > you're heavily into RMW, disk heads are bang'n, latency goes through the > roof. If any programs are trying to access files on the parity array, > say a mildly busy IMAP, FTP, etc, server, everything grinds to a halt. > > With your particular setup, instead you might use n additional > partitions, one each across the physical disks that comprise your n-way > RAID1. You would configure the partition type of each as (82) Linux > swap, and add them all to fstab with equal priority. The kernel will > interleave the 4KB swap page writes evenly across all of these > partitions, yielding swap performance similar to an n-way RAID0 stripe. > > The downside to this setup is the kernel probably crashes if you lose > one of these disks and thus the swap partition on it. So you could > simply make another md/RAID1 of these n partitions if n is an odd number > of spindles. Or n/2 RAID1 arrays if n is even. Then put one swap > partition on each RAID1 device and do swap interleaving across the RAID1 > pairs as described above in the non RAID case. > > The reason for this last configuration is simple-- more swap throughput > for the same number of physical writes. With a 4 drive RAID1 and a > single swap partition atop, each 4KB page write to swap generates a 4KB > write to each of the 4 disks, 16KB total. If you create two RAID1s and > put a swap partition on each and interleave them, each 4KB page write to > swap generates only two 4KB writes, 8KB total. Here for each 16KB > written you're moving two pages to swap instead of one. Thus your swap > bandwidth is doubled. But you still have redundancy and crash avoidance > if one disk fails. You may be tempted to use md/RAID10 of some layout > to optimize for writes, but you'd gain nothing, and you'd lose some > performance due to overhead. The partitions you'll be using in this > case are so small that they easily fit in a single physical disk track, > thus no head movement is required to seek between sectors, only rotation > of the platter. > > Another advantage to this hybrid approach is less disk space consumed. > If you need 8GB of swap, a 4-way RAID1 swap partition requires 32GB of > disk space, 8GB per disk. With the n/2 RAID1 approach and 4 disks it > requires half that, 16GB. With the no redundancy interleaved approach > it requires 1/4th, only 2GB per disk, 8GB total. With today's > mechanical disk capacities this isn't a concern. But if using SSDs it > can be. > > > Meanwhile, I'm applying some of the general ideas I've seen from you: > > I've acquired a pair of Crucial M4 SSDs for my new home media server to > > keep small files and databases away from the bulk storage. Not in > > service yet, but I'm very pleased so far. > > If the two are competing for seeks thus slowing everything down, moving > the random access stuff to SSD should help. > > > I'm pretty sure the new kit is way overkill for a media server... :-) > > Not so many years ago folks would have said the same about 4TB mech > drives. ;) I think a raid10,far3 is a good choice for swap, then you will enjoy RAID0-like reading speed. and good write speed (compared to raid6), and a chance of live surviving if just one drive keeps functioning. best regards keld ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-23 8:30 ` keld @ 2013-05-24 3:45 ` Stan Hoeppner 2013-05-24 6:32 ` keld 0 siblings, 1 reply; 23+ messages in thread From: Stan Hoeppner @ 2013-05-24 3:45 UTC (permalink / raw) To: keld; +Cc: Phil Turmel, Linux RAID On 5/23/2013 3:30 AM, keld@keldix.com wrote: > On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote: >> You may be tempted to use md/RAID10 of some layout >> to optimize for writes, but you'd gain nothing, and you'd lose some >> performance due to overhead. The partitions you'll be using in this >> case are so small that they easily fit in a single physical disk track, >> thus no head movement is required to seek between sectors, only rotation >> of the platter. ... > I think a raid10,far3 is a good choice for swap, then you will enjoy > RAID0-like reading speed. and good write speed (compared to raid6), > and a chance of live surviving if just one drive keeps functioning. As I mention above, none of the md/RAID10 layouts will yield any added performance benefit for swap partitions. And I state the reason why. If you think about this for a moment you should reach the same conclusion. -- Stan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-24 3:45 ` Stan Hoeppner @ 2013-05-24 6:32 ` keld 2013-05-24 7:37 ` Stan Hoeppner 2013-05-24 9:23 ` David Brown 0 siblings, 2 replies; 23+ messages in thread From: keld @ 2013-05-24 6:32 UTC (permalink / raw) To: Stan Hoeppner; +Cc: Phil Turmel, Linux RAID On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote: > On 5/23/2013 3:30 AM, keld@keldix.com wrote: > > On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote: > > >> You may be tempted to use md/RAID10 of some layout > >> to optimize for writes, but you'd gain nothing, and you'd lose some > >> performance due to overhead. The partitions you'll be using in this > >> case are so small that they easily fit in a single physical disk track, > >> thus no head movement is required to seek between sectors, only rotation > >> of the platter. > ... > > I think a raid10,far3 is a good choice for swap, then you will enjoy > > RAID0-like reading speed. and good write speed (compared to raid6), > > and a chance of live surviving if just one drive keeps functioning. > > As I mention above, none of the md/RAID10 layouts will yield any added > performance benefit for swap partitions. And I state the reason why. > If you think about this for a moment you should reach the same conclusion. I think it is you who are not fully aquainted with Linux MD. Linux MD RAID10,far3 offers improved performance in single read, which is an advantage for swap, when you are swapping in. Thinkk about it and try it out for yourself. Especially if we are talking 3 drives (far3), but also when you are talking more drives and only 2 copies. You don't get raid0 read performance in Linux on a combination of raid1 and raid0. best regards keld ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-24 6:32 ` keld @ 2013-05-24 7:37 ` Stan Hoeppner 2013-05-24 17:15 ` keld 2013-05-24 9:23 ` David Brown 1 sibling, 1 reply; 23+ messages in thread From: Stan Hoeppner @ 2013-05-24 7:37 UTC (permalink / raw) To: keld; +Cc: Phil Turmel, Linux RAID On 5/24/2013 1:32 AM, keld@keldix.com wrote: > On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote: >> On 5/23/2013 3:30 AM, keld@keldix.com wrote: >>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote: >> >>>> You may be tempted to use md/RAID10 of some layout >>>> to optimize for writes, but you'd gain nothing, and you'd lose some >>>> performance due to overhead. The partitions you'll be using in this >>>> case are so small that they easily fit in a single physical disk track, >>>> thus no head movement is required to seek between sectors, only rotation >>>> of the platter. >> ... >>> I think a raid10,far3 is a good choice for swap, then you will enjoy >>> RAID0-like reading speed. and good write speed (compared to raid6), >>> and a chance of live surviving if just one drive keeps functioning. >> >> As I mention above, none of the md/RAID10 layouts will yield any added >> performance benefit for swap partitions. And I state the reason why. >> If you think about this for a moment you should reach the same conclusion. > > I think it is you who are not fully aquainted with Linux MD. Linux > MD RAID10,far3 offers improved performance in single read, On most of today's systems, read performance is largely irrelevant WRT swap performance. However write performance is critical. None of the md/RAID10 layouts are going to increase write throughput over RAID1 pairs. And all the mirrored RAIDs will be 2x slower than interleaved swap across direct disk partitions. -- Stan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-24 7:37 ` Stan Hoeppner @ 2013-05-24 17:15 ` keld 2013-05-24 19:05 ` Stan Hoeppner 0 siblings, 1 reply; 23+ messages in thread From: keld @ 2013-05-24 17:15 UTC (permalink / raw) To: Stan Hoeppner; +Cc: Phil Turmel, Linux RAID On Fri, May 24, 2013 at 02:37:01AM -0500, Stan Hoeppner wrote: > On 5/24/2013 1:32 AM, keld@keldix.com wrote: > > On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote: > >> On 5/23/2013 3:30 AM, keld@keldix.com wrote: > >>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote: > >> > >>>> You may be tempted to use md/RAID10 of some layout > >>>> to optimize for writes, but you'd gain nothing, and you'd lose some > >>>> performance due to overhead. The partitions you'll be using in this > >>>> case are so small that they easily fit in a single physical disk track, > >>>> thus no head movement is required to seek between sectors, only rotation > >>>> of the platter. > >> ... > >>> I think a raid10,far3 is a good choice for swap, then you will enjoy > >>> RAID0-like reading speed. and good write speed (compared to raid6), > >>> and a chance of live surviving if just one drive keeps functioning. > >> > >> As I mention above, none of the md/RAID10 layouts will yield any added > >> performance benefit for swap partitions. And I state the reason why. > >> If you think about this for a moment you should reach the same conclusion. > > > > I think it is you who are not fully aquainted with Linux MD. Linux > > MD RAID10,far3 offers improved performance in single read, > > On most of today's systems, read performance is largely irrelevant WRT > swap performance. However write performance is critical. None of the > md/RAID10 layouts are going to increase write throughput over RAID1 > pairs. And all the mirrored RAIDs will be 2x slower than interleaved > swap across direct disk partitions. In my experience read performance from swap is critical, at least on single user systems. Eg swapping in firefox or libreoffice may take quite some time and there raid10,far helps by almost halfing the time for the swapping in. writes are not important, as long as you are not trashing. In general halfing the swapping in with raid10,far is nice for a process, but for small processes it is not noticable for a laptop user or a server user, say http or ftp. best regards keld ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-24 17:15 ` keld @ 2013-05-24 19:05 ` Stan Hoeppner 2013-05-24 19:22 ` keld 0 siblings, 1 reply; 23+ messages in thread From: Stan Hoeppner @ 2013-05-24 19:05 UTC (permalink / raw) To: keld; +Cc: Phil Turmel, Linux RAID On 5/24/2013 12:15 PM, keld@keldix.com wrote: > On Fri, May 24, 2013 at 02:37:01AM -0500, Stan Hoeppner wrote: >> On 5/24/2013 1:32 AM, keld@keldix.com wrote: >>> On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote: >>>> On 5/23/2013 3:30 AM, keld@keldix.com wrote: >>>>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote: >>>> >>>>>> You may be tempted to use md/RAID10 of some layout >>>>>> to optimize for writes, but you'd gain nothing, and you'd lose some >>>>>> performance due to overhead. The partitions you'll be using in this >>>>>> case are so small that they easily fit in a single physical disk track, >>>>>> thus no head movement is required to seek between sectors, only rotation >>>>>> of the platter. >>>> ... >>>>> I think a raid10,far3 is a good choice for swap, then you will enjoy >>>>> RAID0-like reading speed. and good write speed (compared to raid6), >>>>> and a chance of live surviving if just one drive keeps functioning. >>>> >>>> As I mention above, none of the md/RAID10 layouts will yield any added >>>> performance benefit for swap partitions. And I state the reason why. >>>> If you think about this for a moment you should reach the same conclusion. >>> >>> I think it is you who are not fully aquainted with Linux MD. Linux >>> MD RAID10,far3 offers improved performance in single read, >> >> On most of today's systems, read performance is largely irrelevant WRT >> swap performance. However write performance is critical. None of the >> md/RAID10 layouts are going to increase write throughput over RAID1 >> pairs. And all the mirrored RAIDs will be 2x slower than interleaved >> swap across direct disk partitions. > > In my experience read performance from swap is critical, at least > on single user systems. Eg swapping in firefox or libreoffice > may take quite some time and there raid10,far helps by almost halfing > the time for the swapping in. writes are not important, as long as you are not trashing. If a single user system has multiple drives configured in RAID10 and productivity applications are being swapped, then the user should be smacked in the head. 2GB DIMMs are $10. Any hard drive is $50+ but usually much more. This is not a valid argument. > In general halfing the swapping in with raid10,far is nice for a process, but > for small processes it is not noticable for a laptop user or a > server user, say http or ftp. Neither is this. Laptop users don't run RAID10. And server swap performance is all about page write, not read, as I previously stated. -- Stan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-24 19:05 ` Stan Hoeppner @ 2013-05-24 19:22 ` keld 2013-05-25 1:42 ` Stan Hoeppner 0 siblings, 1 reply; 23+ messages in thread From: keld @ 2013-05-24 19:22 UTC (permalink / raw) To: Stan Hoeppner; +Cc: Phil Turmel, Linux RAID On Fri, May 24, 2013 at 02:05:44PM -0500, Stan Hoeppner wrote: > On 5/24/2013 12:15 PM, keld@keldix.com wrote: > > On Fri, May 24, 2013 at 02:37:01AM -0500, Stan Hoeppner wrote: > >> On 5/24/2013 1:32 AM, keld@keldix.com wrote: > >>> On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote: > >>>> On 5/23/2013 3:30 AM, keld@keldix.com wrote: > >>>>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote: > >>>> > >>>>>> You may be tempted to use md/RAID10 of some layout > >>>>>> to optimize for writes, but you'd gain nothing, and you'd lose some > >>>>>> performance due to overhead. The partitions you'll be using in this > >>>>>> case are so small that they easily fit in a single physical disk track, > >>>>>> thus no head movement is required to seek between sectors, only rotation > >>>>>> of the platter. > >>>> ... > >>>>> I think a raid10,far3 is a good choice for swap, then you will enjoy > >>>>> RAID0-like reading speed. and good write speed (compared to raid6), > >>>>> and a chance of live surviving if just one drive keeps functioning. > >>>> > >>>> As I mention above, none of the md/RAID10 layouts will yield any added > >>>> performance benefit for swap partitions. And I state the reason why. > >>>> If you think about this for a moment you should reach the same conclusion. > >>> > >>> I think it is you who are not fully aquainted with Linux MD. Linux > >>> MD RAID10,far3 offers improved performance in single read, > >> > >> On most of today's systems, read performance is largely irrelevant WRT > >> swap performance. However write performance is critical. None of the > >> md/RAID10 layouts are going to increase write throughput over RAID1 > >> pairs. And all the mirrored RAIDs will be 2x slower than interleaved > >> swap across direct disk partitions. > > > > In my experience read performance from swap is critical, at least > > on single user systems. Eg swapping in firefox or libreoffice > > may take quite some time and there raid10,far helps by almost halfing > > the time for the swapping in. writes are not important, as long as you are not trashing. > > If a single user system has multiple drives configured in RAID10 and > productivity applications are being swapped, then the user should be > smacked in the head. 2GB DIMMs are $10. Any hard drive is $50+ but > usually much more. > > This is not a valid argument. And the cost to select, buy and install RAM is much more than USD 10. And some systems dont have room for more RAM, etc. > > In general halfing the swapping in with raid10,far is nice for a process, but > > for small processes it is not noticable for a laptop user or a > > server user, say http or ftp. > > Neither is this. Laptop users don't run RAID10. And server swap > performance is all about page write, not read, as I previously stated. Some laptop users and desktop users run raid10. I think all laptop and desktop users should have at least 2 disks and run mirrorred raid on them. Both for security and for performance. Server performance for properly configured servers will benefit from snappy swap read performance. Write performance of swap would normally not be noticable - not a bottleneck. best regards keld ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-24 19:22 ` keld @ 2013-05-25 1:42 ` Stan Hoeppner 0 siblings, 0 replies; 23+ messages in thread From: Stan Hoeppner @ 2013-05-25 1:42 UTC (permalink / raw) To: keld; +Cc: Phil Turmel, Linux RAID On 5/24/2013 2:22 PM, keld@keldix.com wrote: > On Fri, May 24, 2013 at 02:05:44PM -0500, Stan Hoeppner wrote: >> On 5/24/2013 12:15 PM, keld@keldix.com wrote: >>> On Fri, May 24, 2013 at 02:37:01AM -0500, Stan Hoeppner wrote: >>>> On 5/24/2013 1:32 AM, keld@keldix.com wrote: >>>>> On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote: >>>>>> On 5/23/2013 3:30 AM, keld@keldix.com wrote: >>>>>>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote: >>>>>> >>>>>>>> You may be tempted to use md/RAID10 of some layout >>>>>>>> to optimize for writes, but you'd gain nothing, and you'd lose some >>>>>>>> performance due to overhead. The partitions you'll be using in this >>>>>>>> case are so small that they easily fit in a single physical disk track, >>>>>>>> thus no head movement is required to seek between sectors, only rotation >>>>>>>> of the platter. >>>>>> ... >>>>>>> I think a raid10,far3 is a good choice for swap, then you will enjoy >>>>>>> RAID0-like reading speed. and good write speed (compared to raid6), >>>>>>> and a chance of live surviving if just one drive keeps functioning. >>>>>> >>>>>> As I mention above, none of the md/RAID10 layouts will yield any added >>>>>> performance benefit for swap partitions. And I state the reason why. >>>>>> If you think about this for a moment you should reach the same conclusion. >>>>> >>>>> I think it is you who are not fully aquainted with Linux MD. Linux >>>>> MD RAID10,far3 offers improved performance in single read, >>>> >>>> On most of today's systems, read performance is largely irrelevant WRT >>>> swap performance. However write performance is critical. None of the >>>> md/RAID10 layouts are going to increase write throughput over RAID1 >>>> pairs. And all the mirrored RAIDs will be 2x slower than interleaved >>>> swap across direct disk partitions. >>> >>> In my experience read performance from swap is critical, at least >>> on single user systems. Eg swapping in firefox or libreoffice >>> may take quite some time and there raid10,far helps by almost halfing >>> the time for the swapping in. writes are not important, as long as you are not trashing. >> >> If a single user system has multiple drives configured in RAID10 and >> productivity applications are being swapped, then the user should be >> smacked in the head. 2GB DIMMs are $10. Any hard drive is $50+ but >> usually much more. >> >> This is not a valid argument. > > And the cost to select, buy and install RAM is much more than USD 10. > And some systems dont have room for more RAM, etc. Yet another invalid, nonsensical argument... >>> In general halfing the swapping in with raid10,far is nice for a process, but >>> for small processes it is not noticable for a laptop user or a >>> server user, say http or ftp. >> >> Neither is this. Laptop users don't run RAID10. And server swap >> performance is all about page write, not read, as I previously stated. > > Some laptop users and desktop users run raid10. And you might find a polar bear or two living in the tropics. > I think all laptop > and desktop users should have at least 2 disks and run mirrorred raid on them. > Both for security and for performance. The World According to Keld. Most laptops can only house one drive. If they held two or more their on battery time would be useless. Most desktops are sold with only one drive. Yes, in a perfect world everyone would be redundant. > Server performance for properly configured servers will benefit from > snappy swap read performance. Write performance of swap would normally > not be noticable - not a bottleneck. Properly configured servers don't swap. When they need to swap it's typically because something has gone wrong. When this happens they need to free pages as quickly as possible. Once the problem no longer exists, the speed with which pages are brought back from swap isn't critical. Please put down the shovel. Your arguments keep digging you into a deeper hole. I'm not sure if you'll ever be able to climb out. -- Stan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-24 6:32 ` keld 2013-05-24 7:37 ` Stan Hoeppner @ 2013-05-24 9:23 ` David Brown 2013-05-24 18:03 ` keld 1 sibling, 1 reply; 23+ messages in thread From: David Brown @ 2013-05-24 9:23 UTC (permalink / raw) To: keld; +Cc: Stan Hoeppner, Phil Turmel, Linux RAID On 24/05/13 08:32, keld@keldix.com wrote: > On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote: >> On 5/23/2013 3:30 AM, keld@keldix.com wrote: >>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote: >> >>>> You may be tempted to use md/RAID10 of some layout >>>> to optimize for writes, but you'd gain nothing, and you'd lose some >>>> performance due to overhead. The partitions you'll be using in this >>>> case are so small that they easily fit in a single physical disk track, >>>> thus no head movement is required to seek between sectors, only rotation >>>> of the platter. >> ... >>> I think a raid10,far3 is a good choice for swap, then you will enjoy >>> RAID0-like reading speed. and good write speed (compared to raid6), >>> and a chance of live surviving if just one drive keeps functioning. >> >> As I mention above, none of the md/RAID10 layouts will yield any added >> performance benefit for swap partitions. And I state the reason why. >> If you think about this for a moment you should reach the same conclusion. > > I think it is you who are not fully aquainted with Linux MD. Linux > MD RAID10,far3 offers improved performance in single read, which is an > advantage for swap, when you are swapping in. Thinkk about it and try it out for yourself. > Especially if we are talking 3 drives (far3), but also when you are > talking more drives and only 2 copies. You don't get raid0 read performance in Linux > on a combination of raid1 and raid0. > I think you are getting a number of things wrong here. For general usage, especially on a two disk system, raid10,f2 is very often an excellent choice of setup - it gives you protection (two copies of everything) and fastreads (you get striped read performance, and always from the faster outer half of the disk). You pay a higher write latency compared to plain raid1, but with typical usage figures of 5 reads per write, that's fine. And normally you don't have to wait for writes to finish anyway. But swap is different in many ways. First, the read/write ratio for swap is much closer to 1 - it can even be lower than 1. (Things like startup code for programs can get pushed to swap and never read again, as can leaked memory from buggy programs.) Secondly, write latency is a big factor - data is pushed to swap to free up memory for other usage, and that has to wait until the write is complete. Thirdly, the kernel will handle striping of multiple swap partitions automatically. And it will do it in a way that is optimal for swap usage, rather than the chunk sizes used by a striped raid system. (More often, the kernel wants parallel access to different parts of swap, rather than single large reads or writes.) One thing that seems to be slightly confused here in this thread is the mixup between the number of mirror copies and the number of drives in raid10 setups. With md raid, you can have as many mirrors as you like over as many drives as you like, though you need at least as many partitions as mirrors (and it seldom makes sense to have more mirrors than drives). For example, if you have 3 disks, you can use "far3" layout to get three copies of your data - one copy on each disk. But you can also use "far2", and get two copies of your data. See <http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10> for some pictures. With plain raid1, if you use 3 drives you get three copies. It seems unlikely to me that you would need the "safe against two disk failure" protection of 3-way mirrors on swap, but it is possible. Back to swap. If you don't need protection for your swap (swap should not often be in use, and a dead disk will lead to crashes on swapped-out processes but should not cause more problems than that), put a small partition on each disk, and add them all to swap. The kernel will handle striping of the swap partitions. There is nothing you can do to make it faster. When you want protection, raid1 is your best choice. Make small partitions on each disk, then pair them up as a number of raid1 pairs, and add each of these as swap. Your system will survive any disk failure, or multiple failures as long as they are from different pairs. Again, there is nothing you can do to make it faster. The important factor here is to minimise write latency. You do that by keeping the layers as simple as possible - raid1 is simpler and faster than raid10 on two disks. With small partitions, head movement and the bandwidth differences between inner and outer tracks makes no difference, so "far" layout is no benefit. Theoretically, a set of raid10,f2 pairs rather than raid1 pairs would allow faster reading of large chunks of swap - assuming, of course, that the rest of the system supports such large I/O bandwidth. But such large streaming reads do not often happen with swap - more commonly, the kernel will jump around in its accesses. Large reads that use all spindles are good for the throughput for large streamed reads, but they also block all disks and increase the latency for random accesses which are the common case for swap. I'm a great fan of raid10,f2 - I think it is an optimal choice for many uses, and shows a power and flexibility of Linux's md system that is well above what you can get with hardware raid (or software raid on other OS's). But for swap, you want raid1. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-24 9:23 ` David Brown @ 2013-05-24 18:03 ` keld 0 siblings, 0 replies; 23+ messages in thread From: keld @ 2013-05-24 18:03 UTC (permalink / raw) To: David Brown; +Cc: Stan Hoeppner, Phil Turmel, Linux RAID On Fri, May 24, 2013 at 11:23:30AM +0200, David Brown wrote: > On 24/05/13 08:32, keld@keldix.com wrote: > > On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote: > >> On 5/23/2013 3:30 AM, keld@keldix.com wrote: > >>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote: > >> > >>>> You may be tempted to use md/RAID10 of some layout > >>>> to optimize for writes, but you'd gain nothing, and you'd lose some > >>>> performance due to overhead. The partitions you'll be using in this > >>>> case are so small that they easily fit in a single physical disk track, > >>>> thus no head movement is required to seek between sectors, only rotation > >>>> of the platter. > >> ... > >>> I think a raid10,far3 is a good choice for swap, then you will enjoy > >>> RAID0-like reading speed. and good write speed (compared to raid6), > >>> and a chance of live surviving if just one drive keeps functioning. > >> > >> As I mention above, none of the md/RAID10 layouts will yield any added > >> performance benefit for swap partitions. And I state the reason why. > >> If you think about this for a moment you should reach the same conclusion. > > > > I think it is you who are not fully aquainted with Linux MD. Linux > > MD RAID10,far3 offers improved performance in single read, which is an > > advantage for swap, when you are swapping in. Thinkk about it and try it out for yourself. > > Especially if we are talking 3 drives (far3), but also when you are > > talking more drives and only 2 copies. You don't get raid0 read performance in Linux > > on a combination of raid1 and raid0. > > > > I think you are getting a number of things wrong here. For general > usage, especially on a two disk system, raid10,f2 is very often an > excellent choice of setup - it gives you protection (two copies of > everything) and fastreads (you get striped read performance, and always > from the faster outer half of the disk). You pay a higher write latency > compared to plain raid1, but with typical usage figures of 5 reads per > write, that's fine. And normally you don't have to wait for writes to > finish anyway. > > But swap is different in many ways. > > First, the read/write ratio for swap is much closer to 1 - it can even > be lower than 1. (Things like startup code for programs can get pushed > to swap and never read again, as can leaked memory from buggy programs.) > > Secondly, write latency is a big factor - data is pushed to swap to free > up memory for other usage, and that has to wait until the write is complete. Agreed > Thirdly, the kernel will handle striping of multiple swap partitions > automatically. And it will do it in a way that is optimal for swap > usage, rather than the chunk sizes used by a striped raid system. (More > often, the kernel wants parallel access to different parts of swap, > rather than single large reads or writes.) Yes, the kernel will handle striping, but not mirrored, if you do not employ raid.. > > One thing that seems to be slightly confused here in this thread is the > mixup between the number of mirror copies and the number of drives in > raid10 setups. With md raid, you can have as many mirrors as you like > over as many drives as you like, though you need at least as many > partitions as mirrors (and it seldom makes sense to have more mirrors > than drives). For example, if you have 3 disks, you can use "far3" > layout to get three copies of your data - one copy on each disk. But > you can also use "far2", and get two copies of your data. See > <http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10> > for some pictures. > > With plain raid1, if you use 3 drives you get three copies. > > It seems unlikely to me that you would need the "safe against two disk > failure" protection of 3-way mirrors on swap, but it is possible. yes, it is possible, and why not do it, swap is mostly a very small part of the total disk space, so that seems to be very cheap, and then also giving identical disk layout in many situations. > > > Back to swap. > > If you don't need protection for your swap (swap should not often be in > use, and a dead disk will lead to crashes on swapped-out processes but > should not cause more problems than that), put a small partition on each > disk, and add them all to swap. The kernel will handle striping of the > swap partitions. There is nothing you can do to make it faster. I think it is serious that a process, or a number of processes fail because of failing disks. And it does not cost much disk space to prevent against this. It does cost double/triple write IO, but that is probably worth it too. I do think having a uniform disk space with raid0 reading property does speed up the reading. The kernel cannot evenly spread IO over the disks, as the chunks it needs to read may be different in size. raid10,far automatically does this even spread. And if you need mirrored raid, then no other mirrored raid types give you raid0 read speed. > When you want protection, raid1 is your best choice. Make small > partitions on each disk, then pair them up as a number of raid1 pairs, > and add each of these as swap. Your system will survive any disk > failure, or multiple failures as long as they are from different pairs. > Again, there is nothing you can do to make it faster. Raid1 is only half as fast as raid10,far for single reads.. > > The important factor here is to minimise write latency. You do that by > keeping the layers as simple as possible - raid1 is simpler and faster > than raid10 on two disks. With small partitions, head movement and the > bandwidth differences between inner and outer tracks makes no > difference, so "far" layout is no benefit. The IO scheduling thakes care of latency problems, grouping the right tracks together for the write tasks for the far layout. yes for far layout and small partitions like swap, the difference between the speed of inner and outer tracks are insignificant. > Theoretically, a set of raid10,f2 pairs rather than raid1 pairs would > allow faster reading of large chunks of swap - assuming, of course, that > the rest of the system supports such large I/O bandwidth. But such > large streaming reads do not often happen with swap - more commonly, the > kernel will jump around in its accesses. Large reads that use all > spindles are good for the throughput for large streamed reads, but they > also block all disks and increase the latency for random accesses which > are the common case for swap. I have examples of large swaps like firefox and flash > > I'm a great fan of raid10,f2 - I think it is an optimal choice for many > uses, and shows a power and flexibility of Linux's md system that is > well above what you can get with hardware raid (or software raid on > other OS's). But for swap, you want raid1. raid1 and raid10,f2 are about the same for sequential write, which is what is used for swap write io. Single read speed is far better for the far layout. So why choose the slower raid1? https://raid.wiki.kernel.org/index.php/Performance best regards keld ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-22 23:26 ` Phil Turmel 2013-05-23 5:59 ` Stan Hoeppner @ 2013-05-23 8:22 ` David Brown 1 sibling, 0 replies; 23+ messages in thread From: David Brown @ 2013-05-23 8:22 UTC (permalink / raw) To: Phil Turmel; +Cc: stan, Linux RAID On 23/05/13 01:26, Phil Turmel wrote: > On 05/22/2013 06:43 PM, Stan Hoeppner wrote: >> Sorry for the dup Phil, hit the wrong reply button. > > No worries. > >> On 5/21/2013 7:02 PM, Phil Turmel wrote: >> ... >>> ...First is /dev/md1, a small (~500m) n-way >>> ...as /boot. The other, /dev/md2, uses >>> ...raid10,far3 or raid6. >>> >>> I put LVM on top of /dev/md2, with LVs for swap, ... /tmp >> >> Swap and tmp atop an LV atop RAID6? The former will always RMW on page >> writes, the latter quite often will cause RMW. As you stated your >> performance requirements are modest. However, for the archives, putting >> swap on a parity array, let alone a double parity array, is not good >> practice. > > Ah, good point. Hasn't hurt me yet, but it would if I pushed anything > hard. I'll have to revise my baseline to always have a small raid10,f3 > to go with the raid6. > Always use raid1 (or raid10) for your swap - that is, assuming you want it on raid at all. Raid is all about uptime - look at what is likely to go wrong, what the consequences of that problem are, and the cost of protecting against it. If your swap is seldom used (as is normally the case), and you can live with the consequences of swap failing (i.e., any process using swap will die - but everything else, including data on raid, will be fine), then don't put swap on raid. If it is cheaper to buy more ram than extra spindles for swap on raid, then buy more ram and avoid swap. But if you still feel you need swap on raid, then using raid1 styles. Stan has given you all the details. Also consider putting /tmp on tmpfs. Then you don't need to worry about raid here, and if it overflows to disk then the extra data goes out into swap. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-21 12:51 "Missing" RAID devices Jim Santos 2013-05-21 15:31 ` Phil Turmel @ 2013-05-21 16:23 ` Doug Ledford 2013-05-21 17:03 ` Drew 1 sibling, 1 reply; 23+ messages in thread From: Doug Ledford @ 2013-05-21 16:23 UTC (permalink / raw) To: Jim Santos; +Cc: linux-raid On 05/21/2013 08:51 AM, Jim Santos wrote: > santos@bender:/etc/mdadm$ sudo mdadm --examine /dev/sda7 > > /dev/sda7: > Magic : a92b4efc > Version : 0.90.00 ^^^^^^^ There's your problem. Seriously. > Preferred Minor : 126 So, how did you ever get version 0.90 superblocks that count from 127 backwards? Mdadm doesn't do that. In fact, mdadm relies on you *not* doing that. Here's the deal. When you use version 0.90 superblocks, the number is taken from the superminor field, usually starting at 0 and counting up, and the device file is then /dev/md<number>. With version 1.x superblocks, we care about the name of the device, not the number, and the name is taken from the name field of the superblock, and we create the device as /dev/md/<name>. However, when this support was added, the kernel didn't support named elements (aka, you couldn't have a md/root device in the kernel namespace, it needed to be md<number>), so the /dev/md/<name> file is actually a symlink to a /dev/md<number> file, and we would allocate from 127 and count backwards so that they would be as unlikely as possible to conflict with numbered names from version 0.90 superblocks. You are running into that impossible conflict. I would remake all of your version 0.90 raid arrays as version 1.0 raid arrays (the superblock should sit in the exact same space and the arrays should be the same size, but I can't say that for certain because newer mdadm might reserve more space between the end of the filesystem and the superblock than older mdadm did, so a test would be necessary first), and in the process I would give them all names, and then I would totally eliminate all references to /dev/md<number> in your system setup and stick with just /dev/md/<name> for everything, and then I would remake your initrd and be done with it all. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-21 16:23 ` Doug Ledford @ 2013-05-21 17:03 ` Drew [not found] ` <519BDC8C.1040202@hardwarefreak.com> 0 siblings, 1 reply; 23+ messages in thread From: Drew @ 2013-05-21 17:03 UTC (permalink / raw) To: Jim Santos; +Cc: linux-raid Hi Jim, The other question I'd ask is why do you have 10 raid1 arrays on those two disks? Given you have an initramfs, at most you should have separate partitions (raid'd) for /boot & root. Everything else should be broken down using LVM. Way more flexible to move things around in future as required. -- Drew "Nothing in life is to be feared. It is only to be understood." --Marie Curie "This started out as a hobby and spun horribly out of control." -Unknown ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <519BDC8C.1040202@hardwarefreak.com>]
* Re: "Missing" RAID devices [not found] ` <519BDC8C.1040202@hardwarefreak.com> @ 2013-05-21 21:02 ` Drew 2013-05-21 22:06 ` Stan Hoeppner 0 siblings, 1 reply; 23+ messages in thread From: Drew @ 2013-05-21 21:02 UTC (permalink / raw) To: Stan Hoeppner; +Cc: Jim Santos, Linux RAID Mailing List On Tue, May 21, 2013 at 1:43 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote: > On 5/21/2013 12:03 PM, Drew wrote: >> Hi Jim, >> >> The other question I'd ask is why do you have 10 raid1 arrays on those >> two disks? > > No joke. That setup is ridiculous. RAID exists to guard against a > drive failure, not as a substitute for volume management. > >> Given you have an initramfs, at most you should have separate >> partitions (raid'd) for /boot & root. Everything else should be broken >> down using LVM. Way more flexible to move things around in future as >> required. > > LVM isn't even required. Using partitions (atop MD) or a single large > filesystem (XFS) with quotas works just as well. Agreed. For simple setups, a single boot & root is just fine. I'd assumed the OP's reasons for using multiple partitions was valid, so keeping those partitions over top a single raid array meant LVM was the best choice. -- Drew "Nothing in life is to be feared. It is only to be understood." --Marie Curie "This started out as a hobby and spun horribly out of control." -Unknown ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: "Missing" RAID devices 2013-05-21 21:02 ` Drew @ 2013-05-21 22:06 ` Stan Hoeppner 0 siblings, 0 replies; 23+ messages in thread From: Stan Hoeppner @ 2013-05-21 22:06 UTC (permalink / raw) To: Drew; +Cc: Jim Santos, Linux RAID Mailing List On 5/21/2013 4:02 PM, Drew wrote: > On Tue, May 21, 2013 at 1:43 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote: >> On 5/21/2013 12:03 PM, Drew wrote: >>> Hi Jim, >>> >>> The other question I'd ask is why do you have 10 raid1 arrays on those >>> two disks? >> >> No joke. That setup is ridiculous. RAID exists to guard against a >> drive failure, not as a substitute for volume management. >> >>> Given you have an initramfs, at most you should have separate >>> partitions (raid'd) for /boot & root. Everything else should be broken >>> down using LVM. Way more flexible to move things around in future as >>> required. >> >> LVM isn't even required. Using partitions (atop MD) or a single large >> filesystem (XFS) with quotas works just as well. > > Agreed. For simple setups, a single boot & root is just fine. > > I'd assumed the OP's reasons for using multiple partitions was valid, > so keeping those partitions over top a single raid array meant LVM was > the best choice. We don't have enough information yet to make such a determination. Multiple LVM devices may most closely mimic his current setup, but that doesn't mean it's the best choice. It doesn't mean it's not either. We simply haven't been informed why he was using 10 md/RAID1 devices. My gut instinct says it's simply a lack of education, not a special requirement. The principal reason for such a setup is to prevent runaway processes from filling the storage. Thus /var which normally contains the logs and mail spool is often put on a separate partition. This problem can also be addressed using filesystem quotas. There is more than one way to skin a cat, as they say. If he's using these 10 partitions simply for organization purposes, then there's no need for 10 LVM devices nor FS quotas on a single FS, but simply a good directory hierarchy. -- Stan ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2013-05-25 1:42 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-21 12:51 "Missing" RAID devices Jim Santos
2013-05-21 15:31 ` Phil Turmel
2013-05-21 22:22 ` Jim Santos
2013-05-22 0:02 ` Phil Turmel
2013-05-22 0:16 ` Jim Santos
2013-05-22 22:43 ` Stan Hoeppner
2013-05-22 23:26 ` Phil Turmel
2013-05-23 5:59 ` Stan Hoeppner
2013-05-23 8:30 ` keld
2013-05-24 3:45 ` Stan Hoeppner
2013-05-24 6:32 ` keld
2013-05-24 7:37 ` Stan Hoeppner
2013-05-24 17:15 ` keld
2013-05-24 19:05 ` Stan Hoeppner
2013-05-24 19:22 ` keld
2013-05-25 1:42 ` Stan Hoeppner
2013-05-24 9:23 ` David Brown
2013-05-24 18:03 ` keld
2013-05-23 8:22 ` David Brown
2013-05-21 16:23 ` Doug Ledford
2013-05-21 17:03 ` Drew
[not found] ` <519BDC8C.1040202@hardwarefreak.com>
2013-05-21 21:02 ` Drew
2013-05-21 22:06 ` Stan Hoeppner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).