* Large single raid and XFS or two small ones and EXT3? @ 2006-06-22 19:11 Chris Allen 2006-06-22 19:16 ` Gordon Henderson 2006-06-23 8:59 ` PFC 0 siblings, 2 replies; 47+ messages in thread From: Chris Allen @ 2006-06-22 19:11 UTC (permalink / raw) To: linux-raid Dear All, I have a Linux storage server containing 16x750GB drives - so 12TB raw space. If I make them into a single RAID5 array, then it appears my only choice for a filesystem is XFS - as EXT3 won't really handle partitions over 8TB. Alternatively, I could split each drive into 2 partitions and have 2 RAID5 arrays, then put an EXT3 on each one. Can anybody advise the pros and cons of these two approaches with regard to stability, reliability and performance? The store is to be used for files which will have an even split of: 33% approx 2MB in size 33% approx 50KB in size 33% approx 2KB in size Also: - I am running a 2.6.15-1 stock FC5 kernel. Would there be any RAID benefits in me upgrading to the latest 2.6.16 kernel? (don't want to do this unless there is very good reason to) - I am running mdadm 2.3.1. Would there be any benefits for me in upgrading to mdadm v2.5? - I have read good things about bitmaps. Are these production ready? Any advice/caveats? Many thanks for reading, Chris Allen. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-22 19:11 Large single raid and XFS or two small ones and EXT3? Chris Allen @ 2006-06-22 19:16 ` Gordon Henderson 2006-06-22 19:23 ` H. Peter Anvin 2006-06-22 20:00 ` Chris Allen 2006-06-23 8:59 ` PFC 1 sibling, 2 replies; 47+ messages in thread From: Gordon Henderson @ 2006-06-22 19:16 UTC (permalink / raw) To: Chris Allen; +Cc: linux-raid On Thu, 22 Jun 2006, Chris Allen wrote: > Dear All, > > I have a Linux storage server containing 16x750GB drives - so 12TB raw > space. Just one thing - Do you want to use RAID-5 or RAID-6 ? I just ask, as with that many drives (and that much data!) the possibilities of a 2nd drive failure is increasing, and personally, wherever I can, I take the hit these days, and have used RAID-6 for some time... drives are cheap, even the 750GB behemoths! > If I make them into a single RAID5 array, then it appears my only > choice for a filesystem is XFS - as EXT3 won't really handle partitions > over 8TB. I can't help with this though - I didn't realise ext3 had such a limitation though! Gordon ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-22 19:16 ` Gordon Henderson @ 2006-06-22 19:23 ` H. Peter Anvin 2006-06-22 19:58 ` Chris Allen 2006-06-22 20:00 ` Chris Allen 1 sibling, 1 reply; 47+ messages in thread From: H. Peter Anvin @ 2006-06-22 19:23 UTC (permalink / raw) To: Gordon Henderson; +Cc: Chris Allen, linux-raid Gordon Henderson wrote: > On Thu, 22 Jun 2006, Chris Allen wrote: > >> Dear All, >> >> I have a Linux storage server containing 16x750GB drives - so 12TB raw >> space. > > Just one thing - Do you want to use RAID-5 or RAID-6 ? > > I just ask, as with that many drives (and that much data!) the > possibilities of a 2nd drive failure is increasing, and personally, > wherever I can, I take the hit these days, and have used RAID-6 for > some time... drives are cheap, even the 750GB behemoths! > >> If I make them into a single RAID5 array, then it appears my only >> choice for a filesystem is XFS - as EXT3 won't really handle partitions >> over 8TB. > > I can't help with this though - I didn't realise ext3 had such a > limitation though! > 16 TB (2^32 blocks) should be the right number. -hpa ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-22 19:23 ` H. Peter Anvin @ 2006-06-22 19:58 ` Chris Allen 0 siblings, 0 replies; 47+ messages in thread From: Chris Allen @ 2006-06-22 19:58 UTC (permalink / raw) To: linux-raid H. Peter Anvin wrote: > Gordon Henderson wrote: >> On Thu, 22 Jun 2006, Chris Allen wrote: >> >>> Dear All, >>> >>> I have a Linux storage server containing 16x750GB drives - so 12TB raw >>> space. >> >> Just one thing - Do you want to use RAID-5 or RAID-6 ? >> >> I just ask, as with that many drives (and that much data!) the >> possibilities of a 2nd drive failure is increasing, and personally, >> wherever I can, I take the hit these days, and have used RAID-6 for >> some time... drives are cheap, even the 750GB behemoths! >> >>> If I make them into a single RAID5 array, then it appears my only >>> choice for a filesystem is XFS - as EXT3 won't really handle >>> partitions >>> over 8TB. >> >> I can't help with this though - I didn't realise ext3 had such a >> limitation though! >> > > 16 TB (2^32 blocks) should be the right number. > It should be, but mkfs.ext3 won't let me create a filesystem bigger than 8TB. It appears that the only way round this is through kernel patches, and, as this is a production machine, I'd rather stick to mainstream releases and go for one of the above solutions. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-22 19:16 ` Gordon Henderson 2006-06-22 19:23 ` H. Peter Anvin @ 2006-06-22 20:00 ` Chris Allen 1 sibling, 0 replies; 47+ messages in thread From: Chris Allen @ 2006-06-22 20:00 UTC (permalink / raw) To: Gordon Henderson; +Cc: linux-raid Gordon Henderson wrote: > On Thu, 22 Jun 2006, Chris Allen wrote: > > >> Dear All, >> >> I have a Linux storage server containing 16x750GB drives - so 12TB raw >> space. >> > > Just one thing - Do you want to use RAID-5 or RAID-6 ? > > I just ask, as with that many drives (and that much data!) the > possibilities of a 2nd drive failure is increasing, and personally, > wherever I can, I take the hit these days, and have used RAID-6 for > some time... drives are cheap, even the 750GB behemoths! > > Each of these boxes has an equivalent mirror box - so I'm happy with using raid5 for the time being. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-22 19:11 Large single raid and XFS or two small ones and EXT3? Chris Allen 2006-06-22 19:16 ` Gordon Henderson @ 2006-06-23 8:59 ` PFC 2006-06-23 9:26 ` Francois Barre 2006-06-23 19:48 ` Large single raid and XFS or two small ones and EXT3? Nix 1 sibling, 2 replies; 47+ messages in thread From: PFC @ 2006-06-23 8:59 UTC (permalink / raw) To: Chris Allen, linux-raid - XFS is faster and fragments less, but make sure you have a good UPS - ReiserFS 3.6 is mature and fast, too, you might consider it - ext3 is slow if you have many files in one directory, but has more mature tools (resize, recovery etc) I'd go with XFS or Reiser. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 8:59 ` PFC @ 2006-06-23 9:26 ` Francois Barre 2006-06-23 12:50 ` Chris Allen 2006-06-23 19:48 ` Large single raid and XFS or two small ones and EXT3? Nix 1 sibling, 1 reply; 47+ messages in thread From: Francois Barre @ 2006-06-23 9:26 UTC (permalink / raw) To: linux-raid; +Cc: Chris Allen, PFC 2006/6/23, PFC <lists@peufeu.com>: > > - XFS is faster and fragments less, but make sure you have a good UPS Why a good UPS ? XFS has a good strong journal, I never had an issue with it yet... And believe me, I did have some dirty things happening here... > - ReiserFS 3.6 is mature and fast, too, you might consider it > - ext3 is slow if you have many files in one directory, but has more > mature tools (resize, recovery etc) XFS tools are kind of mature also. Online grow, dump, ... > > I'd go with XFS or Reiser. I'd go with XFS. But I may be kind of fanatic... ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 9:26 ` Francois Barre @ 2006-06-23 12:50 ` Chris Allen 2006-06-23 13:14 ` Gordon Henderson ` (3 more replies) 0 siblings, 4 replies; 47+ messages in thread From: Chris Allen @ 2006-06-23 12:50 UTC (permalink / raw) To: Francois Barre; +Cc: linux-raid Francois Barre wrote: > 2006/6/23, PFC <lists@peufeu.com>: >> >> - XFS is faster and fragments less, but make sure you have a >> good UPS > Why a good UPS ? XFS has a good strong journal, I never had an issue > with it yet... And believe me, I did have some dirty things happening > here... > >> - ReiserFS 3.6 is mature and fast, too, you might consider it >> - ext3 is slow if you have many files in one directory, but >> has more >> mature tools (resize, recovery etc) > XFS tools are kind of mature also. Online grow, dump, ... > >> >> I'd go with XFS or Reiser. > I'd go with XFS. But I may be kind of fanatic... Strange that whatever the filesystem you get equal numbers of people saying that they have never lost a single byte to those who have had horrible corruption and would never touch it again. We stopped using XFS about a year ago because we were getting kernel stack space panics under heavy load over NFS. It looks like the time has come to give it another try. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 12:50 ` Chris Allen @ 2006-06-23 13:14 ` Gordon Henderson 2006-06-23 13:30 ` Francois Barre ` (2 subsequent siblings) 3 siblings, 0 replies; 47+ messages in thread From: Gordon Henderson @ 2006-06-23 13:14 UTC (permalink / raw) To: Chris Allen; +Cc: linux-raid On Fri, 23 Jun 2006, Chris Allen wrote: > Strange that whatever the filesystem you get equal numbers of people > saying that > they have never lost a single byte to those who have had horrible > corruption and > would never touch it again. We stopped using XFS about a year ago because we > were getting kernel stack space panics under heavy load over NFS. It > looks like > the time has come to give it another try. I had a bad experience with XFS a year or so ago, and after getting told to RTFM from the XFS users list, after I'd already RTFMd, I gave up on it. (and them) However, I've just decided to give it a go again (for the single reason that it's faster at deleting large swathes of files than ext3, which this server might have to do from time to time), and so-far so good. Looking back, what I think I really was having problems with at the time was 2 issues; one was that I was using LVM too, and it really wasn't production ready, and the other was that the default kernel stack size was 4KB at the time - which was what was causing me problems under heavy NFS load... I'm trying it now on a 3.5TB RAID-6 server now with a relatively light NFS (and Samba) load, but will be rolling it out on an identical server soon which is expected to have a relatively high load, so heres hoping... Gordon ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 12:50 ` Chris Allen 2006-06-23 13:14 ` Gordon Henderson @ 2006-06-23 13:30 ` Francois Barre 2006-06-23 14:46 ` Martin Schröder 2006-06-23 14:01 ` Al Boldi 2006-06-27 12:05 ` Large single raid... - XFS over NFS woes Dexter Filmore 3 siblings, 1 reply; 47+ messages in thread From: Francois Barre @ 2006-06-23 13:30 UTC (permalink / raw) To: linux-raid; +Cc: Chris Allen > Strange that whatever the filesystem you get equal numbers of people > saying that > they have never lost a single byte to those who have had horrible > corruption and > would never touch it again. [...] Loosing data is worse than loosing anything else. You can buy you another hard drive, you can buy you another CPU, but you won't buy you all the data you lost... And as far as I know, true life does not implement the "Undo" button, So, as a matter of facts, I started to think that choosing a FS is much more a matter of personnal belief than any kind of scientific, statistical, even empirical benchmarking. Something like a new kind of religion... For example, back in the reiser3.6's first steps in life, I experienced a handfull of oopses, and fuzzy things that made my box think it was running a Redmond stuff... So I neglected Reiser. Then Reiser4 concepts came to my ear, several years after, and I thought that, well, you know, Hans Reiser has great ideas and promising theories, let's have a closer look at it. So I came back testing reiser3.6. Which just worked flawlessly. And you know what ? I never had time to play with Reiser4 yet. So I finally chose XFS for all my more-than-2G partitions, with regard to thread contents I started back to january : "Linux MD raid5 and reiser4... Any experience ?". Anyway, I'm shared between two points of view regarding the fs experience in Linux - maybe FS can not be generic, and cannot cover all usage scenarii. Some are good for doing some stuff, some are better for some others... And you'll have to chose with regard to your own usage forecasts. - or maybe there's too much choice inthere : whenever a big problem arises, it's easier to switch filesystems than to go bug hunting... At least that's the way I reacted a couple of times. And because data loss is such a sensible topic, when trust is broken, you just want to change all stuff around, and start hating what you were found of a minute ago... ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 13:30 ` Francois Barre @ 2006-06-23 14:46 ` Martin Schröder 2006-06-23 14:59 ` Francois Barre ` (2 more replies) 0 siblings, 3 replies; 47+ messages in thread From: Martin Schröder @ 2006-06-23 14:46 UTC (permalink / raw) To: linux-raid 2006/6/23, Francois Barre <francois.barre@gmail.com>: > Loosing data is worse than loosing anything else. You can buy you That's why RAID is no excuse for backups. Best Martin ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 14:46 ` Martin Schröder @ 2006-06-23 14:59 ` Francois Barre 2006-06-23 15:13 ` Bill Davidsen 2006-06-23 15:17 ` Chris Allen 2 siblings, 0 replies; 47+ messages in thread From: Francois Barre @ 2006-06-23 14:59 UTC (permalink / raw) To: linux-raid; +Cc: Martin Schröder > That's why RAID is no excuse for backups. > Of course yes, but... (I'm working in car industry) Raid is your active (if not pro-active) security system, like a car ESP ; if something goes wrong, it gracefully and automagically re-align to the *safe way*. Whereas backup is your airbag. It's always too late when you use it. And I've never seen anyone trying to recover something from a backup without praying... So, one day or another, I'll develop the strongest backup technology ever, using marble-based disks and a redundant cluster of egyptian scribes. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 14:46 ` Martin Schröder 2006-06-23 14:59 ` Francois Barre @ 2006-06-23 15:13 ` Bill Davidsen 2006-06-23 15:34 ` Francois Barre 2006-06-23 15:17 ` Chris Allen 2 siblings, 1 reply; 47+ messages in thread From: Bill Davidsen @ 2006-06-23 15:13 UTC (permalink / raw) To: Martin Schröder; +Cc: linux-raid Martin Schröder wrote: > 2006/6/23, Francois Barre <francois.barre@gmail.com>: > >> Loosing data is worse than loosing anything else. You can buy you > > > That's why RAID is no excuse for backups. The problem is that there is no cost effective backup available. When a tape was the same size as a disk and 10% the cost, backups were practical. Today anything larger than hobby size disk is just not easy to back up. Anything large enough to be useful is expensive, small media or something you can't take off-site and lock in a vault aren't backups so much as copies, which may protect against some problems, but which provide little to no protection against site disasters. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 15:13 ` Bill Davidsen @ 2006-06-23 15:34 ` Francois Barre 2006-06-23 19:49 ` Nix 2006-06-24 5:19 ` Neil Brown 0 siblings, 2 replies; 47+ messages in thread From: Francois Barre @ 2006-06-23 15:34 UTC (permalink / raw) To: linux-raid > The problem is that there is no cost effective backup available. One-liner questions : - How does Google make backups ? - Aren't tapes dead yet ? - What about a NUMA principle applied to storage ? ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 15:34 ` Francois Barre @ 2006-06-23 19:49 ` Nix 2006-06-24 5:19 ` Neil Brown 1 sibling, 0 replies; 47+ messages in thread From: Nix @ 2006-06-23 19:49 UTC (permalink / raw) To: Francois Barre; +Cc: linux-raid On 23 Jun 2006, Francois Barre uttered the following: >> The problem is that there is no cost effective backup available. > > One-liner questions : > - How does Google make backups ? Replication across huge numbers of cheap machines on a massively distributed filesystem. -- `NB: Anyone suggesting that we should say "Tibibytes" instead of Terabytes there will be hunted down and brutally slain. That is all.' --- Matthew Wilcox ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 15:34 ` Francois Barre 2006-06-23 19:49 ` Nix @ 2006-06-24 5:19 ` Neil Brown 2006-06-24 7:59 ` Adam Talbot 2006-06-24 12:40 ` Justin Piszcz 1 sibling, 2 replies; 47+ messages in thread From: Neil Brown @ 2006-06-24 5:19 UTC (permalink / raw) To: Francois Barre; +Cc: linux-raid On Friday June 23, francois.barre@gmail.com wrote: > > The problem is that there is no cost effective backup available. > > One-liner questions : > - How does Google make backups ? No, Google ARE the backups :-) > - Aren't tapes dead yet ? LTO-3 does 300Gig, and LTO-4 is planned. They may not cope with tera-byte arrays in one hit, but they still have real value. > - What about a NUMA principle applied to storage ? You mean an Hierarchical Storage Manager? Yep, they exist. I'm sure SGI, EMC and assorted other TLAs could sell you one. NeilBrown ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-24 5:19 ` Neil Brown @ 2006-06-24 7:59 ` Adam Talbot 2006-06-24 9:34 ` David Greaves 2006-06-25 23:57 ` Bill Davidsen 2006-06-24 12:40 ` Justin Piszcz 1 sibling, 2 replies; 47+ messages in thread From: Adam Talbot @ 2006-06-24 7:59 UTC (permalink / raw) To: Neil Brown; +Cc: Francois Barre, linux-raid OK, this topic I relay need to get in on. I have spent the last few week bench marking my new 1.2TB, 6 disk, RAID6 array. I wanted real numbers, not "This FS is faster because..." I have moved over 100TB of data on my new array running the bench mark testing. I have yet to have any major problems with ReiserFS, EXT2/3, JFS, or XFS. I have done extensive testing on all, including just trying to break the file system with billions of 1k files, or a 1TB file. Was able to cause some problems with EXT3 and RiserFS with the 1KB and 1TB tests, respectively. but both were fixed with a fsck. My basic test is to move all data from my old server to my new server (whitequeen2) and clock the transfer time. Whitequeen2 has very little storage. The NAS's 1.2TB of storage is attached via iSCSI and a cross over cable to the back of whitequeen2. The data is 100GB of user's files(1KB~2MB), 50GB of MP3's (1MB~5MB) and the rest is movies and system backups 600MB~2GB. Here is a copy of my current data sheet, including specs on the servers and copy times, my numbers are not perfect, but they should give you a clue about speeds... XFS wins. The computer: whitequeen2 AMD Athlon64 3200 (2.0GHz) 1GB Corsair DDR 400 (2X 512MB's running in dual DDR mode) Foxconn 6150K8MA-8EKRS motherboard Off brand case/power supply 2X os disks, software raid array, RAID 1, Maxtor 51369U3, FW DA620CQ0 Intel pro/1000 NIC CentOS 4.3 X86_64 2.6.9 Main app server, Apache, Samba, NFS, NIS The computer: nas AMD Athlon64 3000 (1.8GHz) 256MB Corsair DDR 400 (2X 128MB's running in dual DDR mode) Foxconn 6150K8MA-8EKRS motherboard Off brand case/power supply and drive cages 2X os disks, software raid array, RAID 1, Maxtor 51369U3, FW DA620CQ0 6X software raid array, RAID 6, Maxtor 7V300F0, FW VA111900 Gentoo linux. X86_64 2.6.16-gentoo-r9 System built very lite, only built as an iSCSI based NAS. NFS mount from whitequeen (old server) goes to /mnt/tmp Target iSCSI to NAS, or when running on local NAS, is /data Raw dump /dev/null (Speed mark, how fast is the old whitequeen, Read test) Config=APP+NFS-->/dev/null [root@whitequeen2 tmp]# time tar cf - . | cat - > /dev/null real 216m30.621s user 1m24.222s sys 15m20.031s 3.6 hours @ 105371M/hour or 1756M/min or *29.27M/sec* XFS Config=APP+NFS-->NAS+iSCSI RAID6 64K chunk [root@whitequeen2 tmp]# time tar cf - . | (cd /data ; tar xf - ) real 323m9.990s user 1m28.556s sys 31m6.405s /dev/sdb1 1.1T 371G 748G 34% /data 5.399 hours @ 70,260M/hour or 1171M/min or 19.52M/sec Pass 2 of XFS (are my number repeatable? Yes) real 320m11.615s user 1m26.997s sys 31m11.987s XFS (Direct NFS connection, no app server, max "real world" speed of my array?) Config=NAS+NFS RAID6 64K chunk nas tmp # time tar cf - . | (cd /data ; tar xf - ) real 241m8.698s user 1m2.760s sys 25m9.770s /dev/md/0 1.1T 371G 748G 34% /data 4.417 hours @ 85,880M/hour or 1.431M/min or *23.86M/sec* EXT3 Config=APP+NFS-->NAS+iSCSI RAID6 64K chunk [root@whitequeen2 tmp]# time tar cf - . | (cd /data ; tar xf - ) real 371m29.802s user 1m28.492s sys 46m48.947s /dev/sdb1 1.1T 371G 674G 36% /data 6.192 hours @ 61,262M/hour or 1021M/min or 17.02M/sec EXT2 Config=APP+NFS-->NAS+iSCSI RAID6 64K chunk [root@whitequeen2 tmp]# time tar cf - . | ( cd /data/ ; tar xf - ) real 401m48.702s user 1m25.599s sys 30m22.620s /dev/sdb1 1.1T 371G 674G 36% /data 6.692 hours @ 56,684M/hour or 945M/min or 15.75M/sec JFS Config=APP+NFS-->NAS+iSCSI RAID6 64K chunk [root@whitequeen2 tmp]# time tar cf - . | (cd /data ; tar xf - ) real 337m52.125s user 1m26.526s sys 32m33.983s /dev/sdb1 1.1T 371G 748G 34% /data 5.625 hours @ 67,438M/hour or 1124M/min or 18.73M/sec ReiserFS Config=APP+NFS-->NAS+iSCSI RAID6 64K chunk [root@whitequeen2 tmp]# time tar cf - . | (cd /data ; tar xf - ) real 334m33.615s user 1m31.098s sys 48m41.193s /dev/sdb1 1.1T 371G 748G 34% /data 5.572 hours @ 68,078M/hour or 1135M/min or 18.91M/sec Word count [root@whitequeen2 tmp]# ls | wc 66612 301527 5237755 Actule size = 379,336M ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-24 7:59 ` Adam Talbot @ 2006-06-24 9:34 ` David Greaves 2006-06-24 22:52 ` Adam Talbot 2006-06-25 23:57 ` Bill Davidsen 1 sibling, 1 reply; 47+ messages in thread From: David Greaves @ 2006-06-24 9:34 UTC (permalink / raw) To: Adam Talbot; +Cc: Neil Brown, Francois Barre, linux-raid Adam Talbot wrote: > OK, this topic I relay need to get in on. > I have spent the last few week bench marking my new 1.2TB, 6 disk, RAID6 > array. Very interesting. Thanks. Did you get around to any 'tuning'. Things like raid chunk size, external logs for xfs, blockdev readahead on the underlying devices and the raid device? David ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-24 9:34 ` David Greaves @ 2006-06-24 22:52 ` Adam Talbot 2006-06-25 13:06 ` Joshua Baker-LePain 2006-06-25 14:51 ` Large single raid and XFS or two small ones and EXT3? Adam Talbot 0 siblings, 2 replies; 47+ messages in thread From: Adam Talbot @ 2006-06-24 22:52 UTC (permalink / raw) To: David Greaves; +Cc: Neil Brown, Francois Barre, linux-raid Trying to test for tuning with different chunk's. Just finished 16K chunk and am about 20% done with the 32K test. Here are the numbers on 16K chunk, will send 32, 96,128,192 and 256 as I get them. But keep in mind each one of these tests take about 4~6 hours, so it is a slow process... I have settled for XFS as the file system type, it seems to be able to beat any thing else out there. -Adam XFS Config=NAS+NFS RAID6 16K chunk nas tmp # time tar cf - . | (cd /data ; tar xf - ) real 252m40.143s user 1m4.720s sys 25m6.270s /dev/md/0 1.1T 371G 748G 34% /data 4.207 hours @ 90,167M/hour or 1502M/min or 25.05M/sec David Greaves wrote: > Adam Talbot wrote: > >> OK, this topic I relay need to get in on. >> I have spent the last few week bench marking my new 1.2TB, 6 disk, RAID6 >> array. >> > Very interesting. Thanks. > > Did you get around to any 'tuning'. > Things like raid chunk size, external logs for xfs, blockdev readahead > on the underlying devices and the raid device? > > David > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-24 22:52 ` Adam Talbot @ 2006-06-25 13:06 ` Joshua Baker-LePain 2006-06-28 3:45 ` I need a PCI V2.1 4 port SATA card Guy 2006-06-25 14:51 ` Large single raid and XFS or two small ones and EXT3? Adam Talbot 1 sibling, 1 reply; 47+ messages in thread From: Joshua Baker-LePain @ 2006-06-25 13:06 UTC (permalink / raw) To: Adam Talbot; +Cc: linux-raid On Sat, 24 Jun 2006 at 3:52pm, Adam Talbot wrote > nas tmp # time tar cf - . | (cd /data ; tar xf - ) A (bit) cleaner way to accomplish the same thing: tar cf - --totals . | tar xC /data -f - -- Joshua Baker-LePain Department of Biomedical Engineering Duke University ^ permalink raw reply [flat|nested] 47+ messages in thread
* I need a PCI V2.1 4 port SATA card 2006-06-25 13:06 ` Joshua Baker-LePain @ 2006-06-28 3:45 ` Guy 2006-06-28 4:29 ` Brad Campbell 0 siblings, 1 reply; 47+ messages in thread From: Guy @ 2006-06-28 3:45 UTC (permalink / raw) To: linux-raid Hello group, I am upgrading my disks from old 18 Gig SCSI disks to 300 Gig SATA disks. I need a good SATA controller. My system is old and has PCI V 2.1. I need a 4 port card, or 2 2 port cards. My system has multi PCI buses, so 2 cards may give me better performance, but I don't need it. I will be using software RAID. Can anyone recommend a card that is supported by the current kernel? I know this is the wrong group, sorry. But I know this is a very good place to ask! I did search the archives but don't seem to have the correct keywords to find what I want. Btw, I plan to buy 3 or 4 Seagate ST3320620AS disks. Barracuda 7200.10 SATA 320G. Thanks, Guy ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: I need a PCI V2.1 4 port SATA card 2006-06-28 3:45 ` I need a PCI V2.1 4 port SATA card Guy @ 2006-06-28 4:29 ` Brad Campbell 2006-06-28 10:20 ` Justin Piszcz ` (2 more replies) 0 siblings, 3 replies; 47+ messages in thread From: Brad Campbell @ 2006-06-28 4:29 UTC (permalink / raw) To: Guy; +Cc: linux-raid Guy wrote: > Hello group, > > I am upgrading my disks from old 18 Gig SCSI disks to 300 Gig SATA > disks. I need a good SATA controller. My system is old and has PCI V 2.1. > I need a 4 port card, or 2 2 port cards. My system has multi PCI buses, so > 2 cards may give me better performance, but I don't need it. I will be > using software RAID. Can anyone recommend a card that is supported by the > current kernel? I'm using Promise SATA150TX4 cards here in old PCI based systems. They work great and have been rock solid for well in excess of a year 24/7 hard use. I have 3 in one box and 4 in another. I'm actually looking at building another 15 disk server now and was hoping to move to something quicker using _almost_ commodity hardware. My current 15 drive RAID-6 server is built around a KT600 board with an AMD Sempron processor and 4 SATA150TX4 cards. It does the job but it's not the fastest thing around (takes about 10 hours to do a check of the array or about 15 to do a rebuild). I'd love to do something similar with PCI-E or PCI-X and make it go faster (the PCI bus bandwidth is the killer), however I've not seen many affordable PCI-E multi-port cards that are supported yet and PCI-X seems to mean moving to "server" class mainboards and the other expenses that come along with that. Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: I need a PCI V2.1 4 port SATA card 2006-06-28 4:29 ` Brad Campbell @ 2006-06-28 10:20 ` Justin Piszcz 2006-06-28 11:55 ` Christian Pernegger 2006-06-28 12:12 ` Petr Vyskocil 2 siblings, 0 replies; 47+ messages in thread From: Justin Piszcz @ 2006-06-28 10:20 UTC (permalink / raw) To: Brad Campbell; +Cc: Guy, linux-raid On Wed, 28 Jun 2006, Brad Campbell wrote: > Guy wrote: >> Hello group, >> >> I am upgrading my disks from old 18 Gig SCSI disks to 300 Gig SATA >> disks. I need a good SATA controller. My system is old and has PCI V 2.1. >> I need a 4 port card, or 2 2 port cards. My system has multi PCI buses, so >> 2 cards may give me better performance, but I don't need it. I will be >> using software RAID. Can anyone recommend a card that is supported by the >> current kernel? > > I'm using Promise SATA150TX4 cards here in old PCI based systems. They work > great and have been rock solid for well in excess of a year 24/7 hard use. I > have 3 in one box and 4 in another. > > I'm actually looking at building another 15 disk server now and was hoping to > move to something quicker using _almost_ commodity hardware. > > My current 15 drive RAID-6 server is built around a KT600 board with an AMD > Sempron processor and 4 SATA150TX4 cards. It does the job but it's not the > fastest thing around (takes about 10 hours to do a check of the array or > about 15 to do a rebuild). > > I'd love to do something similar with PCI-E or PCI-X and make it go faster > (the PCI bus bandwidth is the killer), however I've not seen many affordable > PCI-E multi-port cards that are supported yet and PCI-X seems to mean moving > to "server" class mainboards and the other expenses that come along with > that. > > Brad > -- > "Human beings, who are almost unique in having the ability > to learn from the experience of others, are also remarkable > for their apparent disinclination to do so." -- Douglas Adams > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > That is the problem, the only 4 port cards are PCI and not PCI-e and thus limit your speed and bw, the only alternative I see is an Areca card if you want speed.. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: I need a PCI V2.1 4 port SATA card 2006-06-28 4:29 ` Brad Campbell 2006-06-28 10:20 ` Justin Piszcz @ 2006-06-28 11:55 ` Christian Pernegger 2006-06-28 11:59 ` Gordon Henderson 2006-06-28 19:38 ` Justin Piszcz 2006-06-28 12:12 ` Petr Vyskocil 2 siblings, 2 replies; 47+ messages in thread From: Christian Pernegger @ 2006-06-28 11:55 UTC (permalink / raw) To: Brad Campbell; +Cc: linux-raid > My current 15 drive RAID-6 server is built around a KT600 board with an AMD Sempron > processor and 4 SATA150TX4 cards. It does the job but it's not the fastest thing around > (takes about 10 hours to do a check of the array or about 15 to do a rebuild). What kind of enclosure do you have this in? I also subscribe to the "almost commodity hardware" philosophy, however I've not been able to find a case that comfortably takes even 8 drives. (The Stacker is an absolute nightmare ...) Even most rackable cases stop at 6 3.5" drive bays -- either that or they are dedicated storage racks with integrated hw RAID and fiber SCSI interconnect --> definitely not commodity. Thanks, C. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: I need a PCI V2.1 4 port SATA card 2006-06-28 11:55 ` Christian Pernegger @ 2006-06-28 11:59 ` Gordon Henderson 2006-06-29 18:45 ` Bill Davidsen 2006-06-28 19:38 ` Justin Piszcz 1 sibling, 1 reply; 47+ messages in thread From: Gordon Henderson @ 2006-06-28 11:59 UTC (permalink / raw) To: Christian Pernegger; +Cc: linux-raid On Wed, 28 Jun 2006, Christian Pernegger wrote: > I also subscribe to the "almost commodity hardware" philosophy, > however I've not been able to find a case that comfortably takes even > 8 drives. (The Stacker is an absolute nightmare ...) Even most > rackable cases stop at 6 3.5" drive bays -- either that or they are > dedicated storage racks with integrated hw RAID and fiber SCSI > interconnect --> definitely not commodity. I've used these: http://www.acme-technology.co.uk/acm338.htm (8 drives in a 3U case), and their variants eg: http://www.acme-technology.co.uk/acm312.htm (12 disks in a 3U case) for several years with good results. Not the cheapest on the block though, but never had any real issues with them. Gordon ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: I need a PCI V2.1 4 port SATA card 2006-06-28 11:59 ` Gordon Henderson @ 2006-06-29 18:45 ` Bill Davidsen 0 siblings, 0 replies; 47+ messages in thread From: Bill Davidsen @ 2006-06-29 18:45 UTC (permalink / raw) To: Gordon Henderson; +Cc: Christian Pernegger, linux-raid Gordon Henderson wrote: >On Wed, 28 Jun 2006, Christian Pernegger wrote: > > > >>I also subscribe to the "almost commodity hardware" philosophy, >>however I've not been able to find a case that comfortably takes even >>8 drives. (The Stacker is an absolute nightmare ...) Even most >>rackable cases stop at 6 3.5" drive bays -- either that or they are >>dedicated storage racks with integrated hw RAID and fiber SCSI >>interconnect --> definitely not commodity. >> >> > >I've used these: > > http://www.acme-technology.co.uk/acm338.htm > >(8 drives in a 3U case), and their variants > >eg: > > http://www.acme-technology.co.uk/acm312.htm > >(12 disks in a 3U case) > > Interesting ad, with a masonic emblem, and a picture of a white case with a note saying it's only available in black. Of course the hardware may be perfectly fine, but I wouldn't count on color. >for several years with good results. Not the cheapest on the block though, >but never had any real issues with them. > -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: I need a PCI V2.1 4 port SATA card 2006-06-28 11:55 ` Christian Pernegger 2006-06-28 11:59 ` Gordon Henderson @ 2006-06-28 19:38 ` Justin Piszcz 1 sibling, 0 replies; 47+ messages in thread From: Justin Piszcz @ 2006-06-28 19:38 UTC (permalink / raw) To: Christian Pernegger; +Cc: Brad Campbell, linux-raid On Wed, 28 Jun 2006, Christian Pernegger wrote: >> My current 15 drive RAID-6 server is built around a KT600 board with an AMD >> Sempron >> processor and 4 SATA150TX4 cards. It does the job but it's not the fastest >> thing around >> (takes about 10 hours to do a check of the array or about 15 to do a >> rebuild). > > What kind of enclosure do you have this in? > > I also subscribe to the "almost commodity hardware" philosophy, > however I've not been able to find a case that comfortably takes even > 8 drives. (The Stacker is an absolute nightmare ...) Even most > rackable cases stop at 6 3.5" drive bays -- either that or they are > dedicated storage racks with integrated hw RAID and fiber SCSI > interconnect --> definitely not commodity. > > Thanks, > > C. > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > For the case, there a number of cases (Lian Li) that fit 20 drives with easy, check here: http://www.newegg.com/Product/Product.asp?item=N82E16811112062 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: I need a PCI V2.1 4 port SATA card 2006-06-28 4:29 ` Brad Campbell 2006-06-28 10:20 ` Justin Piszcz 2006-06-28 11:55 ` Christian Pernegger @ 2006-06-28 12:12 ` Petr Vyskocil 2 siblings, 0 replies; 47+ messages in thread From: Petr Vyskocil @ 2006-06-28 12:12 UTC (permalink / raw) To: linux-raid Brad Campbell wrote: > I'd love to do something similar with PCI-E or PCI-X and make it go > faster (the PCI bus bandwidth is the killer), however I've not seen > many affordable PCI-E multi-port cards that are supported yet and > PCI-X seems to mean moving to "server" class mainboards and the other > expenses that come along with that. Recently I was looking for a budget solution to exactly this problem, and the best I found was to use 2-port SiI 3132 based PCI-E 1x card combined with 1:5 SATA Splitter based on SiI 3726 (e.g. http://fwdepot.com/thestore/product_info.php/products_id/1245). Unfortunately I didn't find anyone selling the splitter here in Czechia, so I went with 4-port SiI PCI card, which is performing well and stable, but of course quite slow. Some test I googled up at that time suggested that this combo can get about 220MB/s bandwidth through in real life (test was on Win32 though), so at today's drive speeds you can connect ~4-5 drives to one PCI-E without bus bandwidth becoming the limiting factor. Anyway, for really budget machines I can recommend the PCI SiI 3124 based cards, the driver in kernel is working rock-stable for me. Only grudge is that driver doesn't sense if you disconnect a drive from SATA connector, i.e. when you do that, computer will freeze trying to write to disconnected drive. After ~3 minutes it times out and md kicks the drive out of the array, though. If someone has any experience to share about SiI 3132+3726 under linux, I'll be happy to hear about it. According to http://linux-ata.org/software-status.html#pmp it should work, question is how stable it is, since it is recent development. Petr ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-24 22:52 ` Adam Talbot 2006-06-25 13:06 ` Joshua Baker-LePain @ 2006-06-25 14:51 ` Adam Talbot 2006-06-25 20:35 ` Chris Allen 1 sibling, 1 reply; 47+ messages in thread From: Adam Talbot @ 2006-06-25 14:51 UTC (permalink / raw) To: Adam Talbot; +Cc: David Greaves, Neil Brown, Francois Barre, linux-raid ACK! At one point some one stated that they were having problems with XFS crashing under high NFS loads... Did it look something like this? -Adam Starting XFS recovery on filesystem: md0 (logdev: internal) Filesystem "md0": XFS internal error xlog_valid_rec_header(1) at line 3478 of file fs/xfs/xfs_log_recover.c. Caller 0xffffffff802114fc Call Trace: <ffffffff80211437>{xlog_valid_rec_header+231} <ffffffff802114fc>{xlog_do_recovery_pass+172} <ffffffff8020f0c8>{xlog_find_tail+2344} <ffffffff802217e1>{kmem_alloc+97} <ffffffff80211bb0>{xlog_recover+192} <ffffffff8020c564>{xfs_log_mount+1380} <ffffffff80213968>{xfs_mountfs+2712} <ffffffff8016aa3a>{set_blocksize+138} <ffffffff80224d1d>{xfs_setsize_buftarg_flags+61} <ffffffff802192b4>{xfs_mount+2724} <ffffffff8022ae00>{linvfs_fill_super+0} <ffffffff8022aeb8>{linvfs_fill_super+184} <ffffffff8024a62e>{strlcpy+78} <ffffffff80169db2>{sget+722} <ffffffff8016a460>{set_bdev_super+0} <ffffffff8022ae00>{linvfs_fill_super+0} <ffffffff8022ae00>{linvfs_fill_super+0} <ffffffff8016a5bc>{get_sb_bdev+268} <ffffffff8016a84b>{do_kern_mount+107} <ffffffff8017eed3>{do_mount+1603} <ffffffff8011a2f9>{do_page_fault+1033} <ffffffff80145f66>{find_get_pages+22} <ffffffff8014d57a>{invalidate_mapping_pages+202} <ffffffff80149f99>{__alloc_pages+89} <ffffffff8014a234>{__get_free_pages+52} <ffffffff8017f257>{sys_mount+151} <ffffffff8010a996>{system_call+126} XFS: log mount/recovery failed: error 990 XFS: log mount failed Adam Talbot wrote: > Trying to test for tuning with different chunk's. Just finished 16K > chunk and am about 20% done with the 32K test. Here are the numbers on > 16K chunk, will send 32, 96,128,192 and 256 as I get them. But keep in > mind each one of these tests take about 4~6 hours, so it is a slow > process... I have settled for XFS as the file system type, it seems to > be able to beat any thing else out there. > -Adam > > XFS > Config=NAS+NFS > RAID6 16K chunk > nas tmp # time tar cf - . | (cd /data ; tar xf - ) > real 252m40.143s > user 1m4.720s > sys 25m6.270s > /dev/md/0 1.1T 371G 748G 34% /data > 4.207 hours @ 90,167M/hour or 1502M/min or 25.05M/sec > > > > > David Greaves wrote: > >> Adam Talbot wrote: >> >> >>> OK, this topic I relay need to get in on. >>> I have spent the last few week bench marking my new 1.2TB, 6 disk, RAID6 >>> array. >>> >>> >> Very interesting. Thanks. >> >> Did you get around to any 'tuning'. >> Things like raid chunk size, external logs for xfs, blockdev readahead >> on the underlying devices and the raid device? >> >> David >> - >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-25 14:51 ` Large single raid and XFS or two small ones and EXT3? Adam Talbot @ 2006-06-25 20:35 ` Chris Allen 0 siblings, 0 replies; 47+ messages in thread From: Chris Allen @ 2006-06-25 20:35 UTC (permalink / raw) To: Adam Talbot; +Cc: linux-raid Adam Talbot wrote: > ACK! > At one point some one stated that they were having problems with XFS > crashing under high NFS loads... Did it look something like this? > -Adam > > > nope, it looked like the trace below - and I could make it happen consistently by thrashing xfs. Not even sure it was over NFS - this could well have been a local test. ---------------------- do_IRQ: stack overflow: 304 Unable to handle kernel paging request at virtual address a554b923 printing eip: c011b202 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: nfsd(U) lockd(U) md5(U) ipv6(U) autofs4(U) sunrpc(U) xfs(U) exportfs(U) video(U) button(U) battery(U) ac(U) uhci_hcd(U) ehci_hcd(U) i2c_i801(U) i2c_core(U) shpchp(U) e1000(U) floppy(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) raid5(U) xor(U) dm_mod(U) ata_piix(U) libata(U) aar81xx(U) sd_mod(U) scsi_mod(U) CPU: 10 EIP: 0060:[<c011b202>] Tainted: P VLI EFLAGS: 00010086 (2.6.11-2.6.11) EIP is at activate_task+0x34/0x9b eax: e514b703 ebx: 00000000 ecx: 028f8800 edx: c0400200 esi: 028f8800 edi: 000f4352 ebp: f545d02c esp: f545d018 ds: 007b es: 007b ss: 0068 Process (pid: 947105536, threadinfo=f545c000 task=f5a27000) Stack: badc0ded c3630160 f7ae4a80 c0400200 f7ae4a80 c3630160 f545d074 c011b785 00000000 c0220f39 00000001 00000086 00000000 00000001 00000003 f7ae4a80 00000082 00000001 0000000a 00000000 c02219da f7d7cf60 c035d914 00000000 Call Trace: [<c011b785>] try_to_wake_up+0x24a/0x2aa [<c0220f39>] scrup+0xcf/0xd9 [<c02219da>] set_cursor+0x4f/0x60 [<c01348b0>] autoremove_wake_function+0x15/0x37 [<c011d197>] __wake_up_common+0x39/0x59 [<c011d1e9>] __wake_up+0x32/0x43 [<c0121e2c>] release_console_sem+0xad/0xb5 [<c0121c48>] vprintk+0x1e7/0x29e [<c0121a5d>] printk+0x1b/0x1f [<c010664b>] do_IRQ+0x7f/0x86 [<c0104a3e>] common_interrupt+0x1a/0x20 [<c024b5fa>] cfq_may_queue+0x0/0xcd [<c02425e4>] get_request+0xf2/0x2b7 [<c02430cc>] __make_request+0xbe/0x472 [<c024375b>] generic_make_request+0x91/0x234 [<f881be38>] compute_blocknr+0xe5/0x16e [raid5] [<c013489b>] autoremove_wake_function+0x0/0x37 [<f881d0c2>] handle_stripe+0x736/0x109e [raid5] [<f881b45a>] get_active_stripe+0x1fb/0x36c [raid5] [<f881deed>] make_request+0x2e1/0x30d [raid5] [<c013489b>] autoremove_wake_function+0x0/0x37 [<c024375b>] generic_make_request+0x91/0x234 [<c03054e1>] schedule+0x431/0xc5e [<c024a3f4>] cfq_sort_rr_list+0x9b/0xe6 [<c0148c27>] buffered_rmqueue+0xc4/0x1fb [<c013489b>] autoremove_wake_function+0x0/0x37 [<c0243944>] submit_bio+0x46/0xcc [<c0147aae>] mempool_alloc+0x6f/0x108 [<c013489b>] autoremove_wake_function+0x0/0x37 [<c0166696>] bio_add_page+0x26/0x2c [<f9419fe7>] _pagebuf_ioapply+0x175/0x2e3 [xfs] [<f941a185>] pagebuf_iorequest+0x30/0x133 [xfs] [<f9419643>] xfs_buf_get_flags+0xe8/0x147 [xfs] [<f9419d45>] pagebuf_iostart+0x76/0x82 [xfs] [<f9419707>] xfs_buf_read_flags+0x65/0x89 [xfs] [<f940c105>] xfs_trans_read_buf+0x122/0x334 [xfs] [<f93d9dc2>] xfs_btree_read_bufs+0x7d/0x97 [xfs] [<f93c0d7a>] xfs_alloc_lookup+0x326/0x47b [xfs] [<f93bc96b>] xfs_alloc_fixup_trees+0x14f/0x320 [xfs] [<f93d99d9>] xfs_btree_init_cursor+0x1d/0x17f [xfs] [<f93bdc38>] xfs_alloc_ag_vextent_size+0x377/0x456 [xfs] [<f93bcbdb>] xfs_alloc_read_agfl+0x9f/0xb9 [xfs] [<f93bccf5>] xfs_alloc_ag_vextent+0x100/0x102 [xfs] [<f93be929>] xfs_alloc_fix_freelist+0x2ca/0x478 [xfs] [<f93bf087>] xfs_alloc_vextent+0x182/0x570 [xfs] [<f93cdff3>] xfs_bmap_alloc+0x111e/0x18e9 [xfs] [<c013489b>] autoremove_wake_function+0x0/0x37 [<c024375b>] generic_make_request+0x91/0x234 [<f891eb40>] EdmaReqQueueInsert+0x70/0x80 [aar81xx] [<c011cf79>] scheduler_tick+0x236/0x40f [<c011cf79>] scheduler_tick+0x236/0x40f [<f93d833e>] xfs_bmbt_get_state+0x13/0x1c [xfs] [<f93cfebf>] xfs_bmap_do_search_extents+0xc3/0x476 [xfs] [<f93d1b9f>] xfs_bmapi+0x72a/0x1670 [xfs] [<f93d833e>] xfs_bmbt_get_state+0x13/0x1c [xfs] [<f93ffdf7>] xlog_grant_log_space+0x329/0x350 [xfs] [<f93fb3d0>] xfs_iomap_write_allocate+0x2d1/0x572 [xfs] [<c0243944>] submit_bio+0x46/0xcc [<c0147aae>] mempool_alloc+0x6f/0x108 [<f93fa368>] xfs_iomap+0x3ef/0x50c [xfs] [<f94173fd>] xfs_map_blocks+0x39/0x71 [xfs] [<f94183b3>] xfs_page_state_convert+0x4b9/0x6ab [xfs] [<f9418b1d>] linvfs_writepage+0x57/0xd5 [xfs] [<c014e71d>] pageout+0x84/0x101 [<c014ea1b>] shrink_list+0x281/0x454 [<c014db1b>] __pagevec_lru_add+0xac/0xbb [<c014ed82>] shrink_cache+0xe7/0x26c [<c014f33f>] shrink_zone+0x76/0xbb [<c014f3e5>] shrink_caches+0x61/0x6f [<c014f4b8>] try_to_free_pages+0xc5/0x18d [<c0148fbb>] __alloc_pages+0x1cc/0x407 [<c014674a>] generic_file_buffered_write+0x148/0x60c [<c0180ee8>] __mark_inode_dirty+0x28/0x199 [<f941f444>] xfs_write+0xa36/0xd03 [xfs] [<f941b89d>] linvfs_write+0xe9/0x102 [xfs] [<c013489b>] autoremove_wake_function+0x0/0x37 [<c014294d>] audit_syscall_entry+0x10b/0x15e [<f941b7b4>] linvfs_write+0x0/0x102 [xfs] [<c0161a27>] vfs_write+0x9e/0x110 [<c0161b44>] sys_write+0x41/0x6a [<c0104009>] syscall_call+0x7/0xb Code: 89 45 f0 89 55 ec 89 cb e8 24 57 ff ff 89 c6 89 d7 85 db 75 27 ba 00 02 40 c0 b8 00 f0 ff ff 21 e0 8b 40 10 8b 04 85 20 50 40 c0 <2b> 74 02 20 1b 7c 02 24 8b 45 ec 03 70 20 13 78 24 89 f2 89 f9 hr_ioreq_timedout: (0,5,0) opcode 0x28: Enter ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-24 7:59 ` Adam Talbot 2006-06-24 9:34 ` David Greaves @ 2006-06-25 23:57 ` Bill Davidsen 2006-06-26 0:42 ` Adam Talbot 1 sibling, 1 reply; 47+ messages in thread From: Bill Davidsen @ 2006-06-25 23:57 UTC (permalink / raw) To: Adam Talbot; +Cc: Neil Brown, Francois Barre, linux-raid winspeareAdam Talbot wrote: >OK, this topic I relay need to get in on. >I have spent the last few week bench marking my new 1.2TB, 6 disk, RAID6 >array. I wanted real numbers, not "This FS is faster because..." I have >moved over 100TB of data on my new array running the bench mark >testing. I have yet to have any major problems with ReiserFS, EXT2/3, >JFS, or XFS. I have done extensive testing on all, including just >trying to break the file system with billions of 1k files, or a 1TB >file. Was able to cause some problems with EXT3 and RiserFS with the 1KB >and 1TB tests, respectively. but both were fixed with a fsck. My basic >test is to move all data from my old server to my new server >(whitequeen2) and clock the transfer time. Whitequeen2 has very little >storage. The NAS's 1.2TB of storage is attached via iSCSI and a cross >over cable to the back of whitequeen2. The data is 100GB of user's >files(1KB~2MB), 50GB of MP3's (1MB~5MB) and the rest is movies and >system backups 600MB~2GB. Here is a copy of my current data sheet, >including specs on the servers and copy times, my numbers are not >perfect, but they should give you a clue about speeds... XFS wins. > > In many (most?) cases I'm a lot more concerned about filesystem stability than performance. That is, I want the fastest <reliable> filesystem. With ext2 and ext3 I've run multiple multi-TB machines spread over four time zones, and not had a f/s problem updating ~1TB/day. >The computer: whitequeen2 >AMD Athlon64 3200 (2.0GHz) >1GB Corsair DDR 400 (2X 512MB's running in dual DDR mode) >Foxconn 6150K8MA-8EKRS motherboard >Off brand case/power supply >2X os disks, software raid array, RAID 1, Maxtor 51369U3, FW DA620CQ0 >Intel pro/1000 NIC >CentOS 4.3 X86_64 2.6.9 > Main app server, Apache, Samba, NFS, NIS > >The computer: nas >AMD Athlon64 3000 (1.8GHz) >256MB Corsair DDR 400 (2X 128MB's running in dual DDR mode) >Foxconn 6150K8MA-8EKRS motherboard >Off brand case/power supply and drive cages >2X os disks, software raid array, RAID 1, Maxtor 51369U3, FW DA620CQ0 >6X software raid array, RAID 6, Maxtor 7V300F0, FW VA111900 >Gentoo linux. X86_64 2.6.16-gentoo-r9 > System built very lite, only built as an iSCSI based NAS. > >EXT3 >Config=APP+NFS-->NAS+iSCSI >RAID6 64K chunk >[root@whitequeen2 tmp]# time tar cf - . | (cd /data ; tar xf - ) >real 371m29.802s >user 1m28.492s >sys 46m48.947s >/dev/sdb1 1.1T 371G 674G 36% /data >6.192 hours @ 61,262M/hour or 1021M/min or 17.02M/sec > > >EXT2 >Config=APP+NFS-->NAS+iSCSI >RAID6 64K chunk >[root@whitequeen2 tmp]# time tar cf - . | ( cd /data/ ; tar xf - ) >real 401m48.702s >user 1m25.599s >sys 30m22.620s >/dev/sdb1 1.1T 371G 674G 36% /data >6.692 hours @ 56,684M/hour or 945M/min or 15.75M/sec > Did you tune the extN filesystems to the stripe size of the raid? -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-25 23:57 ` Bill Davidsen @ 2006-06-26 0:42 ` Adam Talbot 2006-06-26 14:03 ` Bill Davidsen 0 siblings, 1 reply; 47+ messages in thread From: Adam Talbot @ 2006-06-26 0:42 UTC (permalink / raw) To: Bill Davidsen; +Cc: Neil Brown, Francois Barre, linux-raid Not exactly sure how to tune for stripe size. What would you advise? -Adam Bill Davidsen wrote: > winspeareAdam Talbot wrote: > >> OK, this topic I relay need to get in on. >> I have spent the last few week bench marking my new 1.2TB, 6 disk, RAID6 >> array. I wanted real numbers, not "This FS is faster because..." I have >> moved over 100TB of data on my new array running the bench mark >> testing. I have yet to have any major problems with ReiserFS, EXT2/3, >> JFS, or XFS. I have done extensive testing on all, including just >> trying to break the file system with billions of 1k files, or a 1TB >> file. Was able to cause some problems with EXT3 and RiserFS with the 1KB >> and 1TB tests, respectively. but both were fixed with a fsck. My basic >> test is to move all data from my old server to my new server >> (whitequeen2) and clock the transfer time. Whitequeen2 has very little >> storage. The NAS's 1.2TB of storage is attached via iSCSI and a cross >> over cable to the back of whitequeen2. The data is 100GB of user's >> files(1KB~2MB), 50GB of MP3's (1MB~5MB) and the rest is movies and >> system backups 600MB~2GB. Here is a copy of my current data sheet, >> including specs on the servers and copy times, my numbers are not >> perfect, but they should give you a clue about speeds... XFS wins. >> >> > > In many (most?) cases I'm a lot more concerned about filesystem > stability than performance. That is, I want the fastest <reliable> > filesystem. With ext2 and ext3 I've run multiple multi-TB machines > spread over four time zones, and not had a f/s problem updating ~1TB/day. > >> The computer: whitequeen2 >> AMD Athlon64 3200 (2.0GHz) >> 1GB Corsair DDR 400 (2X 512MB's running in dual DDR mode) >> Foxconn 6150K8MA-8EKRS motherboard >> Off brand case/power supply >> 2X os disks, software raid array, RAID 1, Maxtor 51369U3, FW DA620CQ0 >> Intel pro/1000 NIC >> CentOS 4.3 X86_64 2.6.9 >> Main app server, Apache, Samba, NFS, NIS >> >> The computer: nas >> AMD Athlon64 3000 (1.8GHz) >> 256MB Corsair DDR 400 (2X 128MB's running in dual DDR mode) >> Foxconn 6150K8MA-8EKRS motherboard >> Off brand case/power supply and drive cages >> 2X os disks, software raid array, RAID 1, Maxtor 51369U3, FW DA620CQ0 >> 6X software raid array, RAID 6, Maxtor 7V300F0, FW VA111900 >> Gentoo linux. X86_64 2.6.16-gentoo-r9 >> System built very lite, only built as an iSCSI based NAS. >> >> EXT3 >> Config=APP+NFS-->NAS+iSCSI >> RAID6 64K chunk >> [root@whitequeen2 tmp]# time tar cf - . | (cd /data ; tar xf - ) >> real 371m29.802s >> user 1m28.492s >> sys 46m48.947s >> /dev/sdb1 1.1T 371G 674G 36% /data >> 6.192 hours @ 61,262M/hour or 1021M/min or 17.02M/sec >> >> >> EXT2 >> Config=APP+NFS-->NAS+iSCSI >> RAID6 64K chunk >> [root@whitequeen2 tmp]# time tar cf - . | ( cd /data/ ; tar xf - ) >> real 401m48.702s >> user 1m25.599s >> sys 30m22.620s >> /dev/sdb1 1.1T 371G 674G 36% /data >> 6.692 hours @ 56,684M/hour or 945M/min or 15.75M/sec >> > Did you tune the extN filesystems to the stripe size of the raid? > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-26 0:42 ` Adam Talbot @ 2006-06-26 14:03 ` Bill Davidsen 0 siblings, 0 replies; 47+ messages in thread From: Bill Davidsen @ 2006-06-26 14:03 UTC (permalink / raw) To: Adam Talbot; +Cc: Neil Brown, Francois Barre, linux-raid Adam Talbot wrote: >Not exactly sure how to tune for stripe size. >What would you advise? >-Adam > > See the -R option of mke2fs. I don't have a number for the performance impact of this, but I bet someone else on the list will. Depending on what posts you read, reports range from "measurable" to "significant," without quantifying. Note, next month I will set up either a 2x750 RAID-1 or 4x250 RAID-5 array, and if I got RAID-5 I will have the chance to run some metrics before putting the hardware into production service. I'll report on the -R option if I have any data. > >Bill Davidsen wrote: > > >>winspeareAdam Talbot wrote: >> >> >> >>>OK, this topic I relay need to get in on. >>>I have spent the last few week bench marking my new 1.2TB, 6 disk, RAID6 >>>array. I wanted real numbers, not "This FS is faster because..." I have >>>moved over 100TB of data on my new array running the bench mark >>>testing. I have yet to have any major problems with ReiserFS, EXT2/3, >>>JFS, or XFS. I have done extensive testing on all, including just >>>trying to break the file system with billions of 1k files, or a 1TB >>>file. Was able to cause some problems with EXT3 and RiserFS with the 1KB >>>and 1TB tests, respectively. but both were fixed with a fsck. My basic >>>test is to move all data from my old server to my new server >>>(whitequeen2) and clock the transfer time. Whitequeen2 has very little >>>storage. The NAS's 1.2TB of storage is attached via iSCSI and a cross >>>over cable to the back of whitequeen2. The data is 100GB of user's >>>files(1KB~2MB), 50GB of MP3's (1MB~5MB) and the rest is movies and >>>system backups 600MB~2GB. Here is a copy of my current data sheet, >>>including specs on the servers and copy times, my numbers are not >>>perfect, but they should give you a clue about speeds... XFS wins. >>> >>> >>> >>> >>In many (most?) cases I'm a lot more concerned about filesystem >>stability than performance. That is, I want the fastest <reliable> >>filesystem. With ext2 and ext3 I've run multiple multi-TB machines >>spread over four time zones, and not had a f/s problem updating ~1TB/day. >> >> >> >>Did you tune the extN filesystems to the stripe size of the raid? >> >> >> -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-24 5:19 ` Neil Brown 2006-06-24 7:59 ` Adam Talbot @ 2006-06-24 12:40 ` Justin Piszcz 2006-06-26 0:06 ` Bill Davidsen 1 sibling, 1 reply; 47+ messages in thread From: Justin Piszcz @ 2006-06-24 12:40 UTC (permalink / raw) To: Neil Brown; +Cc: Francois Barre, linux-raid On Sat, 24 Jun 2006, Neil Brown wrote: > On Friday June 23, francois.barre@gmail.com wrote: >>> The problem is that there is no cost effective backup available. >> >> One-liner questions : >> - How does Google make backups ? > > No, Google ARE the backups :-) > >> - Aren't tapes dead yet ? > > LTO-3 does 300Gig, and LTO-4 is planned. > They may not cope with tera-byte arrays in one hit, but they still > have real value. > >> - What about a NUMA principle applied to storage ? > > You mean an Hierarchical Storage Manager? Yep, they exist. I'm sure > SGI, EMC and assorted other TLAs could sell you one. > > NeilBrown > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > LTO3 is 400GB native and we've seen very good compression, so 800GB-1TB per tape. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-24 12:40 ` Justin Piszcz @ 2006-06-26 0:06 ` Bill Davidsen 2006-06-26 8:06 ` Justin Piszcz 0 siblings, 1 reply; 47+ messages in thread From: Bill Davidsen @ 2006-06-26 0:06 UTC (permalink / raw) To: Justin Piszcz; +Cc: Neil Brown, Francois Barre, linux-raid Justin Piszcz wrote: > > On Sat, 24 Jun 2006, Neil Brown wrote: > >> On Friday June 23, francois.barre@gmail.com wrote: >> >>>> The problem is that there is no cost effective backup available. >>> >>> >>> One-liner questions : >>> - How does Google make backups ? >> >> >> No, Google ARE the backups :-) >> >>> - Aren't tapes dead yet ? >> >> >> LTO-3 does 300Gig, and LTO-4 is planned. >> They may not cope with tera-byte arrays in one hit, but they still >> have real value. >> >>> - What about a NUMA principle applied to storage ? >> >> >> You mean an Hierarchical Storage Manager? Yep, they exist. I'm sure >> SGI, EMC and assorted other TLAs could sell you one. >> > > LTO3 is 400GB native and we've seen very good compression, so > 800GB-1TB per tape. The problem is in small business use, LTO3 is costly in the 1-10TB range, and takes a lot of media changes as well. A TB of RAID-5 is ~$500, and at that small size the cost of drives and media is disproportionally high. Using more drives is cost effective, but they are not good for long term off site storage, because they're large and fragile. No obvious solutions in that price and application range that I see. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-26 0:06 ` Bill Davidsen @ 2006-06-26 8:06 ` Justin Piszcz 0 siblings, 0 replies; 47+ messages in thread From: Justin Piszcz @ 2006-06-26 8:06 UTC (permalink / raw) To: Bill Davidsen; +Cc: Neil Brown, Francois Barre, linux-raid On Sun, 25 Jun 2006, Bill Davidsen wrote: > Justin Piszcz wrote: > >> >> On Sat, 24 Jun 2006, Neil Brown wrote: >> >>> On Friday June 23, francois.barre@gmail.com wrote: >>> >>>>> The problem is that there is no cost effective backup available. >>>> >>>> >>>> One-liner questions : >>>> - How does Google make backups ? >>> >>> >>> No, Google ARE the backups :-) >>> >>>> - Aren't tapes dead yet ? >>> >>> >>> LTO-3 does 300Gig, and LTO-4 is planned. >>> They may not cope with tera-byte arrays in one hit, but they still >>> have real value. >>> >>>> - What about a NUMA principle applied to storage ? >>> >>> >>> You mean an Hierarchical Storage Manager? Yep, they exist. I'm sure >>> SGI, EMC and assorted other TLAs could sell you one. >>> >> >> LTO3 is 400GB native and we've seen very good compression, so 800GB-1TB per >> tape. > > The problem is in small business use, LTO3 is costly in the 1-10TB range, and > takes a lot of media changes as well. A TB of RAID-5 is ~$500, and at that > small size the cost of drives and media is disproportionally high. Using more > drives is cost effective, but they are not good for long term off site > storage, because they're large and fragile. > > No obvious solutions in that price and application range that I see. > > -- > bill davidsen <davidsen@tmr.com> > CTO TMR Associates, Inc > Doing interesting things with small computers since 1979 > In the 1-10TB range you are probably correct, as the numbers increase however, many LTO2/LTO3 drives + robotics become justifiable. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 14:46 ` Martin Schröder 2006-06-23 14:59 ` Francois Barre 2006-06-23 15:13 ` Bill Davidsen @ 2006-06-23 15:17 ` Chris Allen 2 siblings, 0 replies; 47+ messages in thread From: Chris Allen @ 2006-06-23 15:17 UTC (permalink / raw) To: Martin Schröder; +Cc: linux-raid Martin Schröder wrote: > 2006/6/23, Francois Barre <francois.barre@gmail.com>: >> Loosing data is worse than loosing anything else. You can buy you > > That's why RAID is no excuse for backups. > > We have 50TB stored data now and maybe 250TB this time next year. We mirror the most recent 20TB to a secondary array and rely on the RAID for the rest. I can't think of a practical tape backup strategy given tape sizes at the moment... - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 12:50 ` Chris Allen 2006-06-23 13:14 ` Gordon Henderson 2006-06-23 13:30 ` Francois Barre @ 2006-06-23 14:01 ` Al Boldi 2006-06-23 16:06 ` Andreas Dilger 2006-06-23 16:21 ` Russell Cattelan 2006-06-27 12:05 ` Large single raid... - XFS over NFS woes Dexter Filmore 3 siblings, 2 replies; 47+ messages in thread From: Al Boldi @ 2006-06-23 14:01 UTC (permalink / raw) To: linux-raid; +Cc: linux-fsdevel Chris Allen wrote: > Francois Barre wrote: > > 2006/6/23, PFC <lists@peufeu.com>: > >> - XFS is faster and fragments less, but make sure you have a > >> good UPS > > > > Why a good UPS ? XFS has a good strong journal, I never had an issue > > with it yet... And believe me, I did have some dirty things happening > > here... > > > >> - ReiserFS 3.6 is mature and fast, too, you might consider it > >> - ext3 is slow if you have many files in one directory, but > >> has more > >> mature tools (resize, recovery etc) > > > > XFS tools are kind of mature also. Online grow, dump, ... > > > >> I'd go with XFS or Reiser. > > > > I'd go with XFS. But I may be kind of fanatic... > > Strange that whatever the filesystem you get equal numbers of people > saying that they have never lost a single byte to those who have had > horrible corruption and would never touch it again. We stopped using XFS > about a year ago because we were getting kernel stack space panics under > heavy load over NFS. It looks like the time has come to give it another > try. If you are keen on data integrity then don't touch any fs w/o data=ordered. ext3 is still king wrt data=ordered, albeit slow. Now XFS is fast, but doesn't support data=ordered. It seems that their solution to the problem is to pass the burden onto hw by using barriers. Maybe XFS can get away with this. Maybe. Thanks! -- Al ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 14:01 ` Al Boldi @ 2006-06-23 16:06 ` Andreas Dilger 2006-06-23 16:41 ` Christian Pedaschus 2006-06-23 16:21 ` Russell Cattelan 1 sibling, 1 reply; 47+ messages in thread From: Andreas Dilger @ 2006-06-23 16:06 UTC (permalink / raw) To: Al Boldi; +Cc: linux-raid, linux-fsdevel On Jun 23, 2006 17:01 +0300, Al Boldi wrote: > Chris Allen wrote: > > Francois Barre wrote: > > > 2006/6/23, PFC <lists@peufeu.com>: > > >> - ext3 is slow if you have many files in one directory, but > > >> has more mature tools (resize, recovery etc) Please use "mke2fs -O dir_index" or "tune2fs -O dir_index" when testing ext3 performance for many-files-in-dir. This is now the default in e2fsprogs-1.39 and later. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 16:06 ` Andreas Dilger @ 2006-06-23 16:41 ` Christian Pedaschus 2006-06-23 16:46 ` Christian Pedaschus 2006-06-23 19:53 ` Nix 0 siblings, 2 replies; 47+ messages in thread From: Christian Pedaschus @ 2006-06-23 16:41 UTC (permalink / raw) To: Andreas Dilger; +Cc: Al Boldi, linux-raid, linux-fsdevel Andreas Dilger wrote: >On Jun 23, 2006 17:01 +0300, Al Boldi wrote: > > >>Chris Allen wrote: >> >> >>>Francois Barre wrote: >>> >>> >>>>2006/6/23, PFC <lists@peufeu.com>: >>>> >>>> >>>>> - ext3 is slow if you have many files in one directory, but >>>>>has more mature tools (resize, recovery etc) >>>>> >>>>> > >Please use "mke2fs -O dir_index" or "tune2fs -O dir_index" when testing >ext3 performance for many-files-in-dir. This is now the default in >e2fsprogs-1.39 and later. > > for ext3 use (on unmounted disks): tune2fs -O has_journal -o journal_data /dev/{disk} tune2fs -O dir_index /dev/{disk} if data is on the drive, you need to run a fsck afterwards and it uses a good bit of ram, but it makes ext3 a good bit faster. and my main points for using ext3 is still: "it's a very mature fs, nobody will tell you such horrible storys about data-lossage with ext3 than with any other filesystem." and there are undelete tools for ext3. so if you're for data-integrity (i guess you are, else you would not use raid, or? ;) ), use ext3 and if you need the last single kb/s get a faster drive or use lots of them with a good raid-combo and/or use a separate disk for the journal (man 8 tune2fs) my 0.5 cents, greets chris ps. but you know, filesystem choosage is not pure science, it's half-religion :D >Cheers, Andreas >-- >Andreas Dilger >Principal Software Engineer >Cluster File Systems, Inc. > >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 16:41 ` Christian Pedaschus @ 2006-06-23 16:46 ` Christian Pedaschus 2006-06-23 19:53 ` Nix 1 sibling, 0 replies; 47+ messages in thread From: Christian Pedaschus @ 2006-06-23 16:46 UTC (permalink / raw) Cc: Andreas Dilger, Al Boldi, linux-raid, linux-fsdevel Christian Pedaschus wrote: >for ext3 use (on unmounted disks): >tune2fs -O has_journal -o journal_data /dev/{disk} >tune2fs -O dir_index /dev/{disk} > >if data is on the drive, you need to run a fsck afterwards and it uses a >good bit of ram, but it makes ext3 a good bit faster. > >and my main points for using ext3 is still: "it's a very mature fs, >nobody will tell you such horrible storys about data-lossage with ext3 >than with any other filesystem." >and there are undelete tools for ext3. > >so if you're for data-integrity (i guess you are, else you would not use >raid, or? ;) ), use ext3 and if you need the last single kb/s get a >faster drive or use lots of them with a good raid-combo and/or use a >separate disk for the journal (man 8 tune2fs) > >my 0.5 cents, >greets chris > >ps. but you know, filesystem choosage is not pure science, it's >half-religion :D > > Ops, should be: tune2fs -O has_journal -o journal_data /dev/{partition} tune2fs -O dir_index /dev/{partition} ;) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 16:41 ` Christian Pedaschus 2006-06-23 16:46 ` Christian Pedaschus @ 2006-06-23 19:53 ` Nix 1 sibling, 0 replies; 47+ messages in thread From: Nix @ 2006-06-23 19:53 UTC (permalink / raw) To: Christian Pedaschus, linux-raid, linux-fsdevel On 23 Jun 2006, Christian Pedaschus said: > and my main points for using ext3 is still: "it's a very mature fs, > nobody will tell you such horrible storys about data-lossage with ext3 > than with any other filesystem." Actually I can, but it required bad RAM *and* a broken disk controller *and* an electrical storm *and* heavy disk loads (only read loads, but I didn't have noatime active so read implied write). In my personal experience it's since weathered machines with `only' RAM so bad that md5sums of 512Kb files wouldn't come out the same way twice with no problems at all (some file data got corrupted, unsurprisingly, but the metadata was fine). Definitely an FS to be relied upon. -- `NB: Anyone suggesting that we should say "Tibibytes" instead of Terabytes there will be hunted down and brutally slain. That is all.' --- Matthew Wilcox ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 14:01 ` Al Boldi 2006-06-23 16:06 ` Andreas Dilger @ 2006-06-23 16:21 ` Russell Cattelan 2006-06-23 18:19 ` Tom Vier 1 sibling, 1 reply; 47+ messages in thread From: Russell Cattelan @ 2006-06-23 16:21 UTC (permalink / raw) To: Al Boldi; +Cc: linux-raid, linux-fsdevel Al Boldi wrote: >Chris Allen wrote: > > >>Francois Barre wrote: >> >> >>>2006/6/23, PFC <lists@peufeu.com>: >>> >>> >>>> - XFS is faster and fragments less, but make sure you have a >>>>good UPS >>>> >>>> >>>Why a good UPS ? XFS has a good strong journal, I never had an issue >>>with it yet... And believe me, I did have some dirty things happening >>>here... >>> >>> >>> >>>> - ReiserFS 3.6 is mature and fast, too, you might consider it >>>> - ext3 is slow if you have many files in one directory, but >>>>has more >>>>mature tools (resize, recovery etc) >>>> >>>> >>>XFS tools are kind of mature also. Online grow, dump, ... >>> >>> >>> >>>> I'd go with XFS or Reiser. >>>> >>>> >>>I'd go with XFS. But I may be kind of fanatic... >>> >>> >>Strange that whatever the filesystem you get equal numbers of people >>saying that they have never lost a single byte to those who have had >>horrible corruption and would never touch it again. We stopped using XFS >>about a year ago because we were getting kernel stack space panics under >>heavy load over NFS. It looks like the time has come to give it another >>try. >> >> > >If you are keen on data integrity then don't touch any fs w/o data=ordered. > >ext3 is still king wrt data=ordered, albeit slow. > >Now XFS is fast, but doesn't support data=ordered. It seems that their >solution to the problem is to pass the burden onto hw by using barriers. >Maybe XFS can get away with this. Maybe. > >Thanks! > >-- > > When you refer to data=ordered are you taking about ext3 user data journaling? While user data journaling seems like a good idea is unclear as what benefits it really provides? By writing all user data twice the write performance of the files system is effectively halved. Granted the log is on area of the disk so some performance advantages show up due to less head seeking for those writes. As far us meta data jornaling goes it is a fundamental requirement that the journal is synced to disk to a given point in order to release the pinned meta data, thus allowing the meta data to be synced to disk. The way most files systems guarantee file system consistency is to either sync all outstanding meta data changes to disk or to sync a record of what incore changes have been made to disk. In the XFS case since it logs meta data delta to the log it can record more change operations in a smaller number of disk blocks, ext3 on the other hand writes the entire metadata block to the log. As far as barriers go I assume you are referring to the ide write barriers? The need for barrier support in the file system is a result of cheap ide disks providing large write caches but not having enough reserve power to guarantee that the cache will be sync'ed to disk in the event of a power failure. Originally when xfs was written the disks/raids used by SGI system was pretty much exclusively enterprise level devices that would guarantee the write caches would be flushed in the event of a power failure. Note ext3,xfs,and reiser all use write barrier now fos r ide disks. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 16:21 ` Russell Cattelan @ 2006-06-23 18:19 ` Tom Vier 0 siblings, 0 replies; 47+ messages in thread From: Tom Vier @ 2006-06-23 18:19 UTC (permalink / raw) To: Russell Cattelan; +Cc: Al Boldi, linux-raid, linux-fsdevel On Fri, Jun 23, 2006 at 11:21:34AM -0500, Russell Cattelan wrote: > When you refer to data=ordered are you taking about ext3 user data > journaling? iirc, data=ordered just writes new data out before updating block pointers, the file's length in its inode, and the block usage bitmap. That way you don't get junk or zeroed data at the tail of the file. However, i think to prevent data leaks (from deleted files), data=writeback requires a write to the journal, indicating what blocks are being added, so that on recovery they can be zeroed if the transaction wasn't completed. > While user data journaling seems like a good idea is unclear as what > benefits it really provides? Data gets commited sooner (until pressure or timeouts force the data to be written to its final spot - then you loose thruput and there's a net delay). I think for bursts of small file creation, data=journaled is a win. I don't know how lazy ext3 is about writing the data to its final position. It probably does it when the commit timeout hits 0 or the journal is full. > As far as barriers go I assume you are referring to the ide write barriers? > > The need for barrier support in the file system is a result of cheap ide > disks providing large write caches but not having enough reserve power to > guarantee that the cache will be sync'ed to disk in the event of a power > failure. It's needed on any drive (including scsi) that has writeback cache enabled. Most scsi drives (in my experience) come from the factory with the cache set to write thru, in case the fs/os doesn't use ordered tags, cache flushes, or force-unit-access writes. > Note ext3,xfs,and reiser all use write barrier now fos r ide disks. What i've found very disappointing is that my raid1 doesn't support them! Jun 22 10:53:49 zero kernel: Filesystem "md1": Disabling barriers, not supported by the underlying device I'm not sure if it's the sata drive that don't support write barriers, or if it's just the md1 layer. I need to investigate that. I think reiserfs also complained that trying to enabled write barriers fails on that md1 (i've been playing with various fs'es on it). -- Tom Vier <tmv@comcast.net> DSA Key ID 0x15741ECE ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid... - XFS over NFS woes 2006-06-23 12:50 ` Chris Allen ` (2 preceding siblings ...) 2006-06-23 14:01 ` Al Boldi @ 2006-06-27 12:05 ` Dexter Filmore 3 siblings, 0 replies; 47+ messages in thread From: Dexter Filmore @ 2006-06-27 12:05 UTC (permalink / raw) To: Chris Allen, linux-raid Am Freitag, 23. Juni 2006 14:50 schrieben Sie: > Strange that whatever the filesystem you get equal numbers of people > saying that > they have never lost a single byte to those who have had horrible > corruption and > would never touch it again. We stopped using XFS about a year ago because > we were getting kernel stack space panics under heavy load over NFS. It > looks like > the time has come to give it another try. I'd tread on XFS land cautious - while I always favored XFS over Reiser (had way to many issues in its stable releases after my fancy) it has some drawbacks. First, you cannot shrink it. LVM becomes kinda pointless. But especially with NFS I ran into trouble myself. Copying large amount of data sometimes stalls and eventually has locked the machine. Plus, recently I had some weird filesystem corruption like /root getting lost or similar. Running 2.6.14.1 and NFS3. If performance is not top priority, stick to ext3 and create 2 partitions or volume groups. My 0.02$ Dex P.S.: How about JFS..? Don't know if it can resize or how stable it is, but I can't remember hearing more or less ups or downs than about any other journaling fs. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.12 GCS d--(+)@ s-:+ a- C++++ UL++ P+>++ L+++>++++ E-- W++ N o? K- w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D- G++ e* h>++ r* y? ------END GEEK CODE BLOCK------ http://www.stop1984.com http://www.againsttcpa.com ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 8:59 ` PFC 2006-06-23 9:26 ` Francois Barre @ 2006-06-23 19:48 ` Nix 2006-06-25 19:13 ` David Rees 1 sibling, 1 reply; 47+ messages in thread From: Nix @ 2006-06-23 19:48 UTC (permalink / raw) To: PFC; +Cc: Chris Allen, linux-raid On 23 Jun 2006, PFC suggested tentatively: > - ext3 is slow if you have many files in one directory, but has > more mature tools (resize, recovery etc) This is much less true if you turn on the dir_index feature. -- `NB: Anyone suggesting that we should say "Tibibytes" instead of Terabytes there will be hunted down and brutally slain. That is all.' --- Matthew Wilcox ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Large single raid and XFS or two small ones and EXT3? 2006-06-23 19:48 ` Large single raid and XFS or two small ones and EXT3? Nix @ 2006-06-25 19:13 ` David Rees 0 siblings, 0 replies; 47+ messages in thread From: David Rees @ 2006-06-25 19:13 UTC (permalink / raw) To: Nix; +Cc: PFC, Chris Allen, linux-raid On 6/23/06, Nix <nix@esperi.org.uk> wrote: > On 23 Jun 2006, PFC suggested tentatively: > > - ext3 is slow if you have many files in one directory, but has > > more mature tools (resize, recovery etc) > > This is much less true if you turn on the dir_index feature. However, even with dir_index, deleting large files is still much slower with ext2/3 than xfs or jfs. -Dave ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2006-06-29 18:45 UTC | newest] Thread overview: 47+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-06-22 19:11 Large single raid and XFS or two small ones and EXT3? Chris Allen 2006-06-22 19:16 ` Gordon Henderson 2006-06-22 19:23 ` H. Peter Anvin 2006-06-22 19:58 ` Chris Allen 2006-06-22 20:00 ` Chris Allen 2006-06-23 8:59 ` PFC 2006-06-23 9:26 ` Francois Barre 2006-06-23 12:50 ` Chris Allen 2006-06-23 13:14 ` Gordon Henderson 2006-06-23 13:30 ` Francois Barre 2006-06-23 14:46 ` Martin Schröder 2006-06-23 14:59 ` Francois Barre 2006-06-23 15:13 ` Bill Davidsen 2006-06-23 15:34 ` Francois Barre 2006-06-23 19:49 ` Nix 2006-06-24 5:19 ` Neil Brown 2006-06-24 7:59 ` Adam Talbot 2006-06-24 9:34 ` David Greaves 2006-06-24 22:52 ` Adam Talbot 2006-06-25 13:06 ` Joshua Baker-LePain 2006-06-28 3:45 ` I need a PCI V2.1 4 port SATA card Guy 2006-06-28 4:29 ` Brad Campbell 2006-06-28 10:20 ` Justin Piszcz 2006-06-28 11:55 ` Christian Pernegger 2006-06-28 11:59 ` Gordon Henderson 2006-06-29 18:45 ` Bill Davidsen 2006-06-28 19:38 ` Justin Piszcz 2006-06-28 12:12 ` Petr Vyskocil 2006-06-25 14:51 ` Large single raid and XFS or two small ones and EXT3? Adam Talbot 2006-06-25 20:35 ` Chris Allen 2006-06-25 23:57 ` Bill Davidsen 2006-06-26 0:42 ` Adam Talbot 2006-06-26 14:03 ` Bill Davidsen 2006-06-24 12:40 ` Justin Piszcz 2006-06-26 0:06 ` Bill Davidsen 2006-06-26 8:06 ` Justin Piszcz 2006-06-23 15:17 ` Chris Allen 2006-06-23 14:01 ` Al Boldi 2006-06-23 16:06 ` Andreas Dilger 2006-06-23 16:41 ` Christian Pedaschus 2006-06-23 16:46 ` Christian Pedaschus 2006-06-23 19:53 ` Nix 2006-06-23 16:21 ` Russell Cattelan 2006-06-23 18:19 ` Tom Vier 2006-06-27 12:05 ` Large single raid... - XFS over NFS woes Dexter Filmore 2006-06-23 19:48 ` Large single raid and XFS or two small ones and EXT3? Nix 2006-06-25 19:13 ` David Rees
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).