* How to build a big file server
@ 2003-06-05 8:14 Heinz-Josef Claes
2003-06-05 8:25 ` Carl-Daniel Hailfinger
` (5 more replies)
0 siblings, 6 replies; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-05 8:14 UTC (permalink / raw)
To: reiserfs-list
Hi,
I did't found information about my problem on namesys, so I try to ask
here.
I plan to build a big fileserver for Deutscher Bundestag (German
Parliament) who is converting to linux on the server side. In the
beginning it will be used in a testing environment and later multiple of
them in production use.
Here are some questions:
- I plan to build a system with IDE 250GB drives. 7 of them for raid 5
and one hot spare. The OS will be on separate hardware raid 1 on smaller
disks. Does anybody have experience with IDE controlers for the big
disks? Is it better to use hardware raid oder software raid
(performance)?
- At this time, we plan to use reiserfs. What are the advantages /
disadvantages of using the raid 1 for journaling for the (big) raid 5
(performance, recovery)? The system will be heavily used with
storebackup (www.sf.net/projects/storebackup) witch means that lots of
hard links have to be created as fast as possible.
- BTW: I'm using a ZIP drive on the parallel port with reiserfs. Does it
make sense to put the journal for the ZIP drive on the hard disk of the
laptop?
Thanks for taking your time,
Heinz-Josef Claes
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:14 How to build a big file server Heinz-Josef Claes
@ 2003-06-05 8:25 ` Carl-Daniel Hailfinger
2003-06-05 8:55 ` Andreas Dilger
2003-06-05 8:33 ` Ragnar Kjørstad
` (4 subsequent siblings)
5 siblings, 1 reply; 42+ messages in thread
From: Carl-Daniel Hailfinger @ 2003-06-05 8:25 UTC (permalink / raw)
To: Heinz-Josef Claes; +Cc: ReiserFS List
Heinz-Josef Claes wrote:
>
> - I plan to build a system with IDE 250GB drives. 7 of them for raid 5
> and one hot spare. The OS will be on separate hardware raid 1 on smaller
> disks. Does anybody have experience with IDE controllers for the big
> disks? Is it better to use hardware raid oder software raid
> (performance)?
As long as you don't buy pseudo raid solutions like those from promise,
you should be better off using hardware raid, especially if one of the
disks fails. Performance-wise linux software raid may be faster.
> - BTW: I'm using a ZIP drive on the parallel port with reiserfs. Does it
> make sense to put the journal for the ZIP drive on the hard disk of the
> laptop?
NO. If you ever plan to use a zip disk with external journal on another
PC, you are completely out of luck.
HTH,
Carl-Daniel
--
http://www.hailfinger.org/
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:14 How to build a big file server Heinz-Josef Claes
2003-06-05 8:25 ` Carl-Daniel Hailfinger
@ 2003-06-05 8:33 ` Ragnar Kjørstad
2003-06-05 8:42 ` Heinz-Josef Claes
2003-06-05 8:46 ` Oleg Drokin
2003-06-05 9:45 ` Christophe Saout
` (3 subsequent siblings)
5 siblings, 2 replies; 42+ messages in thread
From: Ragnar Kjørstad @ 2003-06-05 8:33 UTC (permalink / raw)
To: Heinz-Josef Claes; +Cc: reiserfs-list
On Thu, Jun 05, 2003 at 10:14:13AM +0200, Heinz-Josef Claes wrote:
> Here are some questions:
>
> - I plan to build a system with IDE 250GB drives. 7 of them for raid 5
> and one hot spare. The OS will be on separate hardware raid 1 on smaller
> disks. Does anybody have experience with IDE controlers for the big
> disks? Is it better to use hardware raid oder software raid
> (performance)?
Some controllers will be limited by the 127 GB IDE limitation.
For read-performance, software-raid will be just fine. (it could in
fact be better than some crappy controllers).
For write-performance, you need a hardware RAID-controller with
battery-backed writeback-cache. A hardware RAID-controller that
doesn't use writeback-cache is not likely to be faster than
software-RAID.
Writeback-cache is not very common on PCI/IDE-controllers, but more
common on SCSI/IDE and FC/IDE external RAIDs. External RAIDs also
has managability-advantages, but of course they are more expensive.
> - At this time, we plan to use reiserfs. What are the advantages /
> disadvantages of using the raid 1 for journaling for the (big) raid 5
> (performance, recovery)? The system will be heavily used with
> storebackup (www.sf.net/projects/storebackup) witch means that lots of
> hard links have to be created as fast as possible.
Tricky question.
Without writeback-cache on your RAID5, lots of tiny syncronous writes
is surely going to kill performance.
The writes to the actual filesystem should be async, so they should be
less of a problem. And there is nothing you can do about it anyway :-/
My guess is that the writes to the journal _would_ be a problem, and
that writing them to a seperate (RAID1)-device would help significantly.
I would be curious to see a benchmark on this though.
--
Ragnar Kjørstad
Zet.no
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:33 ` Ragnar Kjørstad
@ 2003-06-05 8:42 ` Heinz-Josef Claes
2003-06-05 8:45 ` Marc-Christian Petersen
2003-06-05 8:46 ` Oleg Drokin
1 sibling, 1 reply; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-05 8:42 UTC (permalink / raw)
To: Ragnar Kjørstad; +Cc: reiserfs-list
Am Don, 2003-06-05 um 10.33 schrieb Ragnar Kjørstad:
> On Thu, Jun 05, 2003 at 10:14:13AM +0200, Heinz-Josef Claes wrote:
> > Here are some questions:
> >
> > - I plan to build a system with IDE 250GB drives. 7 of them for raid 5
> > and one hot spare. The OS will be on separate hardware raid 1 on smaller
> > disks. Does anybody have experience with IDE controlers for the big
> > disks? Is it better to use hardware raid oder software raid
> > (performance)?
>
> Some controllers will be limited by the 127 GB IDE limitation.
>
> For read-performance, software-raid will be just fine. (it could in
> fact be better than some crappy controllers).
>
> For write-performance, you need a hardware RAID-controller with
> battery-backed writeback-cache. A hardware RAID-controller that
> doesn't use writeback-cache is not likely to be faster than
> software-RAID.
>
We used a cheap 3Com IDE RAID controler (sorry, don't know which one)
one and a half year ago (without writeback-cache, in another company).
It was terribly slow when writing with hardware RAID. With software
RAID, it was much faster - that's the reason for my question. Do you
know a useful IDE RAID controler which runs with linux? (For your
information: the machines are not mission critical, only for having an
(additional) online backup for about 5000 users).
> Writeback-cache is not very common on PCI/IDE-controllers, but more
> common on SCSI/IDE and FC/IDE external RAIDs. External RAIDs also
> has managability-advantages, but of course they are more expensive.
>
> > - At this time, we plan to use reiserfs. What are the advantages /
> > disadvantages of using the raid 1 for journaling for the (big) raid 5
> > (performance, recovery)? The system will be heavily used with
> > storebackup (www.sf.net/projects/storebackup) witch means that lots of
> > hard links have to be created as fast as possible.
>
> Tricky question.
>
> Without writeback-cache on your RAID5, lots of tiny syncronous writes
> is surely going to kill performance.
>
> The writes to the actual filesystem should be async, so they should be
> less of a problem. And there is nothing you can do about it anyway :-/
>
> My guess is that the writes to the journal _would_ be a problem, and
> that writing them to a seperate (RAID1)-device would help significantly.
>
> I would be curious to see a benchmark on this though.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:42 ` Heinz-Josef Claes
@ 2003-06-05 8:45 ` Marc-Christian Petersen
2003-06-05 9:30 ` Heinz-Josef Claes
` (2 more replies)
0 siblings, 3 replies; 42+ messages in thread
From: Marc-Christian Petersen @ 2003-06-05 8:45 UTC (permalink / raw)
To: Heinz-Josef Claes, Ragnar Kjørstad; +Cc: reiserfs-list
On Thursday 05 June 2003 10:42, Heinz-Josef Claes wrote:
Hi Heinz,
> We used a cheap 3Com IDE RAID controler (sorry, don't know which one)
> one and a half year ago (without writeback-cache, in another company).
> It was terribly slow when writing with hardware RAID. With software
> RAID, it was much faster - that's the reason for my question. Do you
> know a useful IDE RAID controler which runs with linux? (For your
> information: the machines are not mission critical, only for having an
> (additional) online backup for about 5000 users).
3ware Controller. Works perfect.
--
ciao, Marc
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:33 ` Ragnar Kjørstad
2003-06-05 8:42 ` Heinz-Josef Claes
@ 2003-06-05 8:46 ` Oleg Drokin
2003-06-05 8:50 ` Heinz-Josef Claes
1 sibling, 1 reply; 42+ messages in thread
From: Oleg Drokin @ 2003-06-05 8:46 UTC (permalink / raw)
To: Ragnar Kj?rstad; +Cc: Heinz-Josef Claes, reiserfs-list
Hello!
On Thu, Jun 05, 2003 at 10:33:01AM +0200, Ragnar Kj?rstad wrote:
> My guess is that the writes to the journal _would_ be a problem, and
> that writing them to a seperate (RAID1)-device would help significantly.
And with journal on battery-backed RAM it would be even faster ;)
Bye,
Oleg
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:46 ` Oleg Drokin
@ 2003-06-05 8:50 ` Heinz-Josef Claes
2003-06-05 9:04 ` Oleg Drokin
0 siblings, 1 reply; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-05 8:50 UTC (permalink / raw)
To: Oleg Drokin; +Cc: Ragnar Kj?rstad, reiserfs-list
Am Don, 2003-06-05 um 10.46 schrieb Oleg Drokin:
> Hello!
>
> On Thu, Jun 05, 2003 at 10:33:01AM +0200, Ragnar Kj?rstad wrote:
>
> > My guess is that the writes to the journal _would_ be a problem, and
> > that writing them to a seperate (RAID1)-device would help significantly.
>
> And with journal on battery-backed RAM it would be even faster ;)
The target is to have a cheep solution :/
Thanks,
Heinz-Josef
>
> Bye,
> Oleg
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:25 ` Carl-Daniel Hailfinger
@ 2003-06-05 8:55 ` Andreas Dilger
2003-06-05 9:51 ` Hendrik Visage
0 siblings, 1 reply; 42+ messages in thread
From: Andreas Dilger @ 2003-06-05 8:55 UTC (permalink / raw)
To: Carl-Daniel Hailfinger; +Cc: Heinz-Josef Claes, ReiserFS List
On Jun 05, 2003 10:25 +0200, Carl-Daniel Hailfinger wrote:
> Heinz-Josef Claes wrote:
> > - I plan to build a system with IDE 250GB drives. 7 of them for raid 5
> > and one hot spare. The OS will be on separate hardware raid 1 on smaller
> > disks. Does anybody have experience with IDE controllers for the big
> > disks? Is it better to use hardware raid oder software raid
> > (performance)?
>
> As long as you don't buy pseudo raid solutions like those from promise,
> you should be better off using hardware raid, especially if one of the
> disks fails. Performance-wise linux software raid may be faster.
The other problem I've heard of with hardware RAID is that if you do not
have another of exactly the same RAID controller, it may be impossible to
get your data back if the controller fails. If you are planning on making
many similar servers, that may not be a problem, but is always something
to be aware of with hardware RAID.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:50 ` Heinz-Josef Claes
@ 2003-06-05 9:04 ` Oleg Drokin
2003-06-05 9:17 ` Heinz-Josef Claes
2003-06-05 12:06 ` Heinz-Josef Claes
0 siblings, 2 replies; 42+ messages in thread
From: Oleg Drokin @ 2003-06-05 9:04 UTC (permalink / raw)
To: Heinz-Josef Claes; +Cc: Ragnar Kj?rstad, reiserfs-list
Hello!
On Thu, Jun 05, 2003 at 10:50:15AM +0200, Heinz-Josef Claes wrote:
> > > My guess is that the writes to the journal _would_ be a problem, and
> > > that writing them to a seperate (RAID1)-device would help significantly.
> > And with journal on battery-backed RAM it would be even faster ;)
> The target is to have a cheep solution :/
The cheapest solution to get speed is to put journal and all data on ramdisk
(or even into ramfs).
This is not the safest one, though.
Bye,
Oleg
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 9:04 ` Oleg Drokin
@ 2003-06-05 9:17 ` Heinz-Josef Claes
2003-06-05 10:29 ` Russell Coker
2003-06-05 13:38 ` Hans Reiser
2003-06-05 12:06 ` Heinz-Josef Claes
1 sibling, 2 replies; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-05 9:17 UTC (permalink / raw)
To: reiserfs-list
Am Don, 2003-06-05 um 11.04 schrieb Oleg Drokin:
> Hello!
>
> On Thu, Jun 05, 2003 at 10:50:15AM +0200, Heinz-Josef Claes wrote:
> > > > My guess is that the writes to the journal _would_ be a problem, and
> > > > that writing them to a seperate (RAID1)-device would help significantly.
> > > And with journal on battery-backed RAM it would be even faster ;)
> > The target is to have a cheep solution :/
>
> The cheapest solution to get speed is to put journal and all data on ramdisk
> (or even into ramfs).
> This is not the safest one, though.
>
Perhaps this is a good idea. I'll try to test the performance. Since the
servers are only used for an (additional) online backup (like
snapshots), there will not be really critical data on them. I'll see if
the performance win is worth the risk :-)
BTW: Some weeks ago there where questions about the performance of
NetApp. I think I will have the possibility to repeat the tests with a
filer. I'll post the results to the list, if somebody is interested.
> Bye,
> Oleg
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:45 ` Marc-Christian Petersen
@ 2003-06-05 9:30 ` Heinz-Josef Claes
2003-06-05 11:27 ` Bill Rees
2003-06-10 13:28 ` myciel
2 siblings, 0 replies; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-05 9:30 UTC (permalink / raw)
To: Marc-Christian Petersen; +Cc: reiserfs-list
Am Don, 2003-06-05 um 10.45 schrieb Marc-Christian Petersen:
> On Thursday 05 June 2003 10:42, Heinz-Josef Claes wrote:
>
> Hi Heinz,
>
> > We used a cheap 3Com IDE RAID controler (sorry, don't know which one)
> > one and a half year ago (without writeback-cache, in another company).
> > It was terribly slow when writing with hardware RAID. With software
> > RAID, it was much faster - that's the reason for my question. Do you
> > know a useful IDE RAID controler which runs with linux? (For your
> > information: the machines are not mission critical, only for having an
> > (additional) online backup for about 5000 users).
> 3ware Controller. Works perfect.
Do you think about serial or parallel ATA? Unfortunately I have no
experience with serial ATA. Are these as good and cheap as 3ware writes
on there home page?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:14 How to build a big file server Heinz-Josef Claes
2003-06-05 8:25 ` Carl-Daniel Hailfinger
2003-06-05 8:33 ` Ragnar Kjørstad
@ 2003-06-05 9:45 ` Christophe Saout
2003-06-05 10:07 ` Soeren Sonnenburg
2003-06-05 9:59 ` Russell Coker
` (2 subsequent siblings)
5 siblings, 1 reply; 42+ messages in thread
From: Christophe Saout @ 2003-06-05 9:45 UTC (permalink / raw)
To: Heinz-Josef Claes; +Cc: reiserfs-list
Hi!
> I plan to build a big fileserver for Deutscher Bundestag (German
> Parliament) who is converting to linux on the server side. In the
> beginning it will be used in a testing environment and later multiple of
> them in production use.
Great. :-)
> - I plan to build a system with IDE 250GB drives. 7 of them for raid 5
> and one hot spare. The OS will be on separate hardware raid 1 on smaller
> disks. Does anybody have experience with IDE controlers for the big
> disks? Is it better to use hardware raid oder software raid
> (performance)?
I'm successfully running a bunch of servers with Promise
Ultra100/Ultra133 controllers and software raid on it (and LVM but
that's another story).
The only problem you'll probably run into is that the PCI slots get full
because you should only have one hard disk per cable to get the best
performance.
Also you should try use an APIC to not run into trouble with interrupts
(I think it should work somewhat better though I cannot prove this).
The only problem with that solution is that when one disk fails you'll
have to turn of the machine and exchange the disk, and after turning it
on fdisk it (I love the raid autodetection) and raidhotadd that disk
manually.
And I've never used software raid5, only striping over several raid1
pairs. The cpu has to calculate the checksums. But when doing heavy disk
IO the machines usually only use five percent of the CPU time or
something so I don't think that should be too much of a bottleneck if
it's a fast machine.
> - At this time, we plan to use reiserfs. What are the advantages /
> disadvantages of using the raid 1 for journaling for the (big) raid 5
> (performance, recovery)? The system will be heavily used with
> storebackup (www.sf.net/projects/storebackup) witch means that lots of
> hard links have to be created as fast as possible.
I haven't tested separate journals yet, but reiserfs is very fast in
dealing with lots of files. You probably should just try it out.
I don't have enough hard disks here to play with them now, but on a raid
10 configuration here (striping over two raid1 pairs of 200 GB 7200 rpm
disks that can do about 50MB/sec each) a cp -al takes about one minute
over 80.000 files/directories.
--
Christophe Saout <christophe@saout.de>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:55 ` Andreas Dilger
@ 2003-06-05 9:51 ` Hendrik Visage
0 siblings, 0 replies; 42+ messages in thread
From: Hendrik Visage @ 2003-06-05 9:51 UTC (permalink / raw)
To: Carl-Daniel Hailfinger, Heinz-Josef Claes, ReiserFS List
On Thu, Jun 05, 2003 at 02:55:54AM -0600, Andreas Dilger wrote:
>
> The other problem I've heard of with hardware RAID is that if you do not
> have another of exactly the same RAID controller, it may be impossible to
> get your data back if the controller fails. If you are planning on making
> many similar servers, that may not be a problem, but is always something
> to be aware of with hardware RAID.
True.
They write meta information on the disks saying this disk is part of this
mirror in this configuration. Then the higher end versions even have copies
in NVRAM *and* the disks, to check whether the disks have been moved etc.
Thus, beware in the "cheap" cases... though the bigger OEMs (Dell/IBM/etc.)
which have support contracts etc. you would be able to get a replacement
part (if it's not too old ;^)
Hendrik
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:14 How to build a big file server Heinz-Josef Claes
` (2 preceding siblings ...)
2003-06-05 9:45 ` Christophe Saout
@ 2003-06-05 9:59 ` Russell Coker
2003-06-05 10:13 ` Heinz-Josef Claes
2003-06-05 10:05 ` How to build a big file server Hans Reiser
2003-06-05 13:43 ` Sam Vilain
5 siblings, 1 reply; 42+ messages in thread
From: Russell Coker @ 2003-06-05 9:59 UTC (permalink / raw)
To: Heinz-Josef Claes, reiserfs-list
On Thu, 5 Jun 2003 18:14, Heinz-Josef Claes wrote:
> Here are some questions:
>
> - I plan to build a system with IDE 250GB drives. 7 of them for raid 5
> and one hot spare. The OS will be on separate hardware raid 1 on smaller
> disks. Does anybody have experience with IDE controlers for the big
> disks? Is it better to use hardware raid oder software raid
> (performance)?
The sonsensus of opinion on the large-ide-arrays mailing list is that 3ware
controllers are the best for hardware RAID on IDE.
Software RAID performs better in almost all areas, however for RAID-5 the
reliability of software RAID isn't as good as that of hardware RAID (NB only
hardware RAID with NV-RAM write-back cache does the right thing - non-caching
hardware RAID-5 is equal to software RAID-5 for reliability).
Why have the OS on a different device? On all the RAID installations I've
used which I regard as successful the OS has been on the same hardware.
> - At this time, we plan to use reiserfs. What are the advantages /
> disadvantages of using the raid 1 for journaling for the (big) raid 5
> (performance, recovery)? The system will be heavily used with
> storebackup (www.sf.net/projects/storebackup) witch means that lots of
> hard links have to be created as fast as possible.
umem non-volatile RAM devices should perform best for journals. If you use
journalled-data in ReiserFS (need patches - it's not in the main kernels) and
put the journal on a umem device you should get some significant performance
increases - better than you get for having a separate hardware setup for the
OS. Also for 2.4.20 there are a few ReiserFS patches that you need for
decent performance (which come along with the data-journalling patch). These
patches have more than doubled the performance of some of my machines. I
would expect that the SUSE kernel would include them already.
Also if using a umem device you want to try and move your most volatile data
to it (EG a mail spool). A 1G umem device has space for a lot more than
ReiserFS journals.
The write-back cache in hardware RAID seems to be no more than about 128M.
This isn't much when you have sustained disk writes of 2M/s (and due to the
way RAID-5 works you need to cache at least twice as much data as you are
writing for small writes).
Finally, having more than one busy file system on the same physical device is
going to hurt performance. The OS partition won't be very busy (turn off
atime, set syslogd to allow write-back caching, and give your server a decent
amount of RAM and it can go for minutes without a single disk access). If
you want good performance then you don't want to be swapping at all, RAM is
cheap, much cheaper than fast hard drives.
I suggest that for good performance and a reasonable price you look at a 3ware
RAID controller with a single RAID-5 device comprised of 7 disks as you
suggested. Then you have a umem device for the journal of the data
filesystem and for a small filesystem for volatile data. That and plenty of
RAM should solve most of the performance problems.
Finally, there are some kernel bugs that choke performance on big machines.
2.4.20 running with 4G of RAM and serious IO load will usually waste large
amounts of CPU time on kswapd. The kernel patch that SUSE uses solves this,
also I think that 2.4.21-rc kernels may have code to address this.
--
http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/ My home page
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:14 How to build a big file server Heinz-Josef Claes
` (3 preceding siblings ...)
2003-06-05 9:59 ` Russell Coker
@ 2003-06-05 10:05 ` Hans Reiser
2003-06-05 10:24 ` Heinz-Josef Claes
2003-06-05 13:43 ` Sam Vilain
5 siblings, 1 reply; 42+ messages in thread
From: Hans Reiser @ 2003-06-05 10:05 UTC (permalink / raw)
To: Heinz-Josef Claes; +Cc: reiserfs-list, Chris Mason
Heinz-Josef Claes wrote:
>Hi,
>
>I did't found information about my problem on namesys, so I try to ask
>here.
>
>I plan to build a big fileserver for Deutscher Bundestag (German
>Parliament) who is converting to linux on the server side. In the
>beginning it will be used in a testing environment and later multiple of
>them in production use.
>
>Here are some questions:
>
>- I plan to build a system with IDE 250GB drives. 7 of them for raid 5
>and one hot spare. The OS will be on separate hardware raid 1 on smaller
>disks. Does anybody have experience with IDE controlers for the big
>disks? Is it better to use hardware raid oder software raid
>(performance)?
>
>- At this time, we plan to use reiserfs. What are the advantages /
>disadvantages of using the raid 1 for journaling for the (big) raid 5
>(performance, recovery)? The system will be heavily used with
>storebackup (www.sf.net/projects/storebackup) witch means that lots of
>hard links have to be created as fast as possible.
>
>- BTW: I'm using a ZIP drive on the parallel port with reiserfs. Does it
>make sense to put the journal for the ZIP drive on the hard disk of the
>laptop?
>
>Thanks for taking your time,
>Heinz-Josef Claes
>
>
>
>
>
>
Do you plan to do synchronous work loads involving lots of little
fsyncs? The RAID 5 will not perform well for that purpose, otherwise it
should work well even for the journal. Chris will correct me if I err.
--
Hans
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 9:45 ` Christophe Saout
@ 2003-06-05 10:07 ` Soeren Sonnenburg
0 siblings, 0 replies; 42+ messages in thread
From: Soeren Sonnenburg @ 2003-06-05 10:07 UTC (permalink / raw)
To: reiserfs-list
On Thu, 2003-06-05 at 11:45, Christophe Saout wrote:
> > - I plan to build a system with IDE 250GB drives. 7 of them for raid 5
> > and one hot spare. The OS will be on separate hardware raid 1 on smaller
> > disks. Does anybody have experience with IDE controlers for the big
> > disks? Is it better to use hardware raid oder software raid
> > (performance)?
>
> I'm successfully running a bunch of servers with Promise
> Ultra100/Ultra133 controllers and software raid on it (and LVM but
> that's another story).
iiiieeeek! I don't know how you get them running stably but here two
ultra tx2 only caused trouble and were quickly thrown away.
so the message is KEEP AWAY FROM PROMISE CONTROLLERS.
> The only problem you'll probably run into is that the PCI slots get full
> because you should only have one hard disk per cable to get the best
> performance.
well 6 pci slots gives you 12 disks with el cheapo controllers (hpt37x
based ones etc...) so that is not the problem, BUT pci bus speed. I have
a software raid5 running here with only 5 disks and on heavy disk io
nothing else works on the PCI bus... (e.g. watching tv via pci-tv card
is impossible then).
> The only problem with that solution is that when one disk fails you'll
> have to turn of the machine and exchange the disk, and after turning it
> on fdisk it (I love the raid autodetection) and raidhotadd that disk
> manually.
or use a spare and do it at night :-)
> I don't have enough hard disks here to play with them now, but on a raid
> 10 configuration here (striping over two raid1 pairs of 200 GB 7200 rpm
> disks that can do about 50MB/sec each) a cp -al takes about one minute
> over 80.000 files/directories.
the raid5 has about the same speed... reading/writing is 40-50M/s.
rebuilding a 5*80G raid takes about an hour.
Another solution would be to take these transtec external raids which
are ide intern scsi extern. I have one (1.2TB) here and it is althoug a
bit slower pretty reliable so far...
I've had no bad experience with reiserfs on x86 hardware so far and it
gets pretty intensively used...
Soeren.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 9:59 ` Russell Coker
@ 2003-06-05 10:13 ` Heinz-Josef Claes
2003-06-05 10:25 ` Russell Coker
0 siblings, 1 reply; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-05 10:13 UTC (permalink / raw)
To: Russell Coker; +Cc: reiserfs-list
Am Don, 2003-06-05 um 11.59 schrieb Russell Coker:
> On Thu, 5 Jun 2003 18:14, Heinz-Josef Claes wrote:
> > Here are some questions:
> >
> > - I plan to build a system with IDE 250GB drives. 7 of them for raid 5
> > and one hot spare. The OS will be on separate hardware raid 1 on smaller
> > disks. Does anybody have experience with IDE controlers for the big
> > disks? Is it better to use hardware raid oder software raid
> > (performance)?
>
> The sonsensus of opinion on the large-ide-arrays mailing list is that 3ware
> controllers are the best for hardware RAID on IDE.
>
> Software RAID performs better in almost all areas, however for RAID-5 the
> reliability of software RAID isn't as good as that of hardware RAID (NB only
> hardware RAID with NV-RAM write-back cache does the right thing - non-caching
> hardware RAID-5 is equal to software RAID-5 for reliability).
>
> Why have the OS on a different device? On all the RAID installations I've
> used which I regard as successful the OS has been on the same hardware.
>
My question also was about hardware or software raid. If I'm using
software raid, a hardware raid 1 for the OS is not a too bad idea.
The other point is, that with the data (= raid 5 disks in my example),
separated from the OS, you are more flexible in case of a hardware
failure of the system. (I do not mean a disk failure.)
> > - At this time, we plan to use reiserfs. What are the advantages /
> > disadvantages of using the raid 1 for journaling for the (big) raid 5
> > (performance, recovery)? The system will be heavily used with
> > storebackup (www.sf.net/projects/storebackup) witch means that lots of
> > hard links have to be created as fast as possible.
>
> umem non-volatile RAM devices should perform best for journals. If you use
> journalled-data in ReiserFS (need patches - it's not in the main kernels) and
> put the journal on a umem device you should get some significant performance
> increases - better than you get for having a separate hardware setup for the
> OS. Also for 2.4.20 there are a few ReiserFS patches that you need for
> decent performance (which come along with the data-journalling patch). These
> patches have more than doubled the performance of some of my machines. I
> would expect that the SUSE kernel would include them already.
>
I use SuSE 8.2. It's much faster than the SuSE kernel in 8.1, even on my
ZIP drive at the parallel port ;)
> Also if using a umem device you want to try and move your most volatile data
> to it (EG a mail spool). A 1G umem device has space for a lot more than
> ReiserFS journals.
>
> The write-back cache in hardware RAID seems to be no more than about 128M.
> This isn't much when you have sustained disk writes of 2M/s (and due to the
> way RAID-5 works you need to cache at least twice as much data as you are
> writing for small writes).
>
> Finally, having more than one busy file system on the same physical device is
> going to hurt performance. The OS partition won't be very busy (turn off
> atime, set syslogd to allow write-back caching, and give your server a decent
> amount of RAM and it can go for minutes without a single disk access). If
> you want good performance then you don't want to be swapping at all, RAM is
> cheap, much cheaper than fast hard drives.
>
That's true.
> I suggest that for good performance and a reasonable price you look at a 3ware
> RAID controller with a single RAID-5 device comprised of 7 disks as you
> suggested. Then you have a umem device for the journal of the data
> filesystem and for a small filesystem for volatile data. That and plenty of
> RAM should solve most of the performance problems.
>
Do you have experiences with serial ATA from 3ware? Is it a good idea
not to use parallel ATA?
> Finally, there are some kernel bugs that choke performance on big machines.
> 2.4.20 running with 4G of RAM and serious IO load will usually waste large
> amounts of CPU time on kswapd. The kernel patch that SUSE uses solves this,
> also I think that 2.4.21-rc kernels may have code to address this.
Thanks for your responce!
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 10:05 ` How to build a big file server Hans Reiser
@ 2003-06-05 10:24 ` Heinz-Josef Claes
0 siblings, 0 replies; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-05 10:24 UTC (permalink / raw)
To: Hans Reiser; +Cc: reiserfs-list, Chris Mason
Am Don, 2003-06-05 um 12.05 schrieb Hans Reiser:
> Heinz-Josef Claes wrote:
>
> >Hi,
> >
> >I did't found information about my problem on namesys, so I try to ask
> >here.
> >
> >I plan to build a big fileserver for Deutscher Bundestag (German
> >Parliament) who is converting to linux on the server side. In the
> >beginning it will be used in a testing environment and later multiple of
> >them in production use.
> >
> >Here are some questions:
> >
> >- I plan to build a system with IDE 250GB drives. 7 of them for raid 5
> >and one hot spare. The OS will be on separate hardware raid 1 on smaller
> >disks. Does anybody have experience with IDE controlers for the big
> >disks? Is it better to use hardware raid oder software raid
> >(performance)?
> >
> >- At this time, we plan to use reiserfs. What are the advantages /
> >disadvantages of using the raid 1 for journaling for the (big) raid 5
> >(performance, recovery)? The system will be heavily used with
> >storebackup (www.sf.net/projects/storebackup) witch means that lots of
> >hard links have to be created as fast as possible.
> >
> >- BTW: I'm using a ZIP drive on the parallel port with reiserfs. Does it
> >make sense to put the journal for the ZIP drive on the hard disk of the
> >laptop?
> >
> >Thanks for taking your time,
> >Heinz-Josef Claes
> >
> >
> >
> >
> >
> >
> Do you plan to do synchronous work loads involving lots of little
> fsyncs? The RAID 5 will not perform well for that purpose, otherwise it
> should work well even for the journal. Chris will correct me if I err.
The used software will copy or compress new files (it gets them via nfs)
to the backup filesystem (reiserfs). For files with a content which
allready exists in the backup, only a hard link is set. It's simply a
link <oldfile> <newfile>
chown something <newfile>
chmod something <newfile>
in the source (perl). There is *no* explicit sync.
On my PC at home (800MHz, 40 GB "normal cheap IDE disk" (I cannot look
at it at the moment)) I get about 200 hard links per second if nothing
has changed (no copying or compressing). The speed only depends on the
speed of the harddisk. A friend of mine gets the same speed with a 2GHz
processor.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 10:13 ` Heinz-Josef Claes
@ 2003-06-05 10:25 ` Russell Coker
2003-06-05 10:38 ` Heinz-Josef Claes
2003-06-05 13:48 ` Chris Mason
0 siblings, 2 replies; 42+ messages in thread
From: Russell Coker @ 2003-06-05 10:25 UTC (permalink / raw)
To: Heinz-Josef Claes; +Cc: reiserfs-list
On Thu, 5 Jun 2003 20:13, Heinz-Josef Claes wrote:
> My question also was about hardware or software raid. If I'm using
> software raid, a hardware raid 1 for the OS is not a too bad idea.
Booting from software RAID-1 is easy to setup. I've done it with both SCSI
and IDE software RAID.
But having a single hardware RAID-5 is even easier.
> The other point is, that with the data (= raid 5 disks in my example),
> separated from the OS, you are more flexible in case of a hardware
> failure of the system. (I do not mean a disk failure.)
How? If the disks are all working and the data is consistant then you
transfer them all to another system and things are fine. If the disks aren't
all working or the data is inconsistant then having the OS on separate disks
won't help much. Recovering the OS from backup is usually trivial, so the
fact that RAID-1 makes it easier to recover things does not provide much
benefit for the OS (IMHO).
I guess you could justify RAID-10 for the data store because of the ease of
recovery.
> > separate hardware setup for the OS. Also for 2.4.20 there are a few
> > ReiserFS patches that you need for decent performance (which come along
> > with the data-journalling patch). These patches have more than doubled
> > the performance of some of my machines. I would expect that the SUSE
> > kernel would include them already.
>
> I use SuSE 8.2. It's much faster than the SuSE kernel in 8.1, even on my
> ZIP drive at the parallel port ;)
Hopefully someone can advise on whether the SuSE 8.2 kernel has all the
patches you desire for best ReiserFS performance.
> > I suggest that for good performance and a reasonable price you look at a
> > 3ware RAID controller with a single RAID-5 device comprised of 7 disks as
> > you suggested. Then you have a umem device for the journal of the data
> > filesystem and for a small filesystem for volatile data. That and plenty
> > of RAM should solve most of the performance problems.
>
> Do you have experiences with serial ATA from 3ware? Is it a good idea
> not to use parallel ATA?
Serial ATA is still very new. Last time I checked the details of the 3ware
products they were using a bridge chip to convert the signals from the
parallel ATA controllers on their hardware to serial-ATA for the connectors,
also at that time the disk drives seemed to be all parallel-ATA internally
with a bridge chip. This means that you would not get any performance
benefit from S-ATA (but cabling is much easier).
At the moment I would not expect to see any benefit of S-ATA apart from ease
of cabling (which is a significant issue when you have 8 or more disks).
--
http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/ My home page
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 9:17 ` Heinz-Josef Claes
@ 2003-06-05 10:29 ` Russell Coker
2003-06-05 10:45 ` Heinz-Josef Claes
2003-06-05 13:38 ` Hans Reiser
1 sibling, 1 reply; 42+ messages in thread
From: Russell Coker @ 2003-06-05 10:29 UTC (permalink / raw)
To: Heinz-Josef Claes, reiserfs-list
On Thu, 5 Jun 2003 19:17, Heinz-Josef Claes wrote:
> Perhaps this is a good idea. I'll try to test the performance. Since the
> servers are only used for an (additional) online backup (like
> snapshots), there will not be really critical data on them. I'll see if
> the performance win is worth the risk :-)
Are you backing up file by file or saving backup volumes? IE Will you have a
file on the backup server for each file on a live machine, or will the backup
machine be filled with tar files?
If you are storing tar files or similar then a umem device will provide little
benefit. In which case you will probably want a machine with two PCI buses
and two hardware RAID cards and run software RAID-0 across them.
--
http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/ My home page
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 10:25 ` Russell Coker
@ 2003-06-05 10:38 ` Heinz-Josef Claes
2003-06-05 11:11 ` Russell Coker
2003-06-05 13:48 ` Chris Mason
1 sibling, 1 reply; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-05 10:38 UTC (permalink / raw)
To: Russell Coker; +Cc: reiserfs-list
Am Don, 2003-06-05 um 12.25 schrieb Russell Coker:
> On Thu, 5 Jun 2003 20:13, Heinz-Josef Claes wrote:
> > My question also was about hardware or software raid. If I'm using
> > software raid, a hardware raid 1 for the OS is not a too bad idea.
>
> Booting from software RAID-1 is easy to setup. I've done it with both SCSI
> and IDE software RAID.
>
Perhaps I'm not up to date, but as far as I know rebooting with a broken
boot disk needs a bios which can switch automatically to the other disk
of the raid 1. If you have lots of machines (and which are standing
outside) this is an advantage.
> But having a single hardware RAID-5 is even easier.
>
> > The other point is, that with the data (= raid 5 disks in my example),
> > separated from the OS, you are more flexible in case of a hardware
> > failure of the system. (I do not mean a disk failure.)
>
> How? If the disks are all working and the data is consistant then you
> transfer them all to another system and things are fine. If the disks aren't
> all working or the data is inconsistant then having the OS on separate disks
> won't help much. Recovering the OS from backup is usually trivial, so the
> fact that RAID-1 makes it easier to recover things does not provide much
> benefit for the OS (IMHO).
>
One thing the administrators here (they are moving from windows to
linux!) want is to pull out a cable and plug it to a new machine if the
old one behaves strange (eg. motherboard has strange problems). In such
a case it's *easy* for an administrator (who is not an expert) to try to
"repair" such a machine.
> I guess you could justify RAID-10 for the data store because of the ease of
> recovery.
>
Your right. But this is only system for holding a copy of the original
data. It will be all in all about 5 to 10 TB, so the price is important.
(One of the benefit is *not* to have 160 tape drives out).
> > > separate hardware setup for the OS. Also for 2.4.20 there are a few
> > > ReiserFS patches that you need for decent performance (which come along
> > > with the data-journalling patch). These patches have more than doubled
> > > the performance of some of my machines. I would expect that the SUSE
> > > kernel would include them already.
> >
> > I use SuSE 8.2. It's much faster than the SuSE kernel in 8.1, even on my
> > ZIP drive at the parallel port ;)
>
> Hopefully someone can advise on whether the SuSE 8.2 kernel has all the
> patches you desire for best ReiserFS performance.
>
> > > I suggest that for good performance and a reasonable price you look at a
> > > 3ware RAID controller with a single RAID-5 device comprised of 7 disks as
> > > you suggested. Then you have a umem device for the journal of the data
> > > filesystem and for a small filesystem for volatile data. That and plenty
> > > of RAM should solve most of the performance problems.
> >
> > Do you have experiences with serial ATA from 3ware? Is it a good idea
> > not to use parallel ATA?
>
> Serial ATA is still very new. Last time I checked the details of the 3ware
> products they were using a bridge chip to convert the signals from the
> parallel ATA controllers on their hardware to serial-ATA for the connectors,
> also at that time the disk drives seemed to be all parallel-ATA internally
> with a bridge chip. This means that you would not get any performance
> benefit from S-ATA (but cabling is much easier).
>
> At the moment I would not expect to see any benefit of S-ATA apart from ease
> of cabling (which is a significant issue when you have 8 or more disks).
True. That's why I'm interested in experiences. Cabling is a big problem
with parallel ATA.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 10:29 ` Russell Coker
@ 2003-06-05 10:45 ` Heinz-Josef Claes
2003-06-05 16:47 ` Carl-Daniel Hailfinger
0 siblings, 1 reply; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-05 10:45 UTC (permalink / raw)
To: Russell Coker; +Cc: reiserfs-list
Am Don, 2003-06-05 um 12.29 schrieb Russell Coker:
> On Thu, 5 Jun 2003 19:17, Heinz-Josef Claes wrote:
> > Perhaps this is a good idea. I'll try to test the performance. Since the
> > servers are only used for an (additional) online backup (like
> > snapshots), there will not be really critical data on them. I'll see if
> > the performance win is worth the risk :-)
>
> Are you backing up file by file or saving backup volumes? IE Will you have a
> file on the backup server for each file on a live machine, or will the backup
> machine be filled with tar files?
>
> If you are storing tar files or similar then a umem device will provide little
> benefit. In which case you will probably want a machine with two PCI buses
> and two hardware RAID cards and run software RAID-0 across them.
It's file by file, because access to the backup is very easy then. So a
umem device should be a good idea. I have to check if it's too expencive
and if we really need it. I think we have time for more than 12 hours
for making the backup.
From the debian web page:
http://packages.debian.org/testing/utils/storebackup.html
Package: storebackup 1.12.2-1
fancy compressing managing checksumming hard-linking cp -rua
Copies directory hierarchies recursively into another location, by date
(e.g. /home/ => /var/bkup/2002.12.13_04.27.56/). Permissions are
preserved, so users with access to the backup directory can recover
their files themselves.
File comparisons are done with MD5 checksums, so no changes go
unnoticed.
Hard-links unchanged backuped files to old versions and identical files
within the backuped tree. (identical means the same contents, not
depending on the same path/filename)
Compresses large files (that don't match exclusion patterns).
Manages backups and removes old ones.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 10:38 ` Heinz-Josef Claes
@ 2003-06-05 11:11 ` Russell Coker
0 siblings, 0 replies; 42+ messages in thread
From: Russell Coker @ 2003-06-05 11:11 UTC (permalink / raw)
To: Heinz-Josef Claes; +Cc: reiserfs-list
On Thu, 5 Jun 2003 20:38, Heinz-Josef Claes wrote:
> Am Don, 2003-06-05 um 12.25 schrieb Russell Coker:
> > On Thu, 5 Jun 2003 20:13, Heinz-Josef Claes wrote:
> > > My question also was about hardware or software raid. If I'm using
> > > software raid, a hardware raid 1 for the OS is not a too bad idea.
> >
> > Booting from software RAID-1 is easy to setup. I've done it with both
> > SCSI and IDE software RAID.
>
> Perhaps I'm not up to date, but as far as I know rebooting with a broken
> boot disk needs a bios which can switch automatically to the other disk
> of the raid 1. If you have lots of machines (and which are standing
> outside) this is an advantage.
That can be done, but it depends on the error. If the hard drive is totally
dead then many BIOSes will handle this for even IDE disks (and I expect every
SCSI sytem to handle it). If the hard drive has a bad sector as part of the
boot process then it probably won't work.
--
http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/ My home page
^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: How to build a big file server
2003-06-05 8:45 ` Marc-Christian Petersen
2003-06-05 9:30 ` Heinz-Josef Claes
@ 2003-06-05 11:27 ` Bill Rees
2003-06-10 13:28 ` myciel
2 siblings, 0 replies; 42+ messages in thread
From: Bill Rees @ 2003-06-05 11:27 UTC (permalink / raw)
To: reiserfs-list
I'll second the 3ware recommendation.
-----Original Message-----
From: Marc-Christian Petersen [mailto:m.c.p@gmx.net]
Sent: Thursday, June 05, 2003 4:46 AM
To: Heinz-Josef Claes; Ragnar Kjørstad
Cc: reiserfs-list@namesys.com
Subject: Re: How to build a big file server
On Thursday 05 June 2003 10:42, Heinz-Josef Claes wrote:
Hi Heinz,
> We used a cheap 3Com IDE RAID controler (sorry, don't know which one)
> one and a half year ago (without writeback-cache, in another company).
> It was terribly slow when writing with hardware RAID. With software
> RAID, it was much faster - that's the reason for my question. Do you
> know a useful IDE RAID controler which runs with linux? (For your
> information: the machines are not mission critical, only for having an
> (additional) online backup for about 5000 users).
3ware Controller. Works perfect.
--
ciao, Marc
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 9:04 ` Oleg Drokin
2003-06-05 9:17 ` Heinz-Josef Claes
@ 2003-06-05 12:06 ` Heinz-Josef Claes
1 sibling, 0 replies; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-05 12:06 UTC (permalink / raw)
To: reiserfs-list
Am Don, 2003-06-05 um 11.04 schrieb Oleg Drokin:
> Hello!
>
> On Thu, Jun 05, 2003 at 10:50:15AM +0200, Heinz-Josef Claes wrote:
> > > > My guess is that the writes to the journal _would_ be a problem, and
> > > > that writing them to a seperate (RAID1)-device would help significantly.
> > > And with journal on battery-backed RAM it would be even faster ;)
> > The target is to have a cheep solution :/
>
> The cheapest solution to get speed is to put journal and all data on ramdisk
> (or even into ramfs).
> This is not the safest one, though.
If I put the journal on a ramdisk, how do I recreate the journal after
rebooting? Does reiserfs need some special pattern or so?
> Bye,
> Oleg
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 9:17 ` Heinz-Josef Claes
2003-06-05 10:29 ` Russell Coker
@ 2003-06-05 13:38 ` Hans Reiser
1 sibling, 0 replies; 42+ messages in thread
From: Hans Reiser @ 2003-06-05 13:38 UTC (permalink / raw)
To: Heinz-Josef Claes; +Cc: reiserfs-list
Heinz-Josef Claes wrote:
>Am Don, 2003-06-05 um 11.04 schrieb Oleg Drokin:
>
>
>>Hello!
>>
>>On Thu, Jun 05, 2003 at 10:50:15AM +0200, Heinz-Josef Claes wrote:
>>
>>
>>>>>My guess is that the writes to the journal _would_ be a problem, and
>>>>>that writing them to a seperate (RAID1)-device would help significantly.
>>>>>
>>>>>
>>>>And with journal on battery-backed RAM it would be even faster ;)
>>>>
>>>>
>>>The target is to have a cheep solution :/
>>>
>>>
>>The cheapest solution to get speed is to put journal and all data on ramdisk
>>(or even into ramfs).
>>This is not the safest one, though.
>>
>>
>>
>Perhaps this is a good idea. I'll try to test the performance. Since the
>servers are only used for an (additional) online backup (like
>snapshots), there will not be really critical data on them. I'll see if
>the performance win is worth the risk :-)
>
This sounds like a bad idea to me. If you want to gain performance,
don't do raid 5, or use a controller with a writeback cache. Frankly I
doubt you will be disk performance bound anyway.
>
>BTW: Some weeks ago there where questions about the performance of
>NetApp. I think I will have the possibility to repeat the tests with a
>filer. I'll post the results to the list, if somebody is interested.
>
>
>>Bye,
>> Oleg
>>
>>
>>
>
>
>
>
>
--
Hans
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:14 How to build a big file server Heinz-Josef Claes
` (4 preceding siblings ...)
2003-06-05 10:05 ` How to build a big file server Hans Reiser
@ 2003-06-05 13:43 ` Sam Vilain
2003-06-05 13:55 ` Heinz-Josef Claes
` (2 more replies)
5 siblings, 3 replies; 42+ messages in thread
From: Sam Vilain @ 2003-06-05 13:43 UTC (permalink / raw)
To: Heinz-Josef Claes; +Cc: reiserfs-list
On Thu, 05 Jun 2003 20:14, Heinz-Josef Claes wrote:
> - I plan to build a system with IDE 250GB drives. 7 of them for raid 5
> and one hot spare. The OS will be on separate hardware raid 1 on smaller
> disks. Does anybody have experience with IDE controlers for the big
> disks? Is it better to use hardware raid oder software raid
> (performance)?
Yes, if massive disk space at virtually no cost is what you're after,
it has to be IDE. But, you don't have to give up the benefits of a
servicable data centre. Check out http://www.acme-technology.co.uk/
for some fairly snazzy hot swap IDE rack mount enclosures, cases and
more. There must be other vendors out there, too. If you've got that
many disks, you NEED hot swap!
As for the software RAID vs hardware RAID, my experience is that
software RAID 1 can deliver the same amount of disk space as hardware
RAID 5 for less total cost, factoring in the price of the RAID
controller. Of course, your storage density is worse for RAID 1, but
you could probably still fit your 10TB into a single rack, assuming
you can get 8 disks in a 3U chassis. RAID 1 in any form will *always*
outperform RAID 5, especially in the event of a failure. RAID 1
arrays hardly notice, RAID 5 arrays slow to a crawl.
btw, I wouldn't necessarily be too worried about stacking too many
devices on a single chain. Benchmark heavily all configurations with
a workload and close to the real workload of the device before buying
dozens of controllers, or settling on one plan recommended by some
self-professed `expert' trolling the reiserfs-list. Your bottleneck
may not be where you think.
I'd also strongly suggest getting a motherboard with 64 bit and/or
66MHz PCI bus. The Tyan ThunderK7 is pretty good - a dual capable
Athlon board that's stable as hell, and you can run the OpenBIOS
project on it - manage your servers with a serial terminal server (eg,
a PC with a serial breakout card) instead of a dumbass KVM switch.
Then, you'd have something that's almost as good as a commerical UNIX
platform, but not really. Personally, I'd get a quote for the arrays
running on Sparc hardware from http://www.anysystem.com/ (used Sun
parts peddlers) and offer that as a comparison.
Running LVM to allocate a large RAID volume works exceedingly well
from a system administration standpoint, especially with reiserfs.
It's online resizing support is second to none.
--
Sam Vilain, sam@vilain.net
Humanity is acquiring all the right technology for all the wrong
reasons.
-- R. Buckminster Fuller
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 10:25 ` Russell Coker
2003-06-05 10:38 ` Heinz-Josef Claes
@ 2003-06-05 13:48 ` Chris Mason
2003-06-14 11:11 ` data-logging for 2.4.21+ (was Re: How to build a big file server) Manuel Krause
1 sibling, 1 reply; 42+ messages in thread
From: Chris Mason @ 2003-06-05 13:48 UTC (permalink / raw)
To: Russell Coker; +Cc: Heinz-Josef Claes, reiserfs-list
On Thu, 2003-06-05 at 06:25, Russell Coker wrote:
> >
> > I use SuSE 8.2. It's much faster than the SuSE kernel in 8.1, even on my
> > ZIP drive at the parallel port ;)
>
> Hopefully someone can advise on whether the SuSE 8.2 kernel has all the
> patches you desire for best ReiserFS performance.
The suse kernels include the data logging code, along with the other
minor optimizations in my data logging dir.
The good news is that I've also tracked down a long standing
data=ordered performance bug, and I've got more data=ordered
optimizations to make fsync workloads better. Hopefully this morning I
can get it all to stop oopsing and I'll send out for broader testing.
-chris
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 13:43 ` Sam Vilain
@ 2003-06-05 13:55 ` Heinz-Josef Claes
2003-06-06 11:15 ` Vitezslav T. Se'm
2003-06-06 11:16 ` Vitezslav T. Se'm
2 siblings, 0 replies; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-05 13:55 UTC (permalink / raw)
To: Sam Vilain; +Cc: reiserfs-list
Am Don, 2003-06-05 um 15.43 schrieb Sam Vilain:
> On Thu, 05 Jun 2003 20:14, Heinz-Josef Claes wrote:
> > - I plan to build a system with IDE 250GB drives. 7 of them for raid 5
> > and one hot spare. The OS will be on separate hardware raid 1 on smaller
> > disks. Does anybody have experience with IDE controlers for the big
> > disks? Is it better to use hardware raid oder software raid
> > (performance)?
>
> Yes, if massive disk space at virtually no cost is what you're after,
Yes that's it. If this machine fails, it's of problem of second order.
As I wrote, it's only a big online pot for getting old data out of it.
If it's down, you can wait untill it's up or use tapes.
> it has to be IDE. But, you don't have to give up the benefits of a
> servicable data centre. Check out http://www.acme-technology.co.uk/
> for some fairly snazzy hot swap IDE rack mount enclosures, cases and
> more. There must be other vendors out there, too. If you've got that
> many disks, you NEED hot swap!
>
> As for the software RAID vs hardware RAID, my experience is that
> software RAID 1 can deliver the same amount of disk space as hardware
> RAID 5 for less total cost, factoring in the price of the RAID
> controller. Of course, your storage density is worse for RAID 1, but
> you could probably still fit your 10TB into a single rack, assuming
> you can get 8 disks in a 3U chassis. RAID 1 in any form will *always*
> outperform RAID 5, especially in the event of a failure. RAID 1
> arrays hardly notice, RAID 5 arrays slow to a crawl.
>
> btw, I wouldn't necessarily be too worried about stacking too many
> devices on a single chain. Benchmark heavily all configurations with
> a workload and close to the real workload of the device before buying
> dozens of controllers, or settling on one plan recommended by some
> self-professed `expert' trolling the reiserfs-list. Your bottleneck
> may not be where you think.
>
> I'd also strongly suggest getting a motherboard with 64 bit and/or
> 66MHz PCI bus. The Tyan ThunderK7 is pretty good - a dual capable
> Athlon board that's stable as hell, and you can run the OpenBIOS
> project on it - manage your servers with a serial terminal server (eg,
> a PC with a serial breakout card) instead of a dumbass KVM switch.
>
> Then, you'd have something that's almost as good as a commerical UNIX
> platform, but not really. Personally, I'd get a quote for the arrays
> running on Sparc hardware from http://www.anysystem.com/ (used Sun
> parts peddlers) and offer that as a comparison.
>
> Running LVM to allocate a large RAID volume works exceedingly well
> from a system administration standpoint, especially with reiserfs.
> It's online resizing support is second to none.
Thanx a lot, I'll look at the places to get more information.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 10:45 ` Heinz-Josef Claes
@ 2003-06-05 16:47 ` Carl-Daniel Hailfinger
2003-06-05 17:01 ` Soeren Sonnenburg
2003-06-05 17:06 ` Ragnar Kjørstad
0 siblings, 2 replies; 42+ messages in thread
From: Carl-Daniel Hailfinger @ 2003-06-05 16:47 UTC (permalink / raw)
To: Heinz-Josef Claes; +Cc: Russell Coker, reiserfs-list
Heinz-Josef Claes wrote:
From the debian web page:
>
> http://packages.debian.org/testing/utils/storebackup.html
>
> File comparisons are done with MD5 checksums, so no changes go
> unnoticed.
If you believe the last sentence, I have a bridge to sell.
To be more exact: MD5 is a 128=2^7 bit hash. Assuming a file length of 4kB
= 2^8*4096=2^20 bits, approximately 2^(2^(20-7))= 2^8192= 10^2457
different files have the same hash.
That's right: for a given MD5 hash, there are more different files with
4kB size sharing the same hash than the count of atoms in the whole
universe. If the files are larger, it gets worse.
md5sum(1) is not diff(1). Most of the time, it will suffice as el cheapo
replacement, but for backups it's definitely horrible. You don't store
your backup tapes in the microwave, do you?
Carl-Daniel
--
http://www.hailfinger.org/
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 16:47 ` Carl-Daniel Hailfinger
@ 2003-06-05 17:01 ` Soeren Sonnenburg
2003-06-05 17:06 ` Ragnar Kjørstad
1 sibling, 0 replies; 42+ messages in thread
From: Soeren Sonnenburg @ 2003-06-05 17:01 UTC (permalink / raw)
To: Carl-Daniel Hailfinger; +Cc: Heinz-Josef Claes, Russell Coker, reiserfs-list
On Thu, 2003-06-05 at 18:47, Carl-Daniel Hailfinger wrote:
> Heinz-Josef Claes wrote:
> >From the debian web page:
> >
> > http://packages.debian.org/testing/utils/storebackup.html
> >
> > File comparisons are done with MD5 checksums, so no changes go
> > unnoticed.
>
> If you believe the last sentence, I have a bridge to sell.
>
> To be more exact: MD5 is a 128=2^7 bit hash. Assuming a file length of 4kB
> = 2^8*4096=2^20 bits, approximately 2^(2^(20-7))= 2^8192= 10^2457
> different files have the same hash.
>
> That's right: for a given MD5 hash, there are more different files with
> 4kB size sharing the same hash than the count of atoms in the whole
> universe. If the files are larger, it gets worse.
>
> md5sum(1) is not diff(1). Most of the time, it will suffice as el cheapo
> replacement, but for backups it's definitely horrible. You don't store
> your backup tapes in the microwave, do you?
you forget one thing: how likely is it that a file with MD5SUM A turns
into a a file which has the same MD5SUM A. I would guess that that kind
of file corruption has a likelihood of very close to zero.
S.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 16:47 ` Carl-Daniel Hailfinger
2003-06-05 17:01 ` Soeren Sonnenburg
@ 2003-06-05 17:06 ` Ragnar Kjørstad
2003-06-06 9:41 ` Heinz-Josef Claes
1 sibling, 1 reply; 42+ messages in thread
From: Ragnar Kjørstad @ 2003-06-05 17:06 UTC (permalink / raw)
To: Carl-Daniel Hailfinger; +Cc: Heinz-Josef Claes, Russell Coker, reiserfs-list
On Thu, Jun 05, 2003 at 06:47:32PM +0200, Carl-Daniel Hailfinger wrote:
> Heinz-Josef Claes wrote:
> >From the debian web page:
> >
> > http://packages.debian.org/testing/utils/storebackup.html
> >
> > File comparisons are done with MD5 checksums, so no changes go
> > unnoticed.
>
> If you believe the last sentence, I have a bridge to sell.
>
> To be more exact: MD5 is a 128=2^7 bit hash. Assuming a file length of 4kB
> = 2^8*4096=2^20 bits, approximately 2^(2^(20-7))= 2^8192= 10^2457
> different files have the same hash.
Or explained differently, every time you change a file there is a
1/(2^128) chance that the backup-system will not notice.
I would be willing to take the risk.
However, calculating a checksum of all your data is relatively slow -
I'm surprised it doesn't use timestamps. Or maybe it's optional?
(Of course in some situations timestamps can not be relied on, but if
your application is not one of those it should be much much faster)
--
Ragnar Kjørstad
Zet.no
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
[not found] <no.id>
@ 2003-06-05 23:49 ` The Amazing Dragon
0 siblings, 0 replies; 42+ messages in thread
From: The Amazing Dragon @ 2003-06-05 23:49 UTC (permalink / raw)
To: Carl-Daniel Hailfinger; +Cc: Heinz-Josef Claes, Russell Coker, reiserfs-list
> From: Carl-Daniel Hailfinger <c-d.hailfinger.kernel.2003@gmx.net>
> Heinz-Josef Claes wrote:
> >From the debian web page:
> >
> > http://packages.debian.org/testing/utils/storebackup.html
> >
> > File comparisons are done with MD5 checksums, so no changes go
> > unnoticed.
>
> If you believe the last sentence, I have a bridge to sell.
Where, how large, what price, what is the expected income?
> To be more exact: MD5 is a 128=2^7 bit hash. Assuming a file length of 4kB
> = 2^8*4096=2^20 bits, approximately 2^(2^(20-7))= 2^8192= 10^2457
> different files have the same hash.
>
> That's right: for a given MD5 hash, there are more different files with
> 4kB size sharing the same hash than the count of atoms in the whole
> universe. If the files are larger, it gets worse.
Yes, if you have every possible 4KB file in existance, then 4080 of those
bytes are redundant (4096 minus 16 bytes for the hash), so 2^(4080*8) of
all possible 4KB files would have the same MD5 hash. Very much larger
than the number of atoms in the universe (2^265, without dark matter),
and larger than the volume of the universe in cubic centimeters (2^280).
Think of how many unique 4KB files could exist though, 2^32768. That is
a large number. If we consider a super-huge installation, you might have
a petabyte of storage (2^50, considerably larger than any installation
I've ever heard of). Now if we figure an average file size of 1KB
(extraordinarily small), then this installation will have 2^40 separate
files. There is also the problem of the birthday paradox so once you've
covered half the bits there is a fifty percent chance of a collision, so
MD5 suddenly comes down to 64 bits of use before collision becomes a
significant danger. So we are short of endangering MD5 by 24 bits, your
chances of winning a lottery game are very much better than a 50% chance
of a single collision.
Now, the size of MD5 is small enough that there is a significant
danger of /governments/ being able to find a couple pairs of files with
identical MD5 hashes. A much larger danger is that some attacks have
gotten though weakened versions of MD5. The danger of identical files
being found by accident is utterly insignificant.
This is also assuming MD5 is used for detecting identical files, more
likely it is used to detect changes. Also do you seriously worry about
users attempting to /prevent/ their files from being backed up?
> md5sum(1) is not diff(1). Most of the time, it will suffice as el cheapo
> replacement, but for backups it's definitely horrible. You don't store
> your backup tapes in the microwave, do you?
I'd worry more about the fireproof box your tapes are in failing in a
fire than about finding a single pair of colliding files though.
You do not know enough to disparage MD5.
--
(\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/)
\ ( | EHeM@cs.pdx.edu PGP 8881EF59 | ) /
\_ \ | _____ -O #include <stddisclaimer.h> O- _____ | / _/
\___\_|_/82 04 A1 3C C7 B1 37 2A*E3 6E 84 DA 97 4C 40 E6\_|_/___/
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 17:06 ` Ragnar Kjørstad
@ 2003-06-06 9:41 ` Heinz-Josef Claes
0 siblings, 0 replies; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-06 9:41 UTC (permalink / raw)
To: Ragnar Kjørstad; +Cc: Carl-Daniel Hailfinger, Russell Coker, reiserfs-list
Am Don, 2003-06-05 um 19.06 schrieb Ragnar Kjørstad:
> On Thu, Jun 05, 2003 at 06:47:32PM +0200, Carl-Daniel Hailfinger wrote:
> > Heinz-Josef Claes wrote:
> > >From the debian web page:
> > >
> > > http://packages.debian.org/testing/utils/storebackup.html
> > >
> > > File comparisons are done with MD5 checksums, so no changes go
> > > unnoticed.
> >
> > If you believe the last sentence, I have a bridge to sell.
> >
> > To be more exact: MD5 is a 128=2^7 bit hash. Assuming a file length of 4kB
> > = 2^8*4096=2^20 bits, approximately 2^(2^(20-7))= 2^8192= 10^2457
> > different files have the same hash.
>
-> It's an additionally backup to a normal backup to a tape. <-
(BTW: storebackup exists for 2 years now, 1 year on sourceforge and I
never heard of problems like this, so I think it will not happen too
often :-))
> Or explained differently, every time you change a file there is a
> 1/(2^128) chance that the backup-system will not notice.
>
> I would be willing to take the risk.
>
>
> However, calculating a checksum of all your data is relatively slow -
> I'm surprised it doesn't use timestamps. Or maybe it's optional?
>
> (Of course in some situations timestamps can not be relied on, but if
> your application is not one of those it should be much much faster)
>
The application works a follows:
The first backup is slow because all md5 sums have to be calculated
(thas fast) and the files have to be copied/compressed (that's slow).
The md5 files are stored (beside other information) in a special file.
The following backups will
1. look if the same file at the same place is unchanged (ctime, mtime,
size). If 'yes', it make a link to the old one and takes the md5 sum of
the old file and stores that md5 sum in the special file of that backup.
2. If it cannot find the same file at the same place, or the file has
changed, it will calculate the md5 sum and look for it in the old
backup. If it finds the md5 sum, it will make the link, otherwise it
will copy/compress the file.
Calculating md5 sums, compression and copying is done in parallel to
reading directories and linking to improve performance. There are
several queues. It can also get advantage from multiprocessor systems,
because multiple compressions can happen in parallel.
The real algorithm is a little bit more complicated, among other things
because it can link between independent backups (eg. from different
machines, or you make a big backup with all files every week and a
faster, smaller one every day and share the inodes between them).
-> So it uses timestamps (and size), but if they are not equal but the
file hasn't changes, there is only a little overhead. Md5 sums are
stored in berkeley dbm files during the backup.
Regards,
--
Heinz-Josef Claes hjclaes@web.de
project: http://sourceforge.net/projects/storebackup
-> snapshot-like backup to another disk
Q: How does a Unix guru have sex?
A: gunzip && strip && touch && finger && mount && fsck && \
more && yes && fsck && fsck && fsck && umount && sleep
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 13:43 ` Sam Vilain
2003-06-05 13:55 ` Heinz-Josef Claes
@ 2003-06-06 11:15 ` Vitezslav T. Se'm
2003-06-06 15:15 ` Russell Coker
2003-06-06 11:16 ` Vitezslav T. Se'm
2 siblings, 1 reply; 42+ messages in thread
From: Vitezslav T. Se'm @ 2003-06-06 11:15 UTC (permalink / raw)
To: Sam Vilain; +Cc: Heinz-Josef Claes, reiserfs-list
Hi.
Sam Vilain wrote:
>On Thu, 05 Jun 2003 20:14, Heinz-Josef Claes wrote:
>
>
>>- I plan to build a system with IDE 250GB drives. 7 of them for raid 5
>>and one hot spare. The OS will be on separate hardware raid 1 on smaller
>>disks. Does anybody have experience with IDE controlers for the big
>>disks? Is it better to use hardware raid oder software raid
>>(performance)?
>>
>>
>
>Yes, if massive disk space at virtually no cost is what you're after,
>it has to be IDE. But, you don't have to give up the benefits of a
>servicable data centre. Check out http://www.acme-technology.co.uk/
>for some fairly snazzy hot swap IDE rack mount enclosures, cases and
>more. There must be other vendors out there, too. If you've got that
>many disks, you NEED hot swap!
>
>As for the software RAID vs hardware RAID, my experience is that
>software RAID 1 can deliver the same amount of disk space as hardware
>RAID 5 for less total cost, factoring in the price of the RAID
>controller. Of course, your storage density is worse for RAID 1, but
>you could probably still fit your 10TB into a single rack, assuming
>you can get 8 disks in a 3U chassis. RAID 1 in any form will *always*
>outperform RAID 5, especially in the event of a failure. RAID 1
>arrays hardly notice, RAID 5 arrays slow to a crawl.
>
>btw, I wouldn't necessarily be too worried about stacking too many
>devices on a single chain. Benchmark heavily all configurations with
>a workload and close to the real workload of the device before buying
>dozens of controllers, or settling on one plan recommended by some
>self-professed `expert' trolling the reiserfs-list. Your bottleneck
>may not be where you think.
>
This is not about performace mostly, but about reliability. When U have
2 disks on one ribon, there is a serious problem, when one of them
crash, because in 99% cases, the other disk stops responding to. This is
IDE, not SCSI,
>I'd also strongly suggest getting a motherboard with 64 bit and/or
>66MHz PCI bus. The Tyan ThunderK7 is pretty good - a dual capable
>Athlon board that's stable as hell, and you can run the OpenBIOS
>project on it - manage your servers with a serial terminal server (eg,
>a PC with a serial breakout card) instead of a dumbass KVM switch.
>
>Then, you'd have something that's almost as good as a commerical UNIX
>platform, but not really. Personally, I'd get a quote for the arrays
>running on Sparc hardware from http://www.anysystem.com/ (used Sun
>parts peddlers) and offer that as a comparison.
>
>Running LVM to allocate a large RAID volume works exceedingly well
>from a system administration standpoint, especially with reiserfs.
>It's online resizing support is second to none
>
--
"Byval jsem krotitelem zvirat... Vetsinou tech nejnebezpecnejsich.
- Krotitelem zvirat?
Jo. Je to logicky pokracovani po tom, co jsem byl ucitelem... jenomze se
nemusis starat o rodice."
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 13:43 ` Sam Vilain
2003-06-05 13:55 ` Heinz-Josef Claes
2003-06-06 11:15 ` Vitezslav T. Se'm
@ 2003-06-06 11:16 ` Vitezslav T. Se'm
2 siblings, 0 replies; 42+ messages in thread
From: Vitezslav T. Se'm @ 2003-06-06 11:16 UTC (permalink / raw)
To: Sam Vilain; +Cc: Heinz-Josef Claes, reiserfs-list
Hi.
Sam Vilain wrote:
>On Thu, 05 Jun 2003 20:14, Heinz-Josef Claes wrote:
>
>
>>- I plan to build a system with IDE 250GB drives. 7 of them for raid 5
>>and one hot spare. The OS will be on separate hardware raid 1 on smaller
>>disks. Does anybody have experience with IDE controlers for the big
>>disks? Is it better to use hardware raid oder software raid
>>(performance)?
>>
>>
>
>Yes, if massive disk space at virtually no cost is what you're after,
>it has to be IDE. But, you don't have to give up the benefits of a
>servicable data centre. Check out http://www.acme-technology.co.uk/
>for some fairly snazzy hot swap IDE rack mount enclosures, cases and
>more. There must be other vendors out there, too. If you've got that
>many disks, you NEED hot swap!
>
>As for the software RAID vs hardware RAID, my experience is that
>software RAID 1 can deliver the same amount of disk space as hardware
>RAID 5 for less total cost, factoring in the price of the RAID
>controller. Of course, your storage density is worse for RAID 1, but
>you could probably still fit your 10TB into a single rack, assuming
>you can get 8 disks in a 3U chassis. RAID 1 in any form will *always*
>outperform RAID 5, especially in the event of a failure. RAID 1
>arrays hardly notice, RAID 5 arrays slow to a crawl.
>
>btw, I wouldn't necessarily be too worried about stacking too many
>devices on a single chain. Benchmark heavily all configurations with
>a workload and close to the real workload of the device before buying
>dozens of controllers, or settling on one plan recommended by some
>self-professed `expert' trolling the reiserfs-list. Your bottleneck
>may not be where you think.
>
This is not about performace mostly, but about reliability. When U have
2 disks on one ribon, there is a serious problem, when one of them
crash, because in 99% cases, the other disk stops responding to. This is
IDE, not SCSI.
Travis
>I'd also strongly suggest getting a motherboard with 64 bit and/or
>66MHz PCI bus. The Tyan ThunderK7 is pretty good - a dual capable
>Athlon board that's stable as hell, and you can run the OpenBIOS
>project on it - manage your servers with a serial terminal server (eg,
>a PC with a serial breakout card) instead of a dumbass KVM switch.
>
>Then, you'd have something that's almost as good as a commerical UNIX
>platform, but not really. Personally, I'd get a quote for the arrays
>running on Sparc hardware from http://www.anysystem.com/ (used Sun
>parts peddlers) and offer that as a comparison.
>
>Running LVM to allocate a large RAID volume works exceedingly well
>from a system administration standpoint, especially with reiserfs.
>It's online resizing support is second to none
>
--
"Byval jsem krotitelem zvirat... Vetsinou tech nejnebezpecnejsich.
- Krotitelem zvirat?
Jo. Je to logicky pokracovani po tom, co jsem byl ucitelem... jenomze se
nemusis starat o rodice."
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-06 11:15 ` Vitezslav T. Se'm
@ 2003-06-06 15:15 ` Russell Coker
0 siblings, 0 replies; 42+ messages in thread
From: Russell Coker @ 2003-06-06 15:15 UTC (permalink / raw)
To: Vitezslav T. Se'm; +Cc: reiserfs-list
On Fri, 6 Jun 2003 21:15, Vitezslav T. Se'm wrote:
> This is not about performace mostly, but about reliability. When U have
> 2 disks on one ribon, there is a serious problem, when one of them
> crash, because in 99% cases, the other disk stops responding to. This is
> IDE, not SCSI,
The same problem occurs on SCSI.
If you buy a serious storage system from Sun such as an A5000 which is
correctly configured then you will expect to have multiple RAID-5's, where
for each RAID-5 you have no more than one disk on any given SCSI channel
(cable), and the hot-spare disks are on a separate channel too.
In any situation where you have a bus with multiple devices connected to it
then a single device can kill the entire bus if it breaks in the wrong way.
I've seen this happen in Ethernet hubs, SCSI, IDE, PCI, and ISA.
As to which buses are most prone to such errors, all evidence that I am aware
of is anecdotal. For PCI you usually don't have much choice, machines with
hot-swap PCI and multiple independant PCI buses are very rare, ISA is history
so it's not something we have to worry about. For SCSI and IDE you want to
be able to lose a cable without losing the system if you want reliability.
You just don't want Ethernet hubs.
--
http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/ My home page
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-05 8:45 ` Marc-Christian Petersen
2003-06-05 9:30 ` Heinz-Josef Claes
2003-06-05 11:27 ` Bill Rees
@ 2003-06-10 13:28 ` myciel
2003-06-10 13:36 ` Heinz-Josef Claes
2 siblings, 1 reply; 42+ messages in thread
From: myciel @ 2003-06-10 13:28 UTC (permalink / raw)
To: reiserfs-list
>
> 3ware Controller. Works perfect.
>
but not in raid 5...
rafal
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-10 13:28 ` myciel
@ 2003-06-10 13:36 ` Heinz-Josef Claes
2003-06-10 14:13 ` myciel
0 siblings, 1 reply; 42+ messages in thread
From: Heinz-Josef Claes @ 2003-06-10 13:36 UTC (permalink / raw)
To: myciel; +Cc: reiserfs-list
Am Die, 2003-06-10 um 15.28 schrieb myciel:
> >
> > 3ware Controller. Works perfect.
> >
>
>
> but not in raid 5...
>
> rafal
What do you mean? Do have any experiences? What type of controler do
use?
Regards,
HJC
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-10 13:36 ` Heinz-Josef Claes
@ 2003-06-10 14:13 ` myciel
2003-06-12 20:34 ` Lars O. Grobe
0 siblings, 1 reply; 42+ messages in thread
From: myciel @ 2003-06-10 14:13 UTC (permalink / raw)
To: reiserfs-list
Heinz-Josef Claes wrote:
> Am Die, 2003-06-10 um 15.28 schrieb myciel:
>
>> >
>>
>>>3ware Controller. Works perfect.
>>>
>>
>>
>>but not in raid 5...
>>
> What do you mean? Do have any experiences? What type of controler do
> use?
2 x 3ware 7500-8 ATA RAID with 160GB Maxtors.
What I need for my production is: read speed ~2MB/s, write speed ~1MB/s.
With one partition per controller, on each of them ~500k of mail
accounts using maildir exported via nfs to www frontends and
mailservers, it simply did not work.
After migration to raid 10, it works perfect, during backup I'm able
to get 5MB/s reading (during busy hours).
rafal
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: How to build a big file server
2003-06-10 14:13 ` myciel
@ 2003-06-12 20:34 ` Lars O. Grobe
0 siblings, 0 replies; 42+ messages in thread
From: Lars O. Grobe @ 2003-06-12 20:34 UTC (permalink / raw)
To: reiserfs-list
> >>>3ware Controller. Works perfect.
> >>
> >>but not in raid 5...
> >>
> > What do you mean? Do have any experiences? What type of controler do
> > use?
Hi,
just a short note, I have been using 3ware controllers (the old 6800s) on a
540GB backup server... after too much trouble, I switched to software-raid.
The controller only gives me 8 channels now, raid5 is done by software.
Better performance, no need of identical disks (almost impossible in
ide-world), no firmware-problems any more.
However, a serious server has always scsi (or fc) IMHO...
CU Lars.
^ permalink raw reply [flat|nested] 42+ messages in thread
* data-logging for 2.4.21+ (was Re: How to build a big file server)
2003-06-05 13:48 ` Chris Mason
@ 2003-06-14 11:11 ` Manuel Krause
0 siblings, 0 replies; 42+ messages in thread
From: Manuel Krause @ 2003-06-14 11:11 UTC (permalink / raw)
To: Chris Mason, Oleg Drokin; +Cc: reiserfs-list
On 06/05/2003 03:48 PM, Chris Mason wrote:
> On Thu, 2003-06-05 at 06:25, Russell Coker wrote:
>
>
>>>I use SuSE 8.2. It's much faster than the SuSE kernel in 8.1, even on my
>>>ZIP drive at the parallel port ;)
>>
>>Hopefully someone can advise on whether the SuSE 8.2 kernel has all the
>>patches you desire for best ReiserFS performance.
>
>
> The suse kernels include the data logging code, along with the other
> minor optimizations in my data logging dir.
>
> The good news is that I've also tracked down a long standing
> data=ordered performance bug, and I've got more data=ordered
> optimizations to make fsync workloads better. Hopefully this morning I
> can get it all to stop oopsing and I'll send out for broader testing.
>
> -chris
Hi, Chris, Oleg and all others!
Are these patches ready for "broader testing" now? I would appreciate it
as you mentioned the key words "performance" and "data=ordered" in one
sentence ... ;-))
Does the iget5_locked patch (Oleg?!) get updated or is it obsolete? It
didn't apply fine to 2.4.21-pre6. So I skipped it for that kernel.
I didn't try kernels > 2.4.21-pre6 so far -- my simple question is: Are
the new patches in the queue or have they been forgotten over time? Or
are there some compounds or merges in progress?!!
Many thanks for clarification,
best regards,
Manuel
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2003-06-14 11:11 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-05 8:14 How to build a big file server Heinz-Josef Claes
2003-06-05 8:25 ` Carl-Daniel Hailfinger
2003-06-05 8:55 ` Andreas Dilger
2003-06-05 9:51 ` Hendrik Visage
2003-06-05 8:33 ` Ragnar Kjørstad
2003-06-05 8:42 ` Heinz-Josef Claes
2003-06-05 8:45 ` Marc-Christian Petersen
2003-06-05 9:30 ` Heinz-Josef Claes
2003-06-05 11:27 ` Bill Rees
2003-06-10 13:28 ` myciel
2003-06-10 13:36 ` Heinz-Josef Claes
2003-06-10 14:13 ` myciel
2003-06-12 20:34 ` Lars O. Grobe
2003-06-05 8:46 ` Oleg Drokin
2003-06-05 8:50 ` Heinz-Josef Claes
2003-06-05 9:04 ` Oleg Drokin
2003-06-05 9:17 ` Heinz-Josef Claes
2003-06-05 10:29 ` Russell Coker
2003-06-05 10:45 ` Heinz-Josef Claes
2003-06-05 16:47 ` Carl-Daniel Hailfinger
2003-06-05 17:01 ` Soeren Sonnenburg
2003-06-05 17:06 ` Ragnar Kjørstad
2003-06-06 9:41 ` Heinz-Josef Claes
2003-06-05 13:38 ` Hans Reiser
2003-06-05 12:06 ` Heinz-Josef Claes
2003-06-05 9:45 ` Christophe Saout
2003-06-05 10:07 ` Soeren Sonnenburg
2003-06-05 9:59 ` Russell Coker
2003-06-05 10:13 ` Heinz-Josef Claes
2003-06-05 10:25 ` Russell Coker
2003-06-05 10:38 ` Heinz-Josef Claes
2003-06-05 11:11 ` Russell Coker
2003-06-05 13:48 ` Chris Mason
2003-06-14 11:11 ` data-logging for 2.4.21+ (was Re: How to build a big file server) Manuel Krause
2003-06-05 10:05 ` How to build a big file server Hans Reiser
2003-06-05 10:24 ` Heinz-Josef Claes
2003-06-05 13:43 ` Sam Vilain
2003-06-05 13:55 ` Heinz-Josef Claes
2003-06-06 11:15 ` Vitezslav T. Se'm
2003-06-06 15:15 ` Russell Coker
2003-06-06 11:16 ` Vitezslav T. Se'm
[not found] <no.id>
2003-06-05 23:49 ` The Amazing Dragon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.