recommendations for stripe/chunk size

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* recommendations for stripe/chunk size
@ 2008-02-05 18:24 Keld Jørn Simonsen
  2008-02-05 19:19 ` Justin Piszcz
  2008-02-06 19:22 ` Bill Davidsen
  0 siblings, 2 replies; 12+ messages in thread
From: Keld Jørn Simonsen @ 2008-02-05 18:24 UTC (permalink / raw)
  To: linux-raid

Hi

I am looking at revising our howto. I see a number of places where a
chunk size of 32 kiB is recommended, and even recommendations on
maybe using sizes of 4 kiB. 

My own take on that is that this really hurts performance. 
Normal disks have a rotation speed of between 5400 (laptop)
7200 (ide/sata) and 10000 (SCSI) rounds per minute, giving an average
spinning time for one round of 6 to 12 ms, and average latency of half
this, that is 3 to 6 ms. Then you need to add head movement which
is something like 2 to 20 ms - in total average seek time 5 to 26 ms,
averaging around 13-17 ms. 

in about 15 ms you can read on current SATA-II (300 MB/s) or ATA/133 
something like between 600 to 1200 kB, actual transfer rates of
80 MB/s on SATA-II and 40 MB/s on ATA/133. So to get some bang for the buck,
and transfer some data you should have something like 256/512 kiB
chunks. With a transfer rate of 50 MB/s and chunk sizes of 256 kiB
giving about a time of 20 ms per transaction
you should be able with random reads to transfer 12 MB/s  - my
actual figures is about 30 MB/s which is possibly because of the
elevator effect of the file system driver. With a size of 4 kb per chunk 
you should have a time of 15 ms per transaction, or 66 transactions per 
second, or a transfer rate of 250 kb/s. So 256 kb vs 4 kb speeds up
the transfer by a factor of 50. 

I actually  think the kernel should operate with block sizes
like this and not wth 4 kiB blocks. It is the readahead and the elevator
algorithms that save us from randomly reading 4 kb a time.

I also see that there are some memory constrints on this.
Having maybe 1000 processes reading, as for my mirror service,
256 kib buffers would be acceptable, occupying 256 MB RAM.
That is reasonable, and I could even tolerate 512 MB ram used.
But going to 1 MiB buffers would be overdoing it for my configuration.

What would be the recommended chunk size for todays equipment?

Best regards
Keld

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recommendations for stripe/chunk size
  2008-02-05 18:24 recommendations for stripe/chunk size Keld Jørn Simonsen
@ 2008-02-05 19:19 ` Justin Piszcz
  2008-02-06 19:22 ` Bill Davidsen
  1 sibling, 0 replies; 12+ messages in thread
From: Justin Piszcz @ 2008-02-05 19:19 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2463 bytes --]



On Tue, 5 Feb 2008, Keld Jørn Simonsen wrote:

> Hi
>
> I am looking at revising our howto. I see a number of places where a
> chunk size of 32 kiB is recommended, and even recommendations on
> maybe using sizes of 4 kiB.
>
> My own take on that is that this really hurts performance.
> Normal disks have a rotation speed of between 5400 (laptop)
> 7200 (ide/sata) and 10000 (SCSI) rounds per minute, giving an average
> spinning time for one round of 6 to 12 ms, and average latency of half
> this, that is 3 to 6 ms. Then you need to add head movement which
> is something like 2 to 20 ms - in total average seek time 5 to 26 ms,
> averaging around 13-17 ms.
>
> in about 15 ms you can read on current SATA-II (300 MB/s) or ATA/133
> something like between 600 to 1200 kB, actual transfer rates of
> 80 MB/s on SATA-II and 40 MB/s on ATA/133. So to get some bang for the buck,
> and transfer some data you should have something like 256/512 kiB
> chunks. With a transfer rate of 50 MB/s and chunk sizes of 256 kiB
> giving about a time of 20 ms per transaction
> you should be able with random reads to transfer 12 MB/s  - my
> actual figures is about 30 MB/s which is possibly because of the
> elevator effect of the file system driver. With a size of 4 kb per chunk
> you should have a time of 15 ms per transaction, or 66 transactions per
> second, or a transfer rate of 250 kb/s. So 256 kb vs 4 kb speeds up
> the transfer by a factor of 50.
>
> I actually  think the kernel should operate with block sizes
> like this and not wth 4 kiB blocks. It is the readahead and the elevator
> algorithms that save us from randomly reading 4 kb a time.
>
> I also see that there are some memory constrints on this.
> Having maybe 1000 processes reading, as for my mirror service,
> 256 kib buffers would be acceptable, occupying 256 MB RAM.
> That is reasonable, and I could even tolerate 512 MB ram used.
> But going to 1 MiB buffers would be overdoing it for my configuration.
>
> What would be the recommended chunk size for todays equipment?
>
> Best regards
> Keld
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

My benchmarks concluded that 256 KiB to 1024 KiB is optimal, too much 
below or too much over that range results in degradation.

Justin.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recommendations for stripe/chunk size
  2008-02-05 18:24 recommendations for stripe/chunk size Keld Jørn Simonsen
  2008-02-05 19:19 ` Justin Piszcz
@ 2008-02-06 19:22 ` Bill Davidsen
  2008-02-06 20:25   ` Wolfgang Denk
  2008-02-07  5:31   ` Neil Brown
  1 sibling, 2 replies; 12+ messages in thread
From: Bill Davidsen @ 2008-02-06 19:22 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: linux-raid

Keld Jørn Simonsen wrote:
> Hi
>
> I am looking at revising our howto. I see a number of places where a
> chunk size of 32 kiB is recommended, and even recommendations on
> maybe using sizes of 4 kiB. 
>
>   
Depending on the raid level, a write smaller than the chunk size causes 
the chunk to be read, altered, and rewritten, vs. just written if the 
write is a multiple of chunk size. Many filesystems by default use a 4k 
page size and writes. I believe this is the reasoning behind the 
suggestion of small chunk sizes. Sequential vs. random and raid level 
are important here, there's no one size to work best in all cases.
> My own take on that is that this really hurts performance. 
> Normal disks have a rotation speed of between 5400 (laptop)
> 7200 (ide/sata) and 10000 (SCSI) rounds per minute, giving an average
> spinning time for one round of 6 to 12 ms, and average latency of half
> this, that is 3 to 6 ms. Then you need to add head movement which
> is something like 2 to 20 ms - in total average seek time 5 to 26 ms,
> averaging around 13-17 ms. 
>
>   
Having a write not some multiple of chunk size would seem to require a 
read-alter- wait_for_disk_rotation-write, and for large sustained 
sequential i/o using multiple drives helps transfer. for small random 
i/o small chunks are good, I find little benefit to chunks over 256 or 
maybe 1024k.
> in about 15 ms you can read on current SATA-II (300 MB/s) or ATA/133 
> something like between 600 to 1200 kB, actual transfer rates of
> 80 MB/s on SATA-II and 40 MB/s on ATA/133. So to get some bang for the buck,
> and transfer some data you should have something like 256/512 kiB
> chunks. With a transfer rate of 50 MB/s and chunk sizes of 256 kiB
> giving about a time of 20 ms per transaction
> you should be able with random reads to transfer 12 MB/s  - my
> actual figures is about 30 MB/s which is possibly because of the
> elevator effect of the file system driver. With a size of 4 kb per chunk 
> you should have a time of 15 ms per transaction, or 66 transactions per 
> second, or a transfer rate of 250 kb/s. So 256 kb vs 4 kb speeds up
> the transfer by a factor of 50. 
>
>   
If you actually see anything like this your write caching and readahead 
aren't doing what they should!

> I actually  think the kernel should operate with block sizes
> like this and not wth 4 kiB blocks. It is the readahead and the elevator
> algorithms that save us from randomly reading 4 kb a time.
>
>   
Exactly, and nothing save a R-A-RW cycle if the write is a partial chunk.
> I also see that there are some memory constrints on this.
> Having maybe 1000 processes reading, as for my mirror service,
> 256 kib buffers would be acceptable, occupying 256 MB RAM.
> That is reasonable, and I could even tolerate 512 MB ram used.
> But going to 1 MiB buffers would be overdoing it for my configuration.
>
> What would be the recommended chunk size for todays equipment?
>
>   
I think usage is more important than hardware. My opinion only.

> Best regards
> Keld


-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recommendations for stripe/chunk size
  2008-02-06 19:22 ` Bill Davidsen
@ 2008-02-06 20:25   ` Wolfgang Denk
  2008-02-06 22:37     ` Bill Davidsen
                       ` (2 more replies)
  2008-02-07  5:31   ` Neil Brown
  1 sibling, 3 replies; 12+ messages in thread
From: Wolfgang Denk @ 2008-02-06 20:25 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Keld Jørn Simonsen, linux-raid

In message <47AA08E7.5000801@tmr.com> you wrote:
>
> > I actually  think the kernel should operate with block sizes
> > like this and not wth 4 kiB blocks. It is the readahead and the elevator
> > algorithms that save us from randomly reading 4 kb a time.
> >
> >   
> Exactly, and nothing save a R-A-RW cycle if the write is a partial chunk.

Indeed kernel page size is an important factor in such optimizations.
But you have to keep in mind that this is mostly efficient for (very)
large strictly sequential I/O operations only -  actual  file  system
traffic may be *very* different.

We implemented the option to select kernel page sizes of  4,  16,  64
and  256  kB for some PowerPC systems (440SPe, to be precise). A nice
graphics of the effect can be found here:

https://www.amcc.com/MyAMCC/retrieveDocument/PowerPC/440SPe/RAIDinLinux_PB_0529a.pdf

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
You got to learn three things. What's  real,  what's  not  real,  and
what's the difference."           - Terry Pratchett, _Witches Abroad_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recommendations for stripe/chunk size
  2008-02-06 20:25   ` Wolfgang Denk
@ 2008-02-06 22:37     ` Bill Davidsen
  2008-02-07  0:31     ` Keld Jørn Simonsen
  2008-02-07  5:46     ` Neil Brown
  2 siblings, 0 replies; 12+ messages in thread
From: Bill Davidsen @ 2008-02-06 22:37 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Keld Jørn Simonsen, linux-raid

Wolfgang Denk wrote:
> In message <47AA08E7.5000801@tmr.com> you wrote:
>   
>>> I actually  think the kernel should operate with block sizes
>>> like this and not wth 4 kiB blocks. It is the readahead and the elevator
>>> algorithms that save us from randomly reading 4 kb a time.
>>>
>>>   
>>>       
>> Exactly, and nothing save a R-A-RW cycle if the write is a partial chunk.
>>     
>
> Indeed kernel page size is an important factor in such optimizations.
> But you have to keep in mind that this is mostly efficient for (very)
> large strictly sequential I/O operations only -  actual  file  system
> traffic may be *very* different.
>
>   
That was actually what I meant by page size, that of the file system 
rather than the memory, ie. the "block size" typically used for writes. 
Or multiples thereof, obviously.
> We implemented the option to select kernel page sizes of  4,  16,  64
> and  256  kB for some PowerPC systems (440SPe, to be precise). A nice
> graphics of the effect can be found here:
>
> https://www.amcc.com/MyAMCC/retrieveDocument/PowerPC/440SPe/RAIDinLinux_PB_0529a.pdf
>
>   
I started that online and pulled a download to print, very neat stuff. 
Thanks for the link.
> Best regards,
>
> Wolfgang Denk
>
>   


-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recommendations for stripe/chunk size
  2008-02-06 20:25   ` Wolfgang Denk
  2008-02-06 22:37     ` Bill Davidsen
@ 2008-02-07  0:31     ` Keld Jørn Simonsen
  2008-02-07  5:40       ` Iustin Pop
  2008-02-07  5:51       ` Neil Brown
  2008-02-07  5:46     ` Neil Brown
  2 siblings, 2 replies; 12+ messages in thread
From: Keld Jørn Simonsen @ 2008-02-07  0:31 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Bill Davidsen, linux-raid

On Wed, Feb 06, 2008 at 09:25:36PM +0100, Wolfgang Denk wrote:
> In message <47AA08E7.5000801@tmr.com> you wrote:
> >
> > > I actually  think the kernel should operate with block sizes
> > > like this and not wth 4 kiB blocks. It is the readahead and the elevator
> > > algorithms that save us from randomly reading 4 kb a time.
> > >
> > >   
> > Exactly, and nothing save a R-A-RW cycle if the write is a partial chunk.
> 
> Indeed kernel page size is an important factor in such optimizations.
> But you have to keep in mind that this is mostly efficient for (very)
> large strictly sequential I/O operations only -  actual  file  system
> traffic may be *very* different.
> 
> We implemented the option to select kernel page sizes of  4,  16,  64
> and  256  kB for some PowerPC systems (440SPe, to be precise). A nice
> graphics of the effect can be found here:
> 
> https://www.amcc.com/MyAMCC/retrieveDocument/PowerPC/440SPe/RAIDinLinux_PB_0529a.pdf

Yes, that is also what I would expect, for sequential reads.
Random writes of small data blocks, kind of what is done in bug data
bases, should show another picture as others also have described.

If you look at a single disk, would you get improved performance with
the asyncroneous IO?

I am a bit puzzled about my SATA-II performance: nominally I could get
300 MB/s on SATA-II, but I only get about 80 MB/s. Why is that?
I thought it was because of latency with syncroneous reads.
Ie, when a chunk is read, yo need to complete the IO operation, and then
issue an new one. In the meantime while the CPU is doing these
calculations, te disk has spun a little, and to get the next data chunk,
we need to wait for the disk to spin around to have the head positioned 
over the right data pace on the disk surface. Is that so? Or does the
controller take care of this, reading the rest of the not-yet-requested
track into a buffer, which then can be delivered next time. Modern disks
often have buffers of about 8 or 16 MB. I wonder why they don't have
bigger buffers.

Anyway, why does a SATA-II drive not deliver something like 300 MB/s?

best regards
keld

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recommendations for stripe/chunk size
  2008-02-06 19:22 ` Bill Davidsen
  2008-02-06 20:25   ` Wolfgang Denk
@ 2008-02-07  5:31   ` Neil Brown
  1 sibling, 0 replies; 12+ messages in thread
From: Neil Brown @ 2008-02-07  5:31 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Keld Jorn Simonsen, linux-raid

On Wednesday February 6, davidsen@tmr.com wrote:
> Keld Jørn Simonsen wrote:
> > Hi
> >
> > I am looking at revising our howto. I see a number of places where a
> > chunk size of 32 kiB is recommended, and even recommendations on
> > maybe using sizes of 4 kiB. 
> >
> >   
> Depending on the raid level, a write smaller than the chunk size causes 
> the chunk to be read, altered, and rewritten, vs. just written if the 
> write is a multiple of chunk size. Many filesystems by default use a 4k 
> page size and writes. I believe this is the reasoning behind the 
> suggestion of small chunk sizes. Sequential vs. random and raid level 
> are important here, there's no one size to work best in all cases.

Not in md/raid.

RAID4/5/6 will do a read-modify-write if you are writing less than one
*page*, but then they often to read-modify-write anyway for parity
updates.

No level will every read a whole chunk just because it is a chunk.

To answer the original question:  The only way to be sure is to test
your hardware with your workload with different chunk sizes.
But I suspect that around 256K is good on current hardware.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recommendations for stripe/chunk size
  2008-02-07  0:31     ` Keld Jørn Simonsen
@ 2008-02-07  5:40       ` Iustin Pop
  2008-02-07  9:58         ` Keld Jørn Simonsen
  2008-02-07  5:51       ` Neil Brown
  1 sibling, 1 reply; 12+ messages in thread
From: Iustin Pop @ 2008-02-07  5:40 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Wolfgang Denk, Bill Davidsen, linux-raid

On Thu, Feb 07, 2008 at 01:31:16AM +0100, Keld Jørn Simonsen wrote:
> Anyway, why does a SATA-II drive not deliver something like 300 MB/s?

Wait, are you talking about a *single* drive?

In that case, it seems you are confusing the interface speed (300MB/s)
with the mechanical read speed (80MB/s). If you are asking why is a
single drive limited to 80 MB/s, I guess it's a problem of mechanics.
Even with NCQ or big readahead settings, ~80-~100 MB/s is the highest
I've seen on 7200 RPM drives. And yes, there is no wait until the CPU
processes the current data until the drive reads the next data; drives
have a builtin read-ahead mechanism.

Honestly, I have 10x as many problems with the low random I/O throughput
rather than with the (high, IMHO) sequential I/O speed.

regards,
iustin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recommendations for stripe/chunk size
  2008-02-06 20:25   ` Wolfgang Denk
  2008-02-06 22:37     ` Bill Davidsen
  2008-02-07  0:31     ` Keld Jørn Simonsen
@ 2008-02-07  5:46     ` Neil Brown
  2008-02-07  8:49       ` Wolfgang Denk
  2 siblings, 1 reply; 12+ messages in thread
From: Neil Brown @ 2008-02-07  5:46 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Bill Davidsen, Keld Jørn Simonsen, linux-raid

On Wednesday February 6, wd@denx.de wrote:
> 
> We implemented the option to select kernel page sizes of  4,  16,  64
> and  256  kB for some PowerPC systems (440SPe, to be precise). A nice
> graphics of the effect can be found here:
> 
> https://www.amcc.com/MyAMCC/retrieveDocument/PowerPC/440SPe/RAIDinLinux_PB_0529a.pdf

Thanks for the link!

<quote>
The second improvement is to remove a memory copy that is internal to the MD driver. The MD
driver stages strip data ready to be written next to the I/O controller in a page size pre-
allocated buffer. It is possible to bypass this memory copy for sequential writes thereby saving
SDRAM access cycles.
</quote>

I sure hope you've checked that the filesystem never (ever) changes a
buffer while it is being written out.  Otherwise the data written to
disk might be different from the data used in the parity calculation
:-)

And what are the "Second memcpy" and "First memcpy" in the graph?
I assume one is the memcpy mentioned above, but what is the other?

NeilBrown

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recommendations for stripe/chunk size
  2008-02-07  0:31     ` Keld Jørn Simonsen
  2008-02-07  5:40       ` Iustin Pop
@ 2008-02-07  5:51       ` Neil Brown
  1 sibling, 0 replies; 12+ messages in thread
From: Neil Brown @ 2008-02-07  5:51 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Wolfgang Denk, Bill Davidsen, linux-raid

On Thursday February 7, keld@dkuug.dk wrote:
> 
> Anyway, why does a SATA-II drive not deliver something like 300 MB/s?

Are you serious?

I high end 15000RPM enterprise grade drive such as the Seagate
Cheetah® 15K.6 Hard Drives only deliver 164MB/sec.

The SATA Bus might be able to deliver 300MB/s, but an individual drive
would be around 80MB/s unless it is really expensive.

(or was that yesterday?  I'm having trouble keeping up with the pace
 of improvement :-)

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recommendations for stripe/chunk size
  2008-02-07  5:46     ` Neil Brown
@ 2008-02-07  8:49       ` Wolfgang Denk
  0 siblings, 0 replies; 12+ messages in thread
From: Wolfgang Denk @ 2008-02-07  8:49 UTC (permalink / raw)
  To: Neil Brown; +Cc: Bill Davidsen, Keld Jørn Simonsen, linux-raid

Dear Nail,

in message <18346.39756.292908.58065@notabene.brown> you wrote:
> 
> <quote>
> The second improvement is to remove a memory copy that is internal to the MD driver. The MD
> driver stages strip data ready to be written next to the I/O controller in a page size pre-
> allocated buffer. It is possible to bypass this memory copy for sequential writes thereby saving
> SDRAM access cycles.
> </quote>
> 
> I sure hope you've checked that the filesystem never (ever) changes a
> buffer while it is being written out.  Otherwise the data written to
> disk might be different from the data used in the parity calculation
> :-)

Sure. Note that usage szenarios of this implementation are  not  only
(actually  not  even  primarily)  focussed  on  using such a setup as
normal RAID server - instead processors like the 440SPe  will  likely
be  used  on  RAID  controller  cards itself - and data may come from
iSCSI or over one of the PCIe busses, but  not  from  a  normal  file
system.

> And what are the "Second memcpy" and "First memcpy" in the graph?
> I assume one is the memcpy mentioned above, but what is the other?

Avoiding the 1st memcpy means to skip the system block level caching,
i. e. try to use DIRECT_IO capability  ("-dio"  option  to  xdd  tool
which was used for these benchmarks).

The 2nd memcpy is the optimization for large  sequential  writes  you
quoted above.

Please keep  in  mind  that  these  optimizations  are  probably  not
directly  useful  for  general purpose use of a normal file system on
top of the RAID array; they have other goals: provide benchmarks  for
the  special  case  of  large synchrounous I/O operations (as used by
RAID controller manufacturers to show off their competitors), and  to
provide a base for the firmware of such controllers.

Nevertheless, they clearly show  where  optimizations  are  possible,
assuming you understand exactly your usuage szenario.

In real life, your  optimization  may  require  completely  different
strategies  -  for  example,  on  our  main file server we see such a
distribution of file sizes:

Out of a sample of 14.2e6 files,

	 65%    are smaller than  4 kB
	 80% 	are smaller than  8 kB
	 90% 	are smaller than 16 kB
	 96% 	are smaller than 32 kB
	 98.4% 	are smaller than 64 kB

You don't want - for example - huge stripe sizes in such a system.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Egotist: A person of low taste, more interested in  himself  than  in
me.                                                  - Ambrose Bierce

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: recommendations for stripe/chunk size
  2008-02-07  5:40       ` Iustin Pop
@ 2008-02-07  9:58         ` Keld Jørn Simonsen
  0 siblings, 0 replies; 12+ messages in thread
From: Keld Jørn Simonsen @ 2008-02-07  9:58 UTC (permalink / raw)
  To: Wolfgang Denk, Bill Davidsen, linux-raid

On Thu, Feb 07, 2008 at 06:40:12AM +0100, Iustin Pop wrote:
> On Thu, Feb 07, 2008 at 01:31:16AM +0100, Keld Jørn Simonsen wrote:
> > Anyway, why does a SATA-II drive not deliver something like 300 MB/s?
> 
> Wait, are you talking about a *single* drive?

Yes, I was talking about a single drive.

> In that case, it seems you are confusing the interface speed (300MB/s)
> with the mechanical read speed (80MB/s).

I thought the 300 MB/S was the transfer rate between the disk and the
controllers memory in its buffers, but you indicate that this is the
speed between the controller's buffers and main RAM. 

I am, as Neil, amazed by the speeds that we get on current hardware, but
still I would like to see if we could use the hardware better.
Asyncroneous IO could be a way forward. I have written some mainframe
utilities where asyncroneous IO was the key to the performance,
so I thought that it could also become handy in the Linux kernel. 

If about 80 MB/s is the maximum we can get out of a current SATA-II
7200 rpm drive, then I think there is not much to be gained from
asyncroneous IO. 

> If you are asking why is a
> single drive limited to 80 MB/s, I guess it's a problem of mechanics.
> Even with NCQ or big readahead settings, ~80-~100 MB/s is the highest
> I've seen on 7200 RPM drives. And yes, there is no wait until the CPU
> processes the current data until the drive reads the next data; drives
> have a builtin read-ahead mechanism.
> 
> Honestly, I have 10x as many problems with the low random I/O throughput
> rather than with the (high, IMHO) sequential I/O speed.

I agree that random IO is the main factor on most server installations.
But on workstations the sequentioal IO is also important, as the only
user is sometimes waiting for the computer to respond.
And then I think that booting can benefit from faster sequential IO.
And not to forget, I think it is fun to make my hardware run faster!

best regards
Keld
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-02-07  9:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-05 18:24 recommendations for stripe/chunk size Keld Jørn Simonsen
2008-02-05 19:19 ` Justin Piszcz
2008-02-06 19:22 ` Bill Davidsen
2008-02-06 20:25   ` Wolfgang Denk
2008-02-06 22:37     ` Bill Davidsen
2008-02-07  0:31     ` Keld Jørn Simonsen
2008-02-07  5:40       ` Iustin Pop
2008-02-07  9:58         ` Keld Jørn Simonsen
2008-02-07  5:51       ` Neil Brown
2008-02-07  5:46     ` Neil Brown
2008-02-07  8:49       ` Wolfgang Denk
2008-02-07  5:31   ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).