New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations.
@ 2007-12-30 23:04 Justin Piszcz
  2007-12-30 23:33 ` Raz
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Justin Piszcz @ 2007-12-30 23:04 UTC (permalink / raw)
  To: xfs; +Cc: linux-raid, Alan Piszcz

Dave's original e-mail:

> # mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2 -d agcount=4 <dev>
> # mount -o logbsize=256k <dev> <mtpt>

> And if you don't care about filsystem corruption on power loss:

> # mount -o logbsize=256k,nobarrier <dev> <mtpt>

> Those mkfs values (except for log size) will be hte defaults in the next
> release of xfsprogs.

> Cheers,

> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group

---------

I used his mkfs.xfs options verbatim but I use my own mount options:
noatime,nodiratime,logbufs=8,logbsize=26214

Here are the results, the results of 3 bonnie++ averaged together for each 
test:
http://home.comcast.net/~jpiszcz/xfs1/result.html

Thanks Dave, this looks nice--the more optimizations the better!

-----------

I also find it rather pecuilar that in some of my (other) benchmarks my 
RAID 5 is just as fast as RAID 0 for extracting large files (uncompressed) 
files:

RAID 5 (1024k CHUNK)
26.95user 6.72system 0:37.89elapsed 88%CPU (0avgtext+0avgdata 
0maxresident)k0inputs+0outputs (6major+526minor)pagefaults 0swaps

Compare with RAID 0 for the same operation:

(as with RAID5, it appears 256k-1024k..2048k possibly) is the sweet spot.

Why does mdadm still use 64k for the default chunk size?

And another quick question, would there be any benefit to use (if it were 
possible) a block size of > 4096 bytes with XFS (I assume only 
IA64/similar arch can support it), e.g. x86_64 cannot because the 
page_size is 4096.

[ 8265.407137] XFS: only pagesize (4096) or less will currently work.

The speeds:

extract speed with 4 chunk:
27.30user 10.51system 0:55.87elapsed 67%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+526minor)pagefaults 0swaps
27.39user 10.38system 0:56.98elapsed 66%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+528minor)pagefaults 0swaps
27.31user 10.56system 0:57.70elapsed 65%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+528minor)pagefaults 0swaps
extract speed with 8 chunk:
27.09user 9.27system 0:54.60elapsed 66%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+525minor)pagefaults 0swaps
27.23user 8.91system 0:54.38elapsed 66%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+527minor)pagefaults 0swaps
27.19user 8.98system 0:54.68elapsed 66%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+526minor)pagefaults 0swaps
extract speed with 16 chunk:
27.12user 7.24system 0:51.12elapsed 67%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+526minor)pagefaults 0swaps
27.13user 7.12system 0:50.58elapsed 67%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+528minor)pagefaults 0swaps
27.11user 7.18system 0:50.56elapsed 67%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+527minor)pagefaults 0swaps
extract speed with 32 chunk:
27.15user 6.52system 0:48.06elapsed 70%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+527minor)pagefaults 0swaps
27.24user 6.38system 0:49.10elapsed 68%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+528minor)pagefaults 0swaps
27.11user 6.46system 0:47.56elapsed 70%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+528minor)pagefaults 0swaps
extract speed with 64 chunk:
27.15user 5.94system 0:45.13elapsed 73%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+525minor)pagefaults 0swaps
27.17user 5.94system 0:44.82elapsed 73%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+527minor)pagefaults 0swaps
27.02user 6.12system 0:44.61elapsed 74%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+525minor)pagefaults 0swaps
extract speed with 128 chunk:
26.98user 5.78system 0:40.48elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+525minor)pagefaults 0swaps
27.05user 5.73system 0:40.30elapsed 81%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+528minor)pagefaults 0swaps
27.11user 5.68system 0:40.59elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+526minor)pagefaults 0swaps
extract speed with 256 chunk:
27.10user 5.60system 0:36.47elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+525minor)pagefaults 0swaps
27.03user 5.67system 0:36.18elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+528minor)pagefaults 0swaps
27.17user 5.50system 0:37.38elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+526minor)pagefaults 0swaps
extract speed with 512 chunk:
27.06user 5.54system 0:36.58elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+524minor)pagefaults 0swaps
27.03user 5.59system 0:36.31elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+525minor)pagefaults 0swaps
27.06user 5.58system 0:36.42elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+528minor)pagefaults 0swaps
extract speed with 1024 chunk:
26.92user 5.69system 0:36.51elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+525minor)pagefaults 0swaps
27.18user 5.43system 0:36.39elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+528minor)pagefaults 0swaps
27.04user 5.60system 0:36.27elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+526minor)pagefaults 0swaps
extract speed with 2048 chunk:
26.97user 5.63system 0:36.99elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+525minor)pagefaults 0swaps
26.98user 5.62system 0:36.90elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+527minor)pagefaults 0swaps
27.15user 5.44system 0:37.06elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+526minor)pagefaults 0swaps
extract speed with 4096 chunk:
27.11user 5.54system 0:38.96elapsed 83%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+526minor)pagefaults 0swaps
27.09user 5.55system 0:38.85elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+527minor)pagefaults 0swaps
27.12user 5.52system 0:38.80elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+528minor)pagefaults 0swaps
extract speed with 8192 chunk:
27.04user 5.57system 0:43.54elapsed 74%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+526minor)pagefaults 0swaps
27.15user 5.49system 0:43.52elapsed 75%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+526minor)pagefaults 0swaps
27.11user 5.52system 0:43.66elapsed 74%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+528minor)pagefaults 0swaps
extract speed with 16384 chunk:
27.25user 5.45system 0:52.18elapsed 62%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+526minor)pagefaults 0swaps
27.18user 5.52system 0:52.54elapsed 62%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+527minor)pagefaults 0swaps
27.17user 5.50system 0:51.38elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6major+525minor)pagefaults 0swaps

Justin.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations.
  2007-12-30 23:04 New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations Justin Piszcz
@ 2007-12-30 23:33 ` Raz
  2007-12-30 23:44   ` Wolfgang Denk
  2007-12-31  1:14 ` Richard Scobie
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Raz @ 2007-12-30 23:33 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs, linux-raid

what is nobarrier ?

On 12/31/07, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> Dave's original e-mail:
>
> > # mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2 -d agcount=4 <dev>
> > # mount -o logbsize=256k <dev> <mtpt>
>
> > And if you don't care about filsystem corruption on power loss:
>
> > # mount -o logbsize=256k,nobarrier <dev> <mtpt>
>
> > Those mkfs values (except for log size) will be hte defaults in the next
> > release of xfsprogs.
>
> > Cheers,
>
> > Dave.
> > --
> > Dave Chinner
> > Principal Engineer
> > SGI Australian Software Group
>
> ---------
>
> I used his mkfs.xfs options verbatim but I use my own mount options:
> noatime,nodiratime,logbufs=8,logbsize=26214
>
> Here are the results, the results of 3 bonnie++ averaged together for each
> test:
> http://home.comcast.net/~jpiszcz/xfs1/result.html
>
> Thanks Dave, this looks nice--the more optimizations the better!
>
> -----------
>
> I also find it rather pecuilar that in some of my (other) benchmarks my
> RAID 5 is just as fast as RAID 0 for extracting large files (uncompressed)
> files:
>
> RAID 5 (1024k CHUNK)
> 26.95user 6.72system 0:37.89elapsed 88%CPU (0avgtext+0avgdata
> 0maxresident)k0inputs+0outputs (6major+526minor)pagefaults 0swaps
>
> Compare with RAID 0 for the same operation:
>
> (as with RAID5, it appears 256k-1024k..2048k possibly) is the sweet spot.
>
> Why does mdadm still use 64k for the default chunk size?
>
> And another quick question, would there be any benefit to use (if it were
> possible) a block size of > 4096 bytes with XFS (I assume only
> IA64/similar arch can support it), e.g. x86_64 cannot because the
> page_size is 4096.
>
> [ 8265.407137] XFS: only pagesize (4096) or less will currently work.
>
> The speeds:
>
> extract speed with 4 chunk:
> 27.30user 10.51system 0:55.87elapsed 67%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+526minor)pagefaults 0swaps
> 27.39user 10.38system 0:56.98elapsed 66%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+528minor)pagefaults 0swaps
> 27.31user 10.56system 0:57.70elapsed 65%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+528minor)pagefaults 0swaps
> extract speed with 8 chunk:
> 27.09user 9.27system 0:54.60elapsed 66%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+525minor)pagefaults 0swaps
> 27.23user 8.91system 0:54.38elapsed 66%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+527minor)pagefaults 0swaps
> 27.19user 8.98system 0:54.68elapsed 66%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+526minor)pagefaults 0swaps
> extract speed with 16 chunk:
> 27.12user 7.24system 0:51.12elapsed 67%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+526minor)pagefaults 0swaps
> 27.13user 7.12system 0:50.58elapsed 67%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+528minor)pagefaults 0swaps
> 27.11user 7.18system 0:50.56elapsed 67%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+527minor)pagefaults 0swaps
> extract speed with 32 chunk:
> 27.15user 6.52system 0:48.06elapsed 70%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+527minor)pagefaults 0swaps
> 27.24user 6.38system 0:49.10elapsed 68%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+528minor)pagefaults 0swaps
> 27.11user 6.46system 0:47.56elapsed 70%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+528minor)pagefaults 0swaps
> extract speed with 64 chunk:
> 27.15user 5.94system 0:45.13elapsed 73%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+525minor)pagefaults 0swaps
> 27.17user 5.94system 0:44.82elapsed 73%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+527minor)pagefaults 0swaps
> 27.02user 6.12system 0:44.61elapsed 74%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+525minor)pagefaults 0swaps
> extract speed with 128 chunk:
> 26.98user 5.78system 0:40.48elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+525minor)pagefaults 0swaps
> 27.05user 5.73system 0:40.30elapsed 81%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+528minor)pagefaults 0swaps
> 27.11user 5.68system 0:40.59elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+526minor)pagefaults 0swaps
> extract speed with 256 chunk:
> 27.10user 5.60system 0:36.47elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+525minor)pagefaults 0swaps
> 27.03user 5.67system 0:36.18elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+528minor)pagefaults 0swaps
> 27.17user 5.50system 0:37.38elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+526minor)pagefaults 0swaps
> extract speed with 512 chunk:
> 27.06user 5.54system 0:36.58elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+524minor)pagefaults 0swaps
> 27.03user 5.59system 0:36.31elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+525minor)pagefaults 0swaps
> 27.06user 5.58system 0:36.42elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+528minor)pagefaults 0swaps
> extract speed with 1024 chunk:
> 26.92user 5.69system 0:36.51elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+525minor)pagefaults 0swaps
> 27.18user 5.43system 0:36.39elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+528minor)pagefaults 0swaps
> 27.04user 5.60system 0:36.27elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+526minor)pagefaults 0swaps
> extract speed with 2048 chunk:
> 26.97user 5.63system 0:36.99elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+525minor)pagefaults 0swaps
> 26.98user 5.62system 0:36.90elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+527minor)pagefaults 0swaps
> 27.15user 5.44system 0:37.06elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+526minor)pagefaults 0swaps
> extract speed with 4096 chunk:
> 27.11user 5.54system 0:38.96elapsed 83%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+526minor)pagefaults 0swaps
> 27.09user 5.55system 0:38.85elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+527minor)pagefaults 0swaps
> 27.12user 5.52system 0:38.80elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+528minor)pagefaults 0swaps
> extract speed with 8192 chunk:
> 27.04user 5.57system 0:43.54elapsed 74%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+526minor)pagefaults 0swaps
> 27.15user 5.49system 0:43.52elapsed 75%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+526minor)pagefaults 0swaps
> 27.11user 5.52system 0:43.66elapsed 74%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+528minor)pagefaults 0swaps
> extract speed with 16384 chunk:
> 27.25user 5.45system 0:52.18elapsed 62%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+526minor)pagefaults 0swaps
> 27.18user 5.52system 0:52.54elapsed 62%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+527minor)pagefaults 0swaps
> 27.17user 5.50system 0:51.38elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (6major+525minor)pagefaults 0swaps
>
> Justin.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Raz

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations.
  2007-12-30 23:33 ` Raz
@ 2007-12-30 23:44   ` Wolfgang Denk
  0 siblings, 0 replies; 9+ messages in thread
From: Wolfgang Denk @ 2007-12-30 23:44 UTC (permalink / raw)
  To: Raz; +Cc: Justin Piszcz, xfs, linux-raid

In message <5d96567b0712301533v3a7fcfd1pfce02563526a39bf@mail.gmail.com> you wrote:
> what is nobarrier ?
...
> > > # mount -o logbsize=256k,nobarrier <dev> <mtpt>

See http://oss.sgi.com/projects/xfs/faq.html

Q: How can I address the problem with the write cache?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
One of the advantages of being a captain is being able to ask for ad-
vice without necessarily having to take it.
	-- Kirk, "Dagger of the Mind", stardate 2715.2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: New XFS benchmarks using David Chinner's recommendations for  XFS-based  optimizations.
  2007-12-30 23:04 New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations Justin Piszcz
  2007-12-30 23:33 ` Raz
@ 2007-12-31  1:14 ` Richard Scobie
  2007-12-31 15:05   ` Peter Grandi
  2007-12-31 15:22 ` Bill Davidsen
  2008-01-04  5:35 ` Changliang Chen
  3 siblings, 1 reply; 9+ messages in thread
From: Richard Scobie @ 2007-12-31  1:14 UTC (permalink / raw)
  To: Linux RAID Mailing List

Justin Piszcz wrote:

> Why does mdadm still use 64k for the default chunk size?

Probably because this is the best balance for average file sizes, which 
are smaller than you seem to be testing with?

Regards,

Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: New XFS benchmarks using David Chinner's recommendations for  XFS-based  optimizations.
  2007-12-31  1:14 ` Richard Scobie
@ 2007-12-31 15:05   ` Peter Grandi
  2007-12-31 19:32     ` Richard Scobie
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Grandi @ 2007-12-31 15:05 UTC (permalink / raw)
  To: Richard Scobie; +Cc: Linux RAID Mailing List

>> Why does mdadm still use 64k for the default chunk size?

> Probably because this is the best balance for average file
> sizes, which are smaller than you seem to be testing with?

Well "average file sizes" relate less to chunk sizes than access
patterns do. Single threaded sequential reads with smaller block
sizes tend to perform better with smaller chunk sizes, for example.
File sizes do influence a bit the access patterns seen by disks.

The goals of a chunk size choice are to spread a single
''logical'' (application or filesystem level) operation over as
many arms as possible while keeping rotational latency low, to
minimize arm movement, and to minimize arm contention among
different threads.

Thus the tradeoffs influencing chunk size are about the
sequential vs. random nature of reads or writes, how many blocks
are involved in a single ''logical'' operation, and how many
threads are accessing the array.

The goal here is not to optimize the speed of the array, but the
throughput and/or latency of the applications using it.

A good idea is to consider the two extremes: a chunk size of 1
sector and a chunk size of the whole disk (or perhaps more
interestingly 1/2 disk).

For example, consider a RAID0 of 4 disks ('a', 'b', 'c', 'd')
with each chunk size of 8 sectors.

To read the first 16 chunks or 128 sectors of the array these
sector read operations ['get(device,first,last)'] have to be
issued:

  00-31: get(a,0,7) get(b,0,7) get(c,0,7) get(d,0,7)
  32-63:   get(a,8,15) get(b,8,15) get(c,8,15) get(d,8,15)
  64-95:     get(a,16,23) get(b,16,23) get(c,16,23) get(d,16,23)
  96-127:      get(a,24,31) get(b,24,31) get(c,24,31) get(d,24,31)

I have indented the lists to show the increasing offset into
each block device.

Now, the big question here are all about the interval between
these operations, that is how large are logical operations and
how much they cluster in time and space.

For example in the above sequence it matters whether clusters of
operations involve less then 32 sectors or not, and the likely
interval between clusters generated by different concurrent
applications (consider rotational latency and likelyhood of arm
being moved between successive clusters).

So that space/time clustering depends more on how applications
process their data and how many applications concurrently access
the array, and whether they are reading or writing.

The latter point involves an exceedingly important asymmetry
that is often forgotten: an application read can only complete
when the last block is read, while a write can complete as soon
as it issued. So the time clustering of sector reads depends on
how long ''logical'' reads are as well as how long is the
interval between them.

So an application that issues frequent small reads rather than
infrequent large ones may work best with a small chunk size.

Not much related to distribution of file sizes, unless this
influences the space/time clustering of application issued
operations...

In general I like smaller chunk sizes than larger chunk sizes,
because the latter work well only in somewhat artificial cases
like simple-minded benchmarks.

In particular if one uses parity-based (not a good idea in
general...) arrays, as small chunk sizes (as well as stripe
sizes) give a better chance of reducing the frequency of RMW.

Counter to this that the Linux IO queueing subsystem (elevators
etc.) perhaps does not always take advantage of parallelizable
operations across disks as much as it could, and bandwidth
bottlenecks (e.g. PCI bus).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations.
  2007-12-30 23:04 New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations Justin Piszcz
  2007-12-30 23:33 ` Raz
  2007-12-31  1:14 ` Richard Scobie
@ 2007-12-31 15:22 ` Bill Davidsen
  2008-01-04  5:35 ` Changliang Chen
  3 siblings, 0 replies; 9+ messages in thread
From: Bill Davidsen @ 2007-12-31 15:22 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs, linux-raid, Alan Piszcz

Justin Piszcz wrote:
> Dave's original e-mail:
>
>> # mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2 -d 
>> agcount=4 <dev>
>> # mount -o logbsize=256k <dev> <mtpt>
>
>> And if you don't care about filsystem corruption on power loss:
>
>> # mount -o logbsize=256k,nobarrier <dev> <mtpt>
>
>> Those mkfs values (except for log size) will be hte defaults in the next
>> release of xfsprogs.
>
>> Cheers,
>
>> Dave.
>> -- 
>> Dave Chinner
>> Principal Engineer
>> SGI Australian Software Group
>
> ---------
>
> I used his mkfs.xfs options verbatim but I use my own mount options:
> noatime,nodiratime,logbufs=8,logbsize=26214
>
> Here are the results, the results of 3 bonnie++ averaged together for 
> each test:
> http://home.comcast.net/~jpiszcz/xfs1/result.html
>
> Thanks Dave, this looks nice--the more optimizations the better!
>
> -----------
>
> I also find it rather pecuilar that in some of my (other) benchmarks 
> my RAID 5 is just as fast as RAID 0 for extracting large files 
> (uncompressed) files:
>
> RAID 5 (1024k CHUNK)
> 26.95user 6.72system 0:37.89elapsed 88%CPU (0avgtext+0avgdata 
> 0maxresident)k0inputs+0outputs (6major+526minor)pagefaults 0swaps
>
> Compare with RAID 0 for the same operation:
>
> (as with RAID5, it appears 256k-1024k..2048k possibly) is the sweet spot.
>
> Why does mdadm still use 64k for the default chunk size?

Write performance with small files, I would think. There is some 
information in old posts, but I don't seem to find them as quickly as I 
would like.
>
> And another quick question, would there be any benefit to use (if it 
> were possible) a block size of > 4096 bytes with XFS (I assume only 
> IA64/similar arch can support it), e.g. x86_64 cannot because the 
> page_size is 4096.
>
> [ 8265.407137] XFS: only pagesize (4096) or less will currently work.


-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: New XFS benchmarks using David Chinner's recommendations for XFS-based  optimizations.
  2007-12-31 15:05   ` Peter Grandi
@ 2007-12-31 19:32     ` Richard Scobie
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Scobie @ 2007-12-31 19:32 UTC (permalink / raw)
  To: Linux RAID Mailing List

Peter Grandi wrote:

> In particular if one uses parity-based (not a good idea in
> general...) arrays, as small chunk sizes (as well as stripe
> sizes) give a better chance of reducing the frequency of RMW.

Thanks for your thoughts - the above was my thinking when I posted.

Regards,

Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations.
  2007-12-30 23:04 New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations Justin Piszcz
                   ` (2 preceding siblings ...)
  2007-12-31 15:22 ` Bill Davidsen
@ 2008-01-04  5:35 ` Changliang Chen
  2008-01-04  9:07   ` Justin Piszcz
  3 siblings, 1 reply; 9+ messages in thread
From: Changliang Chen @ 2008-01-04  5:35 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs, linux-raid, Alan Piszcz

Hi Justin£¬
    From your report£¬It looks that the p34-default's behavior is
better£¬which item make you consider that the p34-dchinner looks nice£¿

-- 
Best Regards


[[HTML alternate version deleted]]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations.
  2008-01-04  5:35 ` Changliang Chen
@ 2008-01-04  9:07   ` Justin Piszcz
  0 siblings, 0 replies; 9+ messages in thread
From: Justin Piszcz @ 2008-01-04  9:07 UTC (permalink / raw)
  To: Changliang Chen; +Cc: xfs, linux-raid, Alan Piszcz

[-- Attachment #1: Type: TEXT/PLAIN, Size: 308 bytes --]



On Fri, 4 Jan 2008, Changliang Chen wrote:

> Hi Justin£¬
     From your report£¬It looks that the p34-default's behavior is
better£¬which item make you consider that the p34-dchinner looks nice£¿

-- 
Best Regards

The re-write and sequential input and output is faster for dchinner.

Justin.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-01-04  9:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-30 23:04 New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations Justin Piszcz
2007-12-30 23:33 ` Raz
2007-12-30 23:44   ` Wolfgang Denk
2007-12-31  1:14 ` Richard Scobie
2007-12-31 15:05   ` Peter Grandi
2007-12-31 19:32     ` Richard Scobie
2007-12-31 15:22 ` Bill Davidsen
2008-01-04  5:35 ` Changliang Chen
2008-01-04  9:07   ` Justin Piszcz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).