Minutes from 10/1 LSE Call

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Minutes from 10/1 LSE Call
@ 2003-10-01 19:19 Hanna Linder
  2003-10-01 23:29 ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Hanna Linder @ 2003-10-01 19:19 UTC (permalink / raw)
  To: lse-tech, linux-kernel

	LSE Call Minutes from 10/1 

I. Sylvain Jeaugey and Simon Derr: CPUSETS, Controlling CPU placement.

http://marc.theaimsgroup.com/?l=lse-tech&m=106441942222186&w=2

Prcesses attach to cpusets. Made changes to system calls.
new file /proc/cpusets which shows which cpus are in the sets. also
/proc/pid/ info. Only cpus in the current cpuset are available to
the processes attached to the set. Hooks into fork and other system calls.
There are man pages on the web site for all these processes. 
www.opensource.org/cpuset
----
Paul Jackson- q about restriction to certain cpus. ie cpu 3 & 5 in the
set. Does the process see the real numbers or are they renumbered? 

A - They are not renumbered in the /proc info but for the schedsetaffinity
call they are renumbered starting at 0. 

Q- So an application could become confused? 

A- No, the application doesnt need to know those details.

Paul - Sometimes applications do want to know what cpu they are on.
----
Hubertus Franke- Do you do any optimization based on cpu usage? because it
is a virtualization they have the power to assign any cpu at will.

A -Right now we just have a basic view of the machine.  There is some 
awareness of NUMA.

Paul J- I would guess we would put the optimization at a higher layer
than the virtualization layer.

Hubertus-  I think a good idea would be to not have a set mapping but
let the kernel pick the best mapping based on load or some intelligent
entity. That could be an interesting extension to this.

---

Mike Raymond - Can threads within the set leave? 

A- No

Q - Can threads outside the cpuset join?

A- no, unless their cpusets overlap. Every thread is in a cpuset.
no thread can leave their cpuset.

Q - Is there some option ie strict that tells whether or not some
other thread can steal resources or not. 

A - Only threads within the cpuset can use the processor.
The parent is always a superset of the child.

Q- I can imagine we would want the cpus for certain daemons
to be spun up very early.

More difficult to move memory than to reschedule a task.
But that is a bigger topic, not for this call.

====================
II. Mark Gross: Real-Time applications needs when system is stressed. 

http://marc.theaimsgroup.com/?l=lse-tech&m=106494685313136&w=2

Looking at the performance robustness under Linux as the workloads
get heavy. Most this work is being motivated from telco software
venders as they try to use Linux. It works OK until the workload
gets large. At least that is what we keep hearing.

The things we are looking at is to try to isolate the bottlenecks
and create some mircobenchmarks that expose these bottlenecks.
We are also looking into developing tools to identify these
bottlenecks. I believe they are coming from semaphore locks
in the kernel. Would like a tool, such as lockmeter, that
could work with semaphore locks.

We got about 100 mb/sec using the bonie benchmark for block io writes, 
for writes we hit a ceiling around 100-120 mb/sec. Stopped scaling 
after about 3 spindles.

Tried to focus on the block io part of it. Have not tried
direct or raw io yet.  With block IO we got about 133 mb/sec 
doing a simple dd to dev/null from multiple spindles. This
was on the 2.6 test 3.

Jesse Barnes recommends trying direct or raw io. They saw some 
higher numbers. Raw and Odirect avoid the cpu more than block io does.

Block it goes through the buffer cache instead of he page cache.
which only comes out of lowmem which really can restrict you.

Badari Pulavarty, said that is by design and there is no work 
going on to change that to page cache. It is buffer IO, it is 
supposed to go through the cache now. There was some talk but 
nobody cared enough to change it. No one is saying to go through 
block io right now anyway.

Q - Is using direct IO an option for posix apps?

A- not right now. There are a lot of apps.

It is great that direct io kicks ass but we should make sure
file io kicks butt too.

Odirect on large block sizes has low cpu utilization ( 3-5% ).
However with buffered IO we can easily get to 100% cpu utilization. 
If you look at the profile most of that is in the copy_to_user function.

Dave howell- so these apps dont run very well on Linux right
now?

A- Yes.

Q- Did you see any spinlock contention with lockmeter?

A- Not really, there arent many spinlocks with this app.

Mark Gross Q- Is there a Kernprof for 2.6? Andrew Morton
has lockmeter in his tree but not kernprof.

John Hawkes A- No. Will require a different approach in 2.6 than 2.4. 
Could use some help on this.

Badari- going back to the 100% utilization issue with block io.
That is probably caused by low mem problem. It would be good
to change the app to not use block io.

A- It is a whole suite of telco apps that use Posix so that is a 
difficult option.

=======================

III. Steve Pratt: IO performance in mm tree vs mainline.

http://marc.theaimsgroup.com/?l=lse-tech&m=106495069918264&w=2

Got some interesting results comparing IO throughput on the mainline
tree vs the mm tree doing random and sequential reads with different
block sizes. 

With large blcok size  and random reads the (aio readahead speedup patch) 
was giving significant thruput improvements. However, with sequential
reads we saw a 10% overhead in cpu utilization no throuput increase. 
So there are some drawbacks to the readahead patch if doing both random and 
sequential reads.

In mainline, once block size is over 32k our throuput actually drops off.
It levels off around 128k but at a greater cpu utilization.
Dont really understand why that is. 

In mm tree, maintains throughput for all block size but the cpu utilzation
keeps going up to do the same throughput.  Readprofile shows the extra time 
is being spent in copy_to_user (in mm tree). Backing out readahead patch 
reduces cpu by 10% for all block sizes but still shows the spike. So that
isnt the main problem.

Badari - Are you using the serveraid or ips driver? 

A - yes. We are planning on moving to the qlogic driver but need to get 
different disks and get it all set up. But I wouldnt expect that to 
kick in until 128k. So the fact it starts at 32k makes me think that
isnt the issue. So 64k is less than the max throughput.  

Q- What tool are you using for these measurements?

A- It is an internal tool called Raw Read that has been open sourced 
and is available on IBM's open source site. Under developerworks
under Linux Perf. Will get the latest version out soon.

Mark Gross- We saw a similiar performance degredation as we scaled
by spindles.

Steve - I can add spindles no problem, we actually have 80 spindles
in a raid array that looks like 20 spindles.

---

minutes compiled by Hanna Linder 10/01/03

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Minutes from 10/1 LSE Call
  2003-10-01 19:19 Minutes from 10/1 LSE Call Hanna Linder
@ 2003-10-01 23:29 ` Andrew Morton
  2003-10-01 23:38   ` Larry McVoy
  2003-10-02 19:21   ` [Lse-tech] " Steven Pratt
  0 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2003-10-01 23:29 UTC (permalink / raw)
  To: Hanna Linder; +Cc: lse-tech, linux-kernel

Hanna Linder <hannal@us.ibm.com> wrote:
>
> In mainline, once block size is over 32k our throuput actually drops off.
> It levels off around 128k but at a greater cpu utilization.
> Dont really understand why that is. 

Probably thrashing the CPU's L1 cache.

> In mm tree, maintains throughput for all block size but the cpu utilzation
> keeps going up to do the same throughput.  Readprofile shows the extra time 
> is being spent in copy_to_user (in mm tree). Backing out readahead patch 
> reduces cpu by 10% for all block sizes but still shows the spike. So that
> isnt the main problem.

If you have a loop like:

	char *buf;

	for (lots) {
		read(fd, buf, size);
	}


the optimum value of `size' is small: as little as 8k.  Once `size' gets
close to half the size of the L1 cache you end up pushing the memory at
`buf' out of CPU cache all the time.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Minutes from 10/1 LSE Call
  2003-10-01 23:29 ` Andrew Morton
@ 2003-10-01 23:38   ` Larry McVoy
  2003-10-02  0:23     ` Jeff Garzik
  2003-10-02 19:21   ` [Lse-tech] " Steven Pratt
  1 sibling, 1 reply; 13+ messages in thread
From: Larry McVoy @ 2003-10-01 23:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hanna Linder, lse-tech, linux-kernel

On Wed, Oct 01, 2003 at 04:29:16PM -0700, Andrew Morton wrote:
> If you have a loop like:
> 
> 	char *buf;
> 
> 	for (lots) {
> 		read(fd, buf, size);
> 	}
> 
> the optimum value of `size' is small: as little as 8k.  Once `size' gets
> close to half the size of the L1 cache you end up pushing the memory at
> `buf' out of CPU cache all the time.

I've seen this too, not that Andrew needs me to back him up, but in many 
cases even 4k is big enough.  Linux has a very thin system call layer so
it is OK, good even, to use reasonable buffer sizes.
-- 
---
Larry McVoy              lm at bitmover.com          http://www.bitmover.com/lm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Minutes from 10/1 LSE Call
  2003-10-01 23:38   ` Larry McVoy
@ 2003-10-02  0:23     ` Jeff Garzik
  2003-10-02 18:56       ` insecure
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2003-10-02  0:23 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Andrew Morton, Hanna Linder, lse-tech, linux-kernel

Larry McVoy wrote:
> On Wed, Oct 01, 2003 at 04:29:16PM -0700, Andrew Morton wrote:
> 
>>If you have a loop like:
>>
>>	char *buf;
>>
>>	for (lots) {
>>		read(fd, buf, size);
>>	}
>>
>>the optimum value of `size' is small: as little as 8k.  Once `size' gets
>>close to half the size of the L1 cache you end up pushing the memory at
>>`buf' out of CPU cache all the time.
> 
> 
> I've seen this too, not that Andrew needs me to back him up, but in many 
> cases even 4k is big enough.  Linux has a very thin system call layer so
> it is OK, good even, to use reasonable buffer sizes.


Slight tangent, FWIW...   Back when I was working on my "race-free 
userland" project, I noticed that the fastest cp(1) implementation was 
GNU's:  read/write from a single, statically allocated, page-aligned 4K 
buffer.  I experimented with various buffer sizes, mmap-based copies, 
and even with sendfile(2) where both arguments were files. 
read(2)/write(2) of a single 4K buffer was always the fastest.

	Jeff




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Minutes from 10/1 LSE Call
  2003-10-02  0:23     ` Jeff Garzik
@ 2003-10-02 18:56       ` insecure
  2003-10-02 19:10         ` Jeff Garzik
  0 siblings, 1 reply; 13+ messages in thread
From: insecure @ 2003-10-02 18:56 UTC (permalink / raw)
  To: Jeff Garzik, Larry McVoy
  Cc: Andrew Morton, Hanna Linder, lse-tech, linux-kernel

On Thursday 02 October 2003 03:23, Jeff Garzik wrote:
> Larry McVoy wrote:
> > On Wed, Oct 01, 2003 at 04:29:16PM -0700, Andrew Morton wrote:
> >>If you have a loop like:
> >>
> >>	char *buf;
> >>
> >>	for (lots) {
> >>		read(fd, buf, size);
> >>	}
> >>
> >>the optimum value of `size' is small: as little as 8k.  Once `size' gets
> >>close to half the size of the L1 cache you end up pushing the memory at
> >>`buf' out of CPU cache all the time.
> >
> > I've seen this too, not that Andrew needs me to back him up, but in many
> > cases even 4k is big enough.  Linux has a very thin system call layer so
> > it is OK, good even, to use reasonable buffer sizes.
>
> Slight tangent, FWIW...   Back when I was working on my "race-free
> userland" project, I noticed that the fastest cp(1) implementation was
> GNU's:  read/write from a single, statically allocated, page-aligned 4K
> buffer.  I experimented with various buffer sizes, mmap-based copies,
> and even with sendfile(2) where both arguments were files.
> read(2)/write(2) of a single 4K buffer was always the fastest.

That sounds reasonable, but today's RAM throughput is on the order
of 1GB/s, not 100Mb/s. 'Out of L1' theory can't explain 100Mb/s ceiling
it seems.
--
vda

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Minutes from 10/1 LSE Call
  2003-10-02 18:56       ` insecure
@ 2003-10-02 19:10         ` Jeff Garzik
  2003-10-02 22:38           ` insecure
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2003-10-02 19:10 UTC (permalink / raw)
  To: insecure; +Cc: Larry McVoy, Andrew Morton, Hanna Linder, lse-tech, linux-kernel

insecure wrote:
> That sounds reasonable, but today's RAM throughput is on the order
> of 1GB/s, not 100Mb/s. 'Out of L1' theory can't explain 100Mb/s ceiling
> it seems.

cp(1) data, at least, will never ever be in L1.  Copying data you need 
to look at the ends of the pipeline -- hard drive throughput, PCI bus 
bandwidth, FSB bandwidth, speed at which ext2/3 allocates blocks, and 
similar things are likely bottlenecks.

You'll never hit RAM bandwidth limits, unless your copies are extremely 
tiny, and entirely in L2 or pagecache.

	Jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Lse-tech] Re: Minutes from 10/1 LSE Call
  2003-10-01 23:29 ` Andrew Morton
  2003-10-01 23:38   ` Larry McVoy
@ 2003-10-02 19:21   ` Steven Pratt
  2003-10-02 19:36     ` Andrew Morton
  1 sibling, 1 reply; 13+ messages in thread
From: Steven Pratt @ 2003-10-02 19:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hanna Linder, lse-tech, linux-kernel

Andrew Morton wrote:

>Hanna Linder <hannal@us.ibm.com> wrote:
>  
>
>>In mainline, once block size is over 32k our throuput actually drops off.
>>It levels off around 128k but at a greater cpu utilization.
>>Dont really understand why that is. 
>>    
>>
>
>Probably thrashing the CPU's L1 cache.
>  
>
>>In mm tree, maintains throughput for all block size but the cpu utilzation
>>keeps going up to do the same throughput.  Readprofile shows the extra time 
>>is being spent in copy_to_user (in mm tree). Backing out readahead patch 
>>reduces cpu by 10% for all block sizes but still shows the spike. So that
>>isnt the main problem.
>>    
>>
>
>If you have a loop like:
>
>	char *buf;
>
>	for (lots) {
>		read(fd, buf, size);
>	}
>
>
>the optimum value of `size' is small: as little as 8k.  Once `size' gets
>close to half the size of the L1 cache you end up pushing the memory at
>`buf' out of CPU cache all the time.
>
>  
>
Sure, but why do I only see this is the mm tree, and not the mainline 
tree. Also, there are 160 threads doing this loop, so even at 32k block 
size the  working set of 'buf's is over 5MB with only a 2MB L2 cache.

Steve

>
>-------------------------------------------------------
>This sf.net email is sponsored by:ThinkGeek
>Welcome to geek heaven.
>http://thinkgeek.com/sf
>_______________________________________________
>Lse-tech mailing list
>Lse-tech@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/lse-tech
>  
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Lse-tech] Re: Minutes from 10/1 LSE Call
  2003-10-02 19:21   ` [Lse-tech] " Steven Pratt
@ 2003-10-02 19:36     ` Andrew Morton
  2003-10-03 19:33       ` Steven Pratt
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2003-10-02 19:36 UTC (permalink / raw)
  To: Steven Pratt; +Cc: hannal, lse-tech, linux-kernel

Steven Pratt <slpratt@austin.ibm.com> wrote:
>
>  Sure, but why do I only see this is the mm tree, and not the mainline 
>  tree.

Please send a full description of how to reproduce it and I'll take a look.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Minutes from 10/1 LSE Call
  2003-10-02 19:10         ` Jeff Garzik
@ 2003-10-02 22:38           ` insecure
  2003-10-02 22:45             ` Hanna Linder
  2003-10-05  5:38             ` Andrew Morton
  0 siblings, 2 replies; 13+ messages in thread
From: insecure @ 2003-10-02 22:38 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Larry McVoy, Andrew Morton, Hanna Linder, lse-tech, linux-kernel

On Thursday 02 October 2003 22:10, Jeff Garzik wrote:
> insecure wrote:
> > That sounds reasonable, but today's RAM throughput is on the order
> > of 1GB/s, not 100Mb/s. 'Out of L1' theory can't explain 100Mb/s ceiling
> > it seems.
>
> cp(1) data, at least, will never ever be in L1.  Copying data you need
> to look at the ends of the pipeline -- hard drive throughput, PCI bus
> bandwidth, FSB bandwidth, speed at which ext2/3 allocates blocks, and
> similar things are likely bottlenecks.

Hmm.

On Wednesday 01 October 2003 22:19, Hanna Linder wrote:
> We got about 100 mb/sec using the bonie benchmark for block io writes,
> for writes we hit a ceiling around 100-120 mb/sec. Stopped scaling
> after about 3 spindles.
>
> Tried to focus on the block io part of it. Have not tried
> direct or raw io yet.  With block IO we got about 133 mb/sec
> doing a simple dd to dev/null from multiple spindles. This
> was on the 2.6 test 3.
> ....
> Odirect on large block sizes has low cpu utilization ( 3-5% ).
> However with buffered IO we can easily get to 100% cpu utilization.
> If you look at the profile most of that is in the copy_to_user function.

So:
* we hit a ceiling of ~133 Mb/s, no matter how many disks
* CPU utilization is 100%, spent mostly in copy_to_user
* RAM bandwidth is >1Gb/s

These can't be true at once.
At least one of these three statements must be false (imho).
-- 
vda

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Minutes from 10/1 LSE Call
  2003-10-02 22:38           ` insecure
@ 2003-10-02 22:45             ` Hanna Linder
  2003-10-05  5:38             ` Andrew Morton
  1 sibling, 0 replies; 13+ messages in thread
From: Hanna Linder @ 2003-10-02 22:45 UTC (permalink / raw)
  To: insecure, Jeff Garzik
  Cc: Larry McVoy, Andrew Morton, Hanna Linder, lse-tech, linux-kernel


Ahh. I see the confusion. These are excerpts from two different speakers.
Notice the line of ===== separating speakers in the minutes. 
This part is from Steve Pratt's mm vs mainline comparisons:


> On Thursday 02 October 2003 22:10, Jeff Garzik wrote:
>> insecure wrote:
>> > That sounds reasonable, but today's RAM throughput is on the order
>> > of 1GB/s, not 100Mb/s. 'Out of L1' theory can't explain 100Mb/s ceiling
>> > it seems.
>> 
>> cp(1) data, at least, will never ever be in L1.  Copying data you need
>> to look at the ends of the pipeline -- hard drive throughput, PCI bus
>> bandwidth, FSB bandwidth, speed at which ext2/3 allocates blocks, and
>> similar things are likely bottlenecks.
> 
> Hmm.
> 

This part is from Mark Gross's Real Time Application issues discussion:


> On Wednesday 01 October 2003 22:19, Hanna Linder wrote:
>> We got about 100 mb/sec using the bonie benchmark for block io writes,
>> for writes we hit a ceiling around 100-120 mb/sec. Stopped scaling
>> after about 3 spindles.
>> 
>> Tried to focus on the block io part of it. Have not tried
>> direct or raw io yet.  With block IO we got about 133 mb/sec
>> doing a simple dd to dev/null from multiple spindles. This
>> was on the 2.6 test 3.
>> ....
>> Odirect on large block sizes has low cpu utilization ( 3-5% ).
>> However with buffered IO we can easily get to 100% cpu utilization.
>> If you look at the profile most of that is in the copy_to_user function.
> 
> So:
> * we hit a ceiling of ~133 Mb/s, no matter how many disks
> * CPU utilization is 100%, spent mostly in copy_to_user
> * RAM bandwidth is >1Gb/s


Hanna


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Lse-tech] Re: Minutes from 10/1 LSE Call
  2003-10-02 19:36     ` Andrew Morton
@ 2003-10-03 19:33       ` Steven Pratt
  2003-10-03 20:13         ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Pratt @ 2003-10-03 19:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hannal, lse-tech, linux-kernel

Andrew Morton wrote:

>Steven Pratt <slpratt@austin.ibm.com> wrote:
>  
>
>> Sure, but why do I only see this is the mm tree, and not the mainline 
>> tree.
>>    
>>
>
>Please send a full description of how to reproduce it and I'll take a look.
>
>  
>
Get the latest rawread from 
http://www-124.ibm.com/developerworks/opensource/linuxperf/rawread/rawread.html

mkfs devices and mount on /mnt/mntN  where N is increasing index.  
Create file 'foo'  in each filesystem of size 1GB (for this example).  
Unmount and remount the partitions/devices to flush the cache.  
Filesystems are also umounted and re-mounted between each test run.

The following rawread commands will run the tests for block sizes 
ranging from 1k-512k.  The "-d 1" parameters assumes that you mounted 
starting at /mnt/mnt1  and the "-m2 -p16" say to run 8 threads on each 
of 2 devices /mnt/mnt1 and /mnt/mnt2.

rawread -m 2 -p 16 -d 6 -n 20480 -f -c -t 0 -s 1024

rawread -m 2 -p 16 -d 6 -n 10240 -f -c -t 0 -s 2048

rawread -m 2 -p 16 -d 6 -n 5120 -f -c -t 0 -s 4096

rawread -m 2 -p 16 -d 6 -n 2560 -f -c -t 0 -s 8192

rawread -m 2 -p 16 -d 1 -n 1280 -f -c -t 0 -s 16384

rawread -m 2 -p 16 -d 1 -n 640 -f -c -t 0 -s 32768

rawread -m 2 -p 16 -d 1 -n 320 -f -c -t 0 -s 65536

rawread -m 2 -p 16 -d 1 -n 160 -f -c -t 0 -s 131072

rawread -m 2 -p 16 -d 1 -n 80 -f -c -t 0 -s 262144

rawread -m 2 -p 16 -d 1 -n 40 -f -c -t 0 -s 524288

2 devices is the smallest number I have been able to run which shows 
this problem.  With only 1 device I did not see it.  My original tests 
were done with 20 devices.  One thing of interest is that with only 2 
devices the point at which CPU starts to increase again is at 128k 
instead of at 32k which I saw with 20 devices.  This would support your 
theory that this is casued by cache misses with more/larger buffers.  
I'm still not sure this accounts for all of the extra CPU usage, but I 
am less worried about it.

But as long as I have your attention, there is one other thing about 
these runs which bothers me, which is that the mm tree is doing horribly 
on 1k and 2k block sizes.  I looks like readahead is not functioning 
properly for these requst sizes.

Here is a comparison for 2 devices between test6 and test6mm1.  You can 
see that the mm1 tree does great at larger block sizes, but poorly at 
small ones.

Results:seqread-_vs_.seqread-

                                          tolerance = 0.00 + 3.00% of A
                test6         test6-mm1
 Blocksize      KBs/sec      KBs/sec    %diff         diff    tolerance
---------- ------------ ------------ -------- ------------ ------------
      1024        44083        22641   -48.64    -21442.00      1322.49  *
      2048        45276        26371   -41.76    -18905.00      1358.28  *
      4096        44024        45260     2.81      1236.00      1320.72
      8192        44519        50073    12.48      5554.00      1335.57  *
     16384        46869        51528     9.94      4659.00      1406.07  *
     32768        47900        52231     9.04      4331.00      1437.00  *
     65536        42803        52183    21.91      9380.00      1284.09  *
    131072        36525        49724    36.14     13199.00      1095.75  *
    262144        34628        46192    33.39     11564.00      1038.84  *
    524288        28997        48005    65.55     19008.00       869.91  *

Results:seqread-_vs_.seqread-
                                          tolerance = 0.50 + 3.00% of A
               test6         test6-mm1
 Blocksize         %CPU         %CPU    %diff         diff    tolerance
---------- ------------ ------------ -------- ------------ ------------
      1024        27.87        11.72   -57.95       -16.15         1.34  *
      2048        13.77         8.84   -35.80        -4.93         0.91  *
      4096            9         9.99    11.00         0.99         0.77  *
      8192         8.07         8.31     2.97         0.24         0.74
     16384          5.7         6.63    16.32         0.93         0.67  *
     32768         4.93         5.59    13.39         0.66         0.65  *
     65536         3.76          4.7    25.00         0.94         0.61  *
    131072         3.25         4.53    39.38         1.28         0.60  *
    262144         3.23         6.15    90.40         2.92         0.60  *
    524288         2.97         8.19   175.76         5.22         0.59  *

Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Lse-tech] Re: Minutes from 10/1 LSE Call
  2003-10-03 19:33       ` Steven Pratt
@ 2003-10-03 20:13         ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2003-10-03 20:13 UTC (permalink / raw)
  To: Steven Pratt; +Cc: hannal, lse-tech, linux-kernel

Steven Pratt <slpratt@austin.ibm.com> wrote:
>
> Get the latest rawread from 
> http://www-124.ibm.com/developerworks/opensource/linuxperf/rawread/rawread.html

Sigh, I was afraid of that.  My previous outings with rawread have not been
happy.  Maybe your detailed descriptions will help.

> But as long as I have your attention, there is one other thing about 
> these runs which bothers me, which is that the mm tree is doing horribly 
> on 1k and 2k block sizes.  I looks like readahead is not functioning 
> properly for these requst sizes.

There are several problems with readahead under specific circumstances.  I
need to have a session with it.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Minutes from 10/1 LSE Call
  2003-10-02 22:38           ` insecure
  2003-10-02 22:45             ` Hanna Linder
@ 2003-10-05  5:38             ` Andrew Morton
  1 sibling, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2003-10-05  5:38 UTC (permalink / raw)
  To: insecure; +Cc: jgarzik, lm, hannal, lse-tech, linux-kernel

insecure <insecure@mail.od.ua> wrote:
>
> So:
>  * we hit a ceiling of ~133 Mb/s, no matter how many disks
>  * CPU utilization is 100%, spent mostly in copy_to_user
>  * RAM bandwidth is >1Gb/s
> 
>  These can't be true at once.

True.  But bear in mind that the data crosses the memory busses up to three
times: disk to pagecache, pagecache to CPU, CPU to user memory.

So top speed may be as little as 300 MB/sec.



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2003-10-05  5:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-01 19:19 Minutes from 10/1 LSE Call Hanna Linder
2003-10-01 23:29 ` Andrew Morton
2003-10-01 23:38   ` Larry McVoy
2003-10-02  0:23     ` Jeff Garzik
2003-10-02 18:56       ` insecure
2003-10-02 19:10         ` Jeff Garzik
2003-10-02 22:38           ` insecure
2003-10-02 22:45             ` Hanna Linder
2003-10-05  5:38             ` Andrew Morton
2003-10-02 19:21   ` [Lse-tech] " Steven Pratt
2003-10-02 19:36     ` Andrew Morton
2003-10-03 19:33       ` Steven Pratt
2003-10-03 20:13         ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox