Re: scalable kmap (was Re: vm lock contention reduction) (fwd)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@zip.com.au>
To: Hanna Linder <hannal@us.ibm.com>
Cc: "Martin J. Bligh" <Martin.Bligh@us.ibm.com>,
	Keith Mannthey <mannthey@us.ibm.com>,
	haveblue@us.ibm.com, lse-tech@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: Re: scalable kmap (was Re: vm lock contention reduction) (fwd)
Date: Wed, 10 Jul 2002 20:06:22 -0700	[thread overview]
Message-ID: <3D2CF62E.949F20B4@zip.com.au> (raw)
In-Reply-To: 40740000.1026339488@w-hlinder

Hanna Linder wrote:
> 
> ...
> Andrew and Martin,
> 
>         I ran this updated patch on 2.5.25 with dbench on
> the 8-way with 4 Gb of memory compared to clean 2.5.25.
> I saw a significant improvement in throughput about 15%
> (averaged over 5 runs each).

Thanks, Hanna.

The kernel compile test isn't a particularly heavy user of
copy_to_user(), whereas with RAM-only dbench, copy_*_user()
is almost the only thing it does.  So that makes sense.

Tried dbench on the 2.5G 4xPIII Xeon: no improvement at all.
This thing seems to have quite poor memory bandwidth - maybe
250 megabyte/sec downhill with the wind at its tail.

>         Included is the pretty picture (akpm-2525.png) the
> data that picture came from (akpm-2525.data) and the raw
> results of the runs with min/max and timing results
> (2525akpmkmaphi and 2525clnhi).
>         I believe the drop at around 64 clients is caused by
> memory swapping leading to increased disk accesses since the
> time increased by 200% in direct correlation with the decreased
> throughput.

Yes.  The test went to disk.   There are two reasons why
it will do this:

1: Some dirty data was in memory for more than 30-35 seconds or

2: More than 40% of memory is dirty.

In your case, the 64-client run was taking 32 seconds.  After that
the disks lit up.  Once that happens, dbench isn't a very good
benchmark.  It's an excellent benchmark when it's RAM-only
though.  Very repeatable and hits lots of code paths which matter.

You can run more clients before the disk I/O cuts in by
increasing /proc/sys/vm/dirty_expire_centisecs and
/proc/sys/vm/dirty_*_ratio.

The patch you tested only uses the atomic kmap across generic_file_read.
It is reasonable to hope that another 15% or morecan be gained by holding
an atomic kmap across writes as well.  On your machine ;)

Here's what oprofile says about `dbench 40' with that patch:

c0140f1c 402      0.609543    __block_commit_write    
c013dfd4 413      0.626222    vfs_write               
c01402cc 431      0.653515    __find_get_block        
c013a895 472      0.715683    .text.lock.highmem      
c017fe30 494      0.749041    ext2_get_block          
c012cef0 564      0.85518     unlock_page             
c013ee80 564      0.85518     fget                    
c01079f4 571      0.865794    apic_timer_interrupt    
c01e8ecc 594      0.900669    radix_tree_lookup       
c013da90 597      0.905218    generic_file_llseek     
c01514b4 607      0.92038     __d_lookup              
c0106ff8 687      1.04168     system_call             
c013a02c 874      1.32523     kunmap_high             
c0148388 922      1.39801     link_path_walk          
c0140b00 1097     1.66336     __block_prepare_write   
c01346d0 1138     1.72552     rmqueue                 
c01127ac 1243     1.88473     smp_apic_timer_interrupt 
c0139eb8 1514     2.29564     kmap_high               
c0105368 6188     9.38272     poll_idle               
c012d8a8 9564     14.5017     file_read_actor         
c012ea70 21326    32.3361     generic_file_write      

Not taking a kmap in generic_file_write is a biggish patch - it
means changing the prepare_write/commit_write API and visiting
all filesystems.  The API change would be: core kernel no longer
holds a kmap across prepare/commit. If the filesystem wants one
for its own purposes then it gets to do it for itself, possibly in
its prepare_write().

I think I'd prefer to get some additional testing and understanding
before undertaking that work.  It arguably makes sense as a small
cleanup/speedup anyway, but that's not a burning issue.

hmm.  I'll do just ext2, and we can take another look then.

-

next prev parent reply	other threads:[~2002-07-11  2:56 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <237170000.1026317715@flay>
2002-07-10 22:18 ` scalable kmap (was Re: vm lock contention reduction) (fwd) Hanna Linder
2002-07-11  3:06   ` Andrew Morton [this message]
2002-07-11  5:19     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D2CF62E.949F20B4@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=Martin.Bligh@us.ibm.com \
    --cc=hannal@us.ibm.com \
    --cc=haveblue@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=mannthey@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.