public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: scalable kmap (was Re: vm lock contention reduction) (fwd)
       [not found] <237170000.1026317715@flay>
@ 2002-07-10 22:18 ` Hanna Linder
  2002-07-11  3:06   ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Hanna Linder @ 2002-07-10 22:18 UTC (permalink / raw)
  To: Martin J. Bligh, akpm
  Cc: Keith Mannthey, hannal, haveblue, lse-tech, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

--On Wednesday, July 10, 2002 09:15:15 -0700 "Martin J. Bligh" <Martin.Bligh@us.ibm.com> wrote:

> 
> Updated patch below ...
> 
> 
>  arch/i386/kernel/i386_ksyms.c   |    5 ++
>  arch/i386/lib/usercopy.c        |   10 +++++
>  arch/i386/mm/fault.c            |   71 +++++++++++++++++++++++++++++++++++
>  fs/exec.c                       |   60 +++++++++++++++++++++---------
>  include/asm-i386/highmem.h      |    5 ++
>  include/asm-i386/kmap_types.h   |    3 +
>  include/asm-i386/processor.h    |    2 +
>  include/asm-ppc/kmap_types.h    |    1 
>  include/asm-sparc/kmap_types.h  |    1 
>  include/asm-x86_64/kmap_types.h |    1 
>  include/linux/highmem.h         |   80 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/sched.h           |    5 ++
>  mm/filemap.c                    |   11 +++--
>  13 files changed, 232 insertions(+), 23 deletions(-)


Andrew and Martin,

	I ran this updated patch on 2.5.25 with dbench on
the 8-way with 4 Gb of memory compared to clean 2.5.25. 
I saw a significant improvement in throughput about 15%
(averaged over 5 runs each). 
	Included is the pretty picture (akpm-2525.png) the 
data that picture came from (akpm-2525.data) and the raw 
results of the runs with min/max and timing results 
(2525akpmkmaphi and 2525clnhi).
	I believe the drop at around 64 clients is caused by
memory swapping leading to increased disk accesses since the 
time increased by 200% in direct correlation with the decreased 
throughput.

Hanna





[-- Attachment #2: akpm-2525.png --]
[-- Type: image/png, Size: 3587 bytes --]

[-- Attachment #3: 2525akpmkmaphi --]
[-- Type: application/octet-stream, Size: 4599 bytes --]

[root@elm3b96 dbench]# ./test_dbench-avg.sh 1 36 5 hikmap
Running 1 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 3 lo is 2
thru hi is 81.5701 lo is 76.9333
        avg run time 3
        avg throughput 79.6601666666667
Running 5 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 4 lo is 3
thru hi is 277.74 lo is 243.343
        avg run time 3.33333333333333
        avg throughput 264.646333333333
Running 9 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 6 lo is 5
thru hi is 295.914 lo is 288.236
        avg run time 5
        avg throughput 283.635333333333
Running 13 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 8 lo is 7
thru hi is 291.79 lo is 246.807
        avg run time 7
        avg throughput 284.074666666667
Running 17 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 9 lo is 9
thru hi is 288.754 lo is 279.672
        avg run time 9
        avg throughput 282.871666666667
Running 21 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 11 lo is 10
thru hi is 282.556 lo is 279.696
        avg run time 11
        avg throughput 280.028666666667
Running 25 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 13 lo is 13
thru hi is 283.141 lo is 270.631
        avg run time 13
        avg throughput 279.706666666667
Running 21 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 11 lo is 10
thru hi is 282.556 lo is 279.696
        avg run time 11
        avg throughput 280.028666666667
Running 25 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 13 lo is 13
thru hi is 283.141 lo is 270.631
        avg run time 13
        avg throughput 279.706666666667
Running 29 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 16 lo is 14
thru hi is 287.336 lo is 282.406
        avg run time 14.6666666666667
        avg throughput 274.366
Running 33 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 17 lo is 16
thru hi is 280.63 lo is 274.984
        avg run time 16.6666666666667
        avg throughput 277.988

[root@elm3b96 dbench]# ./test_dbench-avg.sh 36 64 5 akpmhi2     
Running 36 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 19 lo is 18
thru hi is 284.684 lo is 272.75
        avg run time 18
        avg throughput 274.103
Running 40 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 22 lo is 20
thru hi is 279.052 lo is 274.726
        avg run time 20
        avg throughput 270.671666666667
Running 44 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 23 lo is 22
thru hi is 275.928 lo is 261.419
        avg run time 22.6666666666667
        avg throughput 270.660333333333
Running 48 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 25 lo is 24
thru hi is 276.622 lo is 271.115
        avg run time 24
        avg throughput 274.449333333333
Running 52 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 28 lo is 25
thru hi is 279.96 lo is 256.425
        avg run time 26
        avg throughput 273.272333333333
Running 56 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 30 lo is 28
thru hi is 270.907 lo is 268.947
        avg run time 28.3333333333333
        avg throughput 267.339666666667
Running 60 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 31 lo is 30
thru hi is 273.963 lo is 266.724
        avg run time 30
        avg throughput 272.487333333333
Running 64 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 883 lo is 33
thru hi is 262.225 lo is 9.57893
        avg run time 147.333333333333
        avg throughput 117.4783


[-- Attachment #4: 2525clnhi --]
[-- Type: application/octet-stream, Size: 4189 bytes --]


[root@elm3b96 dbench]# ./test_dbench-avg.sh 1 36 5 2525hi
Running 1 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 3 lo is 2
thru hi is 82.3442 lo is 77.5547
        avg run time 2.66666666666667
        avg throughput 81.7611333333333
Running 5 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 4 lo is 3
thru hi is 260.515 lo is 229.134
        avg run time 3.66666666666667
        avg throughput 242.980333333333
Running 9 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 6 lo is 5
thru hi is 256.654 lo is 253.473
        avg run time 6
        avg throughput 250.254666666667
Running 13 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 9 lo is 7
thru hi is 258.22 lo is 207.652
        avg run time 8.33333333333333
        avg throughput 246.309666666667
Running 17 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 11 lo is 10
thru hi is 252.47 lo is 228.667
        avg run time 10
        avg throughput 250.657666666667
Running 21 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 13 lo is 12
thru hi is 250.391 lo is 242.239
        avg run time 12
        avg throughput 244.346333333333
Running 25 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 15 lo is 15
thru hi is 249.358 lo is 226.142
        avg run time 15
        avg throughput 238.363333333333
Running 29 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 18 lo is 16
thru hi is 256.472 lo is 228.52
        avg run time 16.6666666666667
        avg throughput 242.994666666667
Running 33 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 20 lo is 18
thru hi is 254.148 lo is 234.099
        avg run time 19
        avg throughput 241.595333333333
[root@elm3b96 dbench]# ./test_dbench-avg.sh 36 64 5 2525hi
Running 36 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 21 lo is 20
thru hi is 250.669 lo is 240.105
        avg run time 20.3333333333333
        avg throughput 247.037333333333
Running 40 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 23 lo is 22
thru hi is 247.32 lo is 241.098
        avg run time 22.6666666666667
        avg throughput 243.157333333333
Running 44 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 26 lo is 25
thru hi is 245.733 lo is 237.135
        avg run time 25
        avg throughput 241.037333333333
unning 48 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 28 lo is 27
thru hi is 244.339 lo is 237.92
        avg run time 27
        avg throughput 242.203
unning 52 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 30 lo is 29
thru hi is 245.097 lo is 242.522
        avg run time 29.3333333333333
        avg throughput 240.211
Running 56 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 32 lo is 31
thru hi is 245.913 lo is 236.625
        avg run time 31.6666666666667
        avg throughput 241.393666666667
Running 60 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 35 lo is 34
thru hi is 243.014 lo is 231.635
        avg run time 34.6666666666667
        avg throughput 235.398333333333
Running 64 clients
        test no. 0
        test no. 1
        test no. 2
        test no. 3
        test no. 4
time hi is 710 lo is 309
thru hi is 237.763 lo is 11.9168
        avg run time 131
        avg throughput 83.1543666666667


[-- Attachment #5: akpm-2525.data --]
[-- Type: application/octet-stream, Size: 305 bytes --]

#clnts	#kmap	#cln

1	79.66	81.76
5	264.65	242.98
9	283.07	250.25
13	284.07	246.31
17	282.87	250.66
21	280.03	244.35
25	279.70	238.36
29	274.37	242.99
33	277.99	241.60
36	274.10	247.04
40	270.67	243.16
44	270.66	241.04	
48	274.45	242.20
52	273.27	240.21
56	267.34	241.39
60	272.49	235.40
64	117.48	83.15



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: scalable kmap (was Re: vm lock contention reduction) (fwd)
  2002-07-10 22:18 ` scalable kmap (was Re: vm lock contention reduction) (fwd) Hanna Linder
@ 2002-07-11  3:06   ` Andrew Morton
  2002-07-11  5:19     ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2002-07-11  3:06 UTC (permalink / raw)
  To: Hanna Linder
  Cc: Martin J. Bligh, Keith Mannthey, haveblue, lse-tech, linux-kernel

Hanna Linder wrote:
> 
> ...
> Andrew and Martin,
> 
>         I ran this updated patch on 2.5.25 with dbench on
> the 8-way with 4 Gb of memory compared to clean 2.5.25.
> I saw a significant improvement in throughput about 15%
> (averaged over 5 runs each).

Thanks, Hanna.

The kernel compile test isn't a particularly heavy user of
copy_to_user(), whereas with RAM-only dbench, copy_*_user()
is almost the only thing it does.  So that makes sense.

Tried dbench on the 2.5G 4xPIII Xeon: no improvement at all.
This thing seems to have quite poor memory bandwidth - maybe
250 megabyte/sec downhill with the wind at its tail.

>         Included is the pretty picture (akpm-2525.png) the
> data that picture came from (akpm-2525.data) and the raw
> results of the runs with min/max and timing results
> (2525akpmkmaphi and 2525clnhi).
>         I believe the drop at around 64 clients is caused by
> memory swapping leading to increased disk accesses since the
> time increased by 200% in direct correlation with the decreased
> throughput.

Yes.  The test went to disk.   There are two reasons why
it will do this:

1: Some dirty data was in memory for more than 30-35 seconds or

2: More than 40% of memory is dirty.

In your case, the 64-client run was taking 32 seconds.  After that
the disks lit up.  Once that happens, dbench isn't a very good
benchmark.  It's an excellent benchmark when it's RAM-only
though.  Very repeatable and hits lots of code paths which matter.

You can run more clients before the disk I/O cuts in by
increasing /proc/sys/vm/dirty_expire_centisecs and
/proc/sys/vm/dirty_*_ratio.

The patch you tested only uses the atomic kmap across generic_file_read.
It is reasonable to hope that another 15% or morecan be gained by holding
an atomic kmap across writes as well.  On your machine ;)

Here's what oprofile says about `dbench 40' with that patch:

c0140f1c 402      0.609543    __block_commit_write    
c013dfd4 413      0.626222    vfs_write               
c01402cc 431      0.653515    __find_get_block        
c013a895 472      0.715683    .text.lock.highmem      
c017fe30 494      0.749041    ext2_get_block          
c012cef0 564      0.85518     unlock_page             
c013ee80 564      0.85518     fget                    
c01079f4 571      0.865794    apic_timer_interrupt    
c01e8ecc 594      0.900669    radix_tree_lookup       
c013da90 597      0.905218    generic_file_llseek     
c01514b4 607      0.92038     __d_lookup              
c0106ff8 687      1.04168     system_call             
c013a02c 874      1.32523     kunmap_high             
c0148388 922      1.39801     link_path_walk          
c0140b00 1097     1.66336     __block_prepare_write   
c01346d0 1138     1.72552     rmqueue                 
c01127ac 1243     1.88473     smp_apic_timer_interrupt 
c0139eb8 1514     2.29564     kmap_high               
c0105368 6188     9.38272     poll_idle               
c012d8a8 9564     14.5017     file_read_actor         
c012ea70 21326    32.3361     generic_file_write      


Not taking a kmap in generic_file_write is a biggish patch - it
means changing the prepare_write/commit_write API and visiting
all filesystems.  The API change would be: core kernel no longer
holds a kmap across prepare/commit. If the filesystem wants one
for its own purposes then it gets to do it for itself, possibly in
its prepare_write().

I think I'd prefer to get some additional testing and understanding
before undertaking that work.  It arguably makes sense as a small
cleanup/speedup anyway, but that's not a burning issue.

hmm.  I'll do just ext2, and we can take another look then.

-

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: scalable kmap (was Re: vm lock contention reduction) (fwd)
  2002-07-11  3:06   ` Andrew Morton
@ 2002-07-11  5:19     ` Andrew Morton
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2002-07-11  5:19 UTC (permalink / raw)
  To: Hanna Linder, Martin J. Bligh, Keith Mannthey, haveblue, lse-tech,
	linux-kernel

Andrew Morton wrote:
> 
> Not taking a kmap in generic_file_write is a biggish patch

OK, so I'm full of it.  It's actually quite simple and clean.

This patch is incremental to the one which you just tested.  It
takes an atomic kmap across generic_file_write().

There's a copy of these patches at
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.25/

The patch also teaches ext2 and ext3 about the new API. If you're
using any other filesystems they will probably oops.

ext2 is still using regular kmap() for directory operations.

And this time, it actually buys 3% improvement on my lumbering hulk,
so your machine will hopefully see nice improvements.

Profile looks better too.  kmap_high and the IPI have vanished.


c013c164 304      0.50923     __set_page_dirty_nobuffers 
c0141008 355      0.59466     __block_commit_write    
c013e004 382      0.639887    vfs_write               
c01402fc 410      0.68679     __find_get_block        
c017fd70 471      0.788971    ext2_get_block          
c01e8eec 530      0.887802    radix_tree_lookup       
c012cef0 542      0.907903    unlock_page             
c013eeb0 562      0.941405    fget                    
c013dac0 585      0.979932    generic_file_llseek     
c01514e4 600      1.00506     __d_lookup              
c0106ff8 635      1.06369     system_call             
c01483b8 872      1.46069     link_path_walk          
c0134700 1014     1.69855     rmqueue                 
c0140b30 1191     1.99504     __block_prepare_write   
c0105368 4029     6.74897     poll_idle               
c012d8a8 9520     15.9469     file_read_actor         
c012ea70 23301    39.0315     generic_file_write      


 fs/buffer.c     |   57 +++++++++++++++++++++++++++++++-------------------------
 fs/ext2/inode.c |   16 +++++++++++++--
 fs/ext3/inode.c |   30 +++++------------------------
 mm/filemap.c    |    6 ++---
 4 files changed, 55 insertions(+), 54 deletions(-)

--- 2.5.25/mm/filemap.c~kmap_atomic_writes	Wed Jul 10 21:18:19 2002
+++ 2.5.25-akpm/mm/filemap.c	Wed Jul 10 21:24:41 2002
@@ -2228,6 +2228,7 @@ generic_file_write(struct file *file, co
 		unsigned long offset;
 		long page_fault;
 		char *kaddr;
+		struct copy_user_state copy_user_state;
 
 		offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
 		index = pos >> PAGE_CACHE_SHIFT;
@@ -2252,22 +2253,22 @@ generic_file_write(struct file *file, co
 			break;
 		}
 
-		kaddr = kmap(page);
 		status = a_ops->prepare_write(file, page, offset, offset+bytes);
 		if (unlikely(status)) {
 			/*
 			 * prepare_write() may have instantiated a few blocks
 			 * outside i_size.  Trim these off again.
 			 */
-			kunmap(page);
 			unlock_page(page);
 			page_cache_release(page);
 			if (pos + bytes > inode->i_size)
 				vmtruncate(inode, inode->i_size);
 			break;
 		}
+		kaddr = kmap_copy_user(&copy_user_state, page, KM_FILEMAP, 0);
 		page_fault = __copy_from_user(kaddr + offset, buf, bytes);
 		flush_dcache_page(page);
+		kunmap_copy_user(&copy_user_state);
 		status = a_ops->commit_write(file, page, offset, offset+bytes);
 		if (unlikely(page_fault)) {
 			status = -EFAULT;
@@ -2282,7 +2283,6 @@ generic_file_write(struct file *file, co
 				buf += status;
 			}
 		}
-		kunmap(page);
 		if (!PageReferenced(page))
 			SetPageReferenced(page);
 		unlock_page(page);
--- 2.5.25/fs/buffer.c~kmap_atomic_writes	Wed Jul 10 21:25:11 2002
+++ 2.5.25-akpm/fs/buffer.c	Wed Jul 10 21:33:06 2002
@@ -1804,7 +1804,6 @@ static int __block_prepare_write(struct 
 	int err = 0;
 	unsigned blocksize, bbits;
 	struct buffer_head *bh, *head, *wait[2], **wait_bh=wait;
-	char *kaddr = kmap(page);
 
 	BUG_ON(!PageLocked(page));
 	BUG_ON(from > PAGE_CACHE_SIZE);
@@ -1845,13 +1844,19 @@ static int __block_prepare_write(struct 
 					set_buffer_uptodate(bh);
 					continue;
 				}
-				if (block_end > to)
-					memset(kaddr+to, 0, block_end-to);
-				if (block_start < from)
-					memset(kaddr+block_start,
-						0, from-block_start);
-				if (block_end > to || block_start < from)
+				if (block_end > to || block_start < from) {
+					void *kaddr;
+
+					kaddr = kmap_atomic(page, KM_USER0);
+					if (block_end > to)
+						memset(kaddr+to, 0,
+							block_end-to);
+					if (block_start < from)
+						memset(kaddr+block_start,
+							0, from-block_start);
 					flush_dcache_page(page);
+					kunmap_atomic(kaddr, KM_USER0);
+				}
 				continue;
 			}
 		}
@@ -1890,10 +1895,14 @@ out:
 		if (block_start >= to)
 			break;
 		if (buffer_new(bh)) {
+			void *kaddr;
+
 			clear_buffer_new(bh);
 			if (buffer_uptodate(bh))
 				buffer_error();
+			kaddr = kmap_atomic(page, KM_USER0);
 			memset(kaddr+block_start, 0, bh->b_size);
+			kunmap_atomic(kaddr, KM_USER0);
 			set_buffer_uptodate(bh);
 			mark_buffer_dirty(bh);
 		}
@@ -1979,9 +1988,10 @@ int block_read_full_page(struct page *pa
 					SetPageError(page);
 			}
 			if (!buffer_mapped(bh)) {
-				memset(kmap(page) + i*blocksize, 0, blocksize);
+				void *kaddr = kmap_atomic(page, KM_USER0);
+				memset(kaddr + i * blocksize, 0, blocksize);
 				flush_dcache_page(page);
-				kunmap(page);
+				kunmap_atomic(kaddr, KM_USER0);
 				set_buffer_uptodate(bh);
 				continue;
 			}
@@ -2089,7 +2099,7 @@ int cont_prepare_write(struct page *page
 	long status;
 	unsigned zerofrom;
 	unsigned blocksize = 1 << inode->i_blkbits;
-	char *kaddr;
+	void *kaddr;
 
 	while(page->index > (pgpos = *bytes>>PAGE_CACHE_SHIFT)) {
 		status = -ENOMEM;
@@ -2111,12 +2121,12 @@ int cont_prepare_write(struct page *page
 						PAGE_CACHE_SIZE, get_block);
 		if (status)
 			goto out_unmap;
-		kaddr = page_address(new_page);
+		kaddr = kmap_atomic(new_page, KM_USER0);
 		memset(kaddr+zerofrom, 0, PAGE_CACHE_SIZE-zerofrom);
 		flush_dcache_page(new_page);
+		kunmap_atomic(kaddr, KM_USER0);
 		__block_commit_write(inode, new_page,
 				zerofrom, PAGE_CACHE_SIZE);
-		kunmap(new_page);
 		unlock_page(new_page);
 		page_cache_release(new_page);
 	}
@@ -2141,21 +2151,20 @@ int cont_prepare_write(struct page *page
 	status = __block_prepare_write(inode, page, zerofrom, to, get_block);
 	if (status)
 		goto out1;
-	kaddr = page_address(page);
 	if (zerofrom < offset) {
+		kaddr = kmap_atomic(page, KM_USER0);
 		memset(kaddr+zerofrom, 0, offset-zerofrom);
 		flush_dcache_page(page);
+		kunmap_atomic(kaddr, KM_USER0);
 		__block_commit_write(inode, page, zerofrom, offset);
 	}
 	return 0;
 out1:
 	ClearPageUptodate(page);
-	kunmap(page);
 	return status;
 
 out_unmap:
 	ClearPageUptodate(new_page);
-	kunmap(new_page);
 	unlock_page(new_page);
 	page_cache_release(new_page);
 out:
@@ -2167,10 +2176,8 @@ int block_prepare_write(struct page *pag
 {
 	struct inode *inode = page->mapping->host;
 	int err = __block_prepare_write(inode, page, from, to, get_block);
-	if (err) {
+	if (err)
 		ClearPageUptodate(page);
-		kunmap(page);
-	}
 	return err;
 }
 
@@ -2178,7 +2185,6 @@ int block_commit_write(struct page *page
 {
 	struct inode *inode = page->mapping->host;
 	__block_commit_write(inode,page,from,to);
-	kunmap(page);
 	return 0;
 }
 
@@ -2188,7 +2194,6 @@ int generic_commit_write(struct file *fi
 	struct inode *inode = page->mapping->host;
 	loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
 	__block_commit_write(inode,page,from,to);
-	kunmap(page);
 	if (pos > inode->i_size) {
 		inode->i_size = pos;
 		mark_inode_dirty(inode);
@@ -2205,6 +2210,7 @@ int block_truncate_page(struct address_s
 	struct inode *inode = mapping->host;
 	struct page *page;
 	struct buffer_head *bh;
+	void *kaddr;
 	int err;
 
 	blocksize = 1 << inode->i_blkbits;
@@ -2257,9 +2263,10 @@ int block_truncate_page(struct address_s
 			goto unlock;
 	}
 
-	memset(kmap(page) + offset, 0, length);
+	kaddr = kmap_atomic(page, KM_USER0);
+	memset(kaddr + offset, 0, length);
 	flush_dcache_page(page);
-	kunmap(page);
+	kunmap_atomic(kaddr, KM_USER0);
 
 	mark_buffer_dirty(bh);
 	err = 0;
@@ -2279,7 +2286,7 @@ int block_write_full_page(struct page *p
 	struct inode * const inode = page->mapping->host;
 	const unsigned long end_index = inode->i_size >> PAGE_CACHE_SHIFT;
 	unsigned offset;
-	char *kaddr;
+	void *kaddr;
 
 	/* Is the page fully inside i_size? */
 	if (page->index < end_index)
@@ -2293,10 +2300,10 @@ int block_write_full_page(struct page *p
 	}
 
 	/* The page straddles i_size */
-	kaddr = kmap(page);
+	kaddr = kmap_atomic(page, KM_USER0);
 	memset(kaddr + offset, 0, PAGE_CACHE_SIZE - offset);
 	flush_dcache_page(page);
-	kunmap(page);
+	kunmap_atomic(kaddr, KM_USER0);
 	return __block_write_full_page(inode, page, get_block);
 }
 
--- 2.5.25/fs/ext3/inode.c~kmap_atomic_writes	Wed Jul 10 21:34:16 2002
+++ 2.5.25-akpm/fs/ext3/inode.c	Wed Jul 10 21:35:08 2002
@@ -1048,16 +1048,6 @@ static int ext3_prepare_write(struct fil
 	if (ext3_should_journal_data(inode)) {
 		ret = walk_page_buffers(handle, page_buffers(page),
 				from, to, NULL, do_journal_get_write_access);
-		if (ret) {
-			/*
-			 * We're going to fail this prepare_write(),
-			 * so commit_write() will not be called.
-			 * We need to undo block_prepare_write()'s kmap().
-			 * AKPM: Do we need to clear PageUptodate?  I don't
-			 * think so.
-			 */
-			kunmap(page);
-		}
 	}
 prepare_write_failed:
 	if (ret)
@@ -1117,7 +1107,6 @@ static int ext3_commit_write(struct file
 			from, to, &partial, commit_write_fn);
 		if (!partial)
 			SetPageUptodate(page);
-		kunmap(page);
 		if (pos > inode->i_size)
 			inode->i_size = pos;
 		EXT3_I(inode)->i_state |= EXT3_STATE_JDATA;
@@ -1128,17 +1117,8 @@ static int ext3_commit_write(struct file
 		}
 		/* Be careful here if generic_commit_write becomes a
 		 * required invocation after block_prepare_write. */
-		if (ret == 0) {
+		if (ret == 0)
 			ret = generic_commit_write(file, page, from, to);
-		} else {
-			/*
-			 * block_prepare_write() was called, but we're not
-			 * going to call generic_commit_write().  So we
-			 * need to perform generic_commit_write()'s kunmap
-			 * by hand.
-			 */
-			kunmap(page);
-		}
 	}
 	if (inode->i_size > EXT3_I(inode)->i_disksize) {
 		EXT3_I(inode)->i_disksize = inode->i_size;
@@ -1433,6 +1413,7 @@ static int ext3_block_truncate_page(hand
 	struct page *page;
 	struct buffer_head *bh;
 	int err;
+	void *kaddr;
 
 	blocksize = inode->i_sb->s_blocksize;
 	length = offset & (blocksize - 1);
@@ -1488,10 +1469,11 @@ static int ext3_block_truncate_page(hand
 		if (err)
 			goto unlock;
 	}
-	
-	memset(kmap(page) + offset, 0, length);
+
+	kaddr = kmap_atomic(page, KM_USER0);
+	memset(kaddr + offset, 0, length);
 	flush_dcache_page(page);
-	kunmap(page);
+	kunmap_atomic(kaddr, KM_USER0);
 
 	BUFFER_TRACE(bh, "zeroed end of block");
 
--- 2.5.25/fs/ext2/inode.c~kmap_atomic_writes	Wed Jul 10 21:35:31 2002
+++ 2.5.25-akpm/fs/ext2/inode.c	Wed Jul 10 21:42:34 2002
@@ -595,12 +595,24 @@ ext2_readpages(struct address_space *map
 }
 
 static int
-ext2_prepare_write(struct file *file, struct page *page,
+ext2_prepare_write(struct file *file_may_be_null, struct page *page,
 			unsigned from, unsigned to)
 {
+	if (S_ISDIR(page->mapping->host->i_mode))
+		kmap(page);
 	return block_prepare_write(page,from,to,ext2_get_block);
 }
 
+static int ext2_commit_write(struct file *file, struct page *page,
+			unsigned from, unsigned to)
+{
+	int ret = generic_commit_write(file, page, from, to);
+
+	if (S_ISDIR(page->mapping->host->i_mode))
+		kunmap(page);
+	return ret;
+}
+
 static int ext2_bmap(struct address_space *mapping, long block)
 {
 	return generic_block_bmap(mapping,block,ext2_get_block);
@@ -633,7 +645,7 @@ struct address_space_operations ext2_aop
 	writepage:		ext2_writepage,
 	sync_page:		block_sync_page,
 	prepare_write:		ext2_prepare_write,
-	commit_write:		generic_commit_write,
+	commit_write:		ext2_commit_write,
 	bmap:			ext2_bmap,
 	direct_IO:		ext2_direct_IO,
 	writepages:		ext2_writepages,

-

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-07-11  5:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <237170000.1026317715@flay>
2002-07-10 22:18 ` scalable kmap (was Re: vm lock contention reduction) (fwd) Hanna Linder
2002-07-11  3:06   ` Andrew Morton
2002-07-11  5:19     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox