* Re: scalable kmap (was Re: vm lock contention reduction) (fwd)
[not found] <237170000.1026317715@flay>
@ 2002-07-10 22:18 ` Hanna Linder
2002-07-11 3:06 ` Andrew Morton
0 siblings, 1 reply; 3+ messages in thread
From: Hanna Linder @ 2002-07-10 22:18 UTC (permalink / raw)
To: Martin J. Bligh, akpm
Cc: Keith Mannthey, hannal, haveblue, lse-tech, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]
--On Wednesday, July 10, 2002 09:15:15 -0700 "Martin J. Bligh" <Martin.Bligh@us.ibm.com> wrote:
>
> Updated patch below ...
>
>
> arch/i386/kernel/i386_ksyms.c | 5 ++
> arch/i386/lib/usercopy.c | 10 +++++
> arch/i386/mm/fault.c | 71 +++++++++++++++++++++++++++++++++++
> fs/exec.c | 60 +++++++++++++++++++++---------
> include/asm-i386/highmem.h | 5 ++
> include/asm-i386/kmap_types.h | 3 +
> include/asm-i386/processor.h | 2 +
> include/asm-ppc/kmap_types.h | 1
> include/asm-sparc/kmap_types.h | 1
> include/asm-x86_64/kmap_types.h | 1
> include/linux/highmem.h | 80 ++++++++++++++++++++++++++++++++++++++++
> include/linux/sched.h | 5 ++
> mm/filemap.c | 11 +++--
> 13 files changed, 232 insertions(+), 23 deletions(-)
Andrew and Martin,
I ran this updated patch on 2.5.25 with dbench on
the 8-way with 4 Gb of memory compared to clean 2.5.25.
I saw a significant improvement in throughput about 15%
(averaged over 5 runs each).
Included is the pretty picture (akpm-2525.png) the
data that picture came from (akpm-2525.data) and the raw
results of the runs with min/max and timing results
(2525akpmkmaphi and 2525clnhi).
I believe the drop at around 64 clients is caused by
memory swapping leading to increased disk accesses since the
time increased by 200% in direct correlation with the decreased
throughput.
Hanna
[-- Attachment #2: akpm-2525.png --]
[-- Type: image/png, Size: 3587 bytes --]
[-- Attachment #3: 2525akpmkmaphi --]
[-- Type: application/octet-stream, Size: 4599 bytes --]
[root@elm3b96 dbench]# ./test_dbench-avg.sh 1 36 5 hikmap
Running 1 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 3 lo is 2
thru hi is 81.5701 lo is 76.9333
avg run time 3
avg throughput 79.6601666666667
Running 5 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 4 lo is 3
thru hi is 277.74 lo is 243.343
avg run time 3.33333333333333
avg throughput 264.646333333333
Running 9 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 6 lo is 5
thru hi is 295.914 lo is 288.236
avg run time 5
avg throughput 283.635333333333
Running 13 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 8 lo is 7
thru hi is 291.79 lo is 246.807
avg run time 7
avg throughput 284.074666666667
Running 17 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 9 lo is 9
thru hi is 288.754 lo is 279.672
avg run time 9
avg throughput 282.871666666667
Running 21 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 11 lo is 10
thru hi is 282.556 lo is 279.696
avg run time 11
avg throughput 280.028666666667
Running 25 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 13 lo is 13
thru hi is 283.141 lo is 270.631
avg run time 13
avg throughput 279.706666666667
Running 21 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 11 lo is 10
thru hi is 282.556 lo is 279.696
avg run time 11
avg throughput 280.028666666667
Running 25 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 13 lo is 13
thru hi is 283.141 lo is 270.631
avg run time 13
avg throughput 279.706666666667
Running 29 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 16 lo is 14
thru hi is 287.336 lo is 282.406
avg run time 14.6666666666667
avg throughput 274.366
Running 33 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 17 lo is 16
thru hi is 280.63 lo is 274.984
avg run time 16.6666666666667
avg throughput 277.988
[root@elm3b96 dbench]# ./test_dbench-avg.sh 36 64 5 akpmhi2
Running 36 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 19 lo is 18
thru hi is 284.684 lo is 272.75
avg run time 18
avg throughput 274.103
Running 40 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 22 lo is 20
thru hi is 279.052 lo is 274.726
avg run time 20
avg throughput 270.671666666667
Running 44 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 23 lo is 22
thru hi is 275.928 lo is 261.419
avg run time 22.6666666666667
avg throughput 270.660333333333
Running 48 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 25 lo is 24
thru hi is 276.622 lo is 271.115
avg run time 24
avg throughput 274.449333333333
Running 52 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 28 lo is 25
thru hi is 279.96 lo is 256.425
avg run time 26
avg throughput 273.272333333333
Running 56 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 30 lo is 28
thru hi is 270.907 lo is 268.947
avg run time 28.3333333333333
avg throughput 267.339666666667
Running 60 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 31 lo is 30
thru hi is 273.963 lo is 266.724
avg run time 30
avg throughput 272.487333333333
Running 64 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 883 lo is 33
thru hi is 262.225 lo is 9.57893
avg run time 147.333333333333
avg throughput 117.4783
[-- Attachment #4: 2525clnhi --]
[-- Type: application/octet-stream, Size: 4189 bytes --]
[root@elm3b96 dbench]# ./test_dbench-avg.sh 1 36 5 2525hi
Running 1 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 3 lo is 2
thru hi is 82.3442 lo is 77.5547
avg run time 2.66666666666667
avg throughput 81.7611333333333
Running 5 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 4 lo is 3
thru hi is 260.515 lo is 229.134
avg run time 3.66666666666667
avg throughput 242.980333333333
Running 9 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 6 lo is 5
thru hi is 256.654 lo is 253.473
avg run time 6
avg throughput 250.254666666667
Running 13 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 9 lo is 7
thru hi is 258.22 lo is 207.652
avg run time 8.33333333333333
avg throughput 246.309666666667
Running 17 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 11 lo is 10
thru hi is 252.47 lo is 228.667
avg run time 10
avg throughput 250.657666666667
Running 21 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 13 lo is 12
thru hi is 250.391 lo is 242.239
avg run time 12
avg throughput 244.346333333333
Running 25 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 15 lo is 15
thru hi is 249.358 lo is 226.142
avg run time 15
avg throughput 238.363333333333
Running 29 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 18 lo is 16
thru hi is 256.472 lo is 228.52
avg run time 16.6666666666667
avg throughput 242.994666666667
Running 33 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 20 lo is 18
thru hi is 254.148 lo is 234.099
avg run time 19
avg throughput 241.595333333333
[root@elm3b96 dbench]# ./test_dbench-avg.sh 36 64 5 2525hi
Running 36 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 21 lo is 20
thru hi is 250.669 lo is 240.105
avg run time 20.3333333333333
avg throughput 247.037333333333
Running 40 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 23 lo is 22
thru hi is 247.32 lo is 241.098
avg run time 22.6666666666667
avg throughput 243.157333333333
Running 44 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 26 lo is 25
thru hi is 245.733 lo is 237.135
avg run time 25
avg throughput 241.037333333333
unning 48 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 28 lo is 27
thru hi is 244.339 lo is 237.92
avg run time 27
avg throughput 242.203
unning 52 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 30 lo is 29
thru hi is 245.097 lo is 242.522
avg run time 29.3333333333333
avg throughput 240.211
Running 56 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 32 lo is 31
thru hi is 245.913 lo is 236.625
avg run time 31.6666666666667
avg throughput 241.393666666667
Running 60 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 35 lo is 34
thru hi is 243.014 lo is 231.635
avg run time 34.6666666666667
avg throughput 235.398333333333
Running 64 clients
test no. 0
test no. 1
test no. 2
test no. 3
test no. 4
time hi is 710 lo is 309
thru hi is 237.763 lo is 11.9168
avg run time 131
avg throughput 83.1543666666667
[-- Attachment #5: akpm-2525.data --]
[-- Type: application/octet-stream, Size: 305 bytes --]
#clnts #kmap #cln
1 79.66 81.76
5 264.65 242.98
9 283.07 250.25
13 284.07 246.31
17 282.87 250.66
21 280.03 244.35
25 279.70 238.36
29 274.37 242.99
33 277.99 241.60
36 274.10 247.04
40 270.67 243.16
44 270.66 241.04
48 274.45 242.20
52 273.27 240.21
56 267.34 241.39
60 272.49 235.40
64 117.48 83.15
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: scalable kmap (was Re: vm lock contention reduction) (fwd)
2002-07-10 22:18 ` scalable kmap (was Re: vm lock contention reduction) (fwd) Hanna Linder
@ 2002-07-11 3:06 ` Andrew Morton
2002-07-11 5:19 ` Andrew Morton
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2002-07-11 3:06 UTC (permalink / raw)
To: Hanna Linder
Cc: Martin J. Bligh, Keith Mannthey, haveblue, lse-tech, linux-kernel
Hanna Linder wrote:
>
> ...
> Andrew and Martin,
>
> I ran this updated patch on 2.5.25 with dbench on
> the 8-way with 4 Gb of memory compared to clean 2.5.25.
> I saw a significant improvement in throughput about 15%
> (averaged over 5 runs each).
Thanks, Hanna.
The kernel compile test isn't a particularly heavy user of
copy_to_user(), whereas with RAM-only dbench, copy_*_user()
is almost the only thing it does. So that makes sense.
Tried dbench on the 2.5G 4xPIII Xeon: no improvement at all.
This thing seems to have quite poor memory bandwidth - maybe
250 megabyte/sec downhill with the wind at its tail.
> Included is the pretty picture (akpm-2525.png) the
> data that picture came from (akpm-2525.data) and the raw
> results of the runs with min/max and timing results
> (2525akpmkmaphi and 2525clnhi).
> I believe the drop at around 64 clients is caused by
> memory swapping leading to increased disk accesses since the
> time increased by 200% in direct correlation with the decreased
> throughput.
Yes. The test went to disk. There are two reasons why
it will do this:
1: Some dirty data was in memory for more than 30-35 seconds or
2: More than 40% of memory is dirty.
In your case, the 64-client run was taking 32 seconds. After that
the disks lit up. Once that happens, dbench isn't a very good
benchmark. It's an excellent benchmark when it's RAM-only
though. Very repeatable and hits lots of code paths which matter.
You can run more clients before the disk I/O cuts in by
increasing /proc/sys/vm/dirty_expire_centisecs and
/proc/sys/vm/dirty_*_ratio.
The patch you tested only uses the atomic kmap across generic_file_read.
It is reasonable to hope that another 15% or morecan be gained by holding
an atomic kmap across writes as well. On your machine ;)
Here's what oprofile says about `dbench 40' with that patch:
c0140f1c 402 0.609543 __block_commit_write
c013dfd4 413 0.626222 vfs_write
c01402cc 431 0.653515 __find_get_block
c013a895 472 0.715683 .text.lock.highmem
c017fe30 494 0.749041 ext2_get_block
c012cef0 564 0.85518 unlock_page
c013ee80 564 0.85518 fget
c01079f4 571 0.865794 apic_timer_interrupt
c01e8ecc 594 0.900669 radix_tree_lookup
c013da90 597 0.905218 generic_file_llseek
c01514b4 607 0.92038 __d_lookup
c0106ff8 687 1.04168 system_call
c013a02c 874 1.32523 kunmap_high
c0148388 922 1.39801 link_path_walk
c0140b00 1097 1.66336 __block_prepare_write
c01346d0 1138 1.72552 rmqueue
c01127ac 1243 1.88473 smp_apic_timer_interrupt
c0139eb8 1514 2.29564 kmap_high
c0105368 6188 9.38272 poll_idle
c012d8a8 9564 14.5017 file_read_actor
c012ea70 21326 32.3361 generic_file_write
Not taking a kmap in generic_file_write is a biggish patch - it
means changing the prepare_write/commit_write API and visiting
all filesystems. The API change would be: core kernel no longer
holds a kmap across prepare/commit. If the filesystem wants one
for its own purposes then it gets to do it for itself, possibly in
its prepare_write().
I think I'd prefer to get some additional testing and understanding
before undertaking that work. It arguably makes sense as a small
cleanup/speedup anyway, but that's not a burning issue.
hmm. I'll do just ext2, and we can take another look then.
-
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: scalable kmap (was Re: vm lock contention reduction) (fwd)
2002-07-11 3:06 ` Andrew Morton
@ 2002-07-11 5:19 ` Andrew Morton
0 siblings, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2002-07-11 5:19 UTC (permalink / raw)
To: Hanna Linder, Martin J. Bligh, Keith Mannthey, haveblue, lse-tech,
linux-kernel
Andrew Morton wrote:
>
> Not taking a kmap in generic_file_write is a biggish patch
OK, so I'm full of it. It's actually quite simple and clean.
This patch is incremental to the one which you just tested. It
takes an atomic kmap across generic_file_write().
There's a copy of these patches at
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.25/
The patch also teaches ext2 and ext3 about the new API. If you're
using any other filesystems they will probably oops.
ext2 is still using regular kmap() for directory operations.
And this time, it actually buys 3% improvement on my lumbering hulk,
so your machine will hopefully see nice improvements.
Profile looks better too. kmap_high and the IPI have vanished.
c013c164 304 0.50923 __set_page_dirty_nobuffers
c0141008 355 0.59466 __block_commit_write
c013e004 382 0.639887 vfs_write
c01402fc 410 0.68679 __find_get_block
c017fd70 471 0.788971 ext2_get_block
c01e8eec 530 0.887802 radix_tree_lookup
c012cef0 542 0.907903 unlock_page
c013eeb0 562 0.941405 fget
c013dac0 585 0.979932 generic_file_llseek
c01514e4 600 1.00506 __d_lookup
c0106ff8 635 1.06369 system_call
c01483b8 872 1.46069 link_path_walk
c0134700 1014 1.69855 rmqueue
c0140b30 1191 1.99504 __block_prepare_write
c0105368 4029 6.74897 poll_idle
c012d8a8 9520 15.9469 file_read_actor
c012ea70 23301 39.0315 generic_file_write
fs/buffer.c | 57 +++++++++++++++++++++++++++++++-------------------------
fs/ext2/inode.c | 16 +++++++++++++--
fs/ext3/inode.c | 30 +++++------------------------
mm/filemap.c | 6 ++---
4 files changed, 55 insertions(+), 54 deletions(-)
--- 2.5.25/mm/filemap.c~kmap_atomic_writes Wed Jul 10 21:18:19 2002
+++ 2.5.25-akpm/mm/filemap.c Wed Jul 10 21:24:41 2002
@@ -2228,6 +2228,7 @@ generic_file_write(struct file *file, co
unsigned long offset;
long page_fault;
char *kaddr;
+ struct copy_user_state copy_user_state;
offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
index = pos >> PAGE_CACHE_SHIFT;
@@ -2252,22 +2253,22 @@ generic_file_write(struct file *file, co
break;
}
- kaddr = kmap(page);
status = a_ops->prepare_write(file, page, offset, offset+bytes);
if (unlikely(status)) {
/*
* prepare_write() may have instantiated a few blocks
* outside i_size. Trim these off again.
*/
- kunmap(page);
unlock_page(page);
page_cache_release(page);
if (pos + bytes > inode->i_size)
vmtruncate(inode, inode->i_size);
break;
}
+ kaddr = kmap_copy_user(©_user_state, page, KM_FILEMAP, 0);
page_fault = __copy_from_user(kaddr + offset, buf, bytes);
flush_dcache_page(page);
+ kunmap_copy_user(©_user_state);
status = a_ops->commit_write(file, page, offset, offset+bytes);
if (unlikely(page_fault)) {
status = -EFAULT;
@@ -2282,7 +2283,6 @@ generic_file_write(struct file *file, co
buf += status;
}
}
- kunmap(page);
if (!PageReferenced(page))
SetPageReferenced(page);
unlock_page(page);
--- 2.5.25/fs/buffer.c~kmap_atomic_writes Wed Jul 10 21:25:11 2002
+++ 2.5.25-akpm/fs/buffer.c Wed Jul 10 21:33:06 2002
@@ -1804,7 +1804,6 @@ static int __block_prepare_write(struct
int err = 0;
unsigned blocksize, bbits;
struct buffer_head *bh, *head, *wait[2], **wait_bh=wait;
- char *kaddr = kmap(page);
BUG_ON(!PageLocked(page));
BUG_ON(from > PAGE_CACHE_SIZE);
@@ -1845,13 +1844,19 @@ static int __block_prepare_write(struct
set_buffer_uptodate(bh);
continue;
}
- if (block_end > to)
- memset(kaddr+to, 0, block_end-to);
- if (block_start < from)
- memset(kaddr+block_start,
- 0, from-block_start);
- if (block_end > to || block_start < from)
+ if (block_end > to || block_start < from) {
+ void *kaddr;
+
+ kaddr = kmap_atomic(page, KM_USER0);
+ if (block_end > to)
+ memset(kaddr+to, 0,
+ block_end-to);
+ if (block_start < from)
+ memset(kaddr+block_start,
+ 0, from-block_start);
flush_dcache_page(page);
+ kunmap_atomic(kaddr, KM_USER0);
+ }
continue;
}
}
@@ -1890,10 +1895,14 @@ out:
if (block_start >= to)
break;
if (buffer_new(bh)) {
+ void *kaddr;
+
clear_buffer_new(bh);
if (buffer_uptodate(bh))
buffer_error();
+ kaddr = kmap_atomic(page, KM_USER0);
memset(kaddr+block_start, 0, bh->b_size);
+ kunmap_atomic(kaddr, KM_USER0);
set_buffer_uptodate(bh);
mark_buffer_dirty(bh);
}
@@ -1979,9 +1988,10 @@ int block_read_full_page(struct page *pa
SetPageError(page);
}
if (!buffer_mapped(bh)) {
- memset(kmap(page) + i*blocksize, 0, blocksize);
+ void *kaddr = kmap_atomic(page, KM_USER0);
+ memset(kaddr + i * blocksize, 0, blocksize);
flush_dcache_page(page);
- kunmap(page);
+ kunmap_atomic(kaddr, KM_USER0);
set_buffer_uptodate(bh);
continue;
}
@@ -2089,7 +2099,7 @@ int cont_prepare_write(struct page *page
long status;
unsigned zerofrom;
unsigned blocksize = 1 << inode->i_blkbits;
- char *kaddr;
+ void *kaddr;
while(page->index > (pgpos = *bytes>>PAGE_CACHE_SHIFT)) {
status = -ENOMEM;
@@ -2111,12 +2121,12 @@ int cont_prepare_write(struct page *page
PAGE_CACHE_SIZE, get_block);
if (status)
goto out_unmap;
- kaddr = page_address(new_page);
+ kaddr = kmap_atomic(new_page, KM_USER0);
memset(kaddr+zerofrom, 0, PAGE_CACHE_SIZE-zerofrom);
flush_dcache_page(new_page);
+ kunmap_atomic(kaddr, KM_USER0);
__block_commit_write(inode, new_page,
zerofrom, PAGE_CACHE_SIZE);
- kunmap(new_page);
unlock_page(new_page);
page_cache_release(new_page);
}
@@ -2141,21 +2151,20 @@ int cont_prepare_write(struct page *page
status = __block_prepare_write(inode, page, zerofrom, to, get_block);
if (status)
goto out1;
- kaddr = page_address(page);
if (zerofrom < offset) {
+ kaddr = kmap_atomic(page, KM_USER0);
memset(kaddr+zerofrom, 0, offset-zerofrom);
flush_dcache_page(page);
+ kunmap_atomic(kaddr, KM_USER0);
__block_commit_write(inode, page, zerofrom, offset);
}
return 0;
out1:
ClearPageUptodate(page);
- kunmap(page);
return status;
out_unmap:
ClearPageUptodate(new_page);
- kunmap(new_page);
unlock_page(new_page);
page_cache_release(new_page);
out:
@@ -2167,10 +2176,8 @@ int block_prepare_write(struct page *pag
{
struct inode *inode = page->mapping->host;
int err = __block_prepare_write(inode, page, from, to, get_block);
- if (err) {
+ if (err)
ClearPageUptodate(page);
- kunmap(page);
- }
return err;
}
@@ -2178,7 +2185,6 @@ int block_commit_write(struct page *page
{
struct inode *inode = page->mapping->host;
__block_commit_write(inode,page,from,to);
- kunmap(page);
return 0;
}
@@ -2188,7 +2194,6 @@ int generic_commit_write(struct file *fi
struct inode *inode = page->mapping->host;
loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
__block_commit_write(inode,page,from,to);
- kunmap(page);
if (pos > inode->i_size) {
inode->i_size = pos;
mark_inode_dirty(inode);
@@ -2205,6 +2210,7 @@ int block_truncate_page(struct address_s
struct inode *inode = mapping->host;
struct page *page;
struct buffer_head *bh;
+ void *kaddr;
int err;
blocksize = 1 << inode->i_blkbits;
@@ -2257,9 +2263,10 @@ int block_truncate_page(struct address_s
goto unlock;
}
- memset(kmap(page) + offset, 0, length);
+ kaddr = kmap_atomic(page, KM_USER0);
+ memset(kaddr + offset, 0, length);
flush_dcache_page(page);
- kunmap(page);
+ kunmap_atomic(kaddr, KM_USER0);
mark_buffer_dirty(bh);
err = 0;
@@ -2279,7 +2286,7 @@ int block_write_full_page(struct page *p
struct inode * const inode = page->mapping->host;
const unsigned long end_index = inode->i_size >> PAGE_CACHE_SHIFT;
unsigned offset;
- char *kaddr;
+ void *kaddr;
/* Is the page fully inside i_size? */
if (page->index < end_index)
@@ -2293,10 +2300,10 @@ int block_write_full_page(struct page *p
}
/* The page straddles i_size */
- kaddr = kmap(page);
+ kaddr = kmap_atomic(page, KM_USER0);
memset(kaddr + offset, 0, PAGE_CACHE_SIZE - offset);
flush_dcache_page(page);
- kunmap(page);
+ kunmap_atomic(kaddr, KM_USER0);
return __block_write_full_page(inode, page, get_block);
}
--- 2.5.25/fs/ext3/inode.c~kmap_atomic_writes Wed Jul 10 21:34:16 2002
+++ 2.5.25-akpm/fs/ext3/inode.c Wed Jul 10 21:35:08 2002
@@ -1048,16 +1048,6 @@ static int ext3_prepare_write(struct fil
if (ext3_should_journal_data(inode)) {
ret = walk_page_buffers(handle, page_buffers(page),
from, to, NULL, do_journal_get_write_access);
- if (ret) {
- /*
- * We're going to fail this prepare_write(),
- * so commit_write() will not be called.
- * We need to undo block_prepare_write()'s kmap().
- * AKPM: Do we need to clear PageUptodate? I don't
- * think so.
- */
- kunmap(page);
- }
}
prepare_write_failed:
if (ret)
@@ -1117,7 +1107,6 @@ static int ext3_commit_write(struct file
from, to, &partial, commit_write_fn);
if (!partial)
SetPageUptodate(page);
- kunmap(page);
if (pos > inode->i_size)
inode->i_size = pos;
EXT3_I(inode)->i_state |= EXT3_STATE_JDATA;
@@ -1128,17 +1117,8 @@ static int ext3_commit_write(struct file
}
/* Be careful here if generic_commit_write becomes a
* required invocation after block_prepare_write. */
- if (ret == 0) {
+ if (ret == 0)
ret = generic_commit_write(file, page, from, to);
- } else {
- /*
- * block_prepare_write() was called, but we're not
- * going to call generic_commit_write(). So we
- * need to perform generic_commit_write()'s kunmap
- * by hand.
- */
- kunmap(page);
- }
}
if (inode->i_size > EXT3_I(inode)->i_disksize) {
EXT3_I(inode)->i_disksize = inode->i_size;
@@ -1433,6 +1413,7 @@ static int ext3_block_truncate_page(hand
struct page *page;
struct buffer_head *bh;
int err;
+ void *kaddr;
blocksize = inode->i_sb->s_blocksize;
length = offset & (blocksize - 1);
@@ -1488,10 +1469,11 @@ static int ext3_block_truncate_page(hand
if (err)
goto unlock;
}
-
- memset(kmap(page) + offset, 0, length);
+
+ kaddr = kmap_atomic(page, KM_USER0);
+ memset(kaddr + offset, 0, length);
flush_dcache_page(page);
- kunmap(page);
+ kunmap_atomic(kaddr, KM_USER0);
BUFFER_TRACE(bh, "zeroed end of block");
--- 2.5.25/fs/ext2/inode.c~kmap_atomic_writes Wed Jul 10 21:35:31 2002
+++ 2.5.25-akpm/fs/ext2/inode.c Wed Jul 10 21:42:34 2002
@@ -595,12 +595,24 @@ ext2_readpages(struct address_space *map
}
static int
-ext2_prepare_write(struct file *file, struct page *page,
+ext2_prepare_write(struct file *file_may_be_null, struct page *page,
unsigned from, unsigned to)
{
+ if (S_ISDIR(page->mapping->host->i_mode))
+ kmap(page);
return block_prepare_write(page,from,to,ext2_get_block);
}
+static int ext2_commit_write(struct file *file, struct page *page,
+ unsigned from, unsigned to)
+{
+ int ret = generic_commit_write(file, page, from, to);
+
+ if (S_ISDIR(page->mapping->host->i_mode))
+ kunmap(page);
+ return ret;
+}
+
static int ext2_bmap(struct address_space *mapping, long block)
{
return generic_block_bmap(mapping,block,ext2_get_block);
@@ -633,7 +645,7 @@ struct address_space_operations ext2_aop
writepage: ext2_writepage,
sync_page: block_sync_page,
prepare_write: ext2_prepare_write,
- commit_write: generic_commit_write,
+ commit_write: ext2_commit_write,
bmap: ext2_bmap,
direct_IO: ext2_direct_IO,
writepages: ext2_writepages,
-
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2002-07-11 5:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <237170000.1026317715@flay>
2002-07-10 22:18 ` scalable kmap (was Re: vm lock contention reduction) (fwd) Hanna Linder
2002-07-11 3:06 ` Andrew Morton
2002-07-11 5:19 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox