* Re: scalable kmap (was Re: vm lock contention reduction) (fwd) [not found] <237170000.1026317715@flay> @ 2002-07-10 22:18 ` Hanna Linder 2002-07-11 3:06 ` Andrew Morton 0 siblings, 1 reply; 3+ messages in thread From: Hanna Linder @ 2002-07-10 22:18 UTC (permalink / raw) To: Martin J. Bligh, akpm Cc: Keith Mannthey, hannal, haveblue, lse-tech, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1486 bytes --] --On Wednesday, July 10, 2002 09:15:15 -0700 "Martin J. Bligh" <Martin.Bligh@us.ibm.com> wrote: > > Updated patch below ... > > > arch/i386/kernel/i386_ksyms.c | 5 ++ > arch/i386/lib/usercopy.c | 10 +++++ > arch/i386/mm/fault.c | 71 +++++++++++++++++++++++++++++++++++ > fs/exec.c | 60 +++++++++++++++++++++--------- > include/asm-i386/highmem.h | 5 ++ > include/asm-i386/kmap_types.h | 3 + > include/asm-i386/processor.h | 2 + > include/asm-ppc/kmap_types.h | 1 > include/asm-sparc/kmap_types.h | 1 > include/asm-x86_64/kmap_types.h | 1 > include/linux/highmem.h | 80 ++++++++++++++++++++++++++++++++++++++++ > include/linux/sched.h | 5 ++ > mm/filemap.c | 11 +++-- > 13 files changed, 232 insertions(+), 23 deletions(-) Andrew and Martin, I ran this updated patch on 2.5.25 with dbench on the 8-way with 4 Gb of memory compared to clean 2.5.25. I saw a significant improvement in throughput about 15% (averaged over 5 runs each). Included is the pretty picture (akpm-2525.png) the data that picture came from (akpm-2525.data) and the raw results of the runs with min/max and timing results (2525akpmkmaphi and 2525clnhi). I believe the drop at around 64 clients is caused by memory swapping leading to increased disk accesses since the time increased by 200% in direct correlation with the decreased throughput. Hanna [-- Attachment #2: akpm-2525.png --] [-- Type: image/png, Size: 3587 bytes --] [-- Attachment #3: 2525akpmkmaphi --] [-- Type: application/octet-stream, Size: 4599 bytes --] [root@elm3b96 dbench]# ./test_dbench-avg.sh 1 36 5 hikmap Running 1 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 3 lo is 2 thru hi is 81.5701 lo is 76.9333 avg run time 3 avg throughput 79.6601666666667 Running 5 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 4 lo is 3 thru hi is 277.74 lo is 243.343 avg run time 3.33333333333333 avg throughput 264.646333333333 Running 9 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 6 lo is 5 thru hi is 295.914 lo is 288.236 avg run time 5 avg throughput 283.635333333333 Running 13 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 8 lo is 7 thru hi is 291.79 lo is 246.807 avg run time 7 avg throughput 284.074666666667 Running 17 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 9 lo is 9 thru hi is 288.754 lo is 279.672 avg run time 9 avg throughput 282.871666666667 Running 21 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 11 lo is 10 thru hi is 282.556 lo is 279.696 avg run time 11 avg throughput 280.028666666667 Running 25 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 13 lo is 13 thru hi is 283.141 lo is 270.631 avg run time 13 avg throughput 279.706666666667 Running 21 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 11 lo is 10 thru hi is 282.556 lo is 279.696 avg run time 11 avg throughput 280.028666666667 Running 25 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 13 lo is 13 thru hi is 283.141 lo is 270.631 avg run time 13 avg throughput 279.706666666667 Running 29 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 16 lo is 14 thru hi is 287.336 lo is 282.406 avg run time 14.6666666666667 avg throughput 274.366 Running 33 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 17 lo is 16 thru hi is 280.63 lo is 274.984 avg run time 16.6666666666667 avg throughput 277.988 [root@elm3b96 dbench]# ./test_dbench-avg.sh 36 64 5 akpmhi2 Running 36 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 19 lo is 18 thru hi is 284.684 lo is 272.75 avg run time 18 avg throughput 274.103 Running 40 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 22 lo is 20 thru hi is 279.052 lo is 274.726 avg run time 20 avg throughput 270.671666666667 Running 44 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 23 lo is 22 thru hi is 275.928 lo is 261.419 avg run time 22.6666666666667 avg throughput 270.660333333333 Running 48 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 25 lo is 24 thru hi is 276.622 lo is 271.115 avg run time 24 avg throughput 274.449333333333 Running 52 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 28 lo is 25 thru hi is 279.96 lo is 256.425 avg run time 26 avg throughput 273.272333333333 Running 56 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 30 lo is 28 thru hi is 270.907 lo is 268.947 avg run time 28.3333333333333 avg throughput 267.339666666667 Running 60 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 31 lo is 30 thru hi is 273.963 lo is 266.724 avg run time 30 avg throughput 272.487333333333 Running 64 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 883 lo is 33 thru hi is 262.225 lo is 9.57893 avg run time 147.333333333333 avg throughput 117.4783 [-- Attachment #4: 2525clnhi --] [-- Type: application/octet-stream, Size: 4189 bytes --] [root@elm3b96 dbench]# ./test_dbench-avg.sh 1 36 5 2525hi Running 1 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 3 lo is 2 thru hi is 82.3442 lo is 77.5547 avg run time 2.66666666666667 avg throughput 81.7611333333333 Running 5 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 4 lo is 3 thru hi is 260.515 lo is 229.134 avg run time 3.66666666666667 avg throughput 242.980333333333 Running 9 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 6 lo is 5 thru hi is 256.654 lo is 253.473 avg run time 6 avg throughput 250.254666666667 Running 13 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 9 lo is 7 thru hi is 258.22 lo is 207.652 avg run time 8.33333333333333 avg throughput 246.309666666667 Running 17 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 11 lo is 10 thru hi is 252.47 lo is 228.667 avg run time 10 avg throughput 250.657666666667 Running 21 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 13 lo is 12 thru hi is 250.391 lo is 242.239 avg run time 12 avg throughput 244.346333333333 Running 25 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 15 lo is 15 thru hi is 249.358 lo is 226.142 avg run time 15 avg throughput 238.363333333333 Running 29 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 18 lo is 16 thru hi is 256.472 lo is 228.52 avg run time 16.6666666666667 avg throughput 242.994666666667 Running 33 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 20 lo is 18 thru hi is 254.148 lo is 234.099 avg run time 19 avg throughput 241.595333333333 [root@elm3b96 dbench]# ./test_dbench-avg.sh 36 64 5 2525hi Running 36 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 21 lo is 20 thru hi is 250.669 lo is 240.105 avg run time 20.3333333333333 avg throughput 247.037333333333 Running 40 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 23 lo is 22 thru hi is 247.32 lo is 241.098 avg run time 22.6666666666667 avg throughput 243.157333333333 Running 44 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 26 lo is 25 thru hi is 245.733 lo is 237.135 avg run time 25 avg throughput 241.037333333333 unning 48 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 28 lo is 27 thru hi is 244.339 lo is 237.92 avg run time 27 avg throughput 242.203 unning 52 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 30 lo is 29 thru hi is 245.097 lo is 242.522 avg run time 29.3333333333333 avg throughput 240.211 Running 56 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 32 lo is 31 thru hi is 245.913 lo is 236.625 avg run time 31.6666666666667 avg throughput 241.393666666667 Running 60 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 35 lo is 34 thru hi is 243.014 lo is 231.635 avg run time 34.6666666666667 avg throughput 235.398333333333 Running 64 clients test no. 0 test no. 1 test no. 2 test no. 3 test no. 4 time hi is 710 lo is 309 thru hi is 237.763 lo is 11.9168 avg run time 131 avg throughput 83.1543666666667 [-- Attachment #5: akpm-2525.data --] [-- Type: application/octet-stream, Size: 305 bytes --] #clnts #kmap #cln 1 79.66 81.76 5 264.65 242.98 9 283.07 250.25 13 284.07 246.31 17 282.87 250.66 21 280.03 244.35 25 279.70 238.36 29 274.37 242.99 33 277.99 241.60 36 274.10 247.04 40 270.67 243.16 44 270.66 241.04 48 274.45 242.20 52 273.27 240.21 56 267.34 241.39 60 272.49 235.40 64 117.48 83.15 ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: scalable kmap (was Re: vm lock contention reduction) (fwd) 2002-07-10 22:18 ` scalable kmap (was Re: vm lock contention reduction) (fwd) Hanna Linder @ 2002-07-11 3:06 ` Andrew Morton 2002-07-11 5:19 ` Andrew Morton 0 siblings, 1 reply; 3+ messages in thread From: Andrew Morton @ 2002-07-11 3:06 UTC (permalink / raw) To: Hanna Linder Cc: Martin J. Bligh, Keith Mannthey, haveblue, lse-tech, linux-kernel Hanna Linder wrote: > > ... > Andrew and Martin, > > I ran this updated patch on 2.5.25 with dbench on > the 8-way with 4 Gb of memory compared to clean 2.5.25. > I saw a significant improvement in throughput about 15% > (averaged over 5 runs each). Thanks, Hanna. The kernel compile test isn't a particularly heavy user of copy_to_user(), whereas with RAM-only dbench, copy_*_user() is almost the only thing it does. So that makes sense. Tried dbench on the 2.5G 4xPIII Xeon: no improvement at all. This thing seems to have quite poor memory bandwidth - maybe 250 megabyte/sec downhill with the wind at its tail. > Included is the pretty picture (akpm-2525.png) the > data that picture came from (akpm-2525.data) and the raw > results of the runs with min/max and timing results > (2525akpmkmaphi and 2525clnhi). > I believe the drop at around 64 clients is caused by > memory swapping leading to increased disk accesses since the > time increased by 200% in direct correlation with the decreased > throughput. Yes. The test went to disk. There are two reasons why it will do this: 1: Some dirty data was in memory for more than 30-35 seconds or 2: More than 40% of memory is dirty. In your case, the 64-client run was taking 32 seconds. After that the disks lit up. Once that happens, dbench isn't a very good benchmark. It's an excellent benchmark when it's RAM-only though. Very repeatable and hits lots of code paths which matter. You can run more clients before the disk I/O cuts in by increasing /proc/sys/vm/dirty_expire_centisecs and /proc/sys/vm/dirty_*_ratio. The patch you tested only uses the atomic kmap across generic_file_read. It is reasonable to hope that another 15% or morecan be gained by holding an atomic kmap across writes as well. On your machine ;) Here's what oprofile says about `dbench 40' with that patch: c0140f1c 402 0.609543 __block_commit_write c013dfd4 413 0.626222 vfs_write c01402cc 431 0.653515 __find_get_block c013a895 472 0.715683 .text.lock.highmem c017fe30 494 0.749041 ext2_get_block c012cef0 564 0.85518 unlock_page c013ee80 564 0.85518 fget c01079f4 571 0.865794 apic_timer_interrupt c01e8ecc 594 0.900669 radix_tree_lookup c013da90 597 0.905218 generic_file_llseek c01514b4 607 0.92038 __d_lookup c0106ff8 687 1.04168 system_call c013a02c 874 1.32523 kunmap_high c0148388 922 1.39801 link_path_walk c0140b00 1097 1.66336 __block_prepare_write c01346d0 1138 1.72552 rmqueue c01127ac 1243 1.88473 smp_apic_timer_interrupt c0139eb8 1514 2.29564 kmap_high c0105368 6188 9.38272 poll_idle c012d8a8 9564 14.5017 file_read_actor c012ea70 21326 32.3361 generic_file_write Not taking a kmap in generic_file_write is a biggish patch - it means changing the prepare_write/commit_write API and visiting all filesystems. The API change would be: core kernel no longer holds a kmap across prepare/commit. If the filesystem wants one for its own purposes then it gets to do it for itself, possibly in its prepare_write(). I think I'd prefer to get some additional testing and understanding before undertaking that work. It arguably makes sense as a small cleanup/speedup anyway, but that's not a burning issue. hmm. I'll do just ext2, and we can take another look then. - ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: scalable kmap (was Re: vm lock contention reduction) (fwd) 2002-07-11 3:06 ` Andrew Morton @ 2002-07-11 5:19 ` Andrew Morton 0 siblings, 0 replies; 3+ messages in thread From: Andrew Morton @ 2002-07-11 5:19 UTC (permalink / raw) To: Hanna Linder, Martin J. Bligh, Keith Mannthey, haveblue, lse-tech, linux-kernel Andrew Morton wrote: > > Not taking a kmap in generic_file_write is a biggish patch OK, so I'm full of it. It's actually quite simple and clean. This patch is incremental to the one which you just tested. It takes an atomic kmap across generic_file_write(). There's a copy of these patches at http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.25/ The patch also teaches ext2 and ext3 about the new API. If you're using any other filesystems they will probably oops. ext2 is still using regular kmap() for directory operations. And this time, it actually buys 3% improvement on my lumbering hulk, so your machine will hopefully see nice improvements. Profile looks better too. kmap_high and the IPI have vanished. c013c164 304 0.50923 __set_page_dirty_nobuffers c0141008 355 0.59466 __block_commit_write c013e004 382 0.639887 vfs_write c01402fc 410 0.68679 __find_get_block c017fd70 471 0.788971 ext2_get_block c01e8eec 530 0.887802 radix_tree_lookup c012cef0 542 0.907903 unlock_page c013eeb0 562 0.941405 fget c013dac0 585 0.979932 generic_file_llseek c01514e4 600 1.00506 __d_lookup c0106ff8 635 1.06369 system_call c01483b8 872 1.46069 link_path_walk c0134700 1014 1.69855 rmqueue c0140b30 1191 1.99504 __block_prepare_write c0105368 4029 6.74897 poll_idle c012d8a8 9520 15.9469 file_read_actor c012ea70 23301 39.0315 generic_file_write fs/buffer.c | 57 +++++++++++++++++++++++++++++++------------------------- fs/ext2/inode.c | 16 +++++++++++++-- fs/ext3/inode.c | 30 +++++------------------------ mm/filemap.c | 6 ++--- 4 files changed, 55 insertions(+), 54 deletions(-) --- 2.5.25/mm/filemap.c~kmap_atomic_writes Wed Jul 10 21:18:19 2002 +++ 2.5.25-akpm/mm/filemap.c Wed Jul 10 21:24:41 2002 @@ -2228,6 +2228,7 @@ generic_file_write(struct file *file, co unsigned long offset; long page_fault; char *kaddr; + struct copy_user_state copy_user_state; offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */ index = pos >> PAGE_CACHE_SHIFT; @@ -2252,22 +2253,22 @@ generic_file_write(struct file *file, co break; } - kaddr = kmap(page); status = a_ops->prepare_write(file, page, offset, offset+bytes); if (unlikely(status)) { /* * prepare_write() may have instantiated a few blocks * outside i_size. Trim these off again. */ - kunmap(page); unlock_page(page); page_cache_release(page); if (pos + bytes > inode->i_size) vmtruncate(inode, inode->i_size); break; } + kaddr = kmap_copy_user(©_user_state, page, KM_FILEMAP, 0); page_fault = __copy_from_user(kaddr + offset, buf, bytes); flush_dcache_page(page); + kunmap_copy_user(©_user_state); status = a_ops->commit_write(file, page, offset, offset+bytes); if (unlikely(page_fault)) { status = -EFAULT; @@ -2282,7 +2283,6 @@ generic_file_write(struct file *file, co buf += status; } } - kunmap(page); if (!PageReferenced(page)) SetPageReferenced(page); unlock_page(page); --- 2.5.25/fs/buffer.c~kmap_atomic_writes Wed Jul 10 21:25:11 2002 +++ 2.5.25-akpm/fs/buffer.c Wed Jul 10 21:33:06 2002 @@ -1804,7 +1804,6 @@ static int __block_prepare_write(struct int err = 0; unsigned blocksize, bbits; struct buffer_head *bh, *head, *wait[2], **wait_bh=wait; - char *kaddr = kmap(page); BUG_ON(!PageLocked(page)); BUG_ON(from > PAGE_CACHE_SIZE); @@ -1845,13 +1844,19 @@ static int __block_prepare_write(struct set_buffer_uptodate(bh); continue; } - if (block_end > to) - memset(kaddr+to, 0, block_end-to); - if (block_start < from) - memset(kaddr+block_start, - 0, from-block_start); - if (block_end > to || block_start < from) + if (block_end > to || block_start < from) { + void *kaddr; + + kaddr = kmap_atomic(page, KM_USER0); + if (block_end > to) + memset(kaddr+to, 0, + block_end-to); + if (block_start < from) + memset(kaddr+block_start, + 0, from-block_start); flush_dcache_page(page); + kunmap_atomic(kaddr, KM_USER0); + } continue; } } @@ -1890,10 +1895,14 @@ out: if (block_start >= to) break; if (buffer_new(bh)) { + void *kaddr; + clear_buffer_new(bh); if (buffer_uptodate(bh)) buffer_error(); + kaddr = kmap_atomic(page, KM_USER0); memset(kaddr+block_start, 0, bh->b_size); + kunmap_atomic(kaddr, KM_USER0); set_buffer_uptodate(bh); mark_buffer_dirty(bh); } @@ -1979,9 +1988,10 @@ int block_read_full_page(struct page *pa SetPageError(page); } if (!buffer_mapped(bh)) { - memset(kmap(page) + i*blocksize, 0, blocksize); + void *kaddr = kmap_atomic(page, KM_USER0); + memset(kaddr + i * blocksize, 0, blocksize); flush_dcache_page(page); - kunmap(page); + kunmap_atomic(kaddr, KM_USER0); set_buffer_uptodate(bh); continue; } @@ -2089,7 +2099,7 @@ int cont_prepare_write(struct page *page long status; unsigned zerofrom; unsigned blocksize = 1 << inode->i_blkbits; - char *kaddr; + void *kaddr; while(page->index > (pgpos = *bytes>>PAGE_CACHE_SHIFT)) { status = -ENOMEM; @@ -2111,12 +2121,12 @@ int cont_prepare_write(struct page *page PAGE_CACHE_SIZE, get_block); if (status) goto out_unmap; - kaddr = page_address(new_page); + kaddr = kmap_atomic(new_page, KM_USER0); memset(kaddr+zerofrom, 0, PAGE_CACHE_SIZE-zerofrom); flush_dcache_page(new_page); + kunmap_atomic(kaddr, KM_USER0); __block_commit_write(inode, new_page, zerofrom, PAGE_CACHE_SIZE); - kunmap(new_page); unlock_page(new_page); page_cache_release(new_page); } @@ -2141,21 +2151,20 @@ int cont_prepare_write(struct page *page status = __block_prepare_write(inode, page, zerofrom, to, get_block); if (status) goto out1; - kaddr = page_address(page); if (zerofrom < offset) { + kaddr = kmap_atomic(page, KM_USER0); memset(kaddr+zerofrom, 0, offset-zerofrom); flush_dcache_page(page); + kunmap_atomic(kaddr, KM_USER0); __block_commit_write(inode, page, zerofrom, offset); } return 0; out1: ClearPageUptodate(page); - kunmap(page); return status; out_unmap: ClearPageUptodate(new_page); - kunmap(new_page); unlock_page(new_page); page_cache_release(new_page); out: @@ -2167,10 +2176,8 @@ int block_prepare_write(struct page *pag { struct inode *inode = page->mapping->host; int err = __block_prepare_write(inode, page, from, to, get_block); - if (err) { + if (err) ClearPageUptodate(page); - kunmap(page); - } return err; } @@ -2178,7 +2185,6 @@ int block_commit_write(struct page *page { struct inode *inode = page->mapping->host; __block_commit_write(inode,page,from,to); - kunmap(page); return 0; } @@ -2188,7 +2194,6 @@ int generic_commit_write(struct file *fi struct inode *inode = page->mapping->host; loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to; __block_commit_write(inode,page,from,to); - kunmap(page); if (pos > inode->i_size) { inode->i_size = pos; mark_inode_dirty(inode); @@ -2205,6 +2210,7 @@ int block_truncate_page(struct address_s struct inode *inode = mapping->host; struct page *page; struct buffer_head *bh; + void *kaddr; int err; blocksize = 1 << inode->i_blkbits; @@ -2257,9 +2263,10 @@ int block_truncate_page(struct address_s goto unlock; } - memset(kmap(page) + offset, 0, length); + kaddr = kmap_atomic(page, KM_USER0); + memset(kaddr + offset, 0, length); flush_dcache_page(page); - kunmap(page); + kunmap_atomic(kaddr, KM_USER0); mark_buffer_dirty(bh); err = 0; @@ -2279,7 +2286,7 @@ int block_write_full_page(struct page *p struct inode * const inode = page->mapping->host; const unsigned long end_index = inode->i_size >> PAGE_CACHE_SHIFT; unsigned offset; - char *kaddr; + void *kaddr; /* Is the page fully inside i_size? */ if (page->index < end_index) @@ -2293,10 +2300,10 @@ int block_write_full_page(struct page *p } /* The page straddles i_size */ - kaddr = kmap(page); + kaddr = kmap_atomic(page, KM_USER0); memset(kaddr + offset, 0, PAGE_CACHE_SIZE - offset); flush_dcache_page(page); - kunmap(page); + kunmap_atomic(kaddr, KM_USER0); return __block_write_full_page(inode, page, get_block); } --- 2.5.25/fs/ext3/inode.c~kmap_atomic_writes Wed Jul 10 21:34:16 2002 +++ 2.5.25-akpm/fs/ext3/inode.c Wed Jul 10 21:35:08 2002 @@ -1048,16 +1048,6 @@ static int ext3_prepare_write(struct fil if (ext3_should_journal_data(inode)) { ret = walk_page_buffers(handle, page_buffers(page), from, to, NULL, do_journal_get_write_access); - if (ret) { - /* - * We're going to fail this prepare_write(), - * so commit_write() will not be called. - * We need to undo block_prepare_write()'s kmap(). - * AKPM: Do we need to clear PageUptodate? I don't - * think so. - */ - kunmap(page); - } } prepare_write_failed: if (ret) @@ -1117,7 +1107,6 @@ static int ext3_commit_write(struct file from, to, &partial, commit_write_fn); if (!partial) SetPageUptodate(page); - kunmap(page); if (pos > inode->i_size) inode->i_size = pos; EXT3_I(inode)->i_state |= EXT3_STATE_JDATA; @@ -1128,17 +1117,8 @@ static int ext3_commit_write(struct file } /* Be careful here if generic_commit_write becomes a * required invocation after block_prepare_write. */ - if (ret == 0) { + if (ret == 0) ret = generic_commit_write(file, page, from, to); - } else { - /* - * block_prepare_write() was called, but we're not - * going to call generic_commit_write(). So we - * need to perform generic_commit_write()'s kunmap - * by hand. - */ - kunmap(page); - } } if (inode->i_size > EXT3_I(inode)->i_disksize) { EXT3_I(inode)->i_disksize = inode->i_size; @@ -1433,6 +1413,7 @@ static int ext3_block_truncate_page(hand struct page *page; struct buffer_head *bh; int err; + void *kaddr; blocksize = inode->i_sb->s_blocksize; length = offset & (blocksize - 1); @@ -1488,10 +1469,11 @@ static int ext3_block_truncate_page(hand if (err) goto unlock; } - - memset(kmap(page) + offset, 0, length); + + kaddr = kmap_atomic(page, KM_USER0); + memset(kaddr + offset, 0, length); flush_dcache_page(page); - kunmap(page); + kunmap_atomic(kaddr, KM_USER0); BUFFER_TRACE(bh, "zeroed end of block"); --- 2.5.25/fs/ext2/inode.c~kmap_atomic_writes Wed Jul 10 21:35:31 2002 +++ 2.5.25-akpm/fs/ext2/inode.c Wed Jul 10 21:42:34 2002 @@ -595,12 +595,24 @@ ext2_readpages(struct address_space *map } static int -ext2_prepare_write(struct file *file, struct page *page, +ext2_prepare_write(struct file *file_may_be_null, struct page *page, unsigned from, unsigned to) { + if (S_ISDIR(page->mapping->host->i_mode)) + kmap(page); return block_prepare_write(page,from,to,ext2_get_block); } +static int ext2_commit_write(struct file *file, struct page *page, + unsigned from, unsigned to) +{ + int ret = generic_commit_write(file, page, from, to); + + if (S_ISDIR(page->mapping->host->i_mode)) + kunmap(page); + return ret; +} + static int ext2_bmap(struct address_space *mapping, long block) { return generic_block_bmap(mapping,block,ext2_get_block); @@ -633,7 +645,7 @@ struct address_space_operations ext2_aop writepage: ext2_writepage, sync_page: block_sync_page, prepare_write: ext2_prepare_write, - commit_write: generic_commit_write, + commit_write: ext2_commit_write, bmap: ext2_bmap, direct_IO: ext2_direct_IO, writepages: ext2_writepages, - ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2002-07-11 5:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <237170000.1026317715@flay>
2002-07-10 22:18 ` scalable kmap (was Re: vm lock contention reduction) (fwd) Hanna Linder
2002-07-11 3:06 ` Andrew Morton
2002-07-11 5:19 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox