* [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2
@ 2007-05-14 6:06 npiggin
2007-05-14 6:06 ` [patch 01/41] mm: revert KERNEL_DS buffered write optimisation npiggin
` (41 more replies)
0 siblings, 42 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel
--
Here is an update against 2.6.21-mm2. Unfortunately UML broke for me, so
test coverage isn't so good as the last time I posted the series. Also,
several filesystems had significant clashes. Considering the amount of
time it took to get them working, I won't fix them again. They aren't
_broken_ as such, they'll just run slowly (but without the deadlock).
The OCFS2 patch seemed to have some clashes too, so I've left that out.
I'm sure Mark will take a look at that quickly if this patchset were to
get merged.
Thanks to Neil for some documentation suggestions and catching a bug, and
to Vladimir for the reiserfs implementation (not 100% done yet, but it is
a good start).
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 01/41] mm: revert KERNEL_DS buffered write optimisation
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 02/41] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6 npiggin
` (40 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management, Neil Brown
[-- Attachment #1: mm-revert-nfsd-writev-opt.patch --]
[-- Type: text/plain, Size: 2023 bytes --]
Revert the patch from Neil Brown to optimise NFSD writev handling.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
mm/filemap.c | 32 +++++++++++++-------------------
1 file changed, 13 insertions(+), 19 deletions(-)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1936,27 +1936,21 @@ generic_file_buffered_write(struct kiocb
/* Limit the size of the copy to the caller's write size */
bytes = min(bytes, count);
- /* We only need to worry about prefaulting when writes are from
- * user-space. NFSd uses vfs_writev with several non-aligned
- * segments in the vector, and limiting to one segment a time is
- * a noticeable performance for re-write
+ /*
+ * Limit the size of the copy to that of the current segment,
+ * because fault_in_pages_readable() doesn't know how to walk
+ * segments.
*/
- if (!segment_eq(get_fs(), KERNEL_DS)) {
- /*
- * Limit the size of the copy to that of the current
- * segment, because fault_in_pages_readable() doesn't
- * know how to walk segments.
- */
- bytes = min(bytes, cur_iov->iov_len - iov_base);
+ bytes = min(bytes, cur_iov->iov_len - iov_base);
+
+ /*
+ * Bring in the user page that we will copy from _first_.
+ * Otherwise there's a nasty deadlock on copying from the
+ * same page as we're writing to, without it being marked
+ * up-to-date.
+ */
+ fault_in_pages_readable(buf, bytes);
- /*
- * Bring in the user page that we will copy from
- * _first_. Otherwise there's a nasty deadlock on
- * copying from the same page as we're writing to,
- * without it being marked up-to-date.
- */
- fault_in_pages_readable(buf, bytes);
- }
page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
if (!page) {
status = -ENOMEM;
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 02/41] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
2007-05-14 6:06 ` [patch 01/41] mm: revert KERNEL_DS buffered write optimisation npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 19:06 ` Dave Jones
2007-05-14 6:06 ` [patch 03/41] Revert 6527c2bdf1f833cc18e8f42bd97973d583e4aa83 npiggin
` (39 subsequent siblings)
41 siblings, 1 reply; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management, Andrew Morton
[-- Attachment #1: mm-revert-buffered-write-zero-length-iov.patch --]
[-- Type: text/plain, Size: 1660 bytes --]
From: Andrew Morton <akpm@osdl.org>
This was a bugfix against 6527c2bdf1f833cc18e8f42bd97973d583e4aa83, which we
also revert.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
mm/filemap.c | 9 +--------
mm/filemap.h | 4 ++--
2 files changed, 3 insertions(+), 10 deletions(-)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1957,12 +1957,6 @@ generic_file_buffered_write(struct kiocb
break;
}
- if (unlikely(bytes == 0)) {
- status = 0;
- copied = 0;
- goto zero_length_segment;
- }
-
status = a_ops->prepare_write(file, page, offset, offset+bytes);
if (unlikely(status)) {
loff_t isize = i_size_read(inode);
@@ -1992,8 +1986,7 @@ generic_file_buffered_write(struct kiocb
page_cache_release(page);
continue;
}
-zero_length_segment:
- if (likely(copied >= 0)) {
+ if (likely(copied > 0)) {
if (!status)
status = copied;
Index: linux-2.6/mm/filemap.h
===================================================================
--- linux-2.6.orig/mm/filemap.h
+++ linux-2.6/mm/filemap.h
@@ -87,7 +87,7 @@ filemap_set_next_iovec(const struct iove
const struct iovec *iov = *iovp;
size_t base = *basep;
- do {
+ while (bytes) {
int copy = min(bytes, iov->iov_len - base);
bytes -= copy;
@@ -96,7 +96,7 @@ filemap_set_next_iovec(const struct iove
iov++;
base = 0;
}
- } while (bytes);
+ }
*iovp = iov;
*basep = base;
}
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 03/41] Revert 6527c2bdf1f833cc18e8f42bd97973d583e4aa83
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
2007-05-14 6:06 ` [patch 01/41] mm: revert KERNEL_DS buffered write optimisation npiggin
2007-05-14 6:06 ` [patch 02/41] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6 npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 04/41] mm: clean up buffered write code npiggin
` (38 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management, Andrew Morton
[-- Attachment #1: mm-revert-buffered-write-deadlock-fix.patch --]
[-- Type: text/plain, Size: 2922 bytes --]
From: Andrew Morton <akpm@osdl.org>
This patch fixed the following bug:
When prefaulting in the pages in generic_file_buffered_write(), we only
faulted in the pages for the firts segment of the iovec. If the second of
successive segment described a mmapping of the page into which we're
write()ing, and that page is not up-to-date, the fault handler tries to lock
the already-locked page (to bring it up to date) and deadlocks.
An exploit for this bug is in writev-deadlock-demo.c, in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.
(These demos assume blocksize < PAGE_CACHE_SIZE).
The problem with this fix is that it takes the kernel back to doing a single
prepare_write()/commit_write() per iovec segment. So in the worst case we'll
run prepare_write+commit_write 1024 times where we previously would have run
it once. The other problem with the fix is that it fix all the locking problems.
<insert numbers obtained via ext3-tools's writev-speed.c here>
And apparently this change killed NFS overwrite performance, because, I
suppose, it talks to the server for each prepare_write+commit_write.
So just back that patch out - we'll be fixing the deadlock by other means.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Nick says: also it only ever actually papered over the bug, because after
faulting in the pages, they might be unmapped or reclaimed.
Signed-off-by: Nick Piggin <npiggin@suse.de>
mm/filemap.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1927,21 +1927,14 @@ generic_file_buffered_write(struct kiocb
do {
unsigned long index;
unsigned long offset;
+ unsigned long maxlen;
size_t copied;
offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
index = pos >> PAGE_CACHE_SHIFT;
bytes = PAGE_CACHE_SIZE - offset;
-
- /* Limit the size of the copy to the caller's write size */
- bytes = min(bytes, count);
-
- /*
- * Limit the size of the copy to that of the current segment,
- * because fault_in_pages_readable() doesn't know how to walk
- * segments.
- */
- bytes = min(bytes, cur_iov->iov_len - iov_base);
+ if (bytes > count)
+ bytes = count;
/*
* Bring in the user page that we will copy from _first_.
@@ -1949,7 +1942,10 @@ generic_file_buffered_write(struct kiocb
* same page as we're writing to, without it being marked
* up-to-date.
*/
- fault_in_pages_readable(buf, bytes);
+ maxlen = cur_iov->iov_len - iov_base;
+ if (maxlen > bytes)
+ maxlen = bytes;
+ fault_in_pages_readable(buf, maxlen);
page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
if (!page) {
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 04/41] mm: clean up buffered write code
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (2 preceding siblings ...)
2007-05-14 6:06 ` [patch 03/41] Revert 6527c2bdf1f833cc18e8f42bd97973d583e4aa83 npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 05/41] mm: debug write deadlocks npiggin
` (37 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management, Andrew Morton
[-- Attachment #1: mm-generic_file_buffered_write-cleanup.patch --]
[-- Type: text/plain, Size: 3417 bytes --]
From: Andrew Morton <akpm@osdl.org>
Rename some variables and fix some types.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
mm/filemap.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1900,16 +1900,15 @@ generic_file_buffered_write(struct kiocb
size_t count, ssize_t written)
{
struct file *file = iocb->ki_filp;
- struct address_space * mapping = file->f_mapping;
+ struct address_space *mapping = file->f_mapping;
const struct address_space_operations *a_ops = mapping->a_ops;
struct inode *inode = mapping->host;
long status = 0;
struct page *page;
struct page *cached_page = NULL;
- size_t bytes;
struct pagevec lru_pvec;
const struct iovec *cur_iov = iov; /* current iovec */
- size_t iov_base = 0; /* offset in the current iovec */
+ size_t iov_offset = 0; /* offset in the current iovec */
char __user *buf;
pagevec_init(&lru_pvec, 0);
@@ -1920,31 +1919,33 @@ generic_file_buffered_write(struct kiocb
if (likely(nr_segs == 1))
buf = iov->iov_base + written;
else {
- filemap_set_next_iovec(&cur_iov, &iov_base, written);
- buf = cur_iov->iov_base + iov_base;
+ filemap_set_next_iovec(&cur_iov, &iov_offset, written);
+ buf = cur_iov->iov_base + iov_offset;
}
do {
- unsigned long index;
- unsigned long offset;
- unsigned long maxlen;
- size_t copied;
+ pgoff_t index; /* Pagecache index for current page */
+ unsigned long offset; /* Offset into pagecache page */
+ unsigned long maxlen; /* Bytes remaining in current iovec */
+ size_t bytes; /* Bytes to write to page */
+ size_t copied; /* Bytes copied from user */
- offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
+ offset = (pos & (PAGE_CACHE_SIZE - 1));
index = pos >> PAGE_CACHE_SHIFT;
bytes = PAGE_CACHE_SIZE - offset;
if (bytes > count)
bytes = count;
+ maxlen = cur_iov->iov_len - iov_offset;
+ if (maxlen > bytes)
+ maxlen = bytes;
+
/*
* Bring in the user page that we will copy from _first_.
* Otherwise there's a nasty deadlock on copying from the
* same page as we're writing to, without it being marked
* up-to-date.
*/
- maxlen = cur_iov->iov_len - iov_base;
- if (maxlen > bytes)
- maxlen = bytes;
fault_in_pages_readable(buf, maxlen);
page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
@@ -1975,7 +1976,7 @@ generic_file_buffered_write(struct kiocb
buf, bytes);
else
copied = filemap_copy_from_user_iovec(page, offset,
- cur_iov, iov_base, bytes);
+ cur_iov, iov_offset, bytes);
flush_dcache_page(page);
status = a_ops->commit_write(file, page, offset, offset+bytes);
if (status == AOP_TRUNCATED_PAGE) {
@@ -1993,12 +1994,12 @@ generic_file_buffered_write(struct kiocb
buf += status;
if (unlikely(nr_segs > 1)) {
filemap_set_next_iovec(&cur_iov,
- &iov_base, status);
+ &iov_offset, status);
if (count)
buf = cur_iov->iov_base +
- iov_base;
+ iov_offset;
} else {
- iov_base += status;
+ iov_offset += status;
}
}
}
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 05/41] mm: debug write deadlocks
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (3 preceding siblings ...)
2007-05-14 6:06 ` [patch 04/41] mm: clean up buffered write code npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 06/41] mm: trim more holes npiggin
` (36 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management
[-- Attachment #1: mm-debug-write-deadlocks.patch --]
[-- Type: text/plain, Size: 1136 bytes --]
Allow CONFIG_DEBUG_VM to switch off the prefaulting logic, to simulate the
difficult race where the page may be unmapped before calling copy_from_user.
Makes the race much easier to hit.
This is useful for demonstration and testing purposes, but is removed in a
subsequent patch.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
mm/filemap.c | 2 ++
1 file changed, 2 insertions(+)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1940,6 +1940,7 @@ generic_file_buffered_write(struct kiocb
if (maxlen > bytes)
maxlen = bytes;
+#ifndef CONFIG_DEBUG_VM
/*
* Bring in the user page that we will copy from _first_.
* Otherwise there's a nasty deadlock on copying from the
@@ -1947,6 +1948,7 @@ generic_file_buffered_write(struct kiocb
* up-to-date.
*/
fault_in_pages_readable(buf, maxlen);
+#endif
page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
if (!page) {
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 06/41] mm: trim more holes
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (4 preceding siblings ...)
2007-05-14 6:06 ` [patch 05/41] mm: debug write deadlocks npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 07/41] mm: buffered write cleanup npiggin
` (35 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management
[-- Attachment #1: mm-trim-more-holes.patch --]
[-- Type: text/plain, Size: 3382 bytes --]
If prepare_write fails with AOP_TRUNCATED_PAGE, or if commit_write fails, then
we may have failed the write operation despite prepare_write having
instantiated blocks past i_size. Fix this, and consolidate the trimming into
one place.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
mm/filemap.c | 80 +++++++++++++++++++++++++++++------------------------------
1 file changed, 40 insertions(+), 40 deletions(-)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1957,22 +1957,9 @@ generic_file_buffered_write(struct kiocb
}
status = a_ops->prepare_write(file, page, offset, offset+bytes);
- if (unlikely(status)) {
- loff_t isize = i_size_read(inode);
+ if (unlikely(status))
+ goto fs_write_aop_error;
- if (status != AOP_TRUNCATED_PAGE)
- unlock_page(page);
- page_cache_release(page);
- if (status == AOP_TRUNCATED_PAGE)
- continue;
- /*
- * prepare_write() may have instantiated a few blocks
- * outside i_size. Trim these off again.
- */
- if (pos + bytes > isize)
- vmtruncate(inode, isize);
- break;
- }
if (likely(nr_segs == 1))
copied = filemap_copy_from_user(page, offset,
buf, bytes);
@@ -1981,40 +1968,53 @@ generic_file_buffered_write(struct kiocb
cur_iov, iov_offset, bytes);
flush_dcache_page(page);
status = a_ops->commit_write(file, page, offset, offset+bytes);
- if (status == AOP_TRUNCATED_PAGE) {
- page_cache_release(page);
- continue;
+ if (unlikely(status < 0 || status == AOP_TRUNCATED_PAGE))
+ goto fs_write_aop_error;
+ if (unlikely(copied != bytes)) {
+ status = -EFAULT;
+ goto fs_write_aop_error;
}
- if (likely(copied > 0)) {
- if (!status)
- status = copied;
+ if (unlikely(status > 0)) /* filesystem did partial write */
+ copied = status;
- if (status >= 0) {
- written += status;
- count -= status;
- pos += status;
- buf += status;
- if (unlikely(nr_segs > 1)) {
- filemap_set_next_iovec(&cur_iov,
- &iov_offset, status);
- if (count)
- buf = cur_iov->iov_base +
- iov_offset;
- } else {
- iov_offset += status;
- }
+ if (likely(copied > 0)) {
+ written += copied;
+ count -= copied;
+ pos += copied;
+ buf += copied;
+ if (unlikely(nr_segs > 1)) {
+ filemap_set_next_iovec(&cur_iov,
+ &iov_offset, copied);
+ if (count)
+ buf = cur_iov->iov_base + iov_offset;
+ } else {
+ iov_offset += copied;
}
}
- if (unlikely(copied != bytes))
- if (status >= 0)
- status = -EFAULT;
unlock_page(page);
mark_page_accessed(page);
page_cache_release(page);
- if (status < 0)
- break;
balance_dirty_pages_ratelimited(mapping);
cond_resched();
+ continue;
+
+fs_write_aop_error:
+ if (status != AOP_TRUNCATED_PAGE)
+ unlock_page(page);
+ page_cache_release(page);
+
+ /*
+ * prepare_write() may have instantiated a few blocks
+ * outside i_size. Trim these off again. Don't need
+ * i_size_read because we hold i_mutex.
+ */
+ if (pos + bytes > inode->i_size)
+ vmtruncate(inode, inode->i_size);
+ if (status == AOP_TRUNCATED_PAGE)
+ continue;
+ else
+ break;
+
} while (count);
*ppos = pos;
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 07/41] mm: buffered write cleanup
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (5 preceding siblings ...)
2007-05-14 6:06 ` [patch 06/41] mm: trim more holes npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 08/41] mm: write iovec cleanup npiggin
` (34 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management
[-- Attachment #1: mm-buffered-write-cleanup.patch --]
[-- Type: text/plain, Size: 11126 bytes --]
Quite a bit of code is used in maintaining these "cached pages" that are
probably pretty unlikely to get used. It would require a narrow race where
the page is inserted concurrently while this process is allocating a page
in order to create the spare page. Then a multi-page write into an uncached
part of the file, to make use of it.
Next, the buffered write path (and others) uses its own LRU pagevec when it
should be just using the per-CPU LRU pagevec (which will cut down on both data
and code size cacheline footprint). Also, these private LRU pagevecs are
emptied after just a very short time, in contrast with the per-CPU pagevecs
that are persistent. Net result: 7.3 times fewer lru_lock acquisitions required
to add the pages to pagecache for a bulk write (in 4K chunks).
[this gets rid of some cond_resched() calls in readahead.c and mpage.c due
to clashes in -mm. What put them there, and why? ]
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/mpage.c | 12 ----
mm/filemap.c | 144 ++++++++++++++++++++++-----------------------------------
mm/readahead.c | 28 +++--------
3 files changed, 66 insertions(+), 118 deletions(-)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -666,26 +666,22 @@ EXPORT_SYMBOL(find_lock_page);
struct page *find_or_create_page(struct address_space *mapping,
unsigned long index, gfp_t gfp_mask)
{
- struct page *page, *cached_page = NULL;
+ struct page *page;
int err;
repeat:
page = find_lock_page(mapping, index);
if (!page) {
- if (!cached_page) {
- cached_page = alloc_page(gfp_mask);
- if (!cached_page)
- return NULL;
- }
- err = add_to_page_cache_lru(cached_page, mapping,
- index, gfp_mask);
- if (!err) {
- page = cached_page;
- cached_page = NULL;
- } else if (err == -EEXIST)
- goto repeat;
+ page = alloc_page(gfp_mask);
+ if (!page)
+ return NULL;
+ err = add_to_page_cache_lru(page, mapping, index, gfp_mask);
+ if (unlikely(err)) {
+ page_cache_release(page);
+ page = NULL;
+ if (err == -EEXIST)
+ goto repeat;
+ }
}
- if (cached_page)
- page_cache_release(cached_page);
return page;
}
EXPORT_SYMBOL(find_or_create_page);
@@ -882,11 +878,9 @@ void do_generic_mapping_read(struct addr
unsigned long prev_index;
unsigned int prev_offset;
loff_t isize;
- struct page *cached_page;
int error;
struct file_ra_state ra = *_ra;
- cached_page = NULL;
index = *ppos >> PAGE_CACHE_SHIFT;
next_index = index;
prev_index = ra.prev_index;
@@ -1053,23 +1047,20 @@ no_cached_page:
* Ok, it wasn't cached, so we need to create a new
* page..
*/
- if (!cached_page) {
- cached_page = page_cache_alloc_cold(mapping);
- if (!cached_page) {
- desc->error = -ENOMEM;
- goto out;
- }
+ page = page_cache_alloc_cold(mapping);
+ if (!page) {
+ desc->error = -ENOMEM;
+ goto out;
}
- error = add_to_page_cache_lru(cached_page, mapping,
+ error = add_to_page_cache_lru(page, mapping,
index, GFP_KERNEL);
if (error) {
+ page_cache_release(page);
if (error == -EEXIST)
goto find_page;
desc->error = error;
goto out;
}
- page = cached_page;
- cached_page = NULL;
goto readpage;
}
@@ -1077,8 +1068,6 @@ out:
*_ra = ra;
*ppos = ((loff_t) index << PAGE_CACHE_SHIFT) + offset;
- if (cached_page)
- page_cache_release(cached_page);
if (filp)
file_accessed(filp);
}
@@ -1561,35 +1550,28 @@ static struct page *__read_cache_page(st
int (*filler)(void *,struct page*),
void *data)
{
- struct page *page, *cached_page = NULL;
+ struct page *page;
int err;
repeat:
page = find_get_page(mapping, index);
if (!page) {
- if (!cached_page) {
- cached_page = page_cache_alloc_cold(mapping);
- if (!cached_page)
- return ERR_PTR(-ENOMEM);
- }
- err = add_to_page_cache_lru(cached_page, mapping,
- index, GFP_KERNEL);
- if (err == -EEXIST)
- goto repeat;
- if (err < 0) {
+ page = page_cache_alloc_cold(mapping);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+ err = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL);
+ if (unlikely(err)) {
+ page_cache_release(page);
+ if (err == -EEXIST)
+ goto repeat;
/* Presumably ENOMEM for radix tree node */
- page_cache_release(cached_page);
return ERR_PTR(err);
}
- page = cached_page;
- cached_page = NULL;
err = filler(data, page);
if (err < 0) {
page_cache_release(page);
page = ERR_PTR(err);
}
}
- if (cached_page)
- page_cache_release(cached_page);
return page;
}
@@ -1667,40 +1649,6 @@ struct page *read_cache_page(struct addr
EXPORT_SYMBOL(read_cache_page);
/*
- * If the page was newly created, increment its refcount and add it to the
- * caller's lru-buffering pagevec. This function is specifically for
- * generic_file_write().
- */
-static inline struct page *
-__grab_cache_page(struct address_space *mapping, unsigned long index,
- struct page **cached_page, struct pagevec *lru_pvec)
-{
- int err;
- struct page *page;
-repeat:
- page = find_lock_page(mapping, index);
- if (!page) {
- if (!*cached_page) {
- *cached_page = page_cache_alloc(mapping);
- if (!*cached_page)
- return NULL;
- }
- err = add_to_page_cache(*cached_page, mapping,
- index, GFP_KERNEL);
- if (err == -EEXIST)
- goto repeat;
- if (err == 0) {
- page = *cached_page;
- page_cache_get(page);
- if (!pagevec_add(lru_pvec, page))
- __pagevec_lru_add(lru_pvec);
- *cached_page = NULL;
- }
- }
- return page;
-}
-
-/*
* The logic we want is
*
* if suid or (sgid and xgrp)
@@ -1894,6 +1842,33 @@ generic_file_direct_write(struct kiocb *
}
EXPORT_SYMBOL(generic_file_direct_write);
+/*
+ * Find or create a page at the given pagecache position. Return the locked
+ * page. This function is specifically for buffered writes.
+ */
+static struct page *__grab_cache_page(struct address_space *mapping,
+ pgoff_t index)
+{
+ int status;
+ struct page *page;
+repeat:
+ page = find_lock_page(mapping, index);
+ if (likely(page))
+ return page;
+
+ page = page_cache_alloc(mapping);
+ if (!page)
+ return NULL;
+ status = add_to_page_cache_lru(page, mapping, index, GFP_KERNEL);
+ if (unlikely(status)) {
+ page_cache_release(page);
+ if (status == -EEXIST)
+ goto repeat;
+ return NULL;
+ }
+ return page;
+}
+
ssize_t
generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
unsigned long nr_segs, loff_t pos, loff_t *ppos,
@@ -1904,15 +1879,10 @@ generic_file_buffered_write(struct kiocb
const struct address_space_operations *a_ops = mapping->a_ops;
struct inode *inode = mapping->host;
long status = 0;
- struct page *page;
- struct page *cached_page = NULL;
- struct pagevec lru_pvec;
const struct iovec *cur_iov = iov; /* current iovec */
size_t iov_offset = 0; /* offset in the current iovec */
char __user *buf;
- pagevec_init(&lru_pvec, 0);
-
/*
* handle partial DIO write. Adjust cur_iov if needed.
*/
@@ -1924,6 +1894,7 @@ generic_file_buffered_write(struct kiocb
}
do {
+ struct page *page;
pgoff_t index; /* Pagecache index for current page */
unsigned long offset; /* Offset into pagecache page */
unsigned long maxlen; /* Bytes remaining in current iovec */
@@ -1950,7 +1921,8 @@ generic_file_buffered_write(struct kiocb
fault_in_pages_readable(buf, maxlen);
#endif
- page = __grab_cache_page(mapping,index,&cached_page,&lru_pvec);
+
+ page = __grab_cache_page(mapping, index);
if (!page) {
status = -ENOMEM;
break;
@@ -2018,9 +1990,6 @@ fs_write_aop_error:
} while (count);
*ppos = pos;
- if (cached_page)
- page_cache_release(cached_page);
-
/*
* For now, when the user asks for O_SYNC, we'll actually give O_DSYNC
*/
@@ -2040,7 +2009,6 @@ fs_write_aop_error:
if (unlikely(file->f_flags & O_DIRECT) && written)
status = filemap_write_and_wait(mapping);
- pagevec_lru_add(&lru_pvec);
return written ? written : status;
}
EXPORT_SYMBOL(generic_file_buffered_write);
Index: linux-2.6/fs/mpage.c
===================================================================
--- linux-2.6.orig/fs/mpage.c
+++ linux-2.6/fs/mpage.c
@@ -387,31 +387,25 @@ mpage_readpages(struct address_space *ma
struct bio *bio = NULL;
unsigned page_idx;
sector_t last_block_in_bio = 0;
- struct pagevec lru_pvec;
struct buffer_head map_bh;
unsigned long first_logical_block = 0;
clear_buffer_mapped(&map_bh);
- pagevec_init(&lru_pvec, 0);
for (page_idx = 0; page_idx < nr_pages; page_idx++) {
struct page *page = list_entry(pages->prev, struct page, lru);
prefetchw(&page->flags);
list_del(&page->lru);
- if (!add_to_page_cache(page, mapping,
+ if (!add_to_page_cache_lru(page, mapping,
page->index, GFP_KERNEL)) {
bio = do_mpage_readpage(bio, page,
nr_pages - page_idx,
&last_block_in_bio, &map_bh,
&first_logical_block,
get_block);
- if (!pagevec_add(&lru_pvec, page))
- __pagevec_lru_add(&lru_pvec);
- } else {
- page_cache_release(page);
}
+ page_cache_release(page);
}
- pagevec_lru_add(&lru_pvec);
BUG_ON(!list_empty(pages));
if (bio)
mpage_bio_submit(READ, bio);
Index: linux-2.6/mm/readahead.c
===================================================================
--- linux-2.6.orig/mm/readahead.c
+++ linux-2.6/mm/readahead.c
@@ -133,28 +133,25 @@ int read_cache_pages(struct address_spac
int (*filler)(void *, struct page *), void *data)
{
struct page *page;
- struct pagevec lru_pvec;
int ret = 0;
- pagevec_init(&lru_pvec, 0);
-
while (!list_empty(pages)) {
page = list_to_page(pages);
list_del(&page->lru);
- if (add_to_page_cache(page, mapping, page->index, GFP_KERNEL)) {
+ if (add_to_page_cache_lru(page, mapping,
+ page->index, GFP_KERNEL)) {
page_cache_release(page);
continue;
}
+ page_cache_release(page);
+
ret = filler(data, page);
- if (!pagevec_add(&lru_pvec, page))
- __pagevec_lru_add(&lru_pvec);
- if (ret) {
+ if (unlikely(ret)) {
put_pages_list(pages);
break;
}
task_io_account_read(PAGE_CACHE_SIZE);
}
- pagevec_lru_add(&lru_pvec);
return ret;
}
@@ -164,7 +161,6 @@ static int read_pages(struct address_spa
struct list_head *pages, unsigned nr_pages)
{
unsigned page_idx;
- struct pagevec lru_pvec;
int ret;
if (mapping->a_ops->readpages) {
@@ -174,19 +170,15 @@ static int read_pages(struct address_spa
goto out;
}
- pagevec_init(&lru_pvec, 0);
for (page_idx = 0; page_idx < nr_pages; page_idx++) {
struct page *page = list_to_page(pages);
list_del(&page->lru);
- if (!add_to_page_cache(page, mapping,
+ if (!add_to_page_cache_lru(page, mapping,
page->index, GFP_KERNEL)) {
mapping->a_ops->readpage(filp, page);
- if (!pagevec_add(&lru_pvec, page))
- __pagevec_lru_add(&lru_pvec);
- } else
- page_cache_release(page);
+ }
+ page_cache_release(page);
}
- pagevec_lru_add(&lru_pvec);
ret = 0;
out:
return ret;
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 08/41] mm: write iovec cleanup
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (6 preceding siblings ...)
2007-05-14 6:06 ` [patch 07/41] mm: buffered write cleanup npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 09/41] mm: fix pagecache write deadlocks npiggin
` (33 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management
[-- Attachment #1: mm-write-iov-cleanup.patch --]
[-- Type: text/plain, Size: 8606 bytes --]
Hide some of the open-coded nr_segs tests into the iovec helpers. This is
all to simplify generic_file_buffered_write, because that gets more complex
in the next patch.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
mm/filemap.c | 36 +++++--------------
mm/filemap.h | 104 +++++++++++++++++++++++++++----------------------------
mm/filemap_xip.c | 17 +++-----
3 files changed, 69 insertions(+), 88 deletions(-)
Index: linux-2.6/mm/filemap.h
===================================================================
--- linux-2.6.orig/mm/filemap.h
+++ linux-2.6/mm/filemap.h
@@ -22,82 +22,82 @@ __filemap_copy_from_user_iovec_inatomic(
/*
* Copy as much as we can into the page and return the number of bytes which
- * were sucessfully copied. If a fault is encountered then clear the page
- * out to (offset+bytes) and return the number of bytes which were copied.
- *
- * NOTE: For this to work reliably we really want copy_from_user_inatomic_nocache
- * to *NOT* zero any tail of the buffer that it failed to copy. If it does,
- * and if the following non-atomic copy succeeds, then there is a small window
- * where the target page contains neither the data before the write, nor the
- * data after the write (it contains zero). A read at this time will see
- * data that is inconsistent with any ordering of the read and the write.
- * (This has been detected in practice).
+ * were sucessfully copied. If a fault is encountered then return the number of
+ * bytes which were copied.
*/
static inline size_t
-filemap_copy_from_user(struct page *page, unsigned long offset,
- const char __user *buf, unsigned bytes)
+filemap_copy_from_user_atomic(struct page *page, unsigned long offset,
+ const struct iovec *iov, unsigned long nr_segs,
+ size_t base, size_t bytes)
{
char *kaddr;
- int left;
+ size_t copied;
kaddr = kmap_atomic(page, KM_USER0);
- left = __copy_from_user_inatomic_nocache(kaddr + offset, buf, bytes);
+ if (likely(nr_segs == 1)) {
+ int left;
+ char __user *buf = iov->iov_base + base;
+ left = __copy_from_user_inatomic_nocache(kaddr + offset,
+ buf, bytes);
+ copied = bytes - left;
+ } else {
+ copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset,
+ iov, base, bytes);
+ }
kunmap_atomic(kaddr, KM_USER0);
- if (left != 0) {
- /* Do it the slow way */
- kaddr = kmap(page);
- left = __copy_from_user_nocache(kaddr + offset, buf, bytes);
- kunmap(page);
- }
- return bytes - left;
+ return copied;
}
/*
- * This has the same sideeffects and return value as filemap_copy_from_user().
- * The difference is that on a fault we need to memset the remainder of the
- * page (out to offset+bytes), to emulate filemap_copy_from_user()'s
- * single-segment behaviour.
+ * This has the same sideeffects and return value as
+ * filemap_copy_from_user_atomic().
+ * The difference is that it attempts to resolve faults.
*/
static inline size_t
-filemap_copy_from_user_iovec(struct page *page, unsigned long offset,
- const struct iovec *iov, size_t base, size_t bytes)
+filemap_copy_from_user(struct page *page, unsigned long offset,
+ const struct iovec *iov, unsigned long nr_segs,
+ size_t base, size_t bytes)
{
char *kaddr;
size_t copied;
- kaddr = kmap_atomic(page, KM_USER0);
- copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset, iov,
- base, bytes);
- kunmap_atomic(kaddr, KM_USER0);
- if (copied != bytes) {
- kaddr = kmap(page);
- copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset, iov,
- base, bytes);
- if (bytes - copied)
- memset(kaddr + offset + copied, 0, bytes - copied);
- kunmap(page);
+ kaddr = kmap(page);
+ if (likely(nr_segs == 1)) {
+ int left;
+ char __user *buf = iov->iov_base + base;
+ left = __copy_from_user_nocache(kaddr + offset, buf, bytes);
+ copied = bytes - left;
+ } else {
+ copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset,
+ iov, base, bytes);
}
+ kunmap(page);
return copied;
}
static inline void
-filemap_set_next_iovec(const struct iovec **iovp, size_t *basep, size_t bytes)
+filemap_set_next_iovec(const struct iovec **iovp, unsigned long nr_segs,
+ size_t *basep, size_t bytes)
{
- const struct iovec *iov = *iovp;
- size_t base = *basep;
-
- while (bytes) {
- int copy = min(bytes, iov->iov_len - base);
-
- bytes -= copy;
- base += copy;
- if (iov->iov_len == base) {
- iov++;
- base = 0;
+ if (likely(nr_segs == 1)) {
+ *basep += bytes;
+ } else {
+ const struct iovec *iov = *iovp;
+ size_t base = *basep;
+
+ while (bytes) {
+ int copy = min(bytes, iov->iov_len - base);
+
+ bytes -= copy;
+ base += copy;
+ if (iov->iov_len == base) {
+ iov++;
+ base = 0;
+ }
}
+ *iovp = iov;
+ *basep = base;
}
- *iovp = iov;
- *basep = base;
}
#endif
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1886,12 +1886,7 @@ generic_file_buffered_write(struct kiocb
/*
* handle partial DIO write. Adjust cur_iov if needed.
*/
- if (likely(nr_segs == 1))
- buf = iov->iov_base + written;
- else {
- filemap_set_next_iovec(&cur_iov, &iov_offset, written);
- buf = cur_iov->iov_base + iov_offset;
- }
+ filemap_set_next_iovec(&cur_iov, nr_segs, &iov_offset, written);
do {
struct page *page;
@@ -1901,6 +1896,7 @@ generic_file_buffered_write(struct kiocb
size_t bytes; /* Bytes to write to page */
size_t copied; /* Bytes copied from user */
+ buf = cur_iov->iov_base + iov_offset;
offset = (pos & (PAGE_CACHE_SIZE - 1));
index = pos >> PAGE_CACHE_SHIFT;
bytes = PAGE_CACHE_SIZE - offset;
@@ -1932,13 +1928,10 @@ generic_file_buffered_write(struct kiocb
if (unlikely(status))
goto fs_write_aop_error;
- if (likely(nr_segs == 1))
- copied = filemap_copy_from_user(page, offset,
- buf, bytes);
- else
- copied = filemap_copy_from_user_iovec(page, offset,
- cur_iov, iov_offset, bytes);
+ copied = filemap_copy_from_user(page, offset,
+ cur_iov, nr_segs, iov_offset, bytes);
flush_dcache_page(page);
+
status = a_ops->commit_write(file, page, offset, offset+bytes);
if (unlikely(status < 0 || status == AOP_TRUNCATED_PAGE))
goto fs_write_aop_error;
@@ -1949,20 +1942,11 @@ generic_file_buffered_write(struct kiocb
if (unlikely(status > 0)) /* filesystem did partial write */
copied = status;
- if (likely(copied > 0)) {
- written += copied;
- count -= copied;
- pos += copied;
- buf += copied;
- if (unlikely(nr_segs > 1)) {
- filemap_set_next_iovec(&cur_iov,
- &iov_offset, copied);
- if (count)
- buf = cur_iov->iov_base + iov_offset;
- } else {
- iov_offset += copied;
- }
- }
+ written += copied;
+ count -= copied;
+ pos += copied;
+ filemap_set_next_iovec(&cur_iov, nr_segs, &iov_offset, copied);
+
unlock_page(page);
mark_page_accessed(page);
page_cache_release(page);
Index: linux-2.6/mm/filemap_xip.c
===================================================================
--- linux-2.6.orig/mm/filemap_xip.c
+++ linux-2.6/mm/filemap_xip.c
@@ -14,7 +14,6 @@
#include <linux/uio.h>
#include <linux/rmap.h>
#include <asm/tlbflush.h>
-#include "filemap.h"
/*
* We do use our own empty page to avoid interference with other users
@@ -318,6 +317,7 @@ __xip_file_write(struct file *filp, cons
unsigned long index;
unsigned long offset;
size_t copied;
+ char *kaddr;
offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
index = pos >> PAGE_CACHE_SHIFT;
@@ -325,14 +325,6 @@ __xip_file_write(struct file *filp, cons
if (bytes > count)
bytes = count;
- /*
- * Bring in the user page that we will copy from _first_.
- * Otherwise there's a nasty deadlock on copying from the
- * same page as we're writing to, without it being marked
- * up-to-date.
- */
- fault_in_pages_readable(buf, bytes);
-
page = a_ops->get_xip_page(mapping,
index*(PAGE_SIZE/512), 0);
if (IS_ERR(page) && (PTR_ERR(page) == -ENODATA)) {
@@ -349,8 +341,13 @@ __xip_file_write(struct file *filp, cons
break;
}
- copied = filemap_copy_from_user(page, offset, buf, bytes);
+ fault_in_pages_readable(buf, bytes);
+ kaddr = kmap_atomic(page, KM_USER0);
+ copied = bytes -
+ __copy_from_user_inatomic_nocache(kaddr, buf, bytes);
+ kunmap_atomic(kaddr, KM_USER0);
flush_dcache_page(page);
+
if (likely(copied > 0)) {
status = copied;
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 09/41] mm: fix pagecache write deadlocks
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (7 preceding siblings ...)
2007-05-14 6:06 ` [patch 08/41] mm: write iovec cleanup npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 10/41] mm: buffered write iterator npiggin
` (32 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management
[-- Attachment #1: mm-pagecache-write-deadlocks.patch --]
[-- Type: text/plain, Size: 9014 bytes --]
Modify the core write() code so that it won't take a pagefault while holding a
lock on the pagecache page. There are a number of different deadlocks possible
if we try to do such a thing:
1. generic_buffered_write
2. lock_page
3. prepare_write
4. unlock_page+vmtruncate
5. copy_from_user
6. mmap_sem(r)
7. handle_mm_fault
8. lock_page (filemap_nopage)
9. commit_write
10. unlock_page
a. sys_munmap / sys_mlock / others
b. mmap_sem(w)
c. make_pages_present
d. get_user_pages
e. handle_mm_fault
f. lock_page (filemap_nopage)
2,8 - recursive deadlock if page is same
2,8;2,8 - ABBA deadlock is page is different
2,6;b,f - ABBA deadlock if page is same
The solution is as follows:
1. If we find the destination page is uptodate, continue as normal, but use
atomic usercopies which do not take pagefaults and do not zero the uncopied
tail of the destination. The destination is already uptodate, so we can
commit_write the full length even if there was a partial copy: it does not
matter that the tail was not modified, because if it is dirtied and written
back to disk it will not cause any problems (uptodate *means* that the
destination page is as new or newer than the copy on disk).
1a. The above requires that fault_in_pages_readable correctly returns access
information, because atomic usercopies cannot distinguish between
non-present pages in a readable mapping, from lack of a readable mapping.
2. If we find the destination page is non uptodate, unlock it (this could be
made slightly more optimal), then allocate a temporary page to copy the
source data into. Relock the destination page and continue with the copy.
However, instead of a usercopy (which might take a fault), copy the data
from the pinned temporary page via the kernel address space.
(also, rename maxlen to seglen, because it was confusing)
This increases the CPU/memory copy cost by almost 50% on the affected
workloads. That will be solved by introducing a new set of pagecache write
aops in a subsequent patch.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
include/linux/pagemap.h | 11 +++-
mm/filemap.c | 114 ++++++++++++++++++++++++++++++++++++++++--------
2 files changed, 104 insertions(+), 21 deletions(-)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1889,11 +1889,12 @@ generic_file_buffered_write(struct kiocb
filemap_set_next_iovec(&cur_iov, nr_segs, &iov_offset, written);
do {
+ struct page *src_page;
struct page *page;
pgoff_t index; /* Pagecache index for current page */
unsigned long offset; /* Offset into pagecache page */
- unsigned long maxlen; /* Bytes remaining in current iovec */
- size_t bytes; /* Bytes to write to page */
+ unsigned long seglen; /* Bytes remaining in current iovec */
+ unsigned long bytes; /* Bytes to write to page */
size_t copied; /* Bytes copied from user */
buf = cur_iov->iov_base + iov_offset;
@@ -1903,20 +1904,30 @@ generic_file_buffered_write(struct kiocb
if (bytes > count)
bytes = count;
- maxlen = cur_iov->iov_len - iov_offset;
- if (maxlen > bytes)
- maxlen = bytes;
+ /*
+ * a non-NULL src_page indicates that we're doing the
+ * copy via get_user_pages and kmap.
+ */
+ src_page = NULL;
+
+ seglen = cur_iov->iov_len - iov_offset;
+ if (seglen > bytes)
+ seglen = bytes;
-#ifndef CONFIG_DEBUG_VM
/*
* Bring in the user page that we will copy from _first_.
* Otherwise there's a nasty deadlock on copying from the
* same page as we're writing to, without it being marked
* up-to-date.
+ *
+ * Not only is this an optimisation, but it is also required
+ * to check that the address is actually valid, when atomic
+ * usercopies are used, below.
*/
- fault_in_pages_readable(buf, maxlen);
-#endif
-
+ if (unlikely(fault_in_pages_readable(buf, seglen))) {
+ status = -EFAULT;
+ break;
+ }
page = __grab_cache_page(mapping, index);
if (!page) {
@@ -1924,32 +1935,104 @@ generic_file_buffered_write(struct kiocb
break;
}
+ /*
+ * non-uptodate pages cannot cope with short copies, and we
+ * cannot take a pagefault with the destination page locked.
+ * So pin the source page to copy it.
+ */
+ if (!PageUptodate(page)) {
+ unlock_page(page);
+
+ src_page = alloc_page(GFP_KERNEL);
+ if (!src_page) {
+ page_cache_release(page);
+ status = -ENOMEM;
+ break;
+ }
+
+ /*
+ * Cannot get_user_pages with a page locked for the
+ * same reason as we can't take a page fault with a
+ * page locked (as explained below).
+ */
+ copied = filemap_copy_from_user(src_page, offset,
+ cur_iov, nr_segs, iov_offset, bytes);
+ if (unlikely(copied == 0)) {
+ status = -EFAULT;
+ page_cache_release(page);
+ page_cache_release(src_page);
+ break;
+ }
+ bytes = copied;
+
+ lock_page(page);
+ /*
+ * Can't handle the page going uptodate here, because
+ * that means we would use non-atomic usercopies, which
+ * zero out the tail of the page, which can cause
+ * zeroes to become transiently visible. We could just
+ * use a non-zeroing copy, but the APIs aren't too
+ * consistent.
+ */
+ if (unlikely(!page->mapping || PageUptodate(page))) {
+ unlock_page(page);
+ page_cache_release(page);
+ page_cache_release(src_page);
+ continue;
+ }
+
+ }
+
status = a_ops->prepare_write(file, page, offset, offset+bytes);
if (unlikely(status))
goto fs_write_aop_error;
- copied = filemap_copy_from_user(page, offset,
+ if (!src_page) {
+ /*
+ * Must not enter the pagefault handler here, because
+ * we hold the page lock, so we might recursively
+ * deadlock on the same lock, or get an ABBA deadlock
+ * against a different lock, or against the mmap_sem
+ * (which nests outside the page lock). So increment
+ * preempt count, and use _atomic usercopies.
+ *
+ * The page is uptodate so we are OK to encounter a
+ * short copy: if unmodified parts of the page are
+ * marked dirty and written out to disk, it doesn't
+ * really matter.
+ */
+ pagefault_disable();
+ copied = filemap_copy_from_user_atomic(page, offset,
cur_iov, nr_segs, iov_offset, bytes);
+ pagefault_enable();
+ } else {
+ void *src, *dst;
+ src = kmap_atomic(src_page, KM_USER0);
+ dst = kmap_atomic(page, KM_USER1);
+ memcpy(dst + offset, src + offset, bytes);
+ kunmap_atomic(dst, KM_USER1);
+ kunmap_atomic(src, KM_USER0);
+ copied = bytes;
+ }
flush_dcache_page(page);
status = a_ops->commit_write(file, page, offset, offset+bytes);
if (unlikely(status < 0 || status == AOP_TRUNCATED_PAGE))
goto fs_write_aop_error;
- if (unlikely(copied != bytes)) {
- status = -EFAULT;
- goto fs_write_aop_error;
- }
if (unlikely(status > 0)) /* filesystem did partial write */
- copied = status;
+ copied = min_t(size_t, copied, status);
+
+ unlock_page(page);
+ mark_page_accessed(page);
+ page_cache_release(page);
+ if (src_page)
+ page_cache_release(src_page);
written += copied;
count -= copied;
pos += copied;
filemap_set_next_iovec(&cur_iov, nr_segs, &iov_offset, copied);
- unlock_page(page);
- mark_page_accessed(page);
- page_cache_release(page);
balance_dirty_pages_ratelimited(mapping);
cond_resched();
continue;
@@ -1958,6 +2041,8 @@ fs_write_aop_error:
if (status != AOP_TRUNCATED_PAGE)
unlock_page(page);
page_cache_release(page);
+ if (src_page)
+ page_cache_release(src_page);
/*
* prepare_write() may have instantiated a few blocks
@@ -1970,7 +2055,6 @@ fs_write_aop_error:
continue;
else
break;
-
} while (count);
*ppos = pos;
Index: linux-2.6/include/linux/pagemap.h
===================================================================
--- linux-2.6.orig/include/linux/pagemap.h
+++ linux-2.6/include/linux/pagemap.h
@@ -218,6 +218,9 @@ static inline int fault_in_pages_writeab
{
int ret;
+ if (unlikely(size == 0))
+ return 0;
+
/*
* Writing zeroes into userspace here is OK, because we know that if
* the zero gets there, we'll be overwriting it.
@@ -237,19 +240,23 @@ static inline int fault_in_pages_writeab
return ret;
}
-static inline void fault_in_pages_readable(const char __user *uaddr, int size)
+static inline int fault_in_pages_readable(const char __user *uaddr, int size)
{
volatile char c;
int ret;
+ if (unlikely(size == 0))
+ return 0;
+
ret = __get_user(c, uaddr);
if (ret == 0) {
const char __user *end = uaddr + size - 1;
if (((unsigned long)uaddr & PAGE_MASK) !=
((unsigned long)end & PAGE_MASK))
- __get_user(c, end);
+ ret = __get_user(c, end);
}
+ return ret;
}
#endif /* _LINUX_PAGEMAP_H */
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 10/41] mm: buffered write iterator
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (8 preceding siblings ...)
2007-05-14 6:06 ` [patch 09/41] mm: fix pagecache write deadlocks npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 11/41] fs: fix data-loss on error npiggin
` (31 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management
[-- Attachment #1: fs-buffered-write-iterator.patch --]
[-- Type: text/plain, Size: 11006 bytes --]
Add an iterator data structure to operate over an iovec. Add usercopy
operators needed by generic_file_buffered_write, and convert that function
over.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
include/linux/fs.h | 33 ++++++++++++
mm/filemap.c | 144 +++++++++++++++++++++++++++++++++++++++++++----------
mm/filemap.h | 103 -------------------------------------
3 files changed, 150 insertions(+), 130 deletions(-)
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -404,6 +404,39 @@ struct page;
struct address_space;
struct writeback_control;
+struct iov_iter {
+ const struct iovec *iov;
+ unsigned long nr_segs;
+ size_t iov_offset;
+ size_t count;
+};
+
+size_t iov_iter_copy_from_user_atomic(struct page *page,
+ struct iov_iter *i, unsigned long offset, size_t bytes);
+size_t iov_iter_copy_from_user(struct page *page,
+ struct iov_iter *i, unsigned long offset, size_t bytes);
+void iov_iter_advance(struct iov_iter *i, size_t bytes);
+int iov_iter_fault_in_readable(struct iov_iter *i);
+size_t iov_iter_single_seg_count(struct iov_iter *i);
+
+static inline void iov_iter_init(struct iov_iter *i,
+ const struct iovec *iov, unsigned long nr_segs,
+ size_t count, size_t written)
+{
+ i->iov = iov;
+ i->nr_segs = nr_segs;
+ i->iov_offset = 0;
+ i->count = count + written;
+
+ iov_iter_advance(i, written);
+}
+
+static inline size_t iov_iter_count(struct iov_iter *i)
+{
+ return i->count;
+}
+
+
struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
int (*readpage)(struct file *, struct page *);
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -30,7 +30,7 @@
#include <linux/security.h>
#include <linux/syscalls.h>
#include <linux/cpuset.h>
-#include "filemap.h"
+#include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
#include "internal.h"
/*
@@ -1696,8 +1696,7 @@ int remove_suid(struct dentry *dentry)
}
EXPORT_SYMBOL(remove_suid);
-size_t
-__filemap_copy_from_user_iovec_inatomic(char *vaddr,
+static size_t __iovec_copy_from_user_inatomic(char *vaddr,
const struct iovec *iov, size_t base, size_t bytes)
{
size_t copied = 0, left = 0;
@@ -1720,6 +1719,110 @@ __filemap_copy_from_user_iovec_inatomic(
}
/*
+ * Copy as much as we can into the page and return the number of bytes which
+ * were sucessfully copied. If a fault is encountered then return the number of
+ * bytes which were copied.
+ */
+size_t iov_iter_copy_from_user_atomic(struct page *page,
+ struct iov_iter *i, unsigned long offset, size_t bytes)
+{
+ char *kaddr;
+ size_t copied;
+
+ BUG_ON(!in_atomic());
+ kaddr = kmap_atomic(page, KM_USER0);
+ if (likely(i->nr_segs == 1)) {
+ int left;
+ char __user *buf = i->iov->iov_base + i->iov_offset;
+ left = __copy_from_user_inatomic_nocache(kaddr + offset,
+ buf, bytes);
+ copied = bytes - left;
+ } else {
+ copied = __iovec_copy_from_user_inatomic(kaddr + offset,
+ i->iov, i->iov_offset, bytes);
+ }
+ kunmap_atomic(kaddr, KM_USER0);
+
+ return copied;
+}
+
+/*
+ * This has the same sideeffects and return value as
+ * iov_iter_copy_from_user_atomic().
+ * The difference is that it attempts to resolve faults.
+ * Page must not be locked.
+ */
+size_t iov_iter_copy_from_user(struct page *page,
+ struct iov_iter *i, unsigned long offset, size_t bytes)
+{
+ char *kaddr;
+ size_t copied;
+
+ kaddr = kmap(page);
+ if (likely(i->nr_segs == 1)) {
+ int left;
+ char __user *buf = i->iov->iov_base + i->iov_offset;
+ left = __copy_from_user_nocache(kaddr + offset, buf, bytes);
+ copied = bytes - left;
+ } else {
+ copied = __iovec_copy_from_user_inatomic(kaddr + offset,
+ i->iov, i->iov_offset, bytes);
+ }
+ kunmap(page);
+ return copied;
+}
+
+static void __iov_iter_advance_iov(struct iov_iter *i, size_t bytes)
+{
+ if (likely(i->nr_segs == 1)) {
+ i->iov_offset += bytes;
+ } else {
+ const struct iovec *iov = i->iov;
+ size_t base = i->iov_offset;
+
+ while (bytes) {
+ int copy = min(bytes, iov->iov_len - base);
+
+ bytes -= copy;
+ base += copy;
+ if (iov->iov_len == base) {
+ iov++;
+ base = 0;
+ }
+ }
+ i->iov = iov;
+ i->iov_offset = base;
+ }
+}
+
+void iov_iter_advance(struct iov_iter *i, size_t bytes)
+{
+ BUG_ON(i->count < bytes);
+
+ __iov_iter_advance_iov(i, bytes);
+ i->count -= bytes;
+}
+
+int iov_iter_fault_in_readable(struct iov_iter *i)
+{
+ size_t seglen = min(i->iov->iov_len - i->iov_offset, i->count);
+ char __user *buf = i->iov->iov_base + i->iov_offset;
+ return fault_in_pages_readable(buf, seglen);
+}
+
+/*
+ * Return the count of just the current iov_iter segment.
+ */
+size_t iov_iter_single_seg_count(struct iov_iter *i)
+{
+ const struct iovec *iov = i->iov;
+ if (i->nr_segs == 1)
+ return i->count;
+ else
+ return min(i->count, iov->iov_len - i->iov_offset);
+}
+
+/*
* Performs necessary checks before doing a write
*
* Can adjust writing position or amount of bytes to write.
@@ -1879,30 +1982,22 @@ generic_file_buffered_write(struct kiocb
const struct address_space_operations *a_ops = mapping->a_ops;
struct inode *inode = mapping->host;
long status = 0;
- const struct iovec *cur_iov = iov; /* current iovec */
- size_t iov_offset = 0; /* offset in the current iovec */
- char __user *buf;
+ struct iov_iter i;
- /*
- * handle partial DIO write. Adjust cur_iov if needed.
- */
- filemap_set_next_iovec(&cur_iov, nr_segs, &iov_offset, written);
+ iov_iter_init(&i, iov, nr_segs, count, written);
do {
struct page *src_page;
struct page *page;
pgoff_t index; /* Pagecache index for current page */
unsigned long offset; /* Offset into pagecache page */
- unsigned long seglen; /* Bytes remaining in current iovec */
unsigned long bytes; /* Bytes to write to page */
size_t copied; /* Bytes copied from user */
- buf = cur_iov->iov_base + iov_offset;
offset = (pos & (PAGE_CACHE_SIZE - 1));
index = pos >> PAGE_CACHE_SHIFT;
- bytes = PAGE_CACHE_SIZE - offset;
- if (bytes > count)
- bytes = count;
+ bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset,
+ iov_iter_count(&i));
/*
* a non-NULL src_page indicates that we're doing the
@@ -1910,10 +2005,6 @@ generic_file_buffered_write(struct kiocb
*/
src_page = NULL;
- seglen = cur_iov->iov_len - iov_offset;
- if (seglen > bytes)
- seglen = bytes;
-
/*
* Bring in the user page that we will copy from _first_.
* Otherwise there's a nasty deadlock on copying from the
@@ -1924,7 +2015,7 @@ generic_file_buffered_write(struct kiocb
* to check that the address is actually valid, when atomic
* usercopies are used, below.
*/
- if (unlikely(fault_in_pages_readable(buf, seglen))) {
+ if (unlikely(iov_iter_fault_in_readable(&i))) {
status = -EFAULT;
break;
}
@@ -1955,8 +2046,8 @@ generic_file_buffered_write(struct kiocb
* same reason as we can't take a page fault with a
* page locked (as explained below).
*/
- copied = filemap_copy_from_user(src_page, offset,
- cur_iov, nr_segs, iov_offset, bytes);
+ copied = iov_iter_copy_from_user(src_page, &i,
+ offset, bytes);
if (unlikely(copied == 0)) {
status = -EFAULT;
page_cache_release(page);
@@ -2002,8 +2093,8 @@ generic_file_buffered_write(struct kiocb
* really matter.
*/
pagefault_disable();
- copied = filemap_copy_from_user_atomic(page, offset,
- cur_iov, nr_segs, iov_offset, bytes);
+ copied = iov_iter_copy_from_user_atomic(page, &i,
+ offset, bytes);
pagefault_enable();
} else {
void *src, *dst;
@@ -2028,10 +2119,9 @@ generic_file_buffered_write(struct kiocb
if (src_page)
page_cache_release(src_page);
+ iov_iter_advance(&i, copied);
written += copied;
- count -= copied;
pos += copied;
- filemap_set_next_iovec(&cur_iov, nr_segs, &iov_offset, copied);
balance_dirty_pages_ratelimited(mapping);
cond_resched();
@@ -2055,7 +2145,7 @@ fs_write_aop_error:
continue;
else
break;
- } while (count);
+ } while (iov_iter_count(&i));
*ppos = pos;
/*
Index: linux-2.6/mm/filemap.h
===================================================================
--- linux-2.6.orig/mm/filemap.h
+++ /dev/null
@@ -1,103 +0,0 @@
-/*
- * linux/mm/filemap.h
- *
- * Copyright (C) 1994-1999 Linus Torvalds
- */
-
-#ifndef __FILEMAP_H
-#define __FILEMAP_H
-
-#include <linux/types.h>
-#include <linux/fs.h>
-#include <linux/mm.h>
-#include <linux/highmem.h>
-#include <linux/uio.h>
-#include <linux/uaccess.h>
-
-size_t
-__filemap_copy_from_user_iovec_inatomic(char *vaddr,
- const struct iovec *iov,
- size_t base,
- size_t bytes);
-
-/*
- * Copy as much as we can into the page and return the number of bytes which
- * were sucessfully copied. If a fault is encountered then return the number of
- * bytes which were copied.
- */
-static inline size_t
-filemap_copy_from_user_atomic(struct page *page, unsigned long offset,
- const struct iovec *iov, unsigned long nr_segs,
- size_t base, size_t bytes)
-{
- char *kaddr;
- size_t copied;
-
- kaddr = kmap_atomic(page, KM_USER0);
- if (likely(nr_segs == 1)) {
- int left;
- char __user *buf = iov->iov_base + base;
- left = __copy_from_user_inatomic_nocache(kaddr + offset,
- buf, bytes);
- copied = bytes - left;
- } else {
- copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset,
- iov, base, bytes);
- }
- kunmap_atomic(kaddr, KM_USER0);
-
- return copied;
-}
-
-/*
- * This has the same sideeffects and return value as
- * filemap_copy_from_user_atomic().
- * The difference is that it attempts to resolve faults.
- */
-static inline size_t
-filemap_copy_from_user(struct page *page, unsigned long offset,
- const struct iovec *iov, unsigned long nr_segs,
- size_t base, size_t bytes)
-{
- char *kaddr;
- size_t copied;
-
- kaddr = kmap(page);
- if (likely(nr_segs == 1)) {
- int left;
- char __user *buf = iov->iov_base + base;
- left = __copy_from_user_nocache(kaddr + offset, buf, bytes);
- copied = bytes - left;
- } else {
- copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset,
- iov, base, bytes);
- }
- kunmap(page);
- return copied;
-}
-
-static inline void
-filemap_set_next_iovec(const struct iovec **iovp, unsigned long nr_segs,
- size_t *basep, size_t bytes)
-{
- if (likely(nr_segs == 1)) {
- *basep += bytes;
- } else {
- const struct iovec *iov = *iovp;
- size_t base = *basep;
-
- while (bytes) {
- int copy = min(bytes, iov->iov_len - base);
-
- bytes -= copy;
- base += copy;
- if (iov->iov_len == base) {
- iov++;
- base = 0;
- }
- }
- *iovp = iov;
- *basep = base;
- }
-}
-#endif
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 11/41] fs: fix data-loss on error
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (9 preceding siblings ...)
2007-05-14 6:06 ` [patch 10/41] mm: buffered write iterator npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 12/41] fs: introduce write_begin, write_end, and perform_write aops npiggin
` (30 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management
[-- Attachment #1: fs-dataloss-stop.patch --]
[-- Type: text/plain, Size: 1116 bytes --]
New buffers against uptodate pages are simply be marked uptodate, while the
buffer_new bit remains set. This causes error-case code to zero out parts
of those buffers because it thinks they contain stale data: wrong, they
are actually uptodate so this is a data loss situation.
Fix this by actually clearning buffer_new and marking the buffer dirty. It
makes sense to always clear buffer_new before setting a buffer uptodate.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/buffer.c | 2 ++
1 file changed, 2 insertions(+)
Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -1793,7 +1793,9 @@ static int __block_prepare_write(struct
unmap_underlying_metadata(bh->b_bdev,
bh->b_blocknr);
if (PageUptodate(page)) {
+ clear_buffer_new(bh);
set_buffer_uptodate(bh);
+ mark_buffer_dirty(bh);
continue;
}
if (block_end > to || block_start < from) {
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 12/41] fs: introduce write_begin, write_end, and perform_write aops
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (10 preceding siblings ...)
2007-05-14 6:06 ` [patch 11/41] fs: fix data-loss on error npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 13/41] mm: restore KERNEL_DS optimisations npiggin
` (29 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management, Mark Fasheh
[-- Attachment #1: fs-new-write-aops.patch --]
[-- Type: text/plain, Size: 35854 bytes --]
These are intended to replace prepare_write and commit_write with more
flexible alternatives that are also able to avoid the buffered write
deadlock problems efficiently (which prepare_write is unable to do).
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
API design contributions, code review and fixes.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Documentation/filesystems/Locking | 9 -
Documentation/filesystems/vfs.txt | 48 +++++++
drivers/block/loop.c | 77 ++++--------
fs/buffer.c | 203 +++++++++++++++++++++++++++------
fs/libfs.c | 44 +++++++
fs/namei.c | 47 +------
fs/splice.c | 70 +----------
include/linux/buffer_head.h | 10 +
include/linux/fs.h | 28 ++++
include/linux/pagemap.h | 2
mm/filemap.c | 233 ++++++++++++++++++++++++++++++++++----
11 files changed, 561 insertions(+), 210 deletions(-)
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -397,6 +397,8 @@ enum positive_aop_returns {
AOP_TRUNCATED_PAGE = 0x80001,
};
+#define AOP_FLAG_UNINTERRUPTIBLE 0x0001 /* will not do a short write */
+
/*
* oh the beauties of C type declarations.
*/
@@ -457,6 +459,14 @@ struct address_space_operations {
*/
int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
+
+ int (*write_begin)(struct file *, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata);
+ int (*write_end)(struct file *, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata);
+
/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
sector_t (*bmap)(struct address_space *, sector_t);
void (*invalidatepage) (struct page *, unsigned long);
@@ -471,6 +481,18 @@ struct address_space_operations {
int (*launder_page) (struct page *);
};
+/*
+ * pagecache_write_begin/pagecache_write_end must be used by general code
+ * to write into the pagecache.
+ */
+int pagecache_write_begin(struct file *, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata);
+
+int pagecache_write_end(struct file *, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata);
+
struct backing_dev_info;
struct address_space {
struct inode *host; /* owner: inode, block_device */
@@ -1938,6 +1960,12 @@ extern int simple_prepare_write(struct f
unsigned offset, unsigned to);
extern int simple_commit_write(struct file *file, struct page *page,
unsigned offset, unsigned to);
+extern int simple_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata);
+extern int simple_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata);
extern struct dentry *simple_lookup(struct inode *, struct dentry *, struct nameidata *);
extern ssize_t generic_read_dir(struct file *, char __user *, size_t, loff_t *);
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1906,6 +1906,93 @@ inline int generic_write_checks(struct f
}
EXPORT_SYMBOL(generic_write_checks);
+int pagecache_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ const struct address_space_operations *aops = mapping->a_ops;
+
+ if (aops->write_begin) {
+ return aops->write_begin(file, mapping, pos, len, flags,
+ pagep, fsdata);
+ } else {
+ int ret;
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+ unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
+ struct inode *inode = mapping->host;
+ struct page *page;
+again:
+ page = __grab_cache_page(mapping, index);
+ *pagep = page;
+ if (!page)
+ return -ENOMEM;
+
+ if (flags & AOP_FLAG_UNINTERRUPTIBLE && !PageUptodate(page)) {
+ /*
+ * There is no way to resolve a short write situation
+ * for a !Uptodate page (except by double copying in
+ * the caller done by generic_perform_write_2copy).
+ *
+ * Instead, we have to bring it uptodate here.
+ */
+ ret = aops->readpage(file, page);
+ page_cache_release(page);
+ if (ret) {
+ if (ret == AOP_TRUNCATED_PAGE)
+ goto again;
+ return ret;
+ }
+ goto again;
+ }
+
+ ret = aops->prepare_write(file, page, offset, offset+len);
+ if (ret) {
+ if (ret != AOP_TRUNCATED_PAGE)
+ unlock_page(page);
+ page_cache_release(page);
+ if (pos + len > inode->i_size)
+ vmtruncate(inode, inode->i_size);
+ if (ret == AOP_TRUNCATED_PAGE)
+ goto again;
+ }
+ return ret;
+ }
+}
+EXPORT_SYMBOL(pagecache_write_begin);
+
+int pagecache_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ const struct address_space_operations *aops = mapping->a_ops;
+ int ret;
+
+ if (aops->write_begin) {
+ ret = aops->write_end(file, mapping, pos, len, copied,
+ page, fsdata);
+ } else {
+ unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
+ struct inode *inode = mapping->host;
+
+ flush_dcache_page(page);
+ ret = aops->commit_write(file, page, offset, offset+len);
+ unlock_page(page);
+ page_cache_release(page);
+ BUG_ON(ret == AOP_TRUNCATED_PAGE); /* can't deal with */
+
+ if (ret < 0) {
+ if (pos + len > inode->i_size)
+ vmtruncate(inode, inode->i_size);
+ } else if (ret > 0)
+ ret = min_t(size_t, copied, ret);
+ else
+ ret = copied;
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL(pagecache_write_end);
+
ssize_t
generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
unsigned long *nr_segs, loff_t pos, loff_t *ppos,
@@ -1949,8 +2036,7 @@ EXPORT_SYMBOL(generic_file_direct_write)
* Find or create a page at the given pagecache position. Return the locked
* page. This function is specifically for buffered writes.
*/
-static struct page *__grab_cache_page(struct address_space *mapping,
- pgoff_t index)
+struct page *__grab_cache_page(struct address_space *mapping, pgoff_t index)
{
int status;
struct page *page;
@@ -1971,20 +2057,16 @@ repeat:
}
return page;
}
+EXPORT_SYMBOL(__grab_cache_page);
-ssize_t
-generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
- unsigned long nr_segs, loff_t pos, loff_t *ppos,
- size_t count, ssize_t written)
+static ssize_t generic_perform_write_2copy(struct file *file,
+ struct iov_iter *i, loff_t pos)
{
- struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
const struct address_space_operations *a_ops = mapping->a_ops;
- struct inode *inode = mapping->host;
- long status = 0;
- struct iov_iter i;
-
- iov_iter_init(&i, iov, nr_segs, count, written);
+ struct inode *inode = mapping->host;
+ long status = 0;
+ ssize_t written = 0;
do {
struct page *src_page;
@@ -1997,7 +2079,7 @@ generic_file_buffered_write(struct kiocb
offset = (pos & (PAGE_CACHE_SIZE - 1));
index = pos >> PAGE_CACHE_SHIFT;
bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset,
- iov_iter_count(&i));
+ iov_iter_count(i));
/*
* a non-NULL src_page indicates that we're doing the
@@ -2015,7 +2097,7 @@ generic_file_buffered_write(struct kiocb
* to check that the address is actually valid, when atomic
* usercopies are used, below.
*/
- if (unlikely(iov_iter_fault_in_readable(&i))) {
+ if (unlikely(iov_iter_fault_in_readable(i))) {
status = -EFAULT;
break;
}
@@ -2046,7 +2128,7 @@ generic_file_buffered_write(struct kiocb
* same reason as we can't take a page fault with a
* page locked (as explained below).
*/
- copied = iov_iter_copy_from_user(src_page, &i,
+ copied = iov_iter_copy_from_user(src_page, i,
offset, bytes);
if (unlikely(copied == 0)) {
status = -EFAULT;
@@ -2071,7 +2153,6 @@ generic_file_buffered_write(struct kiocb
page_cache_release(src_page);
continue;
}
-
}
status = a_ops->prepare_write(file, page, offset, offset+bytes);
@@ -2093,7 +2174,7 @@ generic_file_buffered_write(struct kiocb
* really matter.
*/
pagefault_disable();
- copied = iov_iter_copy_from_user_atomic(page, &i,
+ copied = iov_iter_copy_from_user_atomic(page, i,
offset, bytes);
pagefault_enable();
} else {
@@ -2119,9 +2200,9 @@ generic_file_buffered_write(struct kiocb
if (src_page)
page_cache_release(src_page);
- iov_iter_advance(&i, copied);
- written += copied;
+ iov_iter_advance(i, copied);
pos += copied;
+ written += copied;
balance_dirty_pages_ratelimited(mapping);
cond_resched();
@@ -2145,13 +2226,117 @@ fs_write_aop_error:
continue;
else
break;
- } while (iov_iter_count(&i));
- *ppos = pos;
+ } while (iov_iter_count(i));
+
+ return written ? written : status;
+}
+
+static ssize_t generic_perform_write(struct file *file,
+ struct iov_iter *i, loff_t pos)
+{
+ struct address_space *mapping = file->f_mapping;
+ const struct address_space_operations *a_ops = mapping->a_ops;
+ long status = 0;
+ ssize_t written = 0;
+
+ do {
+ struct page *page;
+ pgoff_t index; /* Pagecache index for current page */
+ unsigned long offset; /* Offset into pagecache page */
+ unsigned long bytes; /* Bytes to write to page */
+ size_t copied; /* Bytes copied from user */
+ void *fsdata;
+
+ offset = (pos & (PAGE_CACHE_SIZE - 1));
+ index = pos >> PAGE_CACHE_SHIFT;
+ bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset,
+ iov_iter_count(i));
+
+again:
+
+ /*
+ * Bring in the user page that we will copy from _first_.
+ * Otherwise there's a nasty deadlock on copying from the
+ * same page as we're writing to, without it being marked
+ * up-to-date.
+ *
+ * Not only is this an optimisation, but it is also required
+ * to check that the address is actually valid, when atomic
+ * usercopies are used, below.
+ */
+ if (unlikely(iov_iter_fault_in_readable(i))) {
+ status = -EFAULT;
+ break;
+ }
+
+ status = a_ops->write_begin(file, mapping, pos, bytes, 0,
+ &page, &fsdata);
+ if (unlikely(status))
+ break;
+
+ pagefault_disable();
+ copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
+ pagefault_enable();
+ flush_dcache_page(page);
+
+ status = a_ops->write_end(file, mapping, pos, bytes, copied,
+ page, fsdata);
+ if (unlikely(status < 0))
+ break;
+ copied = status;
+
+ cond_resched();
+
+ if (unlikely(copied == 0)) {
+ /*
+ * If we were unable to copy any data at all, we must
+ * fall back to a single segment length write.
+ *
+ * If we didn't fallback here, we could livelock
+ * because not all segments in the iov can be copied at
+ * once without a pagefault.
+ */
+ bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset,
+ iov_iter_single_seg_count(i));
+ goto again;
+ }
+ iov_iter_advance(i, copied);
+ pos += copied;
+ written += copied;
+
+ balance_dirty_pages_ratelimited(mapping);
+
+ } while (iov_iter_count(i));
+
+ return written ? written : status;
+}
+
+ssize_t
+generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
+ unsigned long nr_segs, loff_t pos, loff_t *ppos,
+ size_t count, ssize_t written)
+{
+ struct file *file = iocb->ki_filp;
+ struct address_space *mapping = file->f_mapping;
+ const struct address_space_operations *a_ops = mapping->a_ops;
+ struct inode *inode = mapping->host;
+ ssize_t status;
+ struct iov_iter i;
+
+ iov_iter_init(&i, iov, nr_segs, count, written);
+ if (a_ops->write_begin)
+ status = generic_perform_write(file, &i, pos);
+ else
+ status = generic_perform_write_2copy(file, &i, pos);
- /*
- * For now, when the user asks for O_SYNC, we'll actually give O_DSYNC
- */
if (likely(status >= 0)) {
+ written += status;
+ *ppos = pos + status;
+
+ /*
+ * For now, when the user asks for O_SYNC, we'll actually give
+ * O_DSYNC
+ */
if (unlikely((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
if (!a_ops->writepage || !is_sync_kiocb(iocb))
status = generic_osync_inode(inode, mapping,
Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -1750,6 +1750,48 @@ recover:
goto done;
}
+/*
+ * If a page has any new buffers, zero them out here, and mark them uptodate
+ * and dirty so they'll be written out (in order to prevent uninitialised
+ * block data from leaking). And clear the new bit.
+ */
+void page_zero_new_buffers(struct page *page, unsigned from, unsigned to)
+{
+ unsigned int block_start, block_end;
+ struct buffer_head *head, *bh;
+
+ BUG_ON(!PageLocked(page));
+ if (!page_has_buffers(page))
+ return;
+
+ bh = head = page_buffers(page);
+ block_start = 0;
+ do {
+ block_end = block_start + bh->b_size;
+
+ if (buffer_new(bh)) {
+ if (block_end > from && block_start < to) {
+ if (!PageUptodate(page)) {
+ unsigned start, size;
+
+ start = max(from, block_start);
+ size = min(to, block_end) - start;
+
+ zero_user_page(page, start, size, KM_USER0);
+ set_buffer_uptodate(bh);
+ }
+
+ clear_buffer_new(bh);
+ mark_buffer_dirty(bh);
+ }
+ }
+
+ block_start = block_end;
+ bh = bh->b_this_page;
+ } while (bh != head);
+}
+EXPORT_SYMBOL(page_zero_new_buffers);
+
static int __block_prepare_write(struct inode *inode, struct page *page,
unsigned from, unsigned to, get_block_t *get_block)
{
@@ -1834,38 +1876,8 @@ static int __block_prepare_write(struct
if (!buffer_uptodate(*wait_bh))
err = -EIO;
}
- if (!err) {
- bh = head;
- do {
- if (buffer_new(bh))
- clear_buffer_new(bh);
- } while ((bh = bh->b_this_page) != head);
- return 0;
- }
- /* Error case: */
- /*
- * Zero out any newly allocated blocks to avoid exposing stale
- * data. If BH_New is set, we know that the block was newly
- * allocated in the above loop.
- */
- bh = head;
- block_start = 0;
- do {
- block_end = block_start+blocksize;
- if (block_end <= from)
- goto next_bh;
- if (block_start >= to)
- break;
- if (buffer_new(bh)) {
- clear_buffer_new(bh);
- zero_user_page(page, block_start, bh->b_size, KM_USER0);
- set_buffer_uptodate(bh);
- mark_buffer_dirty(bh);
- }
-next_bh:
- block_start = block_end;
- bh = bh->b_this_page;
- } while (bh != head);
+ if (unlikely(err))
+ page_zero_new_buffers(page, from, to);
return err;
}
@@ -1890,6 +1902,7 @@ static int __block_commit_write(struct i
set_buffer_uptodate(bh);
mark_buffer_dirty(bh);
}
+ clear_buffer_new(bh);
}
/*
@@ -1904,6 +1917,127 @@ static int __block_commit_write(struct i
}
/*
+ * block_write_begin takes care of the basic task of block allocation and
+ * bringing partial write blocks uptodate first.
+ *
+ * If *pagep is not NULL, then block_write_begin uses the locked page
+ * at *pagep rather than allocating its own. In this case, the page will
+ * not be unlocked or deallocated on failure.
+ */
+int block_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata,
+ get_block_t *get_block)
+{
+ struct inode *inode = mapping->host;
+ int status = 0;
+ struct page *page;
+ pgoff_t index;
+ unsigned start, end;
+ int ownpage = 0;
+
+ index = pos >> PAGE_CACHE_SHIFT;
+ start = pos & (PAGE_CACHE_SIZE - 1);
+ end = start + len;
+
+ page = *pagep;
+ if (page == NULL) {
+ ownpage = 1;
+ page = __grab_cache_page(mapping, index);
+ if (!page) {
+ status = -ENOMEM;
+ goto out;
+ }
+ *pagep = page;
+ } else
+ BUG_ON(!PageLocked(page));
+
+ status = __block_prepare_write(inode, page, start, end, get_block);
+ if (unlikely(status)) {
+ ClearPageUptodate(page);
+
+ if (ownpage) {
+ unlock_page(page);
+ page_cache_release(page);
+
+ /*
+ * prepare_write() may have instantiated a few blocks
+ * outside i_size. Trim these off again. Don't need
+ * i_size_read because we hold i_mutex.
+ */
+ if (pos + len > inode->i_size)
+ vmtruncate(inode, inode->i_size);
+ }
+ goto out;
+ }
+
+out:
+ return status;
+}
+EXPORT_SYMBOL(block_write_begin);
+
+int block_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ struct inode *inode = mapping->host;
+ unsigned start;
+
+ start = pos & (PAGE_CACHE_SIZE - 1);
+
+ if (unlikely(copied < len)) {
+ /*
+ * The buffers that were written will now be uptodate, so we
+ * don't have to worry about a readpage reading them and
+ * overwriting a partial write. However if we have encountered
+ * a short write and only partially written into a buffer, it
+ * will not be marked uptodate, so a readpage might come in and
+ * destroy our partial write.
+ *
+ * Do the simplest thing, and just treat any short write to a
+ * non uptodate page as a zero-length write, and force the
+ * caller to redo the whole thing.
+ */
+ if (!PageUptodate(page))
+ copied = 0;
+
+ page_zero_new_buffers(page, start+copied, start+len);
+ }
+ flush_dcache_page(page);
+
+ /* This could be a short (even 0-length) commit */
+ __block_commit_write(inode, page, start, start+copied);
+
+ return copied;
+}
+EXPORT_SYMBOL(block_write_end);
+
+int generic_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ struct inode *inode = mapping->host;
+
+ copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
+
+ unlock_page(page);
+ mark_page_accessed(page); /* XXX: put this in caller? */
+ page_cache_release(page);
+
+ /*
+ * No need to use i_size_read() here, the i_size
+ * cannot change under us because we hold i_mutex.
+ */
+ if (pos+copied > inode->i_size) {
+ i_size_write(inode, pos+copied);
+ mark_inode_dirty(inode);
+ }
+
+ return copied;
+}
+EXPORT_SYMBOL(generic_write_end);
+
+/*
* Generic "read page" function for block devices that have the normal
* get_block functionality. This is most of the block device filesystems.
* Reads the page asynchronously --- the unlock_buffer() and
Index: linux-2.6/include/linux/buffer_head.h
===================================================================
--- linux-2.6.orig/include/linux/buffer_head.h
+++ linux-2.6/include/linux/buffer_head.h
@@ -203,6 +203,16 @@ void block_invalidatepage(struct page *p
int block_write_full_page(struct page *page, get_block_t *get_block,
struct writeback_control *wbc);
int block_read_full_page(struct page*, get_block_t*);
+int block_write_begin(struct file *, struct address_space *,
+ loff_t, unsigned, unsigned,
+ struct page **, void **, get_block_t*);
+int block_write_end(struct file *, struct address_space *,
+ loff_t, unsigned, unsigned,
+ struct page *, void *);
+int generic_write_end(struct file *, struct address_space *,
+ loff_t, unsigned, unsigned,
+ struct page *, void *);
+void page_zero_new_buffers(struct page *page, unsigned from, unsigned to);
int block_prepare_write(struct page*, unsigned, unsigned, get_block_t*);
int cont_prepare_write(struct page*, unsigned, unsigned, get_block_t*,
loff_t *);
Index: linux-2.6/include/linux/pagemap.h
===================================================================
--- linux-2.6.orig/include/linux/pagemap.h
+++ linux-2.6/include/linux/pagemap.h
@@ -96,6 +96,8 @@ unsigned find_get_pages_contig(struct ad
unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index,
int tag, unsigned int nr_pages, struct page **pages);
+struct page *__grab_cache_page(struct address_space *mapping, pgoff_t index);
+
/*
* Returns locked page at given index in given cache, creating it if needed.
*/
Index: linux-2.6/fs/libfs.c
===================================================================
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -348,6 +348,26 @@ int simple_prepare_write(struct file *fi
return 0;
}
+int simple_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ struct page *page;
+ pgoff_t index;
+ unsigned from;
+
+ index = pos >> PAGE_CACHE_SHIFT;
+ from = pos & (PAGE_CACHE_SIZE - 1);
+
+ page = __grab_cache_page(mapping, index);
+ if (!page)
+ return -ENOMEM;
+
+ *pagep = page;
+
+ return simple_prepare_write(file, page, from, from+len);
+}
+
int simple_commit_write(struct file *file, struct page *page,
unsigned from, unsigned to)
{
@@ -366,6 +386,28 @@ int simple_commit_write(struct file *fil
return 0;
}
+int simple_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ unsigned from = pos & (PAGE_CACHE_SIZE - 1);
+
+ /* zero the stale part of the page if we did a short copy */
+ if (copied < len) {
+ void *kaddr = kmap_atomic(page, KM_USER0);
+ memset(kaddr + from + copied, 0, len - copied);
+ flush_dcache_page(page);
+ kunmap_atomic(kaddr, KM_USER0);
+ }
+
+ simple_commit_write(file, page, from, from+copied);
+
+ unlock_page(page);
+ page_cache_release(page);
+
+ return copied;
+}
+
/*
* the inodes created here are not hashed. If you use iunique to generate
* unique inode values later for this filesystem, then you must take care
@@ -639,6 +681,8 @@ EXPORT_SYMBOL(dcache_dir_open);
EXPORT_SYMBOL(dcache_readdir);
EXPORT_SYMBOL(generic_read_dir);
EXPORT_SYMBOL(get_sb_pseudo);
+EXPORT_SYMBOL(simple_write_begin);
+EXPORT_SYMBOL(simple_write_end);
EXPORT_SYMBOL(simple_commit_write);
EXPORT_SYMBOL(simple_dir_inode_operations);
EXPORT_SYMBOL(simple_dir_operations);
Index: linux-2.6/drivers/block/loop.c
===================================================================
--- linux-2.6.orig/drivers/block/loop.c
+++ linux-2.6/drivers/block/loop.c
@@ -203,14 +203,13 @@ lo_do_transfer(struct loop_device *lo, i
* do_lo_send_aops - helper for writing data to a loop device
*
* This is the fast version for backing filesystems which implement the address
- * space operations prepare_write and commit_write.
+ * space operations write_begin and write_end.
*/
static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec,
- int bsize, loff_t pos, struct page *page)
+ int bsize, loff_t pos, struct page *unused)
{
struct file *file = lo->lo_backing_file; /* kudos to NFsckingS */
struct address_space *mapping = file->f_mapping;
- const struct address_space_operations *aops = mapping->a_ops;
pgoff_t index;
unsigned offset, bv_offs;
int len, ret;
@@ -222,63 +221,47 @@ static int do_lo_send_aops(struct loop_d
len = bvec->bv_len;
while (len > 0) {
sector_t IV;
- unsigned size;
+ unsigned size, copied;
int transfer_result;
+ struct page *page;
+ void *fsdata;
IV = ((sector_t)index << (PAGE_CACHE_SHIFT - 9))+(offset >> 9);
size = PAGE_CACHE_SIZE - offset;
if (size > len)
size = len;
- page = grab_cache_page(mapping, index);
- if (unlikely(!page))
+
+ ret = pagecache_write_begin(file, mapping, pos, size, 0,
+ &page, &fsdata);
+ if (ret)
goto fail;
- ret = aops->prepare_write(file, page, offset,
- offset + size);
- if (unlikely(ret)) {
- if (ret == AOP_TRUNCATED_PAGE) {
- page_cache_release(page);
- continue;
- }
- goto unlock;
- }
+
transfer_result = lo_do_transfer(lo, WRITE, page, offset,
bvec->bv_page, bv_offs, size, IV);
- if (unlikely(transfer_result)) {
- /*
- * The transfer failed, but we still write the data to
- * keep prepare/commit calls balanced.
- */
- printk(KERN_ERR "loop: transfer error block %llu\n",
- (unsigned long long)index);
- zero_user_page(page, offset, size, KM_USER0);
- }
- flush_dcache_page(page);
- ret = aops->commit_write(file, page, offset,
- offset + size);
- if (unlikely(ret)) {
- if (ret == AOP_TRUNCATED_PAGE) {
- page_cache_release(page);
- continue;
- }
- goto unlock;
- }
+ copied = size;
if (unlikely(transfer_result))
- goto unlock;
- bv_offs += size;
- len -= size;
+ copied = 0;
+
+ ret = pagecache_write_end(file, mapping, pos, size, copied,
+ page, fsdata);
+ if (ret < 0)
+ goto fail;
+ if (ret < copied)
+ copied = ret;
+
+ if (unlikely(transfer_result))
+ goto fail;
+
+ bv_offs += copied;
+ len -= copied;
offset = 0;
index++;
- pos += size;
- unlock_page(page);
- page_cache_release(page);
+ pos += copied;
}
ret = 0;
out:
mutex_unlock(&mapping->host->i_mutex);
return ret;
-unlock:
- unlock_page(page);
- page_cache_release(page);
fail:
ret = -1;
goto out;
@@ -312,7 +295,7 @@ static int __do_lo_send_write(struct fil
* do_lo_send_direct_write - helper for writing data to a loop device
*
* This is the fast, non-transforming version for backing filesystems which do
- * not implement the address space operations prepare_write and commit_write.
+ * not implement the address space operations write_begin and write_end.
* It uses the write file operation which should be present on all writeable
* filesystems.
*/
@@ -331,7 +314,7 @@ static int do_lo_send_direct_write(struc
* do_lo_send_write - helper for writing data to a loop device
*
* This is the slow, transforming version for filesystems which do not
- * implement the address space operations prepare_write and commit_write. It
+ * implement the address space operations write_begin and write_end. It
* uses the write file operation which should be present on all writeable
* filesystems.
*
@@ -764,7 +747,7 @@ static int loop_set_fd(struct loop_devic
*/
if (!file->f_op->sendfile)
goto out_putf;
- if (aops->prepare_write && aops->commit_write)
+ if (aops->prepare_write || aops->write_begin)
lo_flags |= LO_FLAGS_USE_AOPS;
if (!(lo_flags & LO_FLAGS_USE_AOPS) && !file->f_op->write)
lo_flags |= LO_FLAGS_READ_ONLY;
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c
+++ linux-2.6/fs/namei.c
@@ -2718,53 +2718,30 @@ int __page_symlink(struct inode *inode,
{
struct address_space *mapping = inode->i_mapping;
struct page *page;
+ void *fsdata;
int err;
char *kaddr;
retry:
- err = -ENOMEM;
- page = find_or_create_page(mapping, 0, gfp_mask);
- if (!page)
- goto fail;
- err = mapping->a_ops->prepare_write(NULL, page, 0, len-1);
- if (err == AOP_TRUNCATED_PAGE) {
- page_cache_release(page);
- goto retry;
- }
+ err = pagecache_write_begin(NULL, mapping, 0, PAGE_CACHE_SIZE,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, &fsdata);
if (err)
- goto fail_map;
+ goto fail;
+
kaddr = kmap_atomic(page, KM_USER0);
memcpy(kaddr, symname, len-1);
+ memset(kaddr+len-1, 0, PAGE_CACHE_SIZE-(len-1));
kunmap_atomic(kaddr, KM_USER0);
- err = mapping->a_ops->commit_write(NULL, page, 0, len-1);
- if (err == AOP_TRUNCATED_PAGE) {
- page_cache_release(page);
- goto retry;
- }
- if (err)
- goto fail_map;
- /*
- * Notice that we are _not_ going to block here - end of page is
- * unmapped, so this will only try to map the rest of page, see
- * that it is unmapped (typically even will not look into inode -
- * ->i_size will be enough for everything) and zero it out.
- * OTOH it's obviously correct and should make the page up-to-date.
- */
- if (!PageUptodate(page)) {
- err = mapping->a_ops->readpage(NULL, page);
- if (err != AOP_TRUNCATED_PAGE)
- wait_on_page_locked(page);
- } else {
- unlock_page(page);
- }
- page_cache_release(page);
+
+ err = pagecache_write_end(NULL, mapping, 0, PAGE_CACHE_SIZE, PAGE_CACHE_SIZE,
+ page, fsdata);
if (err < 0)
goto fail;
+ if (err < PAGE_CACHE_SIZE)
+ goto retry;
+
mark_inode_dirty(inode);
return 0;
-fail_map:
- unlock_page(page);
- page_cache_release(page);
fail:
return err;
}
Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c
+++ linux-2.6/fs/splice.c
@@ -558,7 +558,7 @@ static int pipe_to_file(struct pipe_inod
struct address_space *mapping = file->f_mapping;
unsigned int offset, this_len;
struct page *page;
- pgoff_t index;
+ void *fsdata;
int ret;
/*
@@ -568,49 +568,16 @@ static int pipe_to_file(struct pipe_inod
if (unlikely(ret))
return ret;
- index = sd->pos >> PAGE_CACHE_SHIFT;
offset = sd->pos & ~PAGE_CACHE_MASK;
this_len = sd->len;
if (this_len + offset > PAGE_CACHE_SIZE)
this_len = PAGE_CACHE_SIZE - offset;
-find_page:
- page = find_lock_page(mapping, index);
- if (!page) {
- ret = -ENOMEM;
- page = page_cache_alloc_cold(mapping);
- if (unlikely(!page))
- goto out_ret;
-
- /*
- * This will also lock the page
- */
- ret = add_to_page_cache_lru(page, mapping, index,
- GFP_KERNEL);
- if (unlikely(ret))
- goto out;
- }
-
- ret = mapping->a_ops->prepare_write(file, page, offset, offset+this_len);
- if (unlikely(ret)) {
- loff_t isize = i_size_read(mapping->host);
-
- if (ret != AOP_TRUNCATED_PAGE)
- unlock_page(page);
- page_cache_release(page);
- if (ret == AOP_TRUNCATED_PAGE)
- goto find_page;
-
- /*
- * prepare_write() may have instantiated a few blocks
- * outside i_size. Trim these off again.
- */
- if (sd->pos + this_len > isize)
- vmtruncate(mapping->host, isize);
-
- goto out_ret;
- }
+ ret = pagecache_write_begin(file, mapping, sd->pos, sd->len,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, &fsdata);
+ if (unlikely(ret))
+ goto out;
if (buf->page != page) {
/*
@@ -620,35 +587,14 @@ find_page:
char *dst = kmap_atomic(page, KM_USER1);
memcpy(dst + offset, src + buf->offset, this_len);
- flush_dcache_page(page);
kunmap_atomic(dst, KM_USER1);
buf->ops->unmap(pipe, buf, src);
}
- ret = mapping->a_ops->commit_write(file, page, offset, offset+this_len);
- if (ret) {
- if (ret == AOP_TRUNCATED_PAGE) {
- page_cache_release(page);
- goto find_page;
- }
- if (ret < 0)
- goto out;
- /*
- * Partial write has happened, so 'ret' already initialized by
- * number of bytes written, Where is nothing we have to do here.
- */
- } else
- ret = this_len;
- /*
- * Return the number of bytes written and mark page as
- * accessed, we are now done!
- */
- mark_page_accessed(page);
- balance_dirty_pages_ratelimited(mapping);
+ ret = pagecache_write_end(file, mapping, sd->pos, sd->len, sd->len, page, fsdata);
+
out:
- page_cache_release(page);
- unlock_page(page);
-out_ret:
+
return ret;
}
Index: linux-2.6/Documentation/filesystems/Locking
===================================================================
--- linux-2.6.orig/Documentation/filesystems/Locking
+++ linux-2.6/Documentation/filesystems/Locking
@@ -178,15 +178,18 @@ prototypes:
locking rules:
All except set_page_dirty may block
- BKL PageLocked(page)
+ BKL PageLocked(page) i_sem
writepage: no yes, unlocks (see below)
readpage: no yes, unlocks
sync_page: no maybe
writepages: no
set_page_dirty no no
readpages: no
-prepare_write: no yes
-commit_write: no yes
+prepare_write: no yes yes
+commit_write: no yes yes
+write_begin: no locks the page yes
+write_end: no yes, unlocks yes
+perform_write: no n/a yes
bmap: yes
invalidatepage: no yes
releasepage: no yes
Index: linux-2.6/Documentation/filesystems/vfs.txt
===================================================================
--- linux-2.6.orig/Documentation/filesystems/vfs.txt
+++ linux-2.6/Documentation/filesystems/vfs.txt
@@ -534,6 +534,12 @@ struct address_space_operations {
struct list_head *pages, unsigned nr_pages);
int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
+ int (*write_begin)(struct file *, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata);
+ int (*write_end)(struct file *, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
int (*invalidatepage) (struct page *, unsigned long);
int (*releasepage) (struct page *, int);
@@ -629,6 +635,45 @@ struct address_space_operations {
operations. It should avoid returning an error if possible -
errors should have been handled by prepare_write.
+ write_begin: This is intended as a replacement for prepare_write. The
+ key differences being that:
+ - it returns a locked page (in *pagep) rather than being
+ given a pre locked page;
+ - it must be able to cope with short writes (where the
+ length passed to write_begin is greater than the number
+ of bytes copied into the page).
+
+ Called by the generic buffered write code to ask the filesystem to
+ prepare to write len bytes at the given offset in the file. The
+ address_space should check that the write will be able to complete,
+ by allocating space if necessary and doing any other internal
+ housekeeping. If the write will update parts of any basic-blocks on
+ storage, then those blocks should be pre-read (if they haven't been
+ read already) so that the updated blocks can be written out properly.
+
+ The filesystem must return the locked pagecache page for the specified
+ offset, in *pagep, for the caller to write into.
+
+ flags is a field for AOP_FLAG_xxx flags, described in
+ include/linux/fs.h.
+
+ A void * may be returned in fsdata, which then gets passed into
+ write_end.
+
+ Returns 0 on success; < 0 on failure (which is the error code), in
+ which case write_end is not called.
+
+ write_end: After a successful write_begin, and data copy, write_end must
+ be called. len is the original len passed to write_begin, and copied
+ is the amount that was able to be copied (copied == len is always true
+ if write_begin was called with the AOP_FLAG_UNINTERRUPTIBLE flag).
+
+ The filesystem must take care of unlocking the page and releasing it
+ refcount, and updating i_size.
+
+ Returns < 0 on failure, otherwise the number of bytes (<= 'copied')
+ that were able to be copied into pagecache.
+
bmap: called by the VFS to map a logical block offset within object to
physical block number. This method is used by the FIBMAP
ioctl and for working with swap-files. To be able to swap to
Index: linux-2.6/fs/revoked_inode.c
===================================================================
--- linux-2.6.orig/fs/revoked_inode.c
+++ linux-2.6/fs/revoked_inode.c
@@ -384,6 +384,20 @@ static int revoked_commit_write(struct f
return -EIO;
}
+static int revoked_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ return -EIO;
+}
+
+static int revoked_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ return -EIO;
+}
+
static ssize_t revoked_direct_IO(int rw, struct kiocb *iocb,
const struct iovec *iov, loff_t offset,
unsigned long nr_segs)
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 13/41] mm: restore KERNEL_DS optimisations
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (11 preceding siblings ...)
2007-05-14 6:06 ` [patch 12/41] fs: introduce write_begin, write_end, and perform_write aops npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 14/41] implement simple fs aops npiggin
` (28 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Linux Memory Management
[-- Attachment #1: fs-kernel_ds-opt.patch --]
[-- Type: text/plain, Size: 1572 bytes --]
Restore the KERNEL_DS optimisation, especially helpful to the 2copy write
path.
This may be a pretty questionable gain in most cases, especially after the
legacy 2copy write path is removed, but it doesn't cost much.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
mm/filemap.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -2113,7 +2113,7 @@ static ssize_t generic_perform_write_2co
* cannot take a pagefault with the destination page locked.
* So pin the source page to copy it.
*/
- if (!PageUptodate(page)) {
+ if (!PageUptodate(page) && !segment_eq(get_fs(), KERNEL_DS)) {
unlock_page(page);
src_page = alloc_page(GFP_KERNEL);
@@ -2238,6 +2238,13 @@ static ssize_t generic_perform_write(str
const struct address_space_operations *a_ops = mapping->a_ops;
long status = 0;
ssize_t written = 0;
+ unsigned int flags = 0;
+
+ /*
+ * Copies from kernel address space cannot fail (NFSD is a big user).
+ */
+ if (segment_eq(get_fs(), KERNEL_DS))
+ flags |= AOP_FLAG_UNINTERRUPTIBLE;
do {
struct page *page;
@@ -2269,7 +2276,7 @@ again:
break;
}
- status = a_ops->write_begin(file, mapping, pos, bytes, 0,
+ status = a_ops->write_begin(file, mapping, pos, bytes, flags,
&page, &fsdata);
if (unlikely(status))
break;
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 14/41] implement simple fs aops
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (12 preceding siblings ...)
2007-05-14 6:06 ` [patch 13/41] mm: restore KERNEL_DS optimisations npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 15/41] block_dev convert to new aops npiggin
` (27 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel
[-- Attachment #1: fs-simple-aops.patch --]
[-- Type: text/plain, Size: 6104 bytes --]
Implement new aops for some of the simpler filesystems.
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/configfs/inode.c | 4 ++--
fs/hugetlbfs/inode.c | 16 ++++++++++------
fs/ramfs/file-mmu.c | 4 ++--
fs/ramfs/file-nommu.c | 4 ++--
fs/sysfs/inode.c | 4 ++--
mm/shmem.c | 35 ++++++++++++++++++++++++++++-------
6 files changed, 46 insertions(+), 21 deletions(-)
Index: linux-2.6/mm/shmem.c
===================================================================
--- linux-2.6.orig/mm/shmem.c
+++ linux-2.6/mm/shmem.c
@@ -1109,7 +1109,7 @@ static int shmem_getpage(struct inode *i
* Normally, filepage is NULL on entry, and either found
* uptodate immediately, or allocated and zeroed, or read
* in under swappage, which is then assigned to filepage.
- * But shmem_prepare_write passes in a locked filepage,
+ * But shmem_write_begin passes in a locked filepage,
* which may be found not uptodate by other callers too,
* and may need to be copied from the swappage read in.
*/
@@ -1454,14 +1454,35 @@ static const struct inode_operations shm
static const struct inode_operations shmem_symlink_inline_operations;
/*
- * Normally tmpfs makes no use of shmem_prepare_write, but it
+ * Normally tmpfs makes no use of shmem_write_begin, but it
* lets a tmpfs file be used read-write below the loop driver.
*/
static int
-shmem_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to)
+shmem_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ struct inode *inode = mapping->host;
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+ *pagep = NULL;
+ return shmem_getpage(inode, index, pagep, SGP_WRITE, NULL);
+}
+
+static int
+shmem_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
- struct inode *inode = page->mapping->host;
- return shmem_getpage(inode, page->index, &page, SGP_WRITE, NULL);
+ struct inode *inode = mapping->host;
+
+ set_page_dirty(page);
+ mark_page_accessed(page);
+ page_cache_release(page);
+
+ if (pos+copied > inode->i_size)
+ i_size_write(inode, pos+copied);
+
+ return copied;
}
static ssize_t
@@ -2357,8 +2378,8 @@ static const struct address_space_operat
.writepage = shmem_writepage,
.set_page_dirty = __set_page_dirty_no_writeback,
#ifdef CONFIG_TMPFS
- .prepare_write = shmem_prepare_write,
- .commit_write = simple_commit_write,
+ .write_begin = shmem_write_begin,
+ .write_end = shmem_write_end,
#endif
.migratepage = migrate_page,
};
Index: linux-2.6/fs/configfs/inode.c
===================================================================
--- linux-2.6.orig/fs/configfs/inode.c
+++ linux-2.6/fs/configfs/inode.c
@@ -40,8 +40,8 @@ extern struct super_block * configfs_sb;
static const struct address_space_operations configfs_aops = {
.readpage = simple_readpage,
- .prepare_write = simple_prepare_write,
- .commit_write = simple_commit_write
+ .write_begin = simple_write_begin,
+ .write_end = simple_write_end,
};
static struct backing_dev_info configfs_backing_dev_info = {
Index: linux-2.6/fs/sysfs/inode.c
===================================================================
--- linux-2.6.orig/fs/sysfs/inode.c
+++ linux-2.6/fs/sysfs/inode.c
@@ -20,8 +20,8 @@ extern struct super_block * sysfs_sb;
static const struct address_space_operations sysfs_aops = {
.readpage = simple_readpage,
- .prepare_write = simple_prepare_write,
- .commit_write = simple_commit_write
+ .write_begin = simple_write_begin,
+ .write_end = simple_write_end,
};
static struct backing_dev_info sysfs_backing_dev_info = {
Index: linux-2.6/fs/ramfs/file-mmu.c
===================================================================
--- linux-2.6.orig/fs/ramfs/file-mmu.c
+++ linux-2.6/fs/ramfs/file-mmu.c
@@ -29,8 +29,8 @@
const struct address_space_operations ramfs_aops = {
.readpage = simple_readpage,
- .prepare_write = simple_prepare_write,
- .commit_write = simple_commit_write,
+ .write_begin = simple_write_begin,
+ .write_end = simple_write_end,
.set_page_dirty = __set_page_dirty_no_writeback,
};
Index: linux-2.6/fs/ramfs/file-nommu.c
===================================================================
--- linux-2.6.orig/fs/ramfs/file-nommu.c
+++ linux-2.6/fs/ramfs/file-nommu.c
@@ -29,8 +29,8 @@ static int ramfs_nommu_setattr(struct de
const struct address_space_operations ramfs_aops = {
.readpage = simple_readpage,
- .prepare_write = simple_prepare_write,
- .commit_write = simple_commit_write,
+ .write_begin = simple_write_begin,
+ .write_end = simple_write_end,
.set_page_dirty = __set_page_dirty_no_writeback,
};
Index: linux-2.6/fs/hugetlbfs/inode.c
===================================================================
--- linux-2.6.orig/fs/hugetlbfs/inode.c
+++ linux-2.6/fs/hugetlbfs/inode.c
@@ -166,15 +166,19 @@ static int hugetlbfs_readpage(struct fil
return -EINVAL;
}
-static int hugetlbfs_prepare_write(struct file *file,
- struct page *page, unsigned offset, unsigned to)
+static int hugetlbfs_write_begin(struct file *file,
+ struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
return -EINVAL;
}
-static int hugetlbfs_commit_write(struct file *file,
- struct page *page, unsigned offset, unsigned to)
+static int hugetlbfs_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
+ BUG();
return -EINVAL;
}
@@ -546,8 +550,8 @@ static void hugetlbfs_destroy_inode(stru
static const struct address_space_operations hugetlbfs_aops = {
.readpage = hugetlbfs_readpage,
- .prepare_write = hugetlbfs_prepare_write,
- .commit_write = hugetlbfs_commit_write,
+ .write_begin = hugetlbfs_write_begin,
+ .write_end = hugetlbfs_write_end,
.set_page_dirty = hugetlbfs_set_page_dirty,
};
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 15/41] block_dev convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (13 preceding siblings ...)
2007-05-14 6:06 ` [patch 14/41] implement simple fs aops npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 16/41] rd " npiggin
` (26 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel
[-- Attachment #1: fs-blkdev-aops.patch --]
[-- Type: text/plain, Size: 1799 bytes --]
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/block_dev.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -380,14 +380,26 @@ static int blkdev_readpage(struct file *
return block_read_full_page(page, blkdev_get_block);
}
-static int blkdev_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to)
-{
- return block_prepare_write(page, from, to, blkdev_get_block);
+static int blkdev_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ *pagep = NULL;
+ return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ blkdev_get_block);
}
-static int blkdev_commit_write(struct file *file, struct page *page, unsigned from, unsigned to)
+static int blkdev_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
- return block_commit_write(page, from, to);
+ int ret;
+ ret = block_write_end(file, mapping, pos, len, copied, page, fsdata);
+
+ unlock_page(page);
+ page_cache_release(page);
+
+ return ret;
}
/*
@@ -1333,8 +1345,8 @@ const struct address_space_operations de
.readpage = blkdev_readpage,
.writepage = blkdev_writepage,
.sync_page = block_sync_page,
- .prepare_write = blkdev_prepare_write,
- .commit_write = blkdev_commit_write,
+ .write_begin = blkdev_write_begin,
+ .write_end = blkdev_write_end,
.writepages = generic_writepages,
.direct_IO = blkdev_direct_IO,
};
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 16/41] rd convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (14 preceding siblings ...)
2007-05-14 6:06 ` [patch 15/41] block_dev convert to new aops npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 17/41] ext2 " npiggin
` (25 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel
[-- Attachment #1: fs-rd-aops.patch --]
[-- Type: text/plain, Size: 5768 bytes --]
Also clean up various little things.
I've got rid of the comment from akpm, because now that make_page_uptodate
is only called from 2 places, it is pretty easy to see that the buffers
are in an uptodate state at the time of the call. Actually, it was OK before
my patch as well, because the memset is equivalent to reading from disk
of course... however it is more explicit where the updates come from now.
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
drivers/block/rd.c | 125 ++++++++++++++++++++++++++++++-----------------------
1 file changed, 73 insertions(+), 52 deletions(-)
Index: linux-2.6/drivers/block/rd.c
===================================================================
--- linux-2.6.orig/drivers/block/rd.c
+++ linux-2.6/drivers/block/rd.c
@@ -104,50 +104,60 @@ static void make_page_uptodate(struct pa
struct buffer_head *head = bh;
do {
- if (!buffer_uptodate(bh)) {
- memset(bh->b_data, 0, bh->b_size);
- /*
- * akpm: I'm totally undecided about this. The
- * buffer has just been magically brought "up to
- * date", but nobody should want to be reading
- * it anyway, because it hasn't been used for
- * anything yet. It is still in a "not read
- * from disk yet" state.
- *
- * But non-uptodate buffers against an uptodate
- * page are against the rules. So do it anyway.
- */
+ if (!buffer_uptodate(bh))
set_buffer_uptodate(bh);
- }
} while ((bh = bh->b_this_page) != head);
- } else {
- memset(page_address(page), 0, PAGE_CACHE_SIZE);
}
- flush_dcache_page(page);
SetPageUptodate(page);
}
static int ramdisk_readpage(struct file *file, struct page *page)
{
- if (!PageUptodate(page))
+ if (!PageUptodate(page)) {
+ memclear_highpage_flush(page, 0, PAGE_CACHE_SIZE);
make_page_uptodate(page);
+ }
unlock_page(page);
return 0;
}
-static int ramdisk_prepare_write(struct file *file, struct page *page,
- unsigned offset, unsigned to)
-{
- if (!PageUptodate(page))
- make_page_uptodate(page);
+static int ramdisk_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ struct page *page;
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+
+ page = __grab_cache_page(mapping, index);
+ if (!page)
+ return -ENOMEM;
+ *pagep = page;
return 0;
}
-static int ramdisk_commit_write(struct file *file, struct page *page,
- unsigned offset, unsigned to)
-{
+static int ramdisk_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ if (!PageUptodate(page)) {
+ if (copied != PAGE_CACHE_SIZE) {
+ void *dst;
+ unsigned from = pos & (PAGE_CACHE_SIZE - 1);
+ unsigned to = from + copied;
+
+ dst = kmap_atomic(page, KM_USER0);
+ memset(dst, 0, from);
+ memset(dst + to, 0, PAGE_CACHE_SIZE - to);
+ flush_dcache_page(page);
+ kunmap_atomic(dst, KM_USER0);
+ }
+ make_page_uptodate(page);
+ }
+
set_page_dirty(page);
- return 0;
+ unlock_page(page);
+ page_cache_release(page);
+ return copied;
}
/*
@@ -191,8 +201,8 @@ static int ramdisk_set_page_dirty(struct
static const struct address_space_operations ramdisk_aops = {
.readpage = ramdisk_readpage,
- .prepare_write = ramdisk_prepare_write,
- .commit_write = ramdisk_commit_write,
+ .write_begin = ramdisk_write_begin,
+ .write_end = ramdisk_write_end,
.writepage = ramdisk_writepage,
.set_page_dirty = ramdisk_set_page_dirty,
.writepages = ramdisk_writepages,
@@ -201,13 +211,14 @@ static const struct address_space_operat
static int rd_blkdev_pagecache_IO(int rw, struct bio_vec *vec, sector_t sector,
struct address_space *mapping)
{
- pgoff_t index = sector >> (PAGE_CACHE_SHIFT - 9);
+ loff_t pos = sector << 9;
unsigned int vec_offset = vec->bv_offset;
- int offset = (sector << 9) & ~PAGE_CACHE_MASK;
int size = vec->bv_len;
int err = 0;
do {
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+ unsigned offset = pos & ~PAGE_CACHE_MASK;
int count;
struct page *page;
char *src;
@@ -216,40 +227,50 @@ static int rd_blkdev_pagecache_IO(int rw
count = PAGE_CACHE_SIZE - offset;
if (count > size)
count = size;
- size -= count;
-
- page = grab_cache_page(mapping, index);
- if (!page) {
- err = -ENOMEM;
- goto out;
- }
- if (!PageUptodate(page))
- make_page_uptodate(page);
+ if (rw == WRITE) {
+ err = pagecache_write_begin(NULL, mapping, pos, count,
+ 0, &page, NULL);
+ if (err)
+ goto out;
- index++;
+ src = kmap_atomic(vec->bv_page, KM_USER0) + vec_offset;
+ dst = kmap_atomic(page, KM_USER1) + offset;
+ } else {
+again:
+ page = __grab_cache_page(mapping, index);
+ if (!page) {
+ err = -ENOMEM;
+ goto out;
+ }
+ if (!PageUptodate(page)) {
+ mapping->a_ops->readpage(NULL, page);
+ goto again;
+ }
- if (rw == READ) {
src = kmap_atomic(page, KM_USER0) + offset;
dst = kmap_atomic(vec->bv_page, KM_USER1) + vec_offset;
- } else {
- src = kmap_atomic(vec->bv_page, KM_USER0) + vec_offset;
- dst = kmap_atomic(page, KM_USER1) + offset;
}
- offset = 0;
- vec_offset += count;
+
memcpy(dst, src, count);
kunmap_atomic(src, KM_USER0);
kunmap_atomic(dst, KM_USER1);
- if (rw == READ)
+ if (rw == READ) {
flush_dcache_page(vec->bv_page);
- else
- set_page_dirty(page);
- unlock_page(page);
- put_page(page);
+ unlock_page(page);
+ page_cache_release(page);
+ } else {
+ flush_dcache_page(page);
+ pagecache_write_end(NULL, mapping, pos, count,
+ count, page, NULL);
+ }
+
+ pos += count;
+ vec_offset += count;
+ size -= count;
} while (size);
out:
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 17/41] ext2 convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (15 preceding siblings ...)
2007-05-14 6:06 ` [patch 16/41] rd " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-16 12:45 ` Nick Piggin
2007-05-14 6:06 ` [patch 18/41] ext3 " npiggin
` (24 subsequent siblings)
41 siblings, 1 reply; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, linux-ext4
[-- Attachment #1: fs-ext2-aops.patch --]
[-- Type: text/plain, Size: 7181 bytes --]
Cc: linux-ext4@vger.kernel.org
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/ext2/dir.c | 47 +++++++++++++++++++++++++++++------------------
fs/ext2/ext2.h | 3 +++
fs/ext2/inode.c | 24 +++++++++++++-----------
3 files changed, 45 insertions(+), 29 deletions(-)
Index: linux-2.6/fs/ext2/inode.c
===================================================================
--- linux-2.6.orig/fs/ext2/inode.c
+++ linux-2.6/fs/ext2/inode.c
@@ -726,18 +726,21 @@ ext2_readpages(struct file *file, struct
return mpage_readpages(mapping, pages, nr_pages, ext2_get_block);
}
-static int
-ext2_prepare_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+int __ext2_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- return block_prepare_write(page,from,to,ext2_get_block);
+ return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ ext2_get_block);
}
static int
-ext2_nobh_prepare_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+ext2_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- return nobh_prepare_write(page,from,to,ext2_get_block);
+ *pagep = NULL;
+ return __ext2_write_begin(file, mapping, pos, len, flags, pagep,fsdata);
}
static int ext2_nobh_writepage(struct page *page,
@@ -773,8 +776,8 @@ const struct address_space_operations ex
.readpages = ext2_readpages,
.writepage = ext2_writepage,
.sync_page = block_sync_page,
- .prepare_write = ext2_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = ext2_write_begin,
+ .write_end = generic_write_end,
.bmap = ext2_bmap,
.direct_IO = ext2_direct_IO,
.writepages = ext2_writepages,
@@ -791,8 +794,7 @@ const struct address_space_operations ex
.readpages = ext2_readpages,
.writepage = ext2_nobh_writepage,
.sync_page = block_sync_page,
- .prepare_write = ext2_nobh_prepare_write,
- .commit_write = nobh_commit_write,
+ /* XXX: todo */
.bmap = ext2_bmap,
.direct_IO = ext2_direct_IO,
.writepages = ext2_writepages,
Index: linux-2.6/fs/ext2/dir.c
===================================================================
--- linux-2.6.orig/fs/ext2/dir.c
+++ linux-2.6/fs/ext2/dir.c
@@ -22,6 +22,7 @@
*/
#include "ext2.h"
+#include <linux/buffer_head.h>
#include <linux/pagemap.h>
typedef struct ext2_dir_entry_2 ext2_dirent;
@@ -61,12 +62,14 @@ ext2_last_byte(struct inode *inode, unsi
return last_byte;
}
-static int ext2_commit_chunk(struct page *page, unsigned from, unsigned to)
+static int ext2_commit_chunk(struct page *page, loff_t pos, unsigned len)
{
- struct inode *dir = page->mapping->host;
+ struct address_space *mapping = page->mapping;
+ struct inode *dir = mapping->host;
int err = 0;
+
dir->i_version++;
- page->mapping->a_ops->commit_write(NULL, page, from, to);
+ block_write_end(NULL, mapping, pos, len, len, page, NULL);
if (IS_DIRSYNC(dir))
err = write_one_page(page, 1);
else
@@ -412,16 +415,18 @@ ino_t ext2_inode_by_name(struct inode *
void ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de,
struct page *page, struct inode *inode)
{
- unsigned from = (char *) de - (char *) page_address(page);
- unsigned to = from + le16_to_cpu(de->rec_len);
+ loff_t pos = (page->index << PAGE_CACHE_SHIFT) +
+ (char *) de - (char *) page_address(page);
+ unsigned len = le16_to_cpu(de->rec_len);
int err;
lock_page(page);
- err = page->mapping->a_ops->prepare_write(NULL, page, from, to);
+ err = __ext2_write_begin(NULL, page->mapping, pos, len,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
BUG_ON(err);
de->inode = cpu_to_le32(inode->i_ino);
- ext2_set_de_type (de, inode);
- err = ext2_commit_chunk(page, from, to);
+ ext2_set_de_type(de, inode);
+ err = ext2_commit_chunk(page, pos, len);
ext2_put_page(page);
dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;
@@ -444,7 +449,7 @@ int ext2_add_link (struct dentry *dentry
unsigned long npages = dir_pages(dir);
unsigned long n;
char *kaddr;
- unsigned from, to;
+ loff_t pos;
int err;
/*
@@ -497,9 +502,10 @@ int ext2_add_link (struct dentry *dentry
return -EINVAL;
got_it:
- from = (char*)de - (char*)page_address(page);
- to = from + rec_len;
- err = page->mapping->a_ops->prepare_write(NULL, page, from, to);
+ pos = (page->index << PAGE_CACHE_SHIFT) +
+ (char*)de - (char*)page_address(page);
+ err = __ext2_write_begin(NULL, page->mapping, pos, rec_len, 0,
+ &page, NULL);
if (err)
goto out_unlock;
if (de->inode) {
@@ -509,10 +515,10 @@ got_it:
de = de1;
}
de->name_len = namelen;
- memcpy (de->name, name, namelen);
+ memcpy(de->name, name, namelen);
de->inode = cpu_to_le32(inode->i_ino);
ext2_set_de_type (de, inode);
- err = ext2_commit_chunk(page, from, to);
+ err = ext2_commit_chunk(page, pos, rec_len);
dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;
mark_inode_dirty(dir);
@@ -537,6 +543,7 @@ int ext2_delete_entry (struct ext2_dir_e
char *kaddr = page_address(page);
unsigned from = ((char*)dir - kaddr) & ~(ext2_chunk_size(inode)-1);
unsigned to = ((char*)dir - kaddr) + le16_to_cpu(dir->rec_len);
+ loff_t pos;
ext2_dirent * pde = NULL;
ext2_dirent * de = (ext2_dirent *) (kaddr + from);
int err;
@@ -553,13 +560,15 @@ int ext2_delete_entry (struct ext2_dir_e
}
if (pde)
from = (char*)pde - (char*)page_address(page);
+ pos = (page->index << PAGE_CACHE_SHIFT) + from;
lock_page(page);
- err = mapping->a_ops->prepare_write(NULL, page, from, to);
+ err = __ext2_write_begin(NULL, page->mapping, pos, to - from, 0,
+ &page, NULL);
BUG_ON(err);
if (pde)
- pde->rec_len = cpu_to_le16(to-from);
+ pde->rec_len = cpu_to_le16(to - from);
dir->inode = 0;
- err = ext2_commit_chunk(page, from, to);
+ err = ext2_commit_chunk(page, pos, to - from);
inode->i_ctime = inode->i_mtime = CURRENT_TIME_SEC;
EXT2_I(inode)->i_flags &= ~EXT2_BTREE_FL;
mark_inode_dirty(inode);
@@ -582,7 +591,9 @@ int ext2_make_empty(struct inode *inode,
if (!page)
return -ENOMEM;
- err = mapping->a_ops->prepare_write(NULL, page, 0, chunk_size);
+
+ err = __ext2_write_begin(NULL, page->mapping, 0, chunk_size, 0,
+ &page, NULL);
if (err) {
unlock_page(page);
goto fail;
Index: linux-2.6/fs/ext2/ext2.h
===================================================================
--- linux-2.6.orig/fs/ext2/ext2.h
+++ linux-2.6/fs/ext2/ext2.h
@@ -134,6 +134,9 @@ extern void ext2_truncate (struct inode
extern int ext2_setattr (struct dentry *, struct iattr *);
extern void ext2_set_inode_flags(struct inode *inode);
extern void ext2_get_inode_flags(struct ext2_inode_info *);
+int __ext2_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata);
/* ioctl.c */
extern int ext2_ioctl (struct inode *, struct file *, unsigned int,
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 18/41] ext3 convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (16 preceding siblings ...)
2007-05-14 6:06 ` [patch 17/41] ext2 " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 19/41] ext4 " npiggin
` (23 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, linux-ext4, Badari Pulavarty
[-- Attachment #1: fs-ext3-aops.patch --]
[-- Type: text/plain, Size: 8747 bytes --]
Cc: linux-ext4@vger.kernel.org
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Various fixes and improvements
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
fs/ext3/inode.c | 136 ++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 88 insertions(+), 48 deletions(-)
Index: linux-2.6/fs/ext3/inode.c
===================================================================
--- linux-2.6.orig/fs/ext3/inode.c
+++ linux-2.6/fs/ext3/inode.c
@@ -1147,51 +1147,68 @@ static int do_journal_get_write_access(h
return ext3_journal_get_write_access(handle, bh);
}
-static int ext3_prepare_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+static int ext3_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- struct inode *inode = page->mapping->host;
+ struct inode *inode = mapping->host;
int ret, needed_blocks = ext3_writepage_trans_blocks(inode);
handle_t *handle;
int retries = 0;
+ struct page *page;
+ pgoff_t index;
+ unsigned from, to;
+
+ index = pos >> PAGE_CACHE_SHIFT;
+ from = pos & (PAGE_CACHE_SIZE - 1);
+ to = from + len;
retry:
+ page = __grab_cache_page(mapping, index);
+ if (!page)
+ return -ENOMEM;
+ *pagep = page;
+
handle = ext3_journal_start(inode, needed_blocks);
if (IS_ERR(handle)) {
+ unlock_page(page);
+ page_cache_release(page);
ret = PTR_ERR(handle);
goto out;
}
- if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode))
- ret = nobh_prepare_write(page, from, to, ext3_get_block);
- else
- ret = block_prepare_write(page, from, to, ext3_get_block);
+ ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ ext3_get_block);
if (ret)
- goto prepare_write_failed;
+ goto write_begin_failed;
if (ext3_should_journal_data(inode)) {
ret = walk_page_buffers(handle, page_buffers(page),
from, to, NULL, do_journal_get_write_access);
}
-prepare_write_failed:
- if (ret)
+write_begin_failed:
+ if (ret) {
ext3_journal_stop(handle);
+ unlock_page(page);
+ page_cache_release(page);
+ }
if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
goto retry;
out:
return ret;
}
+
int ext3_journal_dirty_data(handle_t *handle, struct buffer_head *bh)
{
int err = journal_dirty_data(handle, bh);
if (err)
ext3_journal_abort_handle(__FUNCTION__, __FUNCTION__,
- bh, handle,err);
+ bh, handle, err);
return err;
}
-/* For commit_write() in data=journal mode */
-static int commit_write_fn(handle_t *handle, struct buffer_head *bh)
+/* For write_end() in data=journal mode */
+static int write_end_fn(handle_t *handle, struct buffer_head *bh)
{
if (!buffer_mapped(bh) || buffer_freed(bh))
return 0;
@@ -1206,78 +1223,100 @@ static int commit_write_fn(handle_t *han
* ext3 never places buffers on inode->i_mapping->private_list. metadata
* buffers are managed internally.
*/
-static int ext3_ordered_commit_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+static int ext3_ordered_write_end(struct file *file,
+ struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
handle_t *handle = ext3_journal_current_handle();
- struct inode *inode = page->mapping->host;
+ struct inode *inode = file->f_mapping->host;
+ unsigned from, to;
int ret = 0, ret2;
+ from = pos & (PAGE_CACHE_SIZE - 1);
+ to = from + len;
+
ret = walk_page_buffers(handle, page_buffers(page),
from, to, NULL, ext3_journal_dirty_data);
if (ret == 0) {
/*
- * generic_commit_write() will run mark_inode_dirty() if i_size
+ * generic_write_end() will run mark_inode_dirty() if i_size
* changes. So let's piggyback the i_disksize mark_inode_dirty
* into that.
*/
loff_t new_i_size;
- new_i_size = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+ new_i_size = pos + copied;
if (new_i_size > EXT3_I(inode)->i_disksize)
EXT3_I(inode)->i_disksize = new_i_size;
- ret = generic_commit_write(file, page, from, to);
+ copied = generic_write_end(file, mapping, pos, len, copied,
+ page, fsdata);
+ if (copied < 0)
+ ret = copied;
+ } else {
+ unlock_page(page);
+ page_cache_release(page);
}
ret2 = ext3_journal_stop(handle);
if (!ret)
ret = ret2;
- return ret;
+ return ret ? ret : copied;
}
-static int ext3_writeback_commit_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+static int ext3_writeback_write_end(struct file *file,
+ struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
handle_t *handle = ext3_journal_current_handle();
- struct inode *inode = page->mapping->host;
+ struct inode *inode = file->f_mapping->host;
int ret = 0, ret2;
loff_t new_i_size;
- new_i_size = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+ new_i_size = pos + copied;
if (new_i_size > EXT3_I(inode)->i_disksize)
EXT3_I(inode)->i_disksize = new_i_size;
- if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode))
- ret = nobh_commit_write(file, page, from, to);
- else
- ret = generic_commit_write(file, page, from, to);
+ copied = generic_write_end(file, mapping, pos, len, copied,
+ page, fsdata);
+ if (copied < 0)
+ ret = copied;
ret2 = ext3_journal_stop(handle);
if (!ret)
ret = ret2;
- return ret;
+ return ret ? ret : copied;
}
-static int ext3_journalled_commit_write(struct file *file,
- struct page *page, unsigned from, unsigned to)
+static int ext3_journalled_write_end(struct file *file,
+ struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
handle_t *handle = ext3_journal_current_handle();
- struct inode *inode = page->mapping->host;
+ struct inode *inode = mapping->host;
int ret = 0, ret2;
int partial = 0;
- loff_t pos;
+ unsigned from, to;
- /*
- * Here we duplicate the generic_commit_write() functionality
- */
- pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+ from = pos & (PAGE_CACHE_SIZE - 1);
+ to = from + len;
+
+ if (copied < len) {
+ if (!PageUptodate(page))
+ copied = 0;
+ page_zero_new_buffers(page, from+copied, to);
+ }
ret = walk_page_buffers(handle, page_buffers(page), from,
- to, &partial, commit_write_fn);
+ to, &partial, write_end_fn);
if (!partial)
SetPageUptodate(page);
- if (pos > inode->i_size)
- i_size_write(inode, pos);
+ unlock_page(page);
+ page_cache_release(page);
+ if (pos+copied > inode->i_size)
+ i_size_write(inode, pos+copied);
EXT3_I(inode)->i_state |= EXT3_STATE_JDATA;
if (inode->i_size > EXT3_I(inode)->i_disksize) {
EXT3_I(inode)->i_disksize = inode->i_size;
@@ -1285,10 +1324,11 @@ static int ext3_journalled_commit_write(
if (!ret)
ret = ret2;
}
+
ret2 = ext3_journal_stop(handle);
if (!ret)
ret = ret2;
- return ret;
+ return ret ? ret : copied;
}
/*
@@ -1546,7 +1586,7 @@ static int ext3_journalled_writepage(str
PAGE_CACHE_SIZE, NULL, do_journal_get_write_access);
err = walk_page_buffers(handle, page_buffers(page), 0,
- PAGE_CACHE_SIZE, NULL, commit_write_fn);
+ PAGE_CACHE_SIZE, NULL, write_end_fn);
if (ret == 0)
ret = err;
EXT3_I(inode)->i_state |= EXT3_STATE_JDATA;
@@ -1706,8 +1746,8 @@ static const struct address_space_operat
.readpages = ext3_readpages,
.writepage = ext3_ordered_writepage,
.sync_page = block_sync_page,
- .prepare_write = ext3_prepare_write,
- .commit_write = ext3_ordered_commit_write,
+ .write_begin = ext3_write_begin,
+ .write_end = ext3_ordered_write_end,
.bmap = ext3_bmap,
.invalidatepage = ext3_invalidatepage,
.releasepage = ext3_releasepage,
@@ -1720,8 +1760,8 @@ static const struct address_space_operat
.readpages = ext3_readpages,
.writepage = ext3_writeback_writepage,
.sync_page = block_sync_page,
- .prepare_write = ext3_prepare_write,
- .commit_write = ext3_writeback_commit_write,
+ .write_begin = ext3_write_begin,
+ .write_end = ext3_writeback_write_end,
.bmap = ext3_bmap,
.invalidatepage = ext3_invalidatepage,
.releasepage = ext3_releasepage,
@@ -1734,8 +1774,8 @@ static const struct address_space_operat
.readpages = ext3_readpages,
.writepage = ext3_journalled_writepage,
.sync_page = block_sync_page,
- .prepare_write = ext3_prepare_write,
- .commit_write = ext3_journalled_commit_write,
+ .write_begin = ext3_write_begin,
+ .write_end = ext3_journalled_write_end,
.set_page_dirty = ext3_journalled_set_page_dirty,
.bmap = ext3_bmap,
.invalidatepage = ext3_invalidatepage,
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 19/41] ext4 convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (17 preceding siblings ...)
2007-05-14 6:06 ` [patch 18/41] ext3 " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 20/41] xfs " npiggin
` (22 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, linux-ext4, Badari Pulavarty
[-- Attachment #1: fs-ext4-aops.patch --]
[-- Type: text/plain, Size: 8844 bytes --]
Cc: linux-ext4@vger.kernel.org
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Convert ext4 to use write_begin()/write_end() methods.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
fs/ext4/inode.c | 147 +++++++++++++++++++++++++++++++++++---------------------
1 file changed, 93 insertions(+), 54 deletions(-)
Index: linux-2.6/fs/ext4/inode.c
===================================================================
--- linux-2.6.orig/fs/ext4/inode.c
+++ linux-2.6/fs/ext4/inode.c
@@ -1146,34 +1146,50 @@ static int do_journal_get_write_access(h
return ext4_journal_get_write_access(handle, bh);
}
-static int ext4_prepare_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+static int ext4_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- struct inode *inode = page->mapping->host;
+ struct inode *inode = mapping->host;
int ret, needed_blocks = ext4_writepage_trans_blocks(inode);
handle_t *handle;
int retries = 0;
+ struct page *page;
+ pgoff_t index;
+ unsigned from, to;
+
+ index = pos >> PAGE_CACHE_SHIFT;
+ from = pos & (PAGE_CACHE_SIZE - 1);
+ to = from + len;
retry:
- handle = ext4_journal_start(inode, needed_blocks);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- goto out;
+ page = __grab_cache_page(mapping, index);
+ if (!page)
+ return -ENOMEM;
+ *pagep = page;
+
+ handle = ext4_journal_start(inode, needed_blocks);
+ if (IS_ERR(handle)) {
+ unlock_page(page);
+ page_cache_release(page);
+ ret = PTR_ERR(handle);
+ goto out;
}
- if (test_opt(inode->i_sb, NOBH) && ext4_should_writeback_data(inode))
- ret = nobh_prepare_write(page, from, to, ext4_get_block);
- else
- ret = block_prepare_write(page, from, to, ext4_get_block);
- if (ret)
- goto prepare_write_failed;
- if (ext4_should_journal_data(inode)) {
+ ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ ext4_get_block);
+
+ if (!ret && ext4_should_journal_data(inode)) {
ret = walk_page_buffers(handle, page_buffers(page),
from, to, NULL, do_journal_get_write_access);
}
-prepare_write_failed:
- if (ret)
+
+ if (ret) {
ext4_journal_stop(handle);
+ unlock_page(page);
+ page_cache_release(page);
+ }
+
if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
goto retry;
out:
@@ -1185,12 +1201,12 @@ int ext4_journal_dirty_data(handle_t *ha
int err = jbd2_journal_dirty_data(handle, bh);
if (err)
ext4_journal_abort_handle(__FUNCTION__, __FUNCTION__,
- bh, handle,err);
+ bh, handle, err);
return err;
}
-/* For commit_write() in data=journal mode */
-static int commit_write_fn(handle_t *handle, struct buffer_head *bh)
+/* For write_end() in data=journal mode */
+static int write_end_fn(handle_t *handle, struct buffer_head *bh)
{
if (!buffer_mapped(bh) || buffer_freed(bh))
return 0;
@@ -1205,78 +1221,100 @@ static int commit_write_fn(handle_t *han
* ext4 never places buffers on inode->i_mapping->private_list. metadata
* buffers are managed internally.
*/
-static int ext4_ordered_commit_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+static int ext4_ordered_write_end(struct file *file,
+ struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
handle_t *handle = ext4_journal_current_handle();
- struct inode *inode = page->mapping->host;
+ struct inode *inode = file->f_mapping->host;
+ unsigned from, to;
int ret = 0, ret2;
+ from = pos & (PAGE_CACHE_SIZE - 1);
+ to = from + len;
+
ret = walk_page_buffers(handle, page_buffers(page),
from, to, NULL, ext4_journal_dirty_data);
if (ret == 0) {
/*
- * generic_commit_write() will run mark_inode_dirty() if i_size
+ * generic_write_end() will run mark_inode_dirty() if i_size
* changes. So let's piggyback the i_disksize mark_inode_dirty
* into that.
*/
loff_t new_i_size;
- new_i_size = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+ new_i_size = pos + copied;
if (new_i_size > EXT4_I(inode)->i_disksize)
EXT4_I(inode)->i_disksize = new_i_size;
- ret = generic_commit_write(file, page, from, to);
+ copied = generic_write_end(file, mapping, pos, len, copied,
+ page, fsdata);
+ if (copied < 0)
+ ret = copied;
+ } else {
+ unlock_page(page);
+ page_cache_release(page);
}
ret2 = ext4_journal_stop(handle);
if (!ret)
ret = ret2;
- return ret;
+ return ret ? ret : copied;
}
-static int ext4_writeback_commit_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+static int ext4_writeback_write_end(struct file *file,
+ struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
handle_t *handle = ext4_journal_current_handle();
- struct inode *inode = page->mapping->host;
+ struct inode *inode = file->f_mapping->host;
int ret = 0, ret2;
loff_t new_i_size;
- new_i_size = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+ new_i_size = pos + copied;
if (new_i_size > EXT4_I(inode)->i_disksize)
EXT4_I(inode)->i_disksize = new_i_size;
- if (test_opt(inode->i_sb, NOBH) && ext4_should_writeback_data(inode))
- ret = nobh_commit_write(file, page, from, to);
- else
- ret = generic_commit_write(file, page, from, to);
+ copied = generic_write_end(file, mapping, pos, len, copied,
+ page, fsdata);
+ if (copied < 0)
+ ret = copied;
ret2 = ext4_journal_stop(handle);
if (!ret)
ret = ret2;
- return ret;
+ return ret ? ret : copied;
}
-static int ext4_journalled_commit_write(struct file *file,
- struct page *page, unsigned from, unsigned to)
+static int ext4_journalled_write_end(struct file *file,
+ struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
handle_t *handle = ext4_journal_current_handle();
- struct inode *inode = page->mapping->host;
+ struct inode *inode = mapping->host;
int ret = 0, ret2;
int partial = 0;
- loff_t pos;
+ unsigned from, to;
- /*
- * Here we duplicate the generic_commit_write() functionality
- */
- pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+ from = pos & (PAGE_CACHE_SIZE - 1);
+ to = from + len;
+
+ if (copied < len) {
+ if (!PageUptodate(page))
+ copied = 0;
+ page_zero_new_buffers(page, from+copied, to);
+ }
ret = walk_page_buffers(handle, page_buffers(page), from,
- to, &partial, commit_write_fn);
+ to, &partial, write_end_fn);
if (!partial)
SetPageUptodate(page);
- if (pos > inode->i_size)
- i_size_write(inode, pos);
+ unlock_page(page);
+ page_cache_release(page);
+ if (pos+copied > inode->i_size)
+ i_size_write(inode, pos+copied);
EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
if (inode->i_size > EXT4_I(inode)->i_disksize) {
EXT4_I(inode)->i_disksize = inode->i_size;
@@ -1284,10 +1322,11 @@ static int ext4_journalled_commit_write(
if (!ret)
ret = ret2;
}
+
ret2 = ext4_journal_stop(handle);
if (!ret)
ret = ret2;
- return ret;
+ return ret ? ret : copied;
}
/*
@@ -1545,7 +1584,7 @@ static int ext4_journalled_writepage(str
PAGE_CACHE_SIZE, NULL, do_journal_get_write_access);
err = walk_page_buffers(handle, page_buffers(page), 0,
- PAGE_CACHE_SIZE, NULL, commit_write_fn);
+ PAGE_CACHE_SIZE, NULL, write_end_fn);
if (ret == 0)
ret = err;
EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
@@ -1705,8 +1744,8 @@ static const struct address_space_operat
.readpages = ext4_readpages,
.writepage = ext4_ordered_writepage,
.sync_page = block_sync_page,
- .prepare_write = ext4_prepare_write,
- .commit_write = ext4_ordered_commit_write,
+ .write_begin = ext4_write_begin,
+ .write_end = ext4_ordered_write_end,
.bmap = ext4_bmap,
.invalidatepage = ext4_invalidatepage,
.releasepage = ext4_releasepage,
@@ -1719,8 +1758,8 @@ static const struct address_space_operat
.readpages = ext4_readpages,
.writepage = ext4_writeback_writepage,
.sync_page = block_sync_page,
- .prepare_write = ext4_prepare_write,
- .commit_write = ext4_writeback_commit_write,
+ .write_begin = ext4_write_begin,
+ .write_end = ext4_writeback_write_end,
.bmap = ext4_bmap,
.invalidatepage = ext4_invalidatepage,
.releasepage = ext4_releasepage,
@@ -1733,8 +1772,8 @@ static const struct address_space_operat
.readpages = ext4_readpages,
.writepage = ext4_journalled_writepage,
.sync_page = block_sync_page,
- .prepare_write = ext4_prepare_write,
- .commit_write = ext4_journalled_commit_write,
+ .write_begin = ext4_write_begin,
+ .write_end = ext4_journalled_write_end,
.set_page_dirty = ext4_journalled_set_page_dirty,
.bmap = ext4_bmap,
.invalidatepage = ext4_invalidatepage,
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 20/41] xfs convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (18 preceding siblings ...)
2007-05-14 6:06 ` [patch 19/41] ext4 " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 21/41] fs: new cont helpers npiggin
` (21 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, xfs-masters
[-- Attachment #1: fs-xfs-aops.patch --]
[-- Type: text/plain, Size: 3035 bytes --]
Cc: xfs-masters@oss.sgi.com
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/xfs/linux-2.6/xfs_aops.c | 19 ++++++++++++-------
fs/xfs/linux-2.6/xfs_lrw.c | 35 ++++++++++++-----------------------
2 files changed, 24 insertions(+), 30 deletions(-)
Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
@@ -1479,13 +1479,18 @@ xfs_vm_direct_IO(
}
STATIC int
-xfs_vm_prepare_write(
+xfs_vm_write_begin(
struct file *file,
- struct page *page,
- unsigned int from,
- unsigned int to)
+ struct address_space *mapping,
+ loff_t pos,
+ unsigned len,
+ unsigned flags,
+ struct page **pagep,
+ void **fsdata)
{
- return block_prepare_write(page, from, to, xfs_get_blocks);
+ *pagep = NULL;
+ return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ xfs_get_blocks);
}
STATIC sector_t
@@ -1539,8 +1544,8 @@ const struct address_space_operations xf
.sync_page = block_sync_page,
.releasepage = xfs_vm_releasepage,
.invalidatepage = xfs_vm_invalidatepage,
- .prepare_write = xfs_vm_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = xfs_vm_write_begin,
+ .write_end = generic_write_end,
.bmap = xfs_vm_bmap,
.direct_IO = xfs_vm_direct_IO,
.migratepage = buffer_migrate_page,
Index: linux-2.6/fs/xfs/linux-2.6/xfs_lrw.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_lrw.c
+++ linux-2.6/fs/xfs/linux-2.6/xfs_lrw.c
@@ -134,45 +134,34 @@ xfs_iozero(
loff_t pos, /* offset in file */
size_t count) /* size of data to zero */
{
- unsigned bytes;
struct page *page;
struct address_space *mapping;
int status;
mapping = ip->i_mapping;
do {
- unsigned long index, offset;
+ unsigned offset, bytes;
+ void *fsdata;
offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
- index = pos >> PAGE_CACHE_SHIFT;
bytes = PAGE_CACHE_SIZE - offset;
if (bytes > count)
bytes = count;
- status = -ENOMEM;
- page = grab_cache_page(mapping, index);
- if (!page)
- break;
-
- status = mapping->a_ops->prepare_write(NULL, page, offset,
- offset + bytes);
+ status = pagecache_write_begin(NULL, mapping, pos, bytes,
+ AOP_FLAG_UNINTERRUPTIBLE,
+ &page, &fsdata);
if (status)
- goto unlock;
+ break;
zero_user_page(page, offset, bytes, KM_USER0);
- status = mapping->a_ops->commit_write(NULL, page, offset,
- offset + bytes);
- if (!status) {
- pos += bytes;
- count -= bytes;
- }
-
-unlock:
- unlock_page(page);
- page_cache_release(page);
- if (status)
- break;
+ status = pagecache_write_end(NULL, mapping, pos, bytes, bytes,
+ page, fsdata);
+ WARN_ON(status <= 0); /* can't return less than zero! */
+ pos += bytes;
+ count -= bytes;
+ status = 0;
} while (count);
return (-status);
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 21/41] fs: new cont helpers
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (19 preceding siblings ...)
2007-05-14 6:06 ` [patch 20/41] xfs " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 22/41] fat convert to new aops npiggin
` (20 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, hirofumi
[-- Attachment #1: fs-cont-aops.patch --]
[-- Type: text/plain, Size: 10876 bytes --]
Rework the generic block "cont" routines to handle the new aops.
Supporting cont_prepare_write would take quite a lot of code to support,
so remove it instead (and we later convert all filesystems to use it).
write_begin gets passed AOP_FLAG_CONT_EXPAND when called from
generic_cont_expand, so filesystems can avoid the old hacks they used.
Cc: hirofumi@mail.parknet.co.jp
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/buffer.c | 204 +++++++++++++++++++++-----------------------
include/linux/buffer_head.h | 5 -
include/linux/fs.h | 1
mm/filemap.c | 5 +
4 files changed, 110 insertions(+), 105 deletions(-)
Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -2133,14 +2133,14 @@ int block_read_full_page(struct page *pa
}
/* utility function for filesystems that need to do work on expanding
- * truncates. Uses prepare/commit_write to allow the filesystem to
+ * truncates. Uses filesystem pagecache writes to allow the filesystem to
* deal with the hole.
*/
-static int __generic_cont_expand(struct inode *inode, loff_t size,
- pgoff_t index, unsigned int offset)
+int generic_cont_expand_simple(struct inode *inode, loff_t size)
{
struct address_space *mapping = inode->i_mapping;
struct page *page;
+ void *fsdata;
unsigned long limit;
int err;
@@ -2153,140 +2153,134 @@ static int __generic_cont_expand(struct
if (size > inode->i_sb->s_maxbytes)
goto out;
- err = -ENOMEM;
- page = grab_cache_page(mapping, index);
- if (!page)
- goto out;
- err = mapping->a_ops->prepare_write(NULL, page, offset, offset);
- if (err) {
- /*
- * ->prepare_write() may have instantiated a few blocks
- * outside i_size. Trim these off again.
- */
- unlock_page(page);
- page_cache_release(page);
- vmtruncate(inode, inode->i_size);
+ err = pagecache_write_begin(NULL, mapping, size, 0,
+ AOP_FLAG_UNINTERRUPTIBLE|AOP_FLAG_CONT_EXPAND,
+ &page, &fsdata);
+ if (err)
goto out;
- }
- err = mapping->a_ops->commit_write(NULL, page, offset, offset);
+ err = pagecache_write_end(NULL, mapping, size, 0, 0, page, fsdata);
+ BUG_ON(err > 0);
- unlock_page(page);
- page_cache_release(page);
- if (err > 0)
- err = 0;
out:
return err;
}
int generic_cont_expand(struct inode *inode, loff_t size)
{
- pgoff_t index;
unsigned int offset;
offset = (size & (PAGE_CACHE_SIZE - 1)); /* Within page */
/* ugh. in prepare/commit_write, if from==to==start of block, we
- ** skip the prepare. make sure we never send an offset for the start
- ** of a block
- */
+ * skip the prepare. make sure we never send an offset for the start
+ * of a block.
+ * XXX: actually, this should be handled in those filesystems by
+ * checking for the AOP_FLAG_CONT_EXPAND flag.
+ */
if ((offset & (inode->i_sb->s_blocksize - 1)) == 0) {
/* caller must handle this extra byte. */
- offset++;
+ size++;
}
- index = size >> PAGE_CACHE_SHIFT;
-
- return __generic_cont_expand(inode, size, index, offset);
-}
-
-int generic_cont_expand_simple(struct inode *inode, loff_t size)
-{
- loff_t pos = size - 1;
- pgoff_t index = pos >> PAGE_CACHE_SHIFT;
- unsigned int offset = (pos & (PAGE_CACHE_SIZE - 1)) + 1;
-
- /* prepare/commit_write can handle even if from==to==start of block. */
- return __generic_cont_expand(inode, size, index, offset);
+ return generic_cont_expand_simple(inode, size);
}
-/*
- * For moronic filesystems that do not allow holes in file.
- * We may have to extend the file.
- */
-
-int cont_prepare_write(struct page *page, unsigned offset,
- unsigned to, get_block_t *get_block, loff_t *bytes)
+int cont_expand_zero(struct file *file, struct address_space *mapping,
+ loff_t pos, loff_t *bytes)
{
- struct address_space *mapping = page->mapping;
struct inode *inode = mapping->host;
- struct page *new_page;
- pgoff_t pgpos;
- long status;
- unsigned zerofrom;
unsigned blocksize = 1 << inode->i_blkbits;
+ struct page *page;
+ void *fsdata;
+ pgoff_t index, curidx;
+ loff_t curpos;
+ unsigned zerofrom, offset, len;
+ int err = 0;
- while(page->index > (pgpos = *bytes>>PAGE_CACHE_SHIFT)) {
- status = -ENOMEM;
- new_page = grab_cache_page(mapping, pgpos);
- if (!new_page)
- goto out;
- /* we might sleep */
- if (*bytes>>PAGE_CACHE_SHIFT != pgpos) {
- unlock_page(new_page);
- page_cache_release(new_page);
- continue;
- }
- zerofrom = *bytes & ~PAGE_CACHE_MASK;
+ index = pos >> PAGE_CACHE_SHIFT;
+ offset = pos & ~PAGE_CACHE_MASK;
+
+ while (index > (curidx = (curpos = *bytes)>>PAGE_CACHE_SHIFT)) {
+ zerofrom = curpos & ~PAGE_CACHE_MASK;
if (zerofrom & (blocksize-1)) {
*bytes |= (blocksize-1);
(*bytes)++;
}
- status = __block_prepare_write(inode, new_page, zerofrom,
- PAGE_CACHE_SIZE, get_block);
- if (status)
- goto out_unmap;
- zero_user_page(page, zerofrom, PAGE_CACHE_SIZE - zerofrom,
- KM_USER0);
- generic_commit_write(NULL, new_page, zerofrom, PAGE_CACHE_SIZE);
- unlock_page(new_page);
- page_cache_release(new_page);
- }
+ len = PAGE_CACHE_SIZE - zerofrom;
- if (page->index < pgpos) {
- /* completely inside the area */
- zerofrom = offset;
- } else {
- /* page covers the boundary, find the boundary offset */
- zerofrom = *bytes & ~PAGE_CACHE_MASK;
+ err = pagecache_write_begin(file, mapping, curpos, len,
+ AOP_FLAG_UNINTERRUPTIBLE,
+ &page, &fsdata);
+ if (err)
+ goto out;
+ zero_user_page(page, zerofrom, len, KM_USER0);
+ err = pagecache_write_end(file, mapping, curpos, len, len,
+ page, fsdata);
+ if (err < 0)
+ goto out;
+ BUG_ON(err != len);
+ err = 0;
+ }
+ /* page covers the boundary, find the boundary offset */
+ if (index == curidx) {
+ zerofrom = curpos & ~PAGE_CACHE_MASK;
/* if we will expand the thing last block will be filled */
- if (to > zerofrom && (zerofrom & (blocksize-1))) {
+ if (offset <= zerofrom) {
+ goto out;
+ }
+ if (zerofrom & (blocksize-1)) {
*bytes |= (blocksize-1);
(*bytes)++;
}
+ len = offset - zerofrom;
- /* starting below the boundary? Nothing to zero out */
- if (offset <= zerofrom)
- zerofrom = offset;
- }
- status = __block_prepare_write(inode, page, zerofrom, to, get_block);
- if (status)
- goto out1;
- if (zerofrom < offset) {
- zero_user_page(page, zerofrom, offset - zerofrom, KM_USER0);
- __block_commit_write(inode, page, zerofrom, offset);
+ err = pagecache_write_begin(file, mapping, curpos, len,
+ AOP_FLAG_UNINTERRUPTIBLE,
+ &page, &fsdata);
+ if (err)
+ goto out;
+ zero_user_page(page, zerofrom, len, KM_USER0);
+ err = pagecache_write_end(file, mapping, curpos, len, len,
+ page, fsdata);
+ if (err < 0)
+ goto out;
+ BUG_ON(err != len);
+ err = 0;
}
- return 0;
-out1:
- ClearPageUptodate(page);
- return status;
-
-out_unmap:
- ClearPageUptodate(new_page);
- unlock_page(new_page);
- page_cache_release(new_page);
out:
- return status;
+ return err;
+}
+
+/*
+ * For moronic filesystems that do not allow holes in file.
+ * We may have to extend the file.
+ */
+int cont_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata,
+ get_block_t *get_block, loff_t *bytes)
+{
+ struct inode *inode = mapping->host;
+ unsigned blocksize = 1 << inode->i_blkbits;
+ unsigned zerofrom;
+ int err;
+
+ err = cont_expand_zero(file, mapping, pos, bytes);
+ if (err)
+ goto out;
+
+ zerofrom = *bytes & ~PAGE_CACHE_MASK;
+ if (pos+len > *bytes && zerofrom & (blocksize-1)) {
+ *bytes |= (blocksize-1);
+ (*bytes)++;
+ }
+
+ *pagep = NULL;
+ err = block_write_begin(file, mapping, pos, len,
+ flags, pagep, fsdata, get_block);
+out:
+ return err;
}
int block_prepare_write(struct page *page, unsigned from, unsigned to,
@@ -3111,7 +3105,7 @@ EXPORT_SYMBOL(block_read_full_page);
EXPORT_SYMBOL(block_sync_page);
EXPORT_SYMBOL(block_truncate_page);
EXPORT_SYMBOL(block_write_full_page);
-EXPORT_SYMBOL(cont_prepare_write);
+EXPORT_SYMBOL(cont_write_begin);
EXPORT_SYMBOL(end_buffer_read_sync);
EXPORT_SYMBOL(end_buffer_write_sync);
EXPORT_SYMBOL(file_fsync);
Index: linux-2.6/include/linux/buffer_head.h
===================================================================
--- linux-2.6.orig/include/linux/buffer_head.h
+++ linux-2.6/include/linux/buffer_head.h
@@ -214,8 +214,9 @@ int generic_write_end(struct file *, str
struct page *, void *);
void page_zero_new_buffers(struct page *page, unsigned from, unsigned to);
int block_prepare_write(struct page*, unsigned, unsigned, get_block_t*);
-int cont_prepare_write(struct page*, unsigned, unsigned, get_block_t*,
- loff_t *);
+int cont_write_begin(struct file *, struct address_space *, loff_t,
+ unsigned, unsigned, struct page **, void **,
+ get_block_t *, loff_t *);
int generic_cont_expand(struct inode *inode, loff_t size);
int generic_cont_expand_simple(struct inode *inode, loff_t size);
int block_commit_write(struct page *page, unsigned from, unsigned to);
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -398,6 +398,7 @@ enum positive_aop_returns {
};
#define AOP_FLAG_UNINTERRUPTIBLE 0x0001 /* will not do a short write */
+#define AOP_FLAG_CONT_EXPAND 0x0002 /* called from cont_expand */
/*
* oh the beauties of C type declarations.
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1745,6 +1745,7 @@ size_t iov_iter_copy_from_user_atomic(st
return copied;
}
+EXPORT_SYMBOL(iov_iter_copy_from_user_atomic);
/*
* This has the same sideeffects and return value as
@@ -1771,6 +1772,7 @@ size_t iov_iter_copy_from_user(struct pa
kunmap(page);
return copied;
}
+EXPORT_SYMBOL(iov_iter_copy_from_user);
static void __iov_iter_advance_iov(struct iov_iter *i, size_t bytes)
{
@@ -1802,6 +1804,7 @@ void iov_iter_advance(struct iov_iter *i
__iov_iter_advance_iov(i, bytes);
i->count -= bytes;
}
+EXPORT_SYMBOL(iov_iter_advance);
int iov_iter_fault_in_readable(struct iov_iter *i)
{
@@ -1809,6 +1812,7 @@ int iov_iter_fault_in_readable(struct io
char __user *buf = i->iov->iov_base + i->iov_offset;
return fault_in_pages_readable(buf, seglen);
}
+EXPORT_SYMBOL(iov_iter_fault_in_readable);
/*
* Return the count of just the current iov_iter segment.
@@ -1821,6 +1825,7 @@ size_t iov_iter_single_seg_count(struct
else
return min(i->count, iov->iov_len - i->iov_offset);
}
+EXPORT_SYMBOL(iov_iter_single_seg_count);
/*
* Performs necessary checks before doing a write
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 22/41] fat convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (20 preceding siblings ...)
2007-05-14 6:06 ` [patch 21/41] fs: new cont helpers npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 23/41] adfs " npiggin
` (19 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, hirofumi
[-- Attachment #1: fs-fat-aops.patch --]
[-- Type: text/plain, Size: 2164 bytes --]
Cc: hirofumi@mail.parknet.co.jp
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/fat/inode.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
Index: linux-2.6/fs/fat/inode.c
===================================================================
--- linux-2.6.orig/fs/fat/inode.c
+++ linux-2.6/fs/fat/inode.c
@@ -140,19 +140,24 @@ static int fat_readpages(struct file *fi
return mpage_readpages(mapping, pages, nr_pages, fat_get_block);
}
-static int fat_prepare_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+static int fat_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- return cont_prepare_write(page, from, to, fat_get_block,
- &MSDOS_I(page->mapping->host)->mmu_private);
+ *pagep = NULL;
+ return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ fat_get_block,
+ &MSDOS_I(mapping->host)->mmu_private);
}
-static int fat_commit_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+static int fat_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *pagep, void *fsdata)
{
- struct inode *inode = page->mapping->host;
- int err = generic_commit_write(file, page, from, to);
- if (!err && !(MSDOS_I(inode)->i_attrs & ATTR_ARCH)) {
+ struct inode *inode = mapping->host;
+ int err;
+ err = generic_write_end(file, mapping, pos, len, copied, pagep, fsdata);
+ if (!(err < 0) && !(MSDOS_I(inode)->i_attrs & ATTR_ARCH)) {
inode->i_mtime = inode->i_ctime = CURRENT_TIME_SEC;
MSDOS_I(inode)->i_attrs |= ATTR_ARCH;
mark_inode_dirty(inode);
@@ -201,8 +206,8 @@ static const struct address_space_operat
.writepage = fat_writepage,
.writepages = fat_writepages,
.sync_page = block_sync_page,
- .prepare_write = fat_prepare_write,
- .commit_write = fat_commit_write,
+ .write_begin = fat_write_begin,
+ .write_end = fat_write_end,
.direct_IO = fat_direct_IO,
.bmap = _fat_bmap
};
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 23/41] adfs convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (21 preceding siblings ...)
2007-05-14 6:06 ` [patch 22/41] fat convert to new aops npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 24/41] hfs " npiggin
` (18 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, rmk
[-- Attachment #1: fs-adfs-aops.patch --]
[-- Type: text/plain, Size: 1446 bytes --]
Cc: rmk@arm.linux.org.uk
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/adfs/inode.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
Index: linux-2.6/fs/adfs/inode.c
===================================================================
--- linux-2.6.orig/fs/adfs/inode.c
+++ linux-2.6/fs/adfs/inode.c
@@ -61,10 +61,14 @@ static int adfs_readpage(struct file *fi
return block_read_full_page(page, adfs_get_block);
}
-static int adfs_prepare_write(struct file *file, struct page *page, unsigned int from, unsigned int to)
+static int adfs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- return cont_prepare_write(page, from, to, adfs_get_block,
- &ADFS_I(page->mapping->host)->mmu_private);
+ *pagep = NULL;
+ return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ adfs_get_block,
+ &ADFS_I(mapping->host)->mmu_private);
}
static sector_t _adfs_bmap(struct address_space *mapping, sector_t block)
@@ -76,8 +80,8 @@ static const struct address_space_operat
.readpage = adfs_readpage,
.writepage = adfs_writepage,
.sync_page = block_sync_page,
- .prepare_write = adfs_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = adfs_write_begin,
+ .write_end = generic_write_end,
.bmap = _adfs_bmap
};
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 24/41] hfs convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (22 preceding siblings ...)
2007-05-14 6:06 ` [patch 23/41] adfs " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 25/41] hfsplus " npiggin
` (17 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, zippel
[-- Attachment #1: fs-hfs-aops.patch --]
[-- Type: text/plain, Size: 3099 bytes --]
Cc: zippel@linux-m68k.org
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/hfs/extent.c | 19 ++++++++-----------
fs/hfs/inode.c | 20 ++++++++++++--------
2 files changed, 20 insertions(+), 19 deletions(-)
Index: linux-2.6/fs/hfs/inode.c
===================================================================
--- linux-2.6.orig/fs/hfs/inode.c
+++ linux-2.6/fs/hfs/inode.c
@@ -34,10 +34,14 @@ static int hfs_readpage(struct file *fil
return block_read_full_page(page, hfs_get_block);
}
-static int hfs_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to)
-{
- return cont_prepare_write(page, from, to, hfs_get_block,
- &HFS_I(page->mapping->host)->phys_size);
+static int hfs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ *pagep = NULL;
+ return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ hfs_get_block,
+ &HFS_I(mapping->host)->phys_size);
}
static sector_t hfs_bmap(struct address_space *mapping, sector_t block)
@@ -118,8 +122,8 @@ const struct address_space_operations hf
.readpage = hfs_readpage,
.writepage = hfs_writepage,
.sync_page = block_sync_page,
- .prepare_write = hfs_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = hfs_write_begin,
+ .write_end = generic_write_end,
.bmap = hfs_bmap,
.releasepage = hfs_releasepage,
};
@@ -128,8 +132,8 @@ const struct address_space_operations hf
.readpage = hfs_readpage,
.writepage = hfs_writepage,
.sync_page = block_sync_page,
- .prepare_write = hfs_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = hfs_write_begin,
+ .write_end = generic_write_end,
.bmap = hfs_bmap,
.direct_IO = hfs_direct_IO,
.writepages = hfs_writepages,
Index: linux-2.6/fs/hfs/extent.c
===================================================================
--- linux-2.6.orig/fs/hfs/extent.c
+++ linux-2.6/fs/hfs/extent.c
@@ -464,23 +464,20 @@ void hfs_file_truncate(struct inode *ino
(long long)HFS_I(inode)->phys_size, inode->i_size);
if (inode->i_size > HFS_I(inode)->phys_size) {
struct address_space *mapping = inode->i_mapping;
+ void *fsdata;
struct page *page;
int res;
+ /* XXX: Can use generic_cont_expand? */
size = inode->i_size - 1;
- page = grab_cache_page(mapping, size >> PAGE_CACHE_SHIFT);
- if (!page)
- return;
- size &= PAGE_CACHE_SIZE - 1;
- size++;
- res = mapping->a_ops->prepare_write(NULL, page, size, size);
- if (!res)
- res = mapping->a_ops->commit_write(NULL, page, size, size);
+ res = pagecache_write_begin(NULL, mapping, size+1, 0,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, &fsdata);
+ if (!res) {
+ res = pagecache_write_end(NULL, mapping, size+1, 0, 0,
+ page, fsdata);
+ }
if (res)
inode->i_size = HFS_I(inode)->phys_size;
- unlock_page(page);
- page_cache_release(page);
- mark_inode_dirty(inode);
return;
} else if (inode->i_size == HFS_I(inode)->phys_size)
return;
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 25/41] hfsplus convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (23 preceding siblings ...)
2007-05-14 6:06 ` [patch 24/41] hfs " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 26/41] hpfs " npiggin
` (16 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, zippel
[-- Attachment #1: fs-hfsplus-aops.patch --]
[-- Type: text/plain, Size: 3167 bytes --]
Cc: zippel@linux-m68k.org
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/hfsplus/extents.c | 21 +++++++++------------
fs/hfsplus/inode.c | 20 ++++++++++++--------
2 files changed, 21 insertions(+), 20 deletions(-)
Index: linux-2.6/fs/hfsplus/inode.c
===================================================================
--- linux-2.6.orig/fs/hfsplus/inode.c
+++ linux-2.6/fs/hfsplus/inode.c
@@ -26,10 +26,14 @@ static int hfsplus_writepage(struct page
return block_write_full_page(page, hfsplus_get_block, wbc);
}
-static int hfsplus_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to)
-{
- return cont_prepare_write(page, from, to, hfsplus_get_block,
- &HFSPLUS_I(page->mapping->host).phys_size);
+static int hfsplus_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ *pagep = NULL;
+ return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ hfsplus_get_block,
+ &HFSPLUS_I(mapping->host).phys_size);
}
static sector_t hfsplus_bmap(struct address_space *mapping, sector_t block)
@@ -113,8 +117,8 @@ const struct address_space_operations hf
.readpage = hfsplus_readpage,
.writepage = hfsplus_writepage,
.sync_page = block_sync_page,
- .prepare_write = hfsplus_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = hfsplus_write_begin,
+ .write_end = generic_write_end,
.bmap = hfsplus_bmap,
.releasepage = hfsplus_releasepage,
};
@@ -123,8 +127,8 @@ const struct address_space_operations hf
.readpage = hfsplus_readpage,
.writepage = hfsplus_writepage,
.sync_page = block_sync_page,
- .prepare_write = hfsplus_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = hfsplus_write_begin,
+ .write_end = generic_write_end,
.bmap = hfsplus_bmap,
.direct_IO = hfsplus_direct_IO,
.writepages = hfsplus_writepages,
Index: linux-2.6/fs/hfsplus/extents.c
===================================================================
--- linux-2.6.orig/fs/hfsplus/extents.c
+++ linux-2.6/fs/hfsplus/extents.c
@@ -443,21 +443,18 @@ void hfsplus_file_truncate(struct inode
if (inode->i_size > HFSPLUS_I(inode).phys_size) {
struct address_space *mapping = inode->i_mapping;
struct page *page;
- u32 size = inode->i_size - 1;
+ void *fsdata;
+ u32 size = inode->i_size;
int res;
- page = grab_cache_page(mapping, size >> PAGE_CACHE_SHIFT);
- if (!page)
- return;
- size &= PAGE_CACHE_SIZE - 1;
- size++;
- res = mapping->a_ops->prepare_write(NULL, page, size, size);
- if (!res)
- res = mapping->a_ops->commit_write(NULL, page, size, size);
+ res = pagecache_write_begin(NULL, mapping, size, 0,
+ AOP_FLAG_UNINTERRUPTIBLE,
+ &page, &fsdata);
if (res)
- inode->i_size = HFSPLUS_I(inode).phys_size;
- unlock_page(page);
- page_cache_release(page);
+ return;
+ res = pagecache_write_end(NULL, mapping, size, 0, 0, page, fsdata);
+ if (res < 0)
+ return;
mark_inode_dirty(inode);
return;
} else if (inode->i_size == HFSPLUS_I(inode).phys_size)
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 26/41] hpfs convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (24 preceding siblings ...)
2007-05-14 6:06 ` [patch 25/41] hfsplus " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 27/41] bfs " npiggin
` (15 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, mikulas
[-- Attachment #1: fs-hpfs-aops.patch --]
[-- Type: text/plain, Size: 1645 bytes --]
Cc: mikulas@artax.karlin.mff.cuni.cz
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/hpfs/file.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
Index: linux-2.6/fs/hpfs/file.c
===================================================================
--- linux-2.6.orig/fs/hpfs/file.c
+++ linux-2.6/fs/hpfs/file.c
@@ -86,25 +86,33 @@ static int hpfs_writepage(struct page *p
{
return block_write_full_page(page,hpfs_get_block, wbc);
}
+
static int hpfs_readpage(struct file *file, struct page *page)
{
return block_read_full_page(page,hpfs_get_block);
}
-static int hpfs_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to)
-{
- return cont_prepare_write(page,from,to,hpfs_get_block,
- &hpfs_i(page->mapping->host)->mmu_private);
+
+static int hpfs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ *pagep = NULL;
+ return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ hpfs_get_block,
+ &hpfs_i(mapping->host)->mmu_private);
}
+
static sector_t _hpfs_bmap(struct address_space *mapping, sector_t block)
{
return generic_block_bmap(mapping,block,hpfs_get_block);
}
+
const struct address_space_operations hpfs_aops = {
.readpage = hpfs_readpage,
.writepage = hpfs_writepage,
.sync_page = block_sync_page,
- .prepare_write = hpfs_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = hpfs_write_begin,
+ .write_end = generic_write_end,
.bmap = _hpfs_bmap
};
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 27/41] bfs convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (25 preceding siblings ...)
2007-05-14 6:06 ` [patch 26/41] hpfs " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 28/41] qnx4 " npiggin
` (14 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, tigran
[-- Attachment #1: fs-bfs-aops.patch --]
[-- Type: text/plain, Size: 1341 bytes --]
Cc: tigran@aivazian.fsnet.co.uk
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/bfs/file.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
Index: linux-2.6/fs/bfs/file.c
===================================================================
--- linux-2.6.orig/fs/bfs/file.c
+++ linux-2.6/fs/bfs/file.c
@@ -145,9 +145,13 @@ static int bfs_readpage(struct file *fil
return block_read_full_page(page, bfs_get_block);
}
-static int bfs_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to)
+static int bfs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- return block_prepare_write(page, from, to, bfs_get_block);
+ *pagep = NULL;
+ return block_write_begin(file, mapping, pos, len, flags,
+ pagep, fsdata, bfs_get_block);
}
static sector_t bfs_bmap(struct address_space *mapping, sector_t block)
@@ -159,8 +163,8 @@ const struct address_space_operations bf
.readpage = bfs_readpage,
.writepage = bfs_writepage,
.sync_page = block_sync_page,
- .prepare_write = bfs_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = bfs_write_begin,
+ .write_end = generic_write_end,
.bmap = bfs_bmap,
};
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 28/41] qnx4 convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (26 preceding siblings ...)
2007-05-14 6:06 ` [patch 27/41] bfs " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 10:37 ` Anders Larsen
2007-05-14 6:06 ` [patch 29/41] reiserfs use generic write npiggin
` (13 subsequent siblings)
41 siblings, 1 reply; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, al
[-- Attachment #1: fs-qnx4-aops.patch --]
[-- Type: text/plain, Size: 1694 bytes --]
Cc: al@alarsen.net
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/qnx4/inode.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
Index: linux-2.6/fs/qnx4/inode.c
===================================================================
--- linux-2.6.orig/fs/qnx4/inode.c
+++ linux-2.6/fs/qnx4/inode.c
@@ -433,16 +433,21 @@ static int qnx4_writepage(struct page *p
{
return block_write_full_page(page,qnx4_get_block, wbc);
}
+
static int qnx4_readpage(struct file *file, struct page *page)
{
return block_read_full_page(page,qnx4_get_block);
}
-static int qnx4_prepare_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
-{
- struct qnx4_inode_info *qnx4_inode = qnx4_i(page->mapping->host);
- return cont_prepare_write(page, from, to, qnx4_get_block,
- &qnx4_inode->mmu_private);
+
+static int qnx4_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ struct qnx4_inode_info *qnx4_inode = qnx4_i(mapping->host);
+ *pagep = NULL;
+ return cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ qnx4_get_block,
+ &qnx4_inode->mmu_private);
}
static sector_t qnx4_bmap(struct address_space *mapping, sector_t block)
{
@@ -452,8 +457,8 @@ static const struct address_space_operat
.readpage = qnx4_readpage,
.writepage = qnx4_writepage,
.sync_page = block_sync_page,
- .prepare_write = qnx4_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = qnx4_write_begin,
+ .write_end = generic_write_end,
.bmap = qnx4_bmap
};
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 29/41] reiserfs use generic write.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (27 preceding siblings ...)
2007-05-14 6:06 ` [patch 28/41] qnx4 " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 30/41] reiserfs convert to new aops npiggin
` (12 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Vladimir Saveliev
[-- Attachment #1: fs-reiserfs-use-generic-write.patch --]
[-- Type: text/plain, Size: 45968 bytes --]
From: Vladimir Saveliev <vs@namesys.com>
Make reiserfs to write via generic routines.
Original reiserfs write optimized for big writes is deadlock rone
Signed-off-by: Vladimir Saveliev <vs@namesys.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/fs/reiserfs/file.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/file.c
+++ linux-2.6/fs/reiserfs/file.c
@@ -153,608 +153,6 @@ static int reiserfs_sync_file(struct fil
return (n_err < 0) ? -EIO : 0;
}
-/* I really do not want to play with memory shortage right now, so
- to simplify the code, we are not going to write more than this much pages at
- a time. This still should considerably improve performance compared to 4k
- at a time case. This is 32 pages of 4k size. */
-#define REISERFS_WRITE_PAGES_AT_A_TIME (128 * 1024) / PAGE_CACHE_SIZE
-
-/* Allocates blocks for a file to fulfil write request.
- Maps all unmapped but prepared pages from the list.
- Updates metadata with newly allocated blocknumbers as needed */
-static int reiserfs_allocate_blocks_for_region(struct reiserfs_transaction_handle *th, struct inode *inode, /* Inode we work with */
- loff_t pos, /* Writing position */
- int num_pages, /* number of pages write going
- to touch */
- int write_bytes, /* amount of bytes to write */
- struct page **prepared_pages, /* array of
- prepared pages
- */
- int blocks_to_allocate /* Amount of blocks we
- need to allocate to
- fit the data into file
- */
- )
-{
- struct cpu_key key; // cpu key of item that we are going to deal with
- struct item_head *ih; // pointer to item head that we are going to deal with
- struct buffer_head *bh; // Buffer head that contains items that we are going to deal with
- __le32 *item; // pointer to item we are going to deal with
- INITIALIZE_PATH(path); // path to item, that we are going to deal with.
- b_blocknr_t *allocated_blocks; // Pointer to a place where allocated blocknumbers would be stored.
- reiserfs_blocknr_hint_t hint; // hint structure for block allocator.
- size_t res; // return value of various functions that we call.
- int curr_block; // current block used to keep track of unmapped blocks.
- int i; // loop counter
- int itempos; // position in item
- unsigned int from = (pos & (PAGE_CACHE_SIZE - 1)); // writing position in
- // first page
- unsigned int to = ((pos + write_bytes - 1) & (PAGE_CACHE_SIZE - 1)) + 1; /* last modified byte offset in last page */
- __u64 hole_size; // amount of blocks for a file hole, if it needed to be created.
- int modifying_this_item = 0; // Flag for items traversal code to keep track
- // of the fact that we already prepared
- // current block for journal
- int will_prealloc = 0;
- RFALSE(!blocks_to_allocate,
- "green-9004: tried to allocate zero blocks?");
-
- /* only preallocate if this is a small write */
- if (REISERFS_I(inode)->i_prealloc_count ||
- (!(write_bytes & (inode->i_sb->s_blocksize - 1)) &&
- blocks_to_allocate <
- REISERFS_SB(inode->i_sb)->s_alloc_options.preallocsize))
- will_prealloc =
- REISERFS_SB(inode->i_sb)->s_alloc_options.preallocsize;
-
- allocated_blocks = kmalloc((blocks_to_allocate + will_prealloc) *
- sizeof(b_blocknr_t), GFP_NOFS);
- if (!allocated_blocks)
- return -ENOMEM;
-
- /* First we compose a key to point at the writing position, we want to do
- that outside of any locking region. */
- make_cpu_key(&key, inode, pos + 1, TYPE_ANY, 3 /*key length */ );
-
- /* If we came here, it means we absolutely need to open a transaction,
- since we need to allocate some blocks */
- reiserfs_write_lock(inode->i_sb); // Journaling stuff and we need that.
- res = journal_begin(th, inode->i_sb, JOURNAL_PER_BALANCE_CNT * 3 + 1 + 2 * REISERFS_QUOTA_TRANS_BLOCKS(inode->i_sb)); // Wish I know if this number enough
- if (res)
- goto error_exit;
- reiserfs_update_inode_transaction(inode);
-
- /* Look for the in-tree position of our write, need path for block allocator */
- res = search_for_position_by_key(inode->i_sb, &key, &path);
- if (res == IO_ERROR) {
- res = -EIO;
- goto error_exit;
- }
-
- /* Allocate blocks */
- /* First fill in "hint" structure for block allocator */
- hint.th = th; // transaction handle.
- hint.path = &path; // Path, so that block allocator can determine packing locality or whatever it needs to determine.
- hint.inode = inode; // Inode is needed by block allocator too.
- hint.search_start = 0; // We have no hint on where to search free blocks for block allocator.
- hint.key = key.on_disk_key; // on disk key of file.
- hint.block = inode->i_blocks >> (inode->i_sb->s_blocksize_bits - 9); // Number of disk blocks this file occupies already.
- hint.formatted_node = 0; // We are allocating blocks for unformatted node.
- hint.preallocate = will_prealloc;
-
- /* Call block allocator to allocate blocks */
- res =
- reiserfs_allocate_blocknrs(&hint, allocated_blocks,
- blocks_to_allocate, blocks_to_allocate);
- if (res != CARRY_ON) {
- if (res == NO_DISK_SPACE) {
- /* We flush the transaction in case of no space. This way some
- blocks might become free */
- SB_JOURNAL(inode->i_sb)->j_must_wait = 1;
- res = restart_transaction(th, inode, &path);
- if (res)
- goto error_exit;
-
- /* We might have scheduled, so search again */
- res =
- search_for_position_by_key(inode->i_sb, &key,
- &path);
- if (res == IO_ERROR) {
- res = -EIO;
- goto error_exit;
- }
-
- /* update changed info for hint structure. */
- res =
- reiserfs_allocate_blocknrs(&hint, allocated_blocks,
- blocks_to_allocate,
- blocks_to_allocate);
- if (res != CARRY_ON) {
- res = res == QUOTA_EXCEEDED ? -EDQUOT : -ENOSPC;
- pathrelse(&path);
- goto error_exit;
- }
- } else {
- res = res == QUOTA_EXCEEDED ? -EDQUOT : -ENOSPC;
- pathrelse(&path);
- goto error_exit;
- }
- }
-#ifdef __BIG_ENDIAN
- // Too bad, I have not found any way to convert a given region from
- // cpu format to little endian format
- {
- int i;
- for (i = 0; i < blocks_to_allocate; i++)
- allocated_blocks[i] = cpu_to_le32(allocated_blocks[i]);
- }
-#endif
-
- /* Blocks allocating well might have scheduled and tree might have changed,
- let's search the tree again */
- /* find where in the tree our write should go */
- res = search_for_position_by_key(inode->i_sb, &key, &path);
- if (res == IO_ERROR) {
- res = -EIO;
- goto error_exit_free_blocks;
- }
-
- bh = get_last_bh(&path); // Get a bufferhead for last element in path.
- ih = get_ih(&path); // Get a pointer to last item head in path.
- item = get_item(&path); // Get a pointer to last item in path
-
- /* Let's see what we have found */
- if (res != POSITION_FOUND) { /* position not found, this means that we
- might need to append file with holes
- first */
- // Since we are writing past the file's end, we need to find out if
- // there is a hole that needs to be inserted before our writing
- // position, and how many blocks it is going to cover (we need to
- // populate pointers to file blocks representing the hole with zeros)
-
- {
- int item_offset = 1;
- /*
- * if ih is stat data, its offset is 0 and we don't want to
- * add 1 to pos in the hole_size calculation
- */
- if (is_statdata_le_ih(ih))
- item_offset = 0;
- hole_size = (pos + item_offset -
- (le_key_k_offset
- (get_inode_item_key_version(inode),
- &(ih->ih_key)) + op_bytes_number(ih,
- inode->
- i_sb->
- s_blocksize)))
- >> inode->i_sb->s_blocksize_bits;
- }
-
- if (hole_size > 0) {
- int to_paste = min_t(__u64, hole_size, MAX_ITEM_LEN(inode->i_sb->s_blocksize) / UNFM_P_SIZE); // How much data to insert first time.
- /* area filled with zeroes, to supply as list of zero blocknumbers
- We allocate it outside of loop just in case loop would spin for
- several iterations. */
- char *zeros = kzalloc(to_paste * UNFM_P_SIZE, GFP_ATOMIC); // We cannot insert more than MAX_ITEM_LEN bytes anyway.
- if (!zeros) {
- res = -ENOMEM;
- goto error_exit_free_blocks;
- }
- do {
- to_paste =
- min_t(__u64, hole_size,
- MAX_ITEM_LEN(inode->i_sb->
- s_blocksize) /
- UNFM_P_SIZE);
- if (is_indirect_le_ih(ih)) {
- /* Ok, there is existing indirect item already. Need to append it */
- /* Calculate position past inserted item */
- make_cpu_key(&key, inode,
- le_key_k_offset
- (get_inode_item_key_version
- (inode),
- &(ih->ih_key)) +
- op_bytes_number(ih,
- inode->
- i_sb->
- s_blocksize),
- TYPE_INDIRECT, 3);
- res =
- reiserfs_paste_into_item(th, &path,
- &key,
- inode,
- (char *)
- zeros,
- UNFM_P_SIZE
- *
- to_paste);
- if (res) {
- kfree(zeros);
- goto error_exit_free_blocks;
- }
- } else if (is_statdata_le_ih(ih)) {
- /* No existing item, create it */
- /* item head for new item */
- struct item_head ins_ih;
-
- /* create a key for our new item */
- make_cpu_key(&key, inode, 1,
- TYPE_INDIRECT, 3);
-
- /* Create new item head for our new item */
- make_le_item_head(&ins_ih, &key,
- key.version, 1,
- TYPE_INDIRECT,
- to_paste *
- UNFM_P_SIZE,
- 0 /* free space */ );
-
- /* Find where such item should live in the tree */
- res =
- search_item(inode->i_sb, &key,
- &path);
- if (res != ITEM_NOT_FOUND) {
- /* item should not exist, otherwise we have error */
- if (res != -ENOSPC) {
- reiserfs_warning(inode->
- i_sb,
- "green-9008: search_by_key (%K) returned %d",
- &key,
- res);
- }
- res = -EIO;
- kfree(zeros);
- goto error_exit_free_blocks;
- }
- res =
- reiserfs_insert_item(th, &path,
- &key, &ins_ih,
- inode,
- (char *)zeros);
- } else {
- reiserfs_panic(inode->i_sb,
- "green-9011: Unexpected key type %K\n",
- &key);
- }
- if (res) {
- kfree(zeros);
- goto error_exit_free_blocks;
- }
- /* Now we want to check if transaction is too full, and if it is
- we restart it. This will also free the path. */
- if (journal_transaction_should_end
- (th, th->t_blocks_allocated)) {
- inode->i_size = cpu_key_k_offset(&key) +
- (to_paste << inode->i_blkbits);
- res =
- restart_transaction(th, inode,
- &path);
- if (res) {
- pathrelse(&path);
- kfree(zeros);
- goto error_exit;
- }
- }
-
- /* Well, need to recalculate path and stuff */
- set_cpu_key_k_offset(&key,
- cpu_key_k_offset(&key) +
- (to_paste << inode->
- i_blkbits));
- res =
- search_for_position_by_key(inode->i_sb,
- &key, &path);
- if (res == IO_ERROR) {
- res = -EIO;
- kfree(zeros);
- goto error_exit_free_blocks;
- }
- bh = get_last_bh(&path);
- ih = get_ih(&path);
- item = get_item(&path);
- hole_size -= to_paste;
- } while (hole_size);
- kfree(zeros);
- }
- }
- // Go through existing indirect items first
- // replace all zeroes with blocknumbers from list
- // Note that if no corresponding item was found, by previous search,
- // it means there are no existing in-tree representation for file area
- // we are going to overwrite, so there is nothing to scan through for holes.
- for (curr_block = 0, itempos = path.pos_in_item;
- curr_block < blocks_to_allocate && res == POSITION_FOUND;) {
- retry:
-
- if (itempos >= ih_item_len(ih) / UNFM_P_SIZE) {
- /* We run out of data in this indirect item, let's look for another
- one. */
- /* First if we are already modifying current item, log it */
- if (modifying_this_item) {
- journal_mark_dirty(th, inode->i_sb, bh);
- modifying_this_item = 0;
- }
- /* Then set the key to look for a new indirect item (offset of old
- item is added to old item length */
- set_cpu_key_k_offset(&key,
- le_key_k_offset
- (get_inode_item_key_version(inode),
- &(ih->ih_key)) +
- op_bytes_number(ih,
- inode->i_sb->
- s_blocksize));
- /* Search ofor position of new key in the tree. */
- res =
- search_for_position_by_key(inode->i_sb, &key,
- &path);
- if (res == IO_ERROR) {
- res = -EIO;
- goto error_exit_free_blocks;
- }
- bh = get_last_bh(&path);
- ih = get_ih(&path);
- item = get_item(&path);
- itempos = path.pos_in_item;
- continue; // loop to check all kinds of conditions and so on.
- }
- /* Ok, we have correct position in item now, so let's see if it is
- representing file hole (blocknumber is zero) and fill it if needed */
- if (!item[itempos]) {
- /* Ok, a hole. Now we need to check if we already prepared this
- block to be journaled */
- while (!modifying_this_item) { // loop until succeed
- /* Well, this item is not journaled yet, so we must prepare
- it for journal first, before we can change it */
- struct item_head tmp_ih; // We copy item head of found item,
- // here to detect if fs changed under
- // us while we were preparing for
- // journal.
- int fs_gen; // We store fs generation here to find if someone
- // changes fs under our feet
-
- copy_item_head(&tmp_ih, ih); // Remember itemhead
- fs_gen = get_generation(inode->i_sb); // remember fs generation
- reiserfs_prepare_for_journal(inode->i_sb, bh, 1); // Prepare a buffer within which indirect item is stored for changing.
- if (fs_changed(fs_gen, inode->i_sb)
- && item_moved(&tmp_ih, &path)) {
- // Sigh, fs was changed under us, we need to look for new
- // location of item we are working with
-
- /* unmark prepaerd area as journaled and search for it's
- new position */
- reiserfs_restore_prepared_buffer(inode->
- i_sb,
- bh);
- res =
- search_for_position_by_key(inode->
- i_sb,
- &key,
- &path);
- if (res == IO_ERROR) {
- res = -EIO;
- goto error_exit_free_blocks;
- }
- bh = get_last_bh(&path);
- ih = get_ih(&path);
- item = get_item(&path);
- itempos = path.pos_in_item;
- goto retry;
- }
- modifying_this_item = 1;
- }
- item[itempos] = allocated_blocks[curr_block]; // Assign new block
- curr_block++;
- }
- itempos++;
- }
-
- if (modifying_this_item) { // We need to log last-accessed block, if it
- // was modified, but not logged yet.
- journal_mark_dirty(th, inode->i_sb, bh);
- }
-
- if (curr_block < blocks_to_allocate) {
- // Oh, well need to append to indirect item, or to create indirect item
- // if there weren't any
- if (is_indirect_le_ih(ih)) {
- // Existing indirect item - append. First calculate key for append
- // position. We do not need to recalculate path as it should
- // already point to correct place.
- make_cpu_key(&key, inode,
- le_key_k_offset(get_inode_item_key_version
- (inode),
- &(ih->ih_key)) +
- op_bytes_number(ih,
- inode->i_sb->s_blocksize),
- TYPE_INDIRECT, 3);
- res =
- reiserfs_paste_into_item(th, &path, &key, inode,
- (char *)(allocated_blocks +
- curr_block),
- UNFM_P_SIZE *
- (blocks_to_allocate -
- curr_block));
- if (res) {
- goto error_exit_free_blocks;
- }
- } else if (is_statdata_le_ih(ih)) {
- // Last found item was statdata. That means we need to create indirect item.
- struct item_head ins_ih; /* itemhead for new item */
-
- /* create a key for our new item */
- make_cpu_key(&key, inode, 1, TYPE_INDIRECT, 3); // Position one,
- // because that's
- // where first
- // indirect item
- // begins
- /* Create new item head for our new item */
- make_le_item_head(&ins_ih, &key, key.version, 1,
- TYPE_INDIRECT,
- (blocks_to_allocate -
- curr_block) * UNFM_P_SIZE,
- 0 /* free space */ );
- /* Find where such item should live in the tree */
- res = search_item(inode->i_sb, &key, &path);
- if (res != ITEM_NOT_FOUND) {
- /* Well, if we have found such item already, or some error
- occured, we need to warn user and return error */
- if (res != -ENOSPC) {
- reiserfs_warning(inode->i_sb,
- "green-9009: search_by_key (%K) "
- "returned %d", &key,
- res);
- }
- res = -EIO;
- goto error_exit_free_blocks;
- }
- /* Insert item into the tree with the data as its body */
- res =
- reiserfs_insert_item(th, &path, &key, &ins_ih,
- inode,
- (char *)(allocated_blocks +
- curr_block));
- } else {
- reiserfs_panic(inode->i_sb,
- "green-9010: unexpected item type for key %K\n",
- &key);
- }
- }
- // the caller is responsible for closing the transaction
- // unless we return an error, they are also responsible for logging
- // the inode.
- //
- pathrelse(&path);
- /*
- * cleanup prellocation from previous writes
- * if this is a partial block write
- */
- if (write_bytes & (inode->i_sb->s_blocksize - 1))
- reiserfs_discard_prealloc(th, inode);
- reiserfs_write_unlock(inode->i_sb);
-
- // go through all the pages/buffers and map the buffers to newly allocated
- // blocks (so that system knows where to write these pages later).
- curr_block = 0;
- for (i = 0; i < num_pages; i++) {
- struct page *page = prepared_pages[i]; //current page
- struct buffer_head *head = page_buffers(page); // first buffer for a page
- int block_start, block_end; // in-page offsets for buffers.
-
- if (!page_buffers(page))
- reiserfs_panic(inode->i_sb,
- "green-9005: No buffers for prepared page???");
-
- /* For each buffer in page */
- for (bh = head, block_start = 0; bh != head || !block_start;
- block_start = block_end, bh = bh->b_this_page) {
- if (!bh)
- reiserfs_panic(inode->i_sb,
- "green-9006: Allocated but absent buffer for a page?");
- block_end = block_start + inode->i_sb->s_blocksize;
- if (i == 0 && block_end <= from)
- /* if this buffer is before requested data to map, skip it */
- continue;
- if (i == num_pages - 1 && block_start >= to)
- /* If this buffer is after requested data to map, abort
- processing of current page */
- break;
-
- if (!buffer_mapped(bh)) { // Ok, unmapped buffer, need to map it
- map_bh(bh, inode->i_sb,
- le32_to_cpu(allocated_blocks
- [curr_block]));
- curr_block++;
- set_buffer_new(bh);
- }
- }
- }
-
- RFALSE(curr_block > blocks_to_allocate,
- "green-9007: Used too many blocks? weird");
-
- kfree(allocated_blocks);
- return 0;
-
-// Need to deal with transaction here.
- error_exit_free_blocks:
- pathrelse(&path);
- // free blocks
- for (i = 0; i < blocks_to_allocate; i++)
- reiserfs_free_block(th, inode, le32_to_cpu(allocated_blocks[i]),
- 1);
-
- error_exit:
- if (th->t_trans_id) {
- int err;
- // update any changes we made to blk count
- mark_inode_dirty(inode);
- err =
- journal_end(th, inode->i_sb,
- JOURNAL_PER_BALANCE_CNT * 3 + 1 +
- 2 * REISERFS_QUOTA_TRANS_BLOCKS(inode->i_sb));
- if (err)
- res = err;
- }
- reiserfs_write_unlock(inode->i_sb);
- kfree(allocated_blocks);
-
- return res;
-}
-
-/* Unlock pages prepared by reiserfs_prepare_file_region_for_write */
-static void reiserfs_unprepare_pages(struct page **prepared_pages, /* list of locked pages */
- size_t num_pages /* amount of pages */ )
-{
- int i; // loop counter
-
- for (i = 0; i < num_pages; i++) {
- struct page *page = prepared_pages[i];
-
- try_to_free_buffers(page);
- unlock_page(page);
- page_cache_release(page);
- }
-}
-
-/* This function will copy data from userspace to specified pages within
- supplied byte range */
-static int reiserfs_copy_from_user_to_file_region(loff_t pos, /* In-file position */
- int num_pages, /* Number of pages affected */
- int write_bytes, /* Amount of bytes to write */
- struct page **prepared_pages, /* pointer to
- array to
- prepared pages
- */
- const char __user * buf /* Pointer to user-supplied
- data */
- )
-{
- long page_fault = 0; // status of copy_from_user.
- int i; // loop counter.
- int offset; // offset in page
-
- for (i = 0, offset = (pos & (PAGE_CACHE_SIZE - 1)); i < num_pages;
- i++, offset = 0) {
- size_t count = min_t(size_t, PAGE_CACHE_SIZE - offset, write_bytes); // How much of bytes to write to this page
- struct page *page = prepared_pages[i]; // Current page we process.
-
- fault_in_pages_readable(buf, count);
-
- /* Copy data from userspace to the current page */
- kmap(page);
- page_fault = __copy_from_user(page_address(page) + offset, buf, count); // Copy the data.
- /* Flush processor's dcache for this page */
- flush_dcache_page(page);
- kunmap(page);
- buf += count;
- write_bytes -= count;
-
- if (page_fault)
- break; // Was there a fault? abort.
- }
-
- return page_fault ? -EFAULT : 0;
-}
-
/* taken fs/buffer.c:__block_commit_write */
int reiserfs_commit_page(struct inode *inode, struct page *page,
unsigned from, unsigned to)
@@ -824,432 +222,6 @@ int reiserfs_commit_page(struct inode *i
return ret;
}
-/* Submit pages for write. This was separated from actual file copying
- because we might want to allocate block numbers in-between.
- This function assumes that caller will adjust file size to correct value. */
-static int reiserfs_submit_file_region_for_write(struct reiserfs_transaction_handle *th, struct inode *inode, loff_t pos, /* Writing position offset */
- size_t num_pages, /* Number of pages to write */
- size_t write_bytes, /* number of bytes to write */
- struct page **prepared_pages /* list of pages */
- )
-{
- int status; // return status of block_commit_write.
- int retval = 0; // Return value we are going to return.
- int i; // loop counter
- int offset; // Writing offset in page.
- int orig_write_bytes = write_bytes;
- int sd_update = 0;
-
- for (i = 0, offset = (pos & (PAGE_CACHE_SIZE - 1)); i < num_pages;
- i++, offset = 0) {
- int count = min_t(int, PAGE_CACHE_SIZE - offset, write_bytes); // How much of bytes to write to this page
- struct page *page = prepared_pages[i]; // Current page we process.
-
- status =
- reiserfs_commit_page(inode, page, offset, offset + count);
- if (status)
- retval = status; // To not overcomplicate matters We are going to
- // submit all the pages even if there was error.
- // we only remember error status to report it on
- // exit.
- write_bytes -= count;
- }
- /* now that we've gotten all the ordered buffers marked dirty,
- * we can safely update i_size and close any running transaction
- */
- if (pos + orig_write_bytes > inode->i_size) {
- inode->i_size = pos + orig_write_bytes; // Set new size
- /* If the file have grown so much that tail packing is no
- * longer possible, reset "need to pack" flag */
- if ((have_large_tails(inode->i_sb) &&
- inode->i_size > i_block_size(inode) * 4) ||
- (have_small_tails(inode->i_sb) &&
- inode->i_size > i_block_size(inode)))
- REISERFS_I(inode)->i_flags &= ~i_pack_on_close_mask;
- else if ((have_large_tails(inode->i_sb) &&
- inode->i_size < i_block_size(inode) * 4) ||
- (have_small_tails(inode->i_sb) &&
- inode->i_size < i_block_size(inode)))
- REISERFS_I(inode)->i_flags |= i_pack_on_close_mask;
-
- if (th->t_trans_id) {
- reiserfs_write_lock(inode->i_sb);
- // this sets the proper flags for O_SYNC to trigger a commit
- mark_inode_dirty(inode);
- reiserfs_write_unlock(inode->i_sb);
- } else {
- reiserfs_write_lock(inode->i_sb);
- reiserfs_update_inode_transaction(inode);
- mark_inode_dirty(inode);
- reiserfs_write_unlock(inode->i_sb);
- }
-
- sd_update = 1;
- }
- if (th->t_trans_id) {
- reiserfs_write_lock(inode->i_sb);
- if (!sd_update)
- mark_inode_dirty(inode);
- status = journal_end(th, th->t_super, th->t_blocks_allocated);
- if (status)
- retval = status;
- reiserfs_write_unlock(inode->i_sb);
- }
- th->t_trans_id = 0;
-
- /*
- * we have to unlock the pages after updating i_size, otherwise
- * we race with writepage
- */
- for (i = 0; i < num_pages; i++) {
- struct page *page = prepared_pages[i];
- unlock_page(page);
- mark_page_accessed(page);
- page_cache_release(page);
- }
- return retval;
-}
-
-/* Look if passed writing region is going to touch file's tail
- (if it is present). And if it is, convert the tail to unformatted node */
-static int reiserfs_check_for_tail_and_convert(struct inode *inode, /* inode to deal with */
- loff_t pos, /* Writing position */
- int write_bytes /* amount of bytes to write */
- )
-{
- INITIALIZE_PATH(path); // needed for search_for_position
- struct cpu_key key; // Key that would represent last touched writing byte.
- struct item_head *ih; // item header of found block;
- int res; // Return value of various functions we call.
- int cont_expand_offset; // We will put offset for generic_cont_expand here
- // This can be int just because tails are created
- // only for small files.
-
-/* this embodies a dependency on a particular tail policy */
- if (inode->i_size >= inode->i_sb->s_blocksize * 4) {
- /* such a big files do not have tails, so we won't bother ourselves
- to look for tails, simply return */
- return 0;
- }
-
- reiserfs_write_lock(inode->i_sb);
- /* find the item containing the last byte to be written, or if
- * writing past the end of the file then the last item of the
- * file (and then we check its type). */
- make_cpu_key(&key, inode, pos + write_bytes + 1, TYPE_ANY,
- 3 /*key length */ );
- res = search_for_position_by_key(inode->i_sb, &key, &path);
- if (res == IO_ERROR) {
- reiserfs_write_unlock(inode->i_sb);
- return -EIO;
- }
- ih = get_ih(&path);
- res = 0;
- if (is_direct_le_ih(ih)) {
- /* Ok, closest item is file tail (tails are stored in "direct"
- * items), so we need to unpack it. */
- /* To not overcomplicate matters, we just call generic_cont_expand
- which will in turn call other stuff and finally will boil down to
- reiserfs_get_block() that would do necessary conversion. */
- cont_expand_offset =
- le_key_k_offset(get_inode_item_key_version(inode),
- &(ih->ih_key));
- pathrelse(&path);
- res = generic_cont_expand(inode, cont_expand_offset);
- } else
- pathrelse(&path);
-
- reiserfs_write_unlock(inode->i_sb);
- return res;
-}
-
-/* This function locks pages starting from @pos for @inode.
- @num_pages pages are locked and stored in
- @prepared_pages array. Also buffers are allocated for these pages.
- First and last page of the region is read if it is overwritten only
- partially. If last page did not exist before write (file hole or file
- append), it is zeroed, then.
- Returns number of unallocated blocks that should be allocated to cover
- new file data.*/
-static int reiserfs_prepare_file_region_for_write(struct inode *inode
- /* Inode of the file */ ,
- loff_t pos, /* position in the file */
- size_t num_pages, /* number of pages to
- prepare */
- size_t write_bytes, /* Amount of bytes to be
- overwritten from
- @pos */
- struct page **prepared_pages /* pointer to array
- where to store
- prepared pages */
- )
-{
- int res = 0; // Return values of different functions we call.
- unsigned long index = pos >> PAGE_CACHE_SHIFT; // Offset in file in pages.
- int from = (pos & (PAGE_CACHE_SIZE - 1)); // Writing offset in first page
- int to = ((pos + write_bytes - 1) & (PAGE_CACHE_SIZE - 1)) + 1;
- /* offset of last modified byte in last
- page */
- struct address_space *mapping = inode->i_mapping; // Pages are mapped here.
- int i; // Simple counter
- int blocks = 0; /* Return value (blocks that should be allocated) */
- struct buffer_head *bh, *head; // Current bufferhead and first bufferhead
- // of a page.
- unsigned block_start, block_end; // Starting and ending offsets of current
- // buffer in the page.
- struct buffer_head *wait[2], **wait_bh = wait; // Buffers for page, if
- // Page appeared to be not up
- // to date. Note how we have
- // at most 2 buffers, this is
- // because we at most may
- // partially overwrite two
- // buffers for one page. One at // the beginning of write area
- // and one at the end.
- // Everything inthe middle gets // overwritten totally.
-
- struct cpu_key key; // cpu key of item that we are going to deal with
- struct item_head *ih = NULL; // pointer to item head that we are going to deal with
- struct buffer_head *itembuf = NULL; // Buffer head that contains items that we are going to deal with
- INITIALIZE_PATH(path); // path to item, that we are going to deal with.
- __le32 *item = NULL; // pointer to item we are going to deal with
- int item_pos = -1; /* Position in indirect item */
-
- if (num_pages < 1) {
- reiserfs_warning(inode->i_sb,
- "green-9001: reiserfs_prepare_file_region_for_write "
- "called with zero number of pages to process");
- return -EFAULT;
- }
-
- /* We have 2 loops for pages. In first loop we grab and lock the pages, so
- that nobody would touch these until we release the pages. Then
- we'd start to deal with mapping buffers to blocks. */
- for (i = 0; i < num_pages; i++) {
- prepared_pages[i] = grab_cache_page(mapping, index + i); // locks the page
- if (!prepared_pages[i]) {
- res = -ENOMEM;
- goto failed_page_grabbing;
- }
- if (!page_has_buffers(prepared_pages[i]))
- create_empty_buffers(prepared_pages[i],
- inode->i_sb->s_blocksize, 0);
- }
-
- /* Let's count amount of blocks for a case where all the blocks
- overwritten are new (we will substract already allocated blocks later) */
- if (num_pages > 2)
- /* These are full-overwritten pages so we count all the blocks in
- these pages are counted as needed to be allocated */
- blocks =
- (num_pages - 2) << (PAGE_CACHE_SHIFT - inode->i_blkbits);
-
- /* count blocks needed for first page (possibly partially written) */
- blocks += ((PAGE_CACHE_SIZE - from) >> inode->i_blkbits) + !!(from & (inode->i_sb->s_blocksize - 1)); /* roundup */
-
- /* Now we account for last page. If last page == first page (we
- overwrite only one page), we substract all the blocks past the
- last writing position in a page out of already calculated number
- of blocks */
- blocks += ((num_pages > 1) << (PAGE_CACHE_SHIFT - inode->i_blkbits)) -
- ((PAGE_CACHE_SIZE - to) >> inode->i_blkbits);
- /* Note how we do not roundup here since partial blocks still
- should be allocated */
-
- /* Now if all the write area lies past the file end, no point in
- maping blocks, since there is none, so we just zero out remaining
- parts of first and last pages in write area (if needed) */
- if ((pos & ~((loff_t) PAGE_CACHE_SIZE - 1)) > inode->i_size) {
- if (from != 0) /* First page needs to be partially zeroed */
- zero_user_page(prepared_pages[0], 0, from, KM_USER0);
-
- if (to != PAGE_CACHE_SIZE) /* Last page needs to be partially zeroed */
- zero_user_page(prepared_pages[num_pages-1], to,
- PAGE_CACHE_SIZE - to, KM_USER0);
-
- /* Since all blocks are new - use already calculated value */
- return blocks;
- }
-
- /* Well, since we write somewhere into the middle of a file, there is
- possibility we are writing over some already allocated blocks, so
- let's map these blocks and substract number of such blocks out of blocks
- we need to allocate (calculated above) */
- /* Mask write position to start on blocksize, we do it out of the
- loop for performance reasons */
- pos &= ~((loff_t) inode->i_sb->s_blocksize - 1);
- /* Set cpu key to the starting position in a file (on left block boundary) */
- make_cpu_key(&key, inode,
- 1 + ((pos) & ~((loff_t) inode->i_sb->s_blocksize - 1)),
- TYPE_ANY, 3 /*key length */ );
-
- reiserfs_write_lock(inode->i_sb); // We need that for at least search_by_key()
- for (i = 0; i < num_pages; i++) {
-
- head = page_buffers(prepared_pages[i]);
- /* For each buffer in the page */
- for (bh = head, block_start = 0; bh != head || !block_start;
- block_start = block_end, bh = bh->b_this_page) {
- if (!bh)
- reiserfs_panic(inode->i_sb,
- "green-9002: Allocated but absent buffer for a page?");
- /* Find where this buffer ends */
- block_end = block_start + inode->i_sb->s_blocksize;
- if (i == 0 && block_end <= from)
- /* if this buffer is before requested data to map, skip it */
- continue;
-
- if (i == num_pages - 1 && block_start >= to) {
- /* If this buffer is after requested data to map, abort
- processing of current page */
- break;
- }
-
- if (buffer_mapped(bh) && bh->b_blocknr != 0) {
- /* This is optimisation for a case where buffer is mapped
- and have blocknumber assigned. In case significant amount
- of such buffers are present, we may avoid some amount
- of search_by_key calls.
- Probably it would be possible to move parts of this code
- out of BKL, but I afraid that would overcomplicate code
- without any noticeable benefit.
- */
- item_pos++;
- /* Update the key */
- set_cpu_key_k_offset(&key,
- cpu_key_k_offset(&key) +
- inode->i_sb->s_blocksize);
- blocks--; // Decrease the amount of blocks that need to be
- // allocated
- continue; // Go to the next buffer
- }
-
- if (!itembuf || /* if first iteration */
- item_pos >= ih_item_len(ih) / UNFM_P_SIZE) { /* or if we progressed past the
- current unformatted_item */
- /* Try to find next item */
- res =
- search_for_position_by_key(inode->i_sb,
- &key, &path);
- /* Abort if no more items */
- if (res != POSITION_FOUND) {
- /* make sure later loops don't use this item */
- itembuf = NULL;
- item = NULL;
- break;
- }
-
- /* Update information about current indirect item */
- itembuf = get_last_bh(&path);
- ih = get_ih(&path);
- item = get_item(&path);
- item_pos = path.pos_in_item;
-
- RFALSE(!is_indirect_le_ih(ih),
- "green-9003: indirect item expected");
- }
-
- /* See if there is some block associated with the file
- at that position, map the buffer to this block */
- if (get_block_num(item, item_pos)) {
- map_bh(bh, inode->i_sb,
- get_block_num(item, item_pos));
- blocks--; // Decrease the amount of blocks that need to be
- // allocated
- }
- item_pos++;
- /* Update the key */
- set_cpu_key_k_offset(&key,
- cpu_key_k_offset(&key) +
- inode->i_sb->s_blocksize);
- }
- }
- pathrelse(&path); // Free the path
- reiserfs_write_unlock(inode->i_sb);
-
- /* Now zero out unmappend buffers for the first and last pages of
- write area or issue read requests if page is mapped. */
- /* First page, see if it is not uptodate */
- if (!PageUptodate(prepared_pages[0])) {
- head = page_buffers(prepared_pages[0]);
-
- /* For each buffer in page */
- for (bh = head, block_start = 0; bh != head || !block_start;
- block_start = block_end, bh = bh->b_this_page) {
-
- if (!bh)
- reiserfs_panic(inode->i_sb,
- "green-9002: Allocated but absent buffer for a page?");
- /* Find where this buffer ends */
- block_end = block_start + inode->i_sb->s_blocksize;
- if (block_end <= from)
- /* if this buffer is before requested data to map, skip it */
- continue;
- if (block_start < from) { /* Aha, our partial buffer */
- if (buffer_mapped(bh)) { /* If it is mapped, we need to
- issue READ request for it to
- not loose data */
- ll_rw_block(READ, 1, &bh);
- *wait_bh++ = bh;
- } else { /* Not mapped, zero it */
- zero_user_page(prepared_pages[0],
- block_start,
- from - block_start, KM_USER0);
- set_buffer_uptodate(bh);
- }
- }
- }
- }
-
- /* Last page, see if it is not uptodate, or if the last page is past the end of the file. */
- if (!PageUptodate(prepared_pages[num_pages - 1]) ||
- ((pos + write_bytes) >> PAGE_CACHE_SHIFT) >
- (inode->i_size >> PAGE_CACHE_SHIFT)) {
- head = page_buffers(prepared_pages[num_pages - 1]);
-
- /* for each buffer in page */
- for (bh = head, block_start = 0; bh != head || !block_start;
- block_start = block_end, bh = bh->b_this_page) {
-
- if (!bh)
- reiserfs_panic(inode->i_sb,
- "green-9002: Allocated but absent buffer for a page?");
- /* Find where this buffer ends */
- block_end = block_start + inode->i_sb->s_blocksize;
- if (block_start >= to)
- /* if this buffer is after requested data to map, skip it */
- break;
- if (block_end > to) { /* Aha, our partial buffer */
- if (buffer_mapped(bh)) { /* If it is mapped, we need to
- issue READ request for it to
- not loose data */
- ll_rw_block(READ, 1, &bh);
- *wait_bh++ = bh;
- } else { /* Not mapped, zero it */
- zero_user_page(prepared_pages[num_pages-1],
- to, block_end - to, KM_USER0);
- set_buffer_uptodate(bh);
- }
- }
- }
- }
-
- /* Wait for read requests we made to happen, if necessary */
- while (wait_bh > wait) {
- wait_on_buffer(*--wait_bh);
- if (!buffer_uptodate(*wait_bh)) {
- res = -EIO;
- goto failed_read;
- }
- }
-
- return blocks;
- failed_page_grabbing:
- num_pages = i;
- failed_read:
- reiserfs_unprepare_pages(prepared_pages, num_pages);
- return res;
-}
-
/* Write @count bytes at position @ppos in a file indicated by @file
from the buffer @buf.
@@ -1284,14 +256,9 @@ static ssize_t reiserfs_file_write(struc
* new current position before returning. */
)
{
- size_t already_written = 0; // Number of bytes already written to the file.
- loff_t pos; // Current position in the file.
- ssize_t res; // return value of various functions that we call.
- int err = 0;
struct inode *inode = file->f_path.dentry->d_inode; // Inode of the file that we are writing to.
/* To simplify coding at this time, we store
locked pages in array for now */
- struct page *prepared_pages[REISERFS_WRITE_PAGES_AT_A_TIME];
struct reiserfs_transaction_handle th;
th.t_trans_id = 0;
@@ -1312,212 +279,7 @@ static ssize_t reiserfs_file_write(struc
count = MAX_NON_LFS - (unsigned long)*ppos;
}
- if (file->f_flags & O_DIRECT)
- return do_sync_write(file, buf, count, ppos);
-
- if (unlikely((ssize_t) count < 0))
- return -EINVAL;
-
- if (unlikely(!access_ok(VERIFY_READ, buf, count)))
- return -EFAULT;
-
- mutex_lock(&inode->i_mutex); // locks the entire file for just us
-
- pos = *ppos;
-
- /* Check if we can write to specified region of file, file
- is not overly big and this kind of stuff. Adjust pos and
- count, if needed */
- res = generic_write_checks(file, &pos, &count, 0);
- if (res)
- goto out;
-
- if (count == 0)
- goto out;
-
- res = remove_suid(file->f_path.dentry);
- if (res)
- goto out;
-
- file_update_time(file);
-
- // Ok, we are done with all the checks.
-
- // Now we should start real work
-
- /* If we are going to write past the file's packed tail or if we are going
- to overwrite part of the tail, we need that tail to be converted into
- unformatted node */
- res = reiserfs_check_for_tail_and_convert(inode, pos, count);
- if (res)
- goto out;
-
- while (count > 0) {
- /* This is the main loop in which we running until some error occures
- or until we write all of the data. */
- size_t num_pages; /* amount of pages we are going to write this iteration */
- size_t write_bytes; /* amount of bytes to write during this iteration */
- size_t blocks_to_allocate; /* how much blocks we need to allocate for this iteration */
-
- /* (pos & (PAGE_CACHE_SIZE-1)) is an idiom for offset into a page of pos */
- num_pages = !!((pos + count) & (PAGE_CACHE_SIZE - 1)) + /* round up partial
- pages */
- ((count +
- (pos & (PAGE_CACHE_SIZE - 1))) >> PAGE_CACHE_SHIFT);
- /* convert size to amount of
- pages */
- reiserfs_write_lock(inode->i_sb);
- if (num_pages > REISERFS_WRITE_PAGES_AT_A_TIME
- || num_pages > reiserfs_can_fit_pages(inode->i_sb)) {
- /* If we were asked to write more data than we want to or if there
- is not that much space, then we shorten amount of data to write
- for this iteration. */
- num_pages =
- min_t(size_t, REISERFS_WRITE_PAGES_AT_A_TIME,
- reiserfs_can_fit_pages(inode->i_sb));
- /* Also we should not forget to set size in bytes accordingly */
- write_bytes = (num_pages << PAGE_CACHE_SHIFT) -
- (pos & (PAGE_CACHE_SIZE - 1));
- /* If position is not on the
- start of the page, we need
- to substract the offset
- within page */
- } else
- write_bytes = count;
-
- /* reserve the blocks to be allocated later, so that later on
- we still have the space to write the blocks to */
- reiserfs_claim_blocks_to_be_allocated(inode->i_sb,
- num_pages <<
- (PAGE_CACHE_SHIFT -
- inode->i_blkbits));
- reiserfs_write_unlock(inode->i_sb);
-
- if (!num_pages) { /* If we do not have enough space even for a single page... */
- if (pos >
- inode->i_size + inode->i_sb->s_blocksize -
- (pos & (inode->i_sb->s_blocksize - 1))) {
- res = -ENOSPC;
- break; // In case we are writing past the end of the last file block, break.
- }
- // Otherwise we are possibly overwriting the file, so
- // let's set write size to be equal or less than blocksize.
- // This way we get it correctly for file holes.
- // But overwriting files on absolutelly full volumes would not
- // be very efficient. Well, people are not supposed to fill
- // 100% of disk space anyway.
- write_bytes =
- min_t(size_t, count,
- inode->i_sb->s_blocksize -
- (pos & (inode->i_sb->s_blocksize - 1)));
- num_pages = 1;
- // No blocks were claimed before, so do it now.
- reiserfs_claim_blocks_to_be_allocated(inode->i_sb,
- 1 <<
- (PAGE_CACHE_SHIFT
- -
- inode->
- i_blkbits));
- }
-
- /* Prepare for writing into the region, read in all the
- partially overwritten pages, if needed. And lock the pages,
- so that nobody else can access these until we are done.
- We get number of actual blocks needed as a result. */
- res = reiserfs_prepare_file_region_for_write(inode, pos,
- num_pages,
- write_bytes,
- prepared_pages);
- if (res < 0) {
- reiserfs_release_claimed_blocks(inode->i_sb,
- num_pages <<
- (PAGE_CACHE_SHIFT -
- inode->i_blkbits));
- break;
- }
-
- blocks_to_allocate = res;
-
- /* First we correct our estimate of how many blocks we need */
- reiserfs_release_claimed_blocks(inode->i_sb,
- (num_pages <<
- (PAGE_CACHE_SHIFT -
- inode->i_sb->
- s_blocksize_bits)) -
- blocks_to_allocate);
-
- if (blocks_to_allocate > 0) { /*We only allocate blocks if we need to */
- /* Fill in all the possible holes and append the file if needed */
- res =
- reiserfs_allocate_blocks_for_region(&th, inode, pos,
- num_pages,
- write_bytes,
- prepared_pages,
- blocks_to_allocate);
- }
-
- /* well, we have allocated the blocks, so it is time to free
- the reservation we made earlier. */
- reiserfs_release_claimed_blocks(inode->i_sb,
- blocks_to_allocate);
- if (res) {
- reiserfs_unprepare_pages(prepared_pages, num_pages);
- break;
- }
-
-/* NOTE that allocating blocks and filling blocks can be done in reverse order
- and probably we would do that just to get rid of garbage in files after a
- crash */
-
- /* Copy data from user-supplied buffer to file's pages */
- res =
- reiserfs_copy_from_user_to_file_region(pos, num_pages,
- write_bytes,
- prepared_pages, buf);
- if (res) {
- reiserfs_unprepare_pages(prepared_pages, num_pages);
- break;
- }
-
- /* Send the pages to disk and unlock them. */
- res =
- reiserfs_submit_file_region_for_write(&th, inode, pos,
- num_pages,
- write_bytes,
- prepared_pages);
- if (res)
- break;
-
- already_written += write_bytes;
- buf += write_bytes;
- *ppos = pos += write_bytes;
- count -= write_bytes;
- balance_dirty_pages_ratelimited_nr(inode->i_mapping, num_pages);
- }
-
- /* this is only true on error */
- if (th.t_trans_id) {
- reiserfs_write_lock(inode->i_sb);
- err = journal_end(&th, th.t_super, th.t_blocks_allocated);
- reiserfs_write_unlock(inode->i_sb);
- if (err) {
- res = err;
- goto out;
- }
- }
-
- if (likely(res >= 0) &&
- (unlikely((file->f_flags & O_SYNC) || IS_SYNC(inode))))
- res = generic_osync_inode(inode, file->f_mapping,
- OSYNC_METADATA | OSYNC_DATA);
-
- mutex_unlock(&inode->i_mutex);
- reiserfs_async_progress_wait(inode->i_sb);
- return (already_written != 0) ? already_written : res;
-
- out:
- mutex_unlock(&inode->i_mutex); // unlock the file on exit.
- return res;
+ return do_sync_write(file, buf, count, ppos);
}
const struct file_operations reiserfs_file_operations = {
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 30/41] reiserfs convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (28 preceding siblings ...)
2007-05-14 6:06 ` [patch 29/41] reiserfs use generic write npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 31/41] nfs " npiggin
` (11 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Vladimir Saveliev
[-- Attachment #1: fs-reiserfs-aops.patch --]
[-- Type: text/plain, Size: 5608 bytes --]
From: Vladimir Saveliev <vs@namesys.com>
[npiggin@suse.de: can't quite get rid of prepare_write/commit_write callbacks
yet! (xattrs and ioctl uses them), but this is a good start]
Signed-off-by: Vladimir Saveliev <vs@namesys.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/fs/reiserfs/inode.c
===================================================================
--- linux-2.6.orig/fs/reiserfs/inode.c
+++ linux-2.6/fs/reiserfs/inode.c
@@ -16,6 +16,7 @@
#include <linux/mpage.h>
#include <linux/writeback.h>
#include <linux/quotaops.h>
+#include <linux/swap.h>
static int reiserfs_commit_write(struct file *f, struct page *page,
unsigned from, unsigned to);
@@ -2549,6 +2550,69 @@ static int reiserfs_writepage(struct pag
return reiserfs_write_full_page(page, wbc);
}
+static int reiserfs_write_begin(struct file *file,
+ struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ struct inode *inode;
+ struct page *page;
+ pgoff_t index;
+ int ret;
+ int old_ref = 0;
+
+ index = pos >> PAGE_CACHE_SHIFT;
+ page = __grab_cache_page(mapping, index);
+ if (!page)
+ return -ENOMEM;
+ *pagep = page;
+
+ inode = mapping->host;
+ reiserfs_wait_on_write_block(inode->i_sb);
+ fix_tail_page_for_writing(page);
+ if (reiserfs_transaction_running(inode->i_sb)) {
+ struct reiserfs_transaction_handle *th;
+ th = (struct reiserfs_transaction_handle *)current->
+ journal_info;
+ BUG_ON(!th->t_refcount);
+ BUG_ON(!th->t_trans_id);
+ old_ref = th->t_refcount;
+ th->t_refcount++;
+ }
+ ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ reiserfs_get_block);
+ if (ret && reiserfs_transaction_running(inode->i_sb)) {
+ struct reiserfs_transaction_handle *th = current->journal_info;
+ /* this gets a little ugly. If reiserfs_get_block returned an
+ * error and left a transacstion running, we've got to close it,
+ * and we've got to free handle if it was a persistent transaction.
+ *
+ * But, if we had nested into an existing transaction, we need
+ * to just drop the ref count on the handle.
+ *
+ * If old_ref == 0, the transaction is from reiserfs_get_block,
+ * and it was a persistent trans. Otherwise, it was nested above.
+ */
+ if (th->t_refcount > old_ref) {
+ if (old_ref)
+ th->t_refcount--;
+ else {
+ int err;
+ reiserfs_write_lock(inode->i_sb);
+ err = reiserfs_end_persistent_transaction(th);
+ reiserfs_write_unlock(inode->i_sb);
+ if (err)
+ ret = err;
+ }
+ }
+ }
+ if (ret) {
+ unlock_page(page);
+ page_cache_release(page);
+ }
+ return ret;
+}
+
static int reiserfs_prepare_write(struct file *f, struct page *page,
unsigned from, unsigned to)
{
@@ -2603,6 +2667,99 @@ static sector_t reiserfs_aop_bmap(struct
return generic_block_bmap(as, block, reiserfs_bmap);
}
+static int reiserfs_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ struct inode *inode = page->mapping->host;
+ int ret = 0;
+ int update_sd = 0;
+ struct reiserfs_transaction_handle *th;
+ unsigned start;
+
+
+ reiserfs_wait_on_write_block(inode->i_sb);
+ if (reiserfs_transaction_running(inode->i_sb))
+ th = current->journal_info;
+ else
+ th = NULL;
+
+ start = pos & (PAGE_CACHE_SIZE - 1);
+ if (unlikely(copied < len)) {
+ if (!PageUptodate(page))
+ copied = 0;
+
+ page_zero_new_buffers(page, start + copied, start + len);
+ }
+ flush_dcache_page(page);
+
+ reiserfs_commit_page(inode, page, start, start + copied);
+ unlock_page(page);
+ mark_page_accessed(page);
+ page_cache_release(page);
+
+ /* generic_commit_write does this for us, but does not update the
+ ** transaction tracking stuff when the size changes. So, we have
+ ** to do the i_size updates here.
+ */
+ pos += copied;
+ if (pos > inode->i_size) {
+ struct reiserfs_transaction_handle myth;
+ reiserfs_write_lock(inode->i_sb);
+ /* If the file have grown beyond the border where it
+ can have a tail, unmark it as needing a tail
+ packing */
+ if ((have_large_tails(inode->i_sb)
+ && inode->i_size > i_block_size(inode) * 4)
+ || (have_small_tails(inode->i_sb)
+ && inode->i_size > i_block_size(inode)))
+ REISERFS_I(inode)->i_flags &= ~i_pack_on_close_mask;
+
+ ret = journal_begin(&myth, inode->i_sb, 1);
+ if (ret) {
+ reiserfs_write_unlock(inode->i_sb);
+ goto journal_error;
+ }
+ reiserfs_update_inode_transaction(inode);
+ inode->i_size = pos;
+ /*
+ * this will just nest into our transaction. It's important
+ * to use mark_inode_dirty so the inode gets pushed around on the
+ * dirty lists, and so that O_SYNC works as expected
+ */
+ mark_inode_dirty(inode);
+ reiserfs_update_sd(&myth, inode);
+ update_sd = 1;
+ ret = journal_end(&myth, inode->i_sb, 1);
+ reiserfs_write_unlock(inode->i_sb);
+ if (ret)
+ goto journal_error;
+ }
+ if (th) {
+ reiserfs_write_lock(inode->i_sb);
+ if (!update_sd)
+ mark_inode_dirty(inode);
+ ret = reiserfs_end_persistent_transaction(th);
+ reiserfs_write_unlock(inode->i_sb);
+ if (ret)
+ goto out;
+ }
+
+ out:
+ return ret == 0 ? copied : ret;
+
+ journal_error:
+ if (th) {
+ reiserfs_write_lock(inode->i_sb);
+ if (!update_sd)
+ reiserfs_update_sd(th, inode);
+ ret = reiserfs_end_persistent_transaction(th);
+ reiserfs_write_unlock(inode->i_sb);
+ }
+
+ return ret;
+}
+
static int reiserfs_commit_write(struct file *f, struct page *page,
unsigned from, unsigned to)
{
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 31/41] nfs convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (29 preceding siblings ...)
2007-05-14 6:06 ` [patch 30/41] reiserfs convert to new aops npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 32/41] smb " npiggin
` (10 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, trond.myklebust
[-- Attachment #1: fs-nfs-aops.patch --]
[-- Type: text/plain, Size: 2763 bytes --]
Cc: trond.myklebust@fys.uio.no
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/nfs/file.c | 49 ++++++++++++++++++++++++++++++++++++-------------
1 file changed, 36 insertions(+), 13 deletions(-)
Index: linux-2.6/fs/nfs/file.c
===================================================================
--- linux-2.6.orig/fs/nfs/file.c
+++ linux-2.6/fs/nfs/file.c
@@ -282,27 +282,50 @@ nfs_fsync(struct file *file, struct dent
}
/*
- * This does the "real" work of the write. The generic routine has
- * allocated the page, locked it, done all the page alignment stuff
- * calculations etc. Now we should just copy the data from user
- * space and write it back to the real medium..
+ * This does the "real" work of the write. We must allocate and lock the
+ * page to be sent back to the generic routine, which then copies the
+ * data from user space.
*
* If the writer ends up delaying the write, the writer needs to
* increment the page use counts until he is done with the page.
*/
-static int nfs_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to)
-{
- return nfs_flush_incompatible(file, page);
+static int nfs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ int ret;
+ pgoff_t index;
+ struct page *page;
+ index = pos >> PAGE_CACHE_SHIFT;
+
+ page = __grab_cache_page(mapping, index);
+ if (!page)
+ return -ENOMEM;
+ *pagep = page;
+
+ ret = nfs_flush_incompatible(file, page);
+ if (ret) {
+ unlock_page(page);
+ page_cache_release(page);
+ }
+ return ret;
}
-static int nfs_commit_write(struct file *file, struct page *page, unsigned offset, unsigned to)
+static int nfs_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
- long status;
+ unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
+ int status;
lock_kernel();
- status = nfs_updatepage(file, page, offset, to-offset);
+ status = nfs_updatepage(file, page, offset, copied);
unlock_kernel();
- return status;
+
+ unlock_page(page);
+ page_cache_release(page);
+
+ return status < 0 ? status : copied;
}
static void nfs_invalidate_page(struct page *page, unsigned long offset)
@@ -330,8 +353,8 @@ const struct address_space_operations nf
.set_page_dirty = nfs_set_page_dirty,
.writepage = nfs_writepage,
.writepages = nfs_writepages,
- .prepare_write = nfs_prepare_write,
- .commit_write = nfs_commit_write,
+ .write_begin = nfs_write_begin,
+ .write_end = nfs_write_end,
.invalidatepage = nfs_invalidate_page,
.releasepage = nfs_release_page,
#ifdef CONFIG_NFS_DIRECTIO
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 32/41] smb convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (30 preceding siblings ...)
2007-05-14 6:06 ` [patch 31/41] nfs " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 33/41] GFS2 " npiggin
` (9 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel
[-- Attachment #1: fs-smbfs-aops.patch --]
[-- Type: text/plain, Size: 1931 bytes --]
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/smbfs/file.c | 34 +++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)
Index: linux-2.6/fs/smbfs/file.c
===================================================================
--- linux-2.6.orig/fs/smbfs/file.c
+++ linux-2.6/fs/smbfs/file.c
@@ -290,29 +290,45 @@ out:
* If the writer ends up delaying the write, the writer needs to
* increment the page use counts until he is done with the page.
*/
-static int smb_prepare_write(struct file *file, struct page *page,
- unsigned offset, unsigned to)
-{
+static int smb_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+ *pagep = __grab_cache_page(mapping, index);
+ if (!*pagep)
+ return -ENOMEM;
return 0;
}
-static int smb_commit_write(struct file *file, struct page *page,
- unsigned offset, unsigned to)
+static int smb_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
int status;
+ unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
- status = -EFAULT;
lock_kernel();
- status = smb_updatepage(file, page, offset, to-offset);
+ status = smb_updatepage(file, page, offset, copied);
unlock_kernel();
+
+ if (!status) {
+ if (!PageUptodate(page) && copied == PAGE_CACHE_SIZE)
+ SetPageUptodate(page);
+ status = copied;
+ }
+
+ unlock_page(page);
+ page_cache_release(page);
+
return status;
}
const struct address_space_operations smb_file_aops = {
.readpage = smb_readpage,
.writepage = smb_writepage,
- .prepare_write = smb_prepare_write,
- .commit_write = smb_commit_write
+ .write_begin = smb_write_begin,
+ .write_end = smb_write_end,
};
/*
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 33/41] GFS2 convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (31 preceding siblings ...)
2007-05-14 6:06 ` [patch 32/41] smb " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 34/41] fuse " npiggin
` (8 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Steven Whitehouse
[-- Attachment #1: fs-gfs2-aops.patch --]
[-- Type: text/plain, Size: 9264 bytes --]
From: Steven Whitehouse <swhiteho@redhat.com>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
fs/gfs2/ops_address.c | 209 +++++++++++++++++++++++++++++---------------------
1 file changed, 125 insertions(+), 84 deletions(-)
Index: linux-2.6/fs/gfs2/ops_address.c
===================================================================
--- linux-2.6.orig/fs/gfs2/ops_address.c
+++ linux-2.6/fs/gfs2/ops_address.c
@@ -17,6 +17,7 @@
#include <linux/mpage.h>
#include <linux/fs.h>
#include <linux/writeback.h>
+#include <linux/swap.h>
#include <linux/gfs2_ondisk.h>
#include <linux/lm_interface.h>
@@ -348,45 +349,49 @@ out_unlock:
}
/**
- * gfs2_prepare_write - Prepare to write a page to a file
+ * gfs2_write_begin - Begin to write to a file
* @file: The file to write to
- * @page: The page which is to be prepared for writing
- * @from: From (byte range within page)
- * @to: To (byte range within page)
+ * @mapping: The mapping in which to write
+ * @pos: The file offset at which to start writing
+ * @len: Length of the write
+ * @flags: Various flags
+ * @pagep: Pointer to return the page
+ * @fsdata: Pointer to return fs data (unused by GFS2)
*
* Returns: errno
*/
-static int gfs2_prepare_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+static int gfs2_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- struct gfs2_inode *ip = GFS2_I(page->mapping->host);
- struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
+ struct gfs2_inode *ip = GFS2_I(mapping->host);
+ struct gfs2_sbd *sdp = GFS2_SB(mapping->host);
unsigned int data_blocks, ind_blocks, rblocks;
int alloc_required;
int error = 0;
- loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + from;
- loff_t end = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
struct gfs2_alloc *al;
- unsigned int write_len = to - from;
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+ unsigned from = pos & (PAGE_CACHE_SIZE - 1);
+ unsigned to = from + len;
+ struct page *page;
-
- gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_ATIME|LM_FLAG_TRY_1CB, &ip->i_gh);
+ gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, GL_ATIME, &ip->i_gh);
error = gfs2_glock_nq_atime(&ip->i_gh);
- if (unlikely(error)) {
- if (error == GLR_TRYFAILED) {
- unlock_page(page);
- error = AOP_TRUNCATED_PAGE;
- yield();
- }
+ if (unlikely(error))
goto out_uninit;
- }
- gfs2_write_calc_reserv(ip, write_len, &data_blocks, &ind_blocks);
+ error = -ENOMEM;
+ page = __grab_cache_page(mapping, index);
+ *pagep = page;
+ if (!page)
+ goto out_unlock;
+
+ gfs2_write_calc_reserv(ip, len, &data_blocks, &ind_blocks);
- error = gfs2_write_alloc_required(ip, pos, write_len, &alloc_required);
+ error = gfs2_write_alloc_required(ip, pos, len, &alloc_required);
if (error)
- goto out_unlock;
+ goto out_putpage;
ip->i_alloc.al_requested = 0;
@@ -418,7 +423,7 @@ static int gfs2_prepare_write(struct fil
goto out;
if (gfs2_is_stuffed(ip)) {
- if (end > sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode)) {
+ if (pos + len > sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode)) {
error = gfs2_unstuff_dinode(ip, page);
if (error == 0)
goto prepare_write;
@@ -440,6 +445,10 @@ out_qunlock:
out_alloc_put:
gfs2_alloc_put(ip);
}
+out_putpage:
+ page_cache_release(page);
+ if (pos + len > ip->i_inode.i_size)
+ vmtruncate(&ip->i_inode, ip->i_inode.i_size);
out_unlock:
gfs2_glock_dq_m(1, &ip->i_gh);
out_uninit:
@@ -450,96 +459,128 @@ out_uninit:
}
/**
- * gfs2_commit_write - Commit write to a file
+ * gfs2_stuffed_write_end - Write end for stuffed files
+ * @inode: The inode
+ * @dibh: The buffer_head containing the on-disk inode
+ * @pos: The file position
+ * @len: The length of the write
+ * @copied: How much was actually copied by the VFS
+ * @page: The page
+ *
+ * This copies the data from the page into the inode block after
+ * the inode data structure itself.
+ *
+ * Returns: errno
+ */
+static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+ u64 to = pos + copied;
+ void *kaddr;
+ unsigned char *buf = dibh->b_data + sizeof(struct gfs2_dinode);
+ struct gfs2_dinode *di = (struct gfs2_dinode *)dibh->b_data;
+
+ BUG_ON((pos + len) > (dibh->b_size - sizeof(struct gfs2_dinode)));
+ kaddr = kmap_atomic(page, KM_USER0);
+ memcpy(buf + pos, kaddr + pos, copied);
+ memset(kaddr + pos + copied, 0, len - copied);
+ flush_dcache_page(page);
+ kunmap_atomic(kaddr, KM_USER0);
+
+ if (!PageUptodate(page))
+ SetPageUptodate(page);
+ unlock_page(page);
+ mark_page_accessed(page);
+ page_cache_release(page);
+
+ if (inode->i_size < to) {
+ i_size_write(inode, to);
+ ip->i_di.di_size = inode->i_size;
+ di->di_size = cpu_to_be64(inode->i_size);
+ mark_inode_dirty(inode);
+ }
+
+ brelse(dibh);
+ gfs2_trans_end(sdp);
+ gfs2_glock_dq(&ip->i_gh);
+ gfs2_holder_uninit(&ip->i_gh);
+ return copied;
+}
+
+/**
+ * gfs2_write_end
* @file: The file to write to
- * @page: The page containing the data
- * @from: From (byte range within page)
- * @to: To (byte range within page)
+ * @mapping: The address space to write to
+ * @pos: The file position
+ * @len: The length of the data
+ * @copied:
+ * @page: The page that has been written
+ * @fsdata: The fsdata (unused in GFS2)
+ *
+ * The main write_end function for GFS2. We have a separate one for
+ * stuffed files as they are slightly different, otherwise we just
+ * put our locking around the VFS provided functions.
*
* Returns: errno
*/
-static int gfs2_commit_write(struct file *file, struct page *page,
- unsigned from, unsigned to)
+static int gfs2_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
struct inode *inode = page->mapping->host;
struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_sbd *sdp = GFS2_SB(inode);
- int error = -EOPNOTSUPP;
struct buffer_head *dibh;
struct gfs2_alloc *al = &ip->i_alloc;
struct gfs2_dinode *di;
+ unsigned int from = pos & (PAGE_CACHE_SIZE - 1);
+ unsigned int to = from + len;
+ int ret;
- if (gfs2_assert_withdraw(sdp, gfs2_glock_is_locked_by_me(ip->i_gl)))
- goto fail_nounlock;
+ BUG_ON(gfs2_glock_is_locked_by_me(ip->i_gl) == 0);
- error = gfs2_meta_inode_buffer(ip, &dibh);
- if (error)
- goto fail_endtrans;
+ ret = gfs2_meta_inode_buffer(ip, &dibh);
+ if (unlikely(ret)) {
+ unlock_page(page);
+ page_cache_release(page);
+ goto failed;
+ }
gfs2_trans_add_bh(ip->i_gl, dibh, 1);
- di = (struct gfs2_dinode *)dibh->b_data;
- if (gfs2_is_stuffed(ip)) {
- u64 file_size;
- void *kaddr;
-
- file_size = ((u64)page->index << PAGE_CACHE_SHIFT) + to;
+ if (gfs2_is_stuffed(ip))
+ return gfs2_stuffed_write_end(inode, dibh, pos, len, copied, page);
- kaddr = kmap_atomic(page, KM_USER0);
- memcpy(dibh->b_data + sizeof(struct gfs2_dinode) + from,
- kaddr + from, to - from);
- kunmap_atomic(kaddr, KM_USER0);
+ if (sdp->sd_args.ar_data == GFS2_DATA_ORDERED || gfs2_is_jdata(ip))
+ gfs2_page_add_databufs(ip, page, from, to);
- SetPageUptodate(page);
+ ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
- if (inode->i_size < file_size) {
- i_size_write(inode, file_size);
+ if (likely(ret >= 0)) {
+ copied = ret;
+ if ((pos + copied) > inode->i_size) {
+ di = (struct gfs2_dinode *)dibh->b_data;
+ ip->i_di.di_size = inode->i_size;
+ di->di_size = cpu_to_be64(inode->i_size);
mark_inode_dirty(inode);
}
- } else {
- if (sdp->sd_args.ar_data == GFS2_DATA_ORDERED ||
- gfs2_is_jdata(ip))
- gfs2_page_add_databufs(ip, page, from, to);
- error = generic_commit_write(file, page, from, to);
- if (error)
- goto fail;
}
- if (ip->i_di.di_size < inode->i_size) {
- ip->i_di.di_size = inode->i_size;
- di->di_size = cpu_to_be64(inode->i_size);
- }
-
- brelse(dibh);
- gfs2_trans_end(sdp);
- if (al->al_requested) {
- gfs2_inplace_release(ip);
- gfs2_quota_unlock(ip);
- gfs2_alloc_put(ip);
- }
- unlock_page(page);
- gfs2_glock_dq_m(1, &ip->i_gh);
- lock_page(page);
- gfs2_holder_uninit(&ip->i_gh);
- return 0;
-
-fail:
brelse(dibh);
-fail_endtrans:
gfs2_trans_end(sdp);
+failed:
if (al->al_requested) {
gfs2_inplace_release(ip);
gfs2_quota_unlock(ip);
gfs2_alloc_put(ip);
}
- unlock_page(page);
- gfs2_glock_dq_m(1, &ip->i_gh);
- lock_page(page);
+ gfs2_glock_dq(&ip->i_gh);
gfs2_holder_uninit(&ip->i_gh);
-fail_nounlock:
- ClearPageUptodate(page);
- return error;
+ return ret;
}
/**
@@ -808,8 +849,8 @@ const struct address_space_operations gf
.readpage = gfs2_readpage,
.readpages = gfs2_readpages,
.sync_page = block_sync_page,
- .prepare_write = gfs2_prepare_write,
- .commit_write = gfs2_commit_write,
+ .write_begin = gfs2_write_begin,
+ .write_end = gfs2_write_end,
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
.releasepage = gfs2_releasepage,
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 34/41] fuse convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (32 preceding siblings ...)
2007-05-14 6:06 ` [patch 33/41] GFS2 " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 35/41] hostfs " npiggin
` (7 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Miklos Szeredi
[-- Attachment #1: fs-fuse-aops.patch --]
[-- Type: text/plain, Size: 2988 bytes --]
[mszeredi]
- don't send zero length write requests
- it is not legal for the filesystem to return with zero written bytes
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
fs/fuse/file.c | 48 +++++++++++++++++++++++++++++++++---------------
1 file changed, 33 insertions(+), 15 deletions(-)
Index: linux-2.6/fs/fuse/file.c
===================================================================
--- linux-2.6.orig/fs/fuse/file.c
+++ linux-2.6/fs/fuse/file.c
@@ -443,22 +443,25 @@ static size_t fuse_send_write(struct fus
return outarg.size;
}
-static int fuse_prepare_write(struct file *file, struct page *page,
- unsigned offset, unsigned to)
-{
- /* No op */
+static int fuse_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+
+ *pagep = __grab_cache_page(mapping, index);
+ if (!*pagep)
+ return -ENOMEM;
return 0;
}
-static int fuse_commit_write(struct file *file, struct page *page,
- unsigned offset, unsigned to)
+static int fuse_buffered_write(struct file *file, struct inode *inode,
+ loff_t pos, unsigned count, struct page *page)
{
int err;
size_t nres;
- unsigned count = to - offset;
- struct inode *inode = page->mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
- loff_t pos = page_offset(page) + offset;
+ unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
struct fuse_req *req;
if (is_bad_inode(inode))
@@ -474,20 +477,35 @@ static int fuse_commit_write(struct file
nres = fuse_send_write(req, file, inode, pos, count);
err = req->out.h.error;
fuse_put_request(fc, req);
- if (!err && nres != count)
+ if (!err && !nres)
err = -EIO;
if (!err) {
- pos += count;
+ pos += nres;
spin_lock(&fc->lock);
if (pos > inode->i_size)
i_size_write(inode, pos);
spin_unlock(&fc->lock);
- if (offset == 0 && to == PAGE_CACHE_SIZE)
+ if (count == PAGE_CACHE_SIZE)
SetPageUptodate(page);
}
fuse_invalidate_attr(inode);
- return err;
+ return err ? err : nres;
+}
+
+static int fuse_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ struct inode *inode = mapping->host;
+ int res = 0;
+
+ if (copied)
+ res = fuse_buffered_write(file, inode, pos, copied, page);
+
+ unlock_page(page);
+ page_cache_release(page);
+ return res;
}
static void fuse_release_user_pages(struct fuse_req *req, int write)
@@ -816,8 +834,8 @@ static const struct file_operations fuse
static const struct address_space_operations fuse_file_aops = {
.readpage = fuse_readpage,
- .prepare_write = fuse_prepare_write,
- .commit_write = fuse_commit_write,
+ .write_begin = fuse_write_begin,
+ .write_end = fuse_write_end,
.readpages = fuse_readpages,
.set_page_dirty = fuse_set_page_dirty,
.bmap = fuse_bmap,
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 35/41] hostfs convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (33 preceding siblings ...)
2007-05-14 6:06 ` [patch 34/41] fuse " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 36/41] jffs2 " npiggin
` (6 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Jeff Dike
[-- Attachment #1: fs-hostfs-aops.patch --]
[-- Type: text/plain, Size: 3317 bytes --]
This also gets rid of a lot of useless read_file stuff. And also
optimises the full page write case by marking a !uptodate page uptodate.
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/hostfs/hostfs_kern.c | 70 +++++++++++++++++++-----------------------------
1 file changed, 28 insertions(+), 42 deletions(-)
Index: linux-2.6/fs/hostfs/hostfs_kern.c
===================================================================
--- linux-2.6.orig/fs/hostfs/hostfs_kern.c
+++ linux-2.6/fs/hostfs/hostfs_kern.c
@@ -466,56 +466,42 @@ int hostfs_readpage(struct file *file, s
return err;
}
-int hostfs_prepare_write(struct file *file, struct page *page,
- unsigned int from, unsigned int to)
+int hostfs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- char *buffer;
- long long start, tmp;
- int err;
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
- start = (long long) page->index << PAGE_CACHE_SHIFT;
- buffer = kmap(page);
- if(from != 0){
- tmp = start;
- err = read_file(FILE_HOSTFS_I(file)->fd, &tmp, buffer,
- from);
- if(err < 0) goto out;
- }
- if(to != PAGE_CACHE_SIZE){
- start += to;
- err = read_file(FILE_HOSTFS_I(file)->fd, &start, buffer + to,
- PAGE_CACHE_SIZE - to);
- if(err < 0) goto out;
- }
- err = 0;
- out:
- kunmap(page);
- return err;
+ *pagep = __grab_cache_page(mapping, index);
+ if (!*pagep)
+ return -ENOMEM;
+ return 0;
}
-int hostfs_commit_write(struct file *file, struct page *page, unsigned from,
- unsigned to)
+int hostfs_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
- struct address_space *mapping = page->mapping;
struct inode *inode = mapping->host;
- char *buffer;
- long long start;
- int err = 0;
+ void *buffer;
+ unsigned from = pos & (PAGE_CACHE_SIZE - 1);
+ int err;
- start = (((long long) page->index) << PAGE_CACHE_SHIFT) + from;
buffer = kmap(page);
- err = write_file(FILE_HOSTFS_I(file)->fd, &start, buffer + from,
- to - from);
- if(err > 0) err = 0;
-
- /* Actually, if !err, write_file has added to-from to start, so, despite
- * the appearance, we are comparing i_size against the _last_ written
- * location, as we should. */
+ err = write_file(FILE_HOSTFS_I(file)->fd, &pos, buffer + from, copied);
+ kunmap(page);
- if(!err && (start > inode->i_size))
- inode->i_size = start;
+ if (!PageUptodate(page) && err == PAGE_CACHE_SIZE)
+ SetPageUptodate(page);
+ unlock_page(page);
+ page_cache_release(page);
+
+ /* If err > 0, write_file has added err to pos, so we are comparing
+ * i_size against the last byte written.
+ */
+ if (err > 0 && (pos > inode->i_size))
+ inode->i_size = pos;
- kunmap(page);
return err;
}
@@ -523,8 +509,8 @@ static const struct address_space_operat
.writepage = hostfs_writepage,
.readpage = hostfs_readpage,
.set_page_dirty = __set_page_dirty_nobuffers,
- .prepare_write = hostfs_prepare_write,
- .commit_write = hostfs_commit_write
+ .write_begin = hostfs_write_begin,
+ .write_end = hostfs_write_end,
};
static int init_inode(struct inode *inode, struct dentry *dentry)
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 36/41] jffs2 convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (34 preceding siblings ...)
2007-05-14 6:06 ` [patch 35/41] hostfs " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 37/41] ufs " npiggin
` (5 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, dwmw2, jffs-dev
[-- Attachment #1: fs-jffs2-aops.patch --]
[-- Type: text/plain, Size: 7474 bytes --]
Cc: dwmw2@infradead.org
Cc: jffs-dev@axis.com
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/jffs2/file.c | 105 +++++++++++++++++++++++++++++++++++---------------------
1 file changed, 66 insertions(+), 39 deletions(-)
Index: linux-2.6/fs/jffs2/file.c
===================================================================
--- linux-2.6.orig/fs/jffs2/file.c
+++ linux-2.6/fs/jffs2/file.c
@@ -19,10 +19,12 @@
#include <linux/jffs2.h>
#include "nodelist.h"
-static int jffs2_commit_write (struct file *filp, struct page *pg,
- unsigned start, unsigned end);
-static int jffs2_prepare_write (struct file *filp, struct page *pg,
- unsigned start, unsigned end);
+static int jffs2_write_end(struct file *filp, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *pg, void *fsdata);
+static int jffs2_write_begin(struct file *filp, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata);
static int jffs2_readpage (struct file *filp, struct page *pg);
int jffs2_fsync(struct file *filp, struct dentry *dentry, int datasync)
@@ -65,8 +67,8 @@ const struct inode_operations jffs2_file
const struct address_space_operations jffs2_file_address_operations =
{
.readpage = jffs2_readpage,
- .prepare_write =jffs2_prepare_write,
- .commit_write = jffs2_commit_write
+ .write_begin = jffs2_write_begin,
+ .write_end = jffs2_write_end,
};
static int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg)
@@ -119,15 +121,23 @@ static int jffs2_readpage (struct file *
return ret;
}
-static int jffs2_prepare_write (struct file *filp, struct page *pg,
- unsigned start, unsigned end)
+static int jffs2_write_begin(struct file *filp, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- struct inode *inode = pg->mapping->host;
+ struct page *pg;
+ struct inode *inode = mapping->host;
struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
- uint32_t pageofs = pg->index << PAGE_CACHE_SHIFT;
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+ uint32_t pageofs = pos & (PAGE_CACHE_SIZE - 1);
int ret = 0;
- D1(printk(KERN_DEBUG "jffs2_prepare_write()\n"));
+ pg = __grab_cache_page(mapping, index);
+ if (!pg)
+ return -ENOMEM;
+ *pagep = pg;
+
+ D1(printk(KERN_DEBUG "jffs2_write_begin()\n"));
if (pageofs > inode->i_size) {
/* Make new hole frag from old EOF to new page */
@@ -142,7 +152,7 @@ static int jffs2_prepare_write (struct f
ret = jffs2_reserve_space(c, sizeof(ri), &alloc_len,
ALLOC_NORMAL, JFFS2_SUMMARY_INODE_SIZE);
if (ret)
- return ret;
+ goto out_page;
down(&f->sem);
memset(&ri, 0, sizeof(ri));
@@ -172,7 +182,7 @@ static int jffs2_prepare_write (struct f
ret = PTR_ERR(fn);
jffs2_complete_reservation(c);
up(&f->sem);
- return ret;
+ goto out_page;
}
ret = jffs2_add_full_dnode_to_inode(c, f, fn);
if (f->metadata) {
@@ -181,65 +191,79 @@ static int jffs2_prepare_write (struct f
f->metadata = NULL;
}
if (ret) {
- D1(printk(KERN_DEBUG "Eep. add_full_dnode_to_inode() failed in prepare_write, returned %d\n", ret));
+ D1(printk(KERN_DEBUG "Eep. add_full_dnode_to_inode() failed in write_begin, returned %d\n", ret));
jffs2_mark_node_obsolete(c, fn->raw);
jffs2_free_full_dnode(fn);
jffs2_complete_reservation(c);
up(&f->sem);
- return ret;
+ goto out_page;
}
jffs2_complete_reservation(c);
inode->i_size = pageofs;
up(&f->sem);
}
- /* Read in the page if it wasn't already present, unless it's a whole page */
- if (!PageUptodate(pg) && (start || end < PAGE_CACHE_SIZE)) {
+ /*
+ * Read in the page if it wasn't already present. Cannot optimize away
+ * the whole page write case until jffs2_write_end can handle the
+ * case of a short-copy.
+ */
+ if (!PageUptodate(pg)) {
down(&f->sem);
ret = jffs2_do_readpage_nolock(inode, pg);
up(&f->sem);
+ if (ret)
+ goto out_page;
}
- D1(printk(KERN_DEBUG "end prepare_write(). pg->flags %lx\n", pg->flags));
+ D1(printk(KERN_DEBUG "end write_begin(). pg->flags %lx\n", pg->flags));
+ return ret;
+
+out_page:
+ unlock_page(pg);
+ page_cache_release(pg);
return ret;
}
-static int jffs2_commit_write (struct file *filp, struct page *pg,
- unsigned start, unsigned end)
+static int jffs2_write_end(struct file *filp, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *pg, void *fsdata)
{
/* Actually commit the write from the page cache page we're looking at.
* For now, we write the full page out each time. It sucks, but it's simple
*/
- struct inode *inode = pg->mapping->host;
+ struct inode *inode = mapping->host;
struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb);
struct jffs2_raw_inode *ri;
+ unsigned start = pos & (PAGE_CACHE_SIZE - 1);
+ unsigned end = start + copied;
unsigned aligned_start = start & ~3;
int ret = 0;
uint32_t writtenlen = 0;
- D1(printk(KERN_DEBUG "jffs2_commit_write(): ino #%lu, page at 0x%lx, range %d-%d, flags %lx\n",
+ D1(printk(KERN_DEBUG "jffs2_write_end(): ino #%lu, page at 0x%lx, range %d-%d, flags %lx\n",
inode->i_ino, pg->index << PAGE_CACHE_SHIFT, start, end, pg->flags));
+ /* We need to avoid deadlock with page_cache_read() in
+ jffs2_garbage_collect_pass(). So the page must be
+ up to date to prevent page_cache_read() from trying
+ to re-lock it. */
+ BUG_ON(!PageUptodate(pg));
+
if (end == PAGE_CACHE_SIZE) {
- if (!start) {
- /* We need to avoid deadlock with page_cache_read() in
- jffs2_garbage_collect_pass(). So we have to mark the
- page up to date, to prevent page_cache_read() from
- trying to re-lock it. */
- SetPageUptodate(pg);
- } else {
- /* When writing out the end of a page, write out the
- _whole_ page. This helps to reduce the number of
- nodes in files which have many short writes, like
- syslog files. */
- start = aligned_start = 0;
- }
+ /* When writing out the end of a page, write out the
+ _whole_ page. This helps to reduce the number of
+ nodes in files which have many short writes, like
+ syslog files. */
+ start = aligned_start = 0;
}
ri = jffs2_alloc_raw_inode();
if (!ri) {
- D1(printk(KERN_DEBUG "jffs2_commit_write(): Allocation of raw inode failed\n"));
+ D1(printk(KERN_DEBUG "jffs2_write_end(): Allocation of raw inode failed\n"));
+ unlock_page(pg);
+ page_cache_release(pg);
return -ENOMEM;
}
@@ -287,11 +311,14 @@ static int jffs2_commit_write (struct fi
/* generic_file_write has written more to the page cache than we've
actually written to the medium. Mark the page !Uptodate so that
it gets reread */
- D1(printk(KERN_DEBUG "jffs2_commit_write(): Not all bytes written. Marking page !uptodate\n"));
+ D1(printk(KERN_DEBUG "jffs2_write_end(): Not all bytes written. Marking page !uptodate\n"));
SetPageError(pg);
ClearPageUptodate(pg);
}
- D1(printk(KERN_DEBUG "jffs2_commit_write() returning %d\n",start+writtenlen==end?0:ret));
- return start+writtenlen==end?0:ret;
+ D1(printk(KERN_DEBUG "jffs2_write_end() returning %d\n",
+ writtenlen > 0 ? writtenlen : ret));
+ unlock_page(pg);
+ page_cache_release(pg);
+ return writtenlen > 0 ? writtenlen : ret;
}
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 37/41] ufs convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (35 preceding siblings ...)
2007-05-14 6:06 ` [patch 36/41] jffs2 " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 38/41] udf " npiggin
` (4 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, dushistov
[-- Attachment #1: fs-ufs-aops.patch --]
[-- Type: text/plain, Size: 6386 bytes --]
Cc: dushistov@mail.ru
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/ufs/dir.c | 50 +++++++++++++++++++++++++++++++-------------------
fs/ufs/inode.c | 23 +++++++++++++++++++----
2 files changed, 50 insertions(+), 23 deletions(-)
Index: linux-2.6/fs/ufs/inode.c
===================================================================
--- linux-2.6.orig/fs/ufs/inode.c
+++ linux-2.6/fs/ufs/inode.c
@@ -558,24 +558,39 @@ static int ufs_writepage(struct page *pa
{
return block_write_full_page(page,ufs_getfrag_block,wbc);
}
+
static int ufs_readpage(struct file *file, struct page *page)
{
return block_read_full_page(page,ufs_getfrag_block);
}
-static int ufs_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to)
+
+int __ufs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- return block_prepare_write(page,from,to,ufs_getfrag_block);
+ return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ ufs_getfrag_block);
}
+
+static int ufs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ *pagep = NULL;
+ return __ufs_write_begin(file, mapping, pos, len, flags, pagep, fsdata);
+}
+
static sector_t ufs_bmap(struct address_space *mapping, sector_t block)
{
return generic_block_bmap(mapping,block,ufs_getfrag_block);
}
+
const struct address_space_operations ufs_aops = {
.readpage = ufs_readpage,
.writepage = ufs_writepage,
.sync_page = block_sync_page,
- .prepare_write = ufs_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = ufs_write_begin,
+ .write_end = generic_write_end,
.bmap = ufs_bmap
};
Index: linux-2.6/fs/ufs/dir.c
===================================================================
--- linux-2.6.orig/fs/ufs/dir.c
+++ linux-2.6/fs/ufs/dir.c
@@ -38,12 +38,14 @@ static inline int ufs_match(struct super
return !memcmp(name, de->d_name, len);
}
-static int ufs_commit_chunk(struct page *page, unsigned from, unsigned to)
+static int ufs_commit_chunk(struct page *page, loff_t pos, unsigned len)
{
- struct inode *dir = page->mapping->host;
+ struct address_space *mapping = page->mapping;
+ struct inode *dir = mapping->host;
int err = 0;
+
dir->i_version++;
- page->mapping->a_ops->commit_write(NULL, page, from, to);
+ block_write_end(NULL, mapping, pos, len, len, page, NULL);
if (IS_DIRSYNC(dir))
err = write_one_page(page, 1);
else
@@ -81,16 +83,20 @@ ino_t ufs_inode_by_name(struct inode *di
void ufs_set_link(struct inode *dir, struct ufs_dir_entry *de,
struct page *page, struct inode *inode)
{
- unsigned from = (char *) de - (char *) page_address(page);
- unsigned to = from + fs16_to_cpu(dir->i_sb, de->d_reclen);
+ loff_t pos = (page->index << PAGE_CACHE_SHIFT) +
+ (char *) de - (char *) page_address(page);
+ unsigned len = fs16_to_cpu(dir->i_sb, de->d_reclen);
int err;
lock_page(page);
- err = page->mapping->a_ops->prepare_write(NULL, page, from, to);
+ err = __ufs_write_begin(NULL, page->mapping, pos, len,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
BUG_ON(err);
+
de->d_ino = cpu_to_fs32(dir->i_sb, inode->i_ino);
ufs_set_de_type(dir->i_sb, de, inode->i_mode);
- err = ufs_commit_chunk(page, from, to);
+
+ err = ufs_commit_chunk(page, pos, len);
ufs_put_page(page);
dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(dir);
@@ -312,7 +318,7 @@ int ufs_add_link(struct dentry *dentry,
unsigned long npages = ufs_dir_pages(dir);
unsigned long n;
char *kaddr;
- unsigned from, to;
+ loff_t pos;
int err;
UFSD("ENTER, name %s, namelen %u\n", name, namelen);
@@ -367,9 +373,10 @@ int ufs_add_link(struct dentry *dentry,
return -EINVAL;
got_it:
- from = (char*)de - (char*)page_address(page);
- to = from + rec_len;
- err = page->mapping->a_ops->prepare_write(NULL, page, from, to);
+ pos = (page->index << PAGE_CACHE_SHIFT) +
+ (char*)de - (char*)page_address(page);
+ err = __ufs_write_begin(NULL, page->mapping, pos, rec_len,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
if (err)
goto out_unlock;
if (de->d_ino) {
@@ -386,7 +393,7 @@ got_it:
de->d_ino = cpu_to_fs32(sb, inode->i_ino);
ufs_set_de_type(sb, de, inode->i_mode);
- err = ufs_commit_chunk(page, from, to);
+ err = ufs_commit_chunk(page, pos, rec_len);
dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(dir);
@@ -509,6 +516,7 @@ int ufs_delete_entry(struct inode *inode
char *kaddr = page_address(page);
unsigned from = ((char*)dir - kaddr) & ~(UFS_SB(sb)->s_uspi->s_dirblksize - 1);
unsigned to = ((char*)dir - kaddr) + fs16_to_cpu(sb, dir->d_reclen);
+ loff_t pos;
struct ufs_dir_entry *pde = NULL;
struct ufs_dir_entry *de = (struct ufs_dir_entry *) (kaddr + from);
int err;
@@ -532,13 +540,16 @@ int ufs_delete_entry(struct inode *inode
}
if (pde)
from = (char*)pde - (char*)page_address(page);
+
+ pos = (page->index << PAGE_CACHE_SHIFT) + from;
lock_page(page);
- err = mapping->a_ops->prepare_write(NULL, page, from, to);
+ err = __ufs_write_begin(NULL, mapping, pos, to - from,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
BUG_ON(err);
if (pde)
- pde->d_reclen = cpu_to_fs16(sb, to-from);
+ pde->d_reclen = cpu_to_fs16(sb, to - from);
dir->d_ino = 0;
- err = ufs_commit_chunk(page, from, to);
+ err = ufs_commit_chunk(page, pos, to - from);
inode->i_ctime = inode->i_mtime = CURRENT_TIME_SEC;
mark_inode_dirty(inode);
out:
@@ -559,14 +570,15 @@ int ufs_make_empty(struct inode * inode,
if (!page)
return -ENOMEM;
- kmap(page);
- err = mapping->a_ops->prepare_write(NULL, page, 0, chunk_size);
+
+ err = __ufs_write_begin(NULL, mapping, 0, chunk_size,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
if (err) {
unlock_page(page);
goto fail;
}
-
+ kmap(page);
base = (char*)page_address(page);
memset(base, 0, PAGE_CACHE_SIZE);
@@ -584,10 +596,10 @@ int ufs_make_empty(struct inode * inode,
de->d_reclen = cpu_to_fs16(sb, chunk_size - UFS_DIR_REC_LEN(1));
ufs_set_de_namlen(sb, de, 2);
strcpy (de->d_name, "..");
+ kunmap(page);
err = ufs_commit_chunk(page, 0, chunk_size);
fail:
- kunmap(page);
page_cache_release(page);
return err;
}
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 38/41] udf convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (36 preceding siblings ...)
2007-05-14 6:06 ` [patch 37/41] ufs " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 39/41] sysv " npiggin
` (3 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, bfennema
[-- Attachment #1: fs-udf-aops.patch --]
[-- Type: text/plain, Size: 3402 bytes --]
Convert udf to new aops. Also seem to have fixed pagecache corruption in
udf_adinicb_commit_write -- page was marked uptodate when it is not. Also,
fixed the silly setup where prepare_write was doing a kmap to be used in
commit_write: just do kmap_atomic in write_end. Use libfs helpers to make
this easier.
Cc: bfennema@falcon.csc.calpoly.edu
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/udf/file.c | 32 +++++++++++++-------------------
fs/udf/inode.c | 11 +++++++----
2 files changed, 20 insertions(+), 23 deletions(-)
Index: linux-2.6/fs/udf/file.c
===================================================================
--- linux-2.6.orig/fs/udf/file.c
+++ linux-2.6/fs/udf/file.c
@@ -73,34 +73,28 @@ static int udf_adinicb_writepage(struct
return 0;
}
-static int udf_adinicb_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to)
+static int udf_adinicb_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
{
- kmap(page);
- return 0;
-}
-
-static int udf_adinicb_commit_write(struct file *file, struct page *page, unsigned offset, unsigned to)
-{
- struct inode *inode = page->mapping->host;
- char *kaddr = page_address(page);
+ struct inode *inode = mapping->host;
+ unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
+ char *kaddr;
+ kaddr = kmap_atomic(page, KM_USER0);
memcpy(UDF_I_DATA(inode) + UDF_I_LENEATTR(inode) + offset,
- kaddr + offset, to - offset);
- mark_inode_dirty(inode);
- SetPageUptodate(page);
- kunmap(page);
- /* only one page here */
- if (to > inode->i_size)
- inode->i_size = to;
- return 0;
+ kaddr + offset, copied);
+ kunmap_atomic(kaddr, KM_USER0);
+
+ return simple_write_end(file, mapping, pos, len, copied, page, fsdata);
}
const struct address_space_operations udf_adinicb_aops = {
.readpage = udf_adinicb_readpage,
.writepage = udf_adinicb_writepage,
.sync_page = block_sync_page,
- .prepare_write = udf_adinicb_prepare_write,
- .commit_write = udf_adinicb_commit_write,
+ .write_begin = simple_write_begin,
+ .write_end = udf_adinicb_write_end,
};
static ssize_t udf_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
Index: linux-2.6/fs/udf/inode.c
===================================================================
--- linux-2.6.orig/fs/udf/inode.c
+++ linux-2.6/fs/udf/inode.c
@@ -122,9 +122,12 @@ static int udf_readpage(struct file *fil
return block_read_full_page(page, udf_get_block);
}
-static int udf_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to)
+static int udf_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- return block_prepare_write(page, from, to, udf_get_block);
+ return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ udf_get_block);
}
static sector_t udf_bmap(struct address_space *mapping, sector_t block)
@@ -136,8 +139,8 @@ const struct address_space_operations ud
.readpage = udf_readpage,
.writepage = udf_writepage,
.sync_page = block_sync_page,
- .prepare_write = udf_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = udf_write_begin,
+ .write_end = generic_write_end,
.bmap = udf_bmap,
};
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 39/41] sysv convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (37 preceding siblings ...)
2007-05-14 6:06 ` [patch 38/41] udf " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:06 ` [patch 40/41] minix " npiggin
` (2 subsequent siblings)
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, hch
[-- Attachment #1: fs-sysv-aops.patch --]
[-- Type: text/plain, Size: 6120 bytes --]
Cc: hch@infradead.org
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/sysv/dir.c | 45 +++++++++++++++++++++++++--------------------
fs/sysv/itree.c | 23 +++++++++++++++++++----
2 files changed, 44 insertions(+), 24 deletions(-)
Index: linux-2.6/fs/sysv/itree.c
===================================================================
--- linux-2.6.orig/fs/sysv/itree.c
+++ linux-2.6/fs/sysv/itree.c
@@ -453,23 +453,38 @@ static int sysv_writepage(struct page *p
{
return block_write_full_page(page,get_block,wbc);
}
+
static int sysv_readpage(struct file *file, struct page *page)
{
return block_read_full_page(page,get_block);
}
-static int sysv_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to)
+
+int __sysv_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- return block_prepare_write(page,from,to,get_block);
+ return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ get_block);
}
+
+static int sysv_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ *pagep = NULL;
+ return __sysv_write_begin(file, mapping, pos, len, flags, pagep, fsdata);
+}
+
static sector_t sysv_bmap(struct address_space *mapping, sector_t block)
{
return generic_block_bmap(mapping,block,get_block);
}
+
const struct address_space_operations sysv_aops = {
.readpage = sysv_readpage,
.writepage = sysv_writepage,
.sync_page = block_sync_page,
- .prepare_write = sysv_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = sysv_write_begin,
+ .write_end = generic_write_end,
.bmap = sysv_bmap
};
Index: linux-2.6/fs/sysv/dir.c
===================================================================
--- linux-2.6.orig/fs/sysv/dir.c
+++ linux-2.6/fs/sysv/dir.c
@@ -37,12 +37,13 @@ static inline unsigned long dir_pages(st
return (inode->i_size+PAGE_CACHE_SIZE-1)>>PAGE_CACHE_SHIFT;
}
-static int dir_commit_chunk(struct page *page, unsigned from, unsigned to)
+static int dir_commit_chunk(struct page *page, loff_t pos, unsigned len)
{
- struct inode *dir = (struct inode *)page->mapping->host;
+ struct address_space *mapping = page->mapping;
+ struct inode *dir = mapping->host;
int err = 0;
- page->mapping->a_ops->commit_write(NULL, page, from, to);
+ block_write_end(NULL, mapping, pos, len, len, page, NULL);
if (IS_DIRSYNC(dir))
err = write_one_page(page, 1);
else
@@ -186,7 +187,7 @@ int sysv_add_link(struct dentry *dentry,
unsigned long npages = dir_pages(dir);
unsigned long n;
char *kaddr;
- unsigned from, to;
+ loff_t pos;
int err;
/* We take care of directory expansion in the same loop */
@@ -212,16 +213,17 @@ int sysv_add_link(struct dentry *dentry,
return -EINVAL;
got_it:
- from = (char*)de - (char*)page_address(page);
- to = from + SYSV_DIRSIZE;
+ pos = (page->index << PAGE_CACHE_SHIFT) +
+ (char*)de - (char*)page_address(page);
lock_page(page);
- err = page->mapping->a_ops->prepare_write(NULL, page, from, to);
+ err = __sysv_write_begin(NULL, page->mapping, pos, SYSV_DIRSIZE,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
if (err)
goto out_unlock;
memcpy (de->name, name, namelen);
memset (de->name + namelen, 0, SYSV_DIRSIZE - namelen - 2);
de->inode = cpu_to_fs16(SYSV_SB(inode->i_sb), inode->i_ino);
- err = dir_commit_chunk(page, from, to);
+ err = dir_commit_chunk(page, pos, SYSV_DIRSIZE);
dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(dir);
out_page:
@@ -238,15 +240,15 @@ int sysv_delete_entry(struct sysv_dir_en
struct address_space *mapping = page->mapping;
struct inode *inode = (struct inode*)mapping->host;
char *kaddr = (char*)page_address(page);
- unsigned from = (char*)de - kaddr;
- unsigned to = from + SYSV_DIRSIZE;
+ loff_t pos = (page->index << PAGE_CACHE_SHIFT) + (char *)de - kaddr;
int err;
lock_page(page);
- err = mapping->a_ops->prepare_write(NULL, page, from, to);
+ err = __sysv_write_begin(NULL, mapping, pos, SYSV_DIRSIZE,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
BUG_ON(err);
de->inode = 0;
- err = dir_commit_chunk(page, from, to);
+ err = dir_commit_chunk(page, pos, SYSV_DIRSIZE);
dir_put_page(page);
inode->i_ctime = inode->i_mtime = CURRENT_TIME_SEC;
mark_inode_dirty(inode);
@@ -263,12 +265,13 @@ int sysv_make_empty(struct inode *inode,
if (!page)
return -ENOMEM;
- kmap(page);
- err = mapping->a_ops->prepare_write(NULL, page, 0, 2 * SYSV_DIRSIZE);
+ err = __sysv_write_begin(NULL, mapping, 0, 2 * SYSV_DIRSIZE,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
if (err) {
unlock_page(page);
goto fail;
}
+ kmap(page);
base = (char*)page_address(page);
memset(base, 0, PAGE_CACHE_SIZE);
@@ -280,9 +283,9 @@ int sysv_make_empty(struct inode *inode,
de->inode = cpu_to_fs16(SYSV_SB(inode->i_sb), dir->i_ino);
strcpy(de->name,"..");
+ kunmap(page);
err = dir_commit_chunk(page, 0, 2 * SYSV_DIRSIZE);
fail:
- kunmap(page);
page_cache_release(page);
return err;
}
@@ -336,16 +339,18 @@ not_empty:
void sysv_set_link(struct sysv_dir_entry *de, struct page *page,
struct inode *inode)
{
- struct inode *dir = (struct inode*)page->mapping->host;
- unsigned from = (char *)de-(char*)page_address(page);
- unsigned to = from + SYSV_DIRSIZE;
+ struct address_space *mapping = page->mapping;
+ struct inode *dir = mapping->host;
+ loff_t pos = (page->index << PAGE_CACHE_SHIFT) +
+ (char *)de-(char*)page_address(page);
int err;
lock_page(page);
- err = page->mapping->a_ops->prepare_write(NULL, page, from, to);
+ err = __sysv_write_begin(NULL, mapping, pos, SYSV_DIRSIZE,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
BUG_ON(err);
de->inode = cpu_to_fs16(SYSV_SB(inode->i_sb), inode->i_ino);
- err = dir_commit_chunk(page, from, to);
+ err = dir_commit_chunk(page, pos, SYSV_DIRSIZE);
dir_put_page(page);
dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(dir);
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 40/41] minix convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (38 preceding siblings ...)
2007-05-14 6:06 ` [patch 39/41] sysv " npiggin
@ 2007-05-14 6:06 ` npiggin
2007-05-14 6:07 ` [patch 41/41] jfs " npiggin
[not found] ` <200705142014.27237.vs@namesys.com>
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, Andries Brouwer
[-- Attachment #1: fs-minix-aops.patch --]
[-- Type: text/plain, Size: 5827 bytes --]
Cc: Andries Brouwer <Andries.Brouwer@cwi.nl>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/minix/dir.c | 43 +++++++++++++++++++++++++------------------
fs/minix/inode.c | 23 +++++++++++++++++++----
2 files changed, 44 insertions(+), 22 deletions(-)
Index: linux-2.6/fs/minix/inode.c
===================================================================
--- linux-2.6.orig/fs/minix/inode.c
+++ linux-2.6/fs/minix/inode.c
@@ -347,24 +347,39 @@ static int minix_writepage(struct page *
{
return block_write_full_page(page, minix_get_block, wbc);
}
+
static int minix_readpage(struct file *file, struct page *page)
{
return block_read_full_page(page,minix_get_block);
}
-static int minix_prepare_write(struct file *file, struct page *page, unsigned from, unsigned to)
+
+int __minix_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
{
- return block_prepare_write(page,from,to,minix_get_block);
+ return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ minix_get_block);
}
+
+static int minix_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ *pagep = NULL;
+ return __minix_write_begin(file, mapping, pos, len, flags, pagep, fsdata);
+}
+
static sector_t minix_bmap(struct address_space *mapping, sector_t block)
{
return generic_block_bmap(mapping,block,minix_get_block);
}
+
static const struct address_space_operations minix_aops = {
.readpage = minix_readpage,
.writepage = minix_writepage,
.sync_page = block_sync_page,
- .prepare_write = minix_prepare_write,
- .commit_write = generic_commit_write,
+ .write_begin = minix_write_begin,
+ .write_end = generic_write_end,
.bmap = minix_bmap
};
Index: linux-2.6/fs/minix/dir.c
===================================================================
--- linux-2.6.orig/fs/minix/dir.c
+++ linux-2.6/fs/minix/dir.c
@@ -9,6 +9,7 @@
*/
#include "minix.h"
+#include <linux/buffer_head.h>
#include <linux/highmem.h>
#include <linux/smp_lock.h>
@@ -48,11 +49,12 @@ static inline unsigned long dir_pages(st
return (inode->i_size+PAGE_CACHE_SIZE-1)>>PAGE_CACHE_SHIFT;
}
-static int dir_commit_chunk(struct page *page, unsigned from, unsigned to)
+static int dir_commit_chunk(struct page *page, loff_t pos, unsigned len)
{
- struct inode *dir = (struct inode *)page->mapping->host;
+ struct address_space *mapping = page->mapping;
+ struct inode *dir = mapping->host;
int err = 0;
- page->mapping->a_ops->commit_write(NULL, page, from, to);
+ block_write_end(NULL, mapping, pos, len, len, page, NULL);
if (IS_DIRSYNC(dir))
err = write_one_page(page, 1);
else
@@ -220,7 +222,7 @@ int minix_add_link(struct dentry *dentry
char *kaddr, *p;
minix_dirent *de;
minix3_dirent *de3;
- unsigned from, to;
+ loff_t pos;
int err;
char *namx = NULL;
__u32 inumber;
@@ -272,9 +274,9 @@ int minix_add_link(struct dentry *dentry
return -EINVAL;
got_it:
- from = p - (char*)page_address(page);
- to = from + sbi->s_dirsize;
- err = page->mapping->a_ops->prepare_write(NULL, page, from, to);
+ pos = (page->index >> PAGE_CACHE_SHIFT) + p - (char*)page_address(page);
+ err = __minix_write_begin(NULL, page->mapping, pos, sbi->s_dirsize,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
if (err)
goto out_unlock;
memcpy (namx, name, namelen);
@@ -285,7 +287,7 @@ got_it:
memset (namx + namelen, 0, sbi->s_dirsize - namelen - 2);
de->inode = inode->i_ino;
}
- err = dir_commit_chunk(page, from, to);
+ err = dir_commit_chunk(page, pos, sbi->s_dirsize);
dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(dir);
out_put:
@@ -302,15 +304,16 @@ int minix_delete_entry(struct minix_dir_
struct address_space *mapping = page->mapping;
struct inode *inode = (struct inode*)mapping->host;
char *kaddr = page_address(page);
- unsigned from = (char*)de - kaddr;
- unsigned to = from + minix_sb(inode->i_sb)->s_dirsize;
+ loff_t pos = (page->index << PAGE_CACHE_SHIFT) + (char*)de - kaddr;
+ unsigned len = minix_sb(inode->i_sb)->s_dirsize;
int err;
lock_page(page);
- err = mapping->a_ops->prepare_write(NULL, page, from, to);
+ err = __minix_write_begin(NULL, mapping, pos, len,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
if (err == 0) {
de->inode = 0;
- err = dir_commit_chunk(page, from, to);
+ err = dir_commit_chunk(page, pos, len);
} else {
unlock_page(page);
}
@@ -330,7 +333,8 @@ int minix_make_empty(struct inode *inode
if (!page)
return -ENOMEM;
- err = mapping->a_ops->prepare_write(NULL, page, 0, 2 * sbi->s_dirsize);
+ err = __minix_write_begin(NULL, mapping, 0, 2 * sbi->s_dirsize,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
if (err) {
unlock_page(page);
goto fail;
@@ -421,17 +425,20 @@ not_empty:
void minix_set_link(struct minix_dir_entry *de, struct page *page,
struct inode *inode)
{
- struct inode *dir = (struct inode*)page->mapping->host;
+ struct address_space *mapping = page->mapping;
+ struct inode *dir = mapping->host;
struct minix_sb_info *sbi = minix_sb(dir->i_sb);
- unsigned from = (char *)de-(char*)page_address(page);
- unsigned to = from + sbi->s_dirsize;
+ loff_t pos = (page->index << PAGE_CACHE_SHIFT) +
+ (char *)de-(char*)page_address(page);
int err;
lock_page(page);
- err = page->mapping->a_ops->prepare_write(NULL, page, from, to);
+
+ err = __minix_write_begin(NULL, mapping, pos, sbi->s_dirsize,
+ AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
if (err == 0) {
de->inode = inode->i_ino;
- err = dir_commit_chunk(page, from, to);
+ err = dir_commit_chunk(page, pos, sbi->s_dirsize);
} else {
unlock_page(page);
}
--
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 41/41] jfs convert to new aops.
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
` (39 preceding siblings ...)
2007-05-14 6:06 ` [patch 40/41] minix " npiggin
@ 2007-05-14 6:07 ` npiggin
[not found] ` <200705142014.27237.vs@namesys.com>
41 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-14 6:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, jfs-discussion, shaggy
[-- Attachment #1: fs-jfs-aops.patch --]
[-- Type: text/plain, Size: 2361 bytes --]
Cc: shaggy@austin.ibm.com
Cc: jfs-discussion@lists.sourceforge.net
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
fs/jfs/inode.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
Index: linux-2.6/fs/jfs/inode.c
===================================================================
--- linux-2.6.orig/fs/jfs/inode.c
+++ linux-2.6/fs/jfs/inode.c
@@ -255,7 +255,7 @@ int jfs_get_block(struct inode *ip, sect
static int jfs_writepage(struct page *page, struct writeback_control *wbc)
{
- return nobh_writepage(page, jfs_get_block, wbc);
+ return block_write_full_page(page, jfs_get_block, wbc);
}
static int jfs_writepages(struct address_space *mapping,
@@ -275,10 +275,13 @@ static int jfs_readpages(struct file *fi
return mpage_readpages(mapping, pages, nr_pages, jfs_get_block);
}
-static int jfs_prepare_write(struct file *file,
- struct page *page, unsigned from, unsigned to)
-{
- return nobh_prepare_write(page, from, to, jfs_get_block);
+static int jfs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ *pagep = NULL;
+ return block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ jfs_get_block);
}
static sector_t jfs_bmap(struct address_space *mapping, sector_t block)
@@ -302,8 +305,8 @@ const struct address_space_operations jf
.writepage = jfs_writepage,
.writepages = jfs_writepages,
.sync_page = block_sync_page,
- .prepare_write = jfs_prepare_write,
- .commit_write = nobh_commit_write,
+ .write_begin = jfs_write_begin,
+ .write_end = generic_write_end,
.bmap = jfs_bmap,
.direct_IO = jfs_direct_IO,
};
@@ -356,7 +359,7 @@ void jfs_truncate(struct inode *ip)
{
jfs_info("jfs_truncate: size = 0x%lx", (ulong) ip->i_size);
- nobh_truncate_page(ip->i_mapping, ip->i_size);
+ block_truncate_page(ip->i_mapping, ip->i_size, jfs_get_block);
IWRITE_LOCK(ip, RDWRLOCK_NORMAL);
jfs_truncate_nolock(ip, ip->i_size);
--
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [patch 28/41] qnx4 convert to new aops.
2007-05-14 6:06 ` [patch 28/41] qnx4 " npiggin
@ 2007-05-14 10:37 ` Anders Larsen
0 siblings, 0 replies; 48+ messages in thread
From: Anders Larsen @ 2007-05-14 10:37 UTC (permalink / raw)
To: npiggin; +Cc: Andrew Morton, linux-fsdevel
On 2007-05-14 08:06:47, npiggin@suse.de wrote:
> Cc: al@alarsen.net
> Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Anders Larsen <al@alarsen.net>
(although we might just as well do away with the 'write' methods completely,
since write-support is && BROKEN anyway)
Cheers
Anders
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [patch 02/41] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6
2007-05-14 6:06 ` [patch 02/41] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6 npiggin
@ 2007-05-14 19:06 ` Dave Jones
2007-05-14 22:45 ` Nick Piggin
0 siblings, 1 reply; 48+ messages in thread
From: Dave Jones @ 2007-05-14 19:06 UTC (permalink / raw)
To: npiggin, Andrew Morton
Cc: Andrew Morton, linux-fsdevel, Linux Memory Management
On Mon, May 14, 2007 at 04:06:21PM +1000, npiggin@suse.de wrote:
> This was a bugfix against 6527c2bdf1f833cc18e8f42bd97973d583e4aa83, which we
> also revert.
changes like this play havoc with git-bisect. If you must revert stuff
before patching new code in, revert it all in a single diff.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [patch 02/41] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6
2007-05-14 19:06 ` Dave Jones
@ 2007-05-14 22:45 ` Nick Piggin
0 siblings, 0 replies; 48+ messages in thread
From: Nick Piggin @ 2007-05-14 22:45 UTC (permalink / raw)
To: Dave Jones
Cc: Andrew Morton, Andrew Morton, linux-fsdevel,
Linux Memory Management
On Mon, May 14, 2007 at 03:06:35PM -0400, Dave Jones wrote:
> On Mon, May 14, 2007 at 04:06:21PM +1000, npiggin@suse.de wrote:
> > This was a bugfix against 6527c2bdf1f833cc18e8f42bd97973d583e4aa83, which we
> > also revert.
>
> changes like this play havoc with git-bisect. If you must revert stuff
> before patching new code in, revert it all in a single diff.
It's all going to still boot and run...
But I guess there is no harm in merging them, however I'll send them
to Andrew singularly and he can merge them if he likes.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [patch 17/41] ext2 convert to new aops.
2007-05-14 6:06 ` [patch 17/41] ext2 " npiggin
@ 2007-05-16 12:45 ` Nick Piggin
0 siblings, 0 replies; 48+ messages in thread
From: Nick Piggin @ 2007-05-16 12:45 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, linux-ext4
On Mon, May 14, 2007 at 04:06:36PM +1000, npiggin@suse.de wrote:
> Cc: linux-ext4@vger.kernel.org
> Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
Found a problem in ext2 pagecache directory handling. Trivial fix follows.
Longer-term, it might be better to rework these things a bit so they can
directly use the pagecache_write_begin/pagecache_write_end accessors.
---
Index: linux-2.6/fs/ext2/dir.c
===================================================================
--- linux-2.6.orig/fs/ext2/dir.c
+++ linux-2.6/fs/ext2/dir.c
@@ -70,10 +70,18 @@ static int ext2_commit_chunk(struct page
dir->i_version++;
block_write_end(NULL, mapping, pos, len, len, page, NULL);
+
+ if (pos+len > dir->i_size) {
+ i_size_write(dir, pos+len);
+ mark_inode_dirty(dir);
+ }
+
if (IS_DIRSYNC(dir))
err = write_one_page(page, 1);
else
unlock_page(page);
+ mark_page_accessed(page);
+
return err;
}
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [patch 30/41] reiserfs convert to new aops.
[not found] ` <20070514224042.GD5531@wotan.suse.de>
@ 2007-05-16 17:22 ` Vladimir V. Saveliev
0 siblings, 0 replies; 48+ messages in thread
From: Vladimir V. Saveliev @ 2007-05-16 17:22 UTC (permalink / raw)
To: Nick Piggin, reiserfs-list, linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 233 bytes --]
Hello, Nick
This is new version of the patch.
reiserfs_prepare_write and reiserfs_commit_write are still there, but they do not show themselves in any struct address_space_operations instance.
xattrs and ioctl use them directly.
[-- Attachment #2: reiserfs-convert-to-new-aops.patch --]
[-- Type: text/x-diff, Size: 9294 bytes --]
From: Vladimir Saveliev <vs@namesys.com>
Convert reiserfs to new aops
Signed-off-by: Vladimir Saveliev <vs@namesys.com>
diff -puN fs/reiserfs/inode.c~reiserfs-convert-to-new-aops fs/reiserfs/inode.c
--- linux-2.6.21-mm2/fs/reiserfs/inode.c~reiserfs-convert-to-new-aops 2007-05-16 20:13:36.000000000 +0300
+++ linux-2.6.21-mm2-vs/fs/reiserfs/inode.c 2007-05-16 20:13:36.000000000 +0300
@@ -16,11 +16,12 @@
#include <linux/mpage.h>
#include <linux/writeback.h>
#include <linux/quotaops.h>
+#include <linux/swap.h>
-static int reiserfs_commit_write(struct file *f, struct page *page,
- unsigned from, unsigned to);
-static int reiserfs_prepare_write(struct file *f, struct page *page,
- unsigned from, unsigned to);
+int reiserfs_commit_write(struct file *f, struct page *page,
+ unsigned from, unsigned to);
+int reiserfs_prepare_write(struct file *f, struct page *page,
+ unsigned from, unsigned to);
void reiserfs_delete_inode(struct inode *inode)
{
@@ -2549,8 +2550,71 @@ static int reiserfs_writepage(struct pag
return reiserfs_write_full_page(page, wbc);
}
-static int reiserfs_prepare_write(struct file *f, struct page *page,
- unsigned from, unsigned to)
+static int reiserfs_write_begin(struct file *file,
+ struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned flags,
+ struct page **pagep, void **fsdata)
+{
+ struct inode *inode;
+ struct page *page;
+ pgoff_t index;
+ int ret;
+ int old_ref = 0;
+
+ index = pos >> PAGE_CACHE_SHIFT;
+ page = __grab_cache_page(mapping, index);
+ if (!page)
+ return -ENOMEM;
+ *pagep = page;
+
+ inode = mapping->host;
+ reiserfs_wait_on_write_block(inode->i_sb);
+ fix_tail_page_for_writing(page);
+ if (reiserfs_transaction_running(inode->i_sb)) {
+ struct reiserfs_transaction_handle *th;
+ th = (struct reiserfs_transaction_handle *)current->
+ journal_info;
+ BUG_ON(!th->t_refcount);
+ BUG_ON(!th->t_trans_id);
+ old_ref = th->t_refcount;
+ th->t_refcount++;
+ }
+ ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+ reiserfs_get_block);
+ if (ret && reiserfs_transaction_running(inode->i_sb)) {
+ struct reiserfs_transaction_handle *th = current->journal_info;
+ /* this gets a little ugly. If reiserfs_get_block returned an
+ * error and left a transacstion running, we've got to close it,
+ * and we've got to free handle if it was a persistent transaction.
+ *
+ * But, if we had nested into an existing transaction, we need
+ * to just drop the ref count on the handle.
+ *
+ * If old_ref == 0, the transaction is from reiserfs_get_block,
+ * and it was a persistent trans. Otherwise, it was nested above.
+ */
+ if (th->t_refcount > old_ref) {
+ if (old_ref)
+ th->t_refcount--;
+ else {
+ int err;
+ reiserfs_write_lock(inode->i_sb);
+ err = reiserfs_end_persistent_transaction(th);
+ reiserfs_write_unlock(inode->i_sb);
+ if (err)
+ ret = err;
+ }
+ }
+ }
+ if (ret) {
+ unlock_page(page);
+ page_cache_release(page);
+ }
+ return ret;
+}
+
+int reiserfs_prepare_write(struct file *f, struct page *page,
+ unsigned from, unsigned to)
{
struct inode *inode = page->mapping->host;
int ret;
@@ -2603,8 +2667,101 @@ static sector_t reiserfs_aop_bmap(struct
return generic_block_bmap(as, block, reiserfs_bmap);
}
-static int reiserfs_commit_write(struct file *f, struct page *page,
- unsigned from, unsigned to)
+static int reiserfs_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ struct inode *inode = page->mapping->host;
+ int ret = 0;
+ int update_sd = 0;
+ struct reiserfs_transaction_handle *th;
+ unsigned start;
+
+
+ reiserfs_wait_on_write_block(inode->i_sb);
+ if (reiserfs_transaction_running(inode->i_sb))
+ th = current->journal_info;
+ else
+ th = NULL;
+
+ start = pos & (PAGE_CACHE_SIZE - 1);
+ if (unlikely(copied < len)) {
+ if (!PageUptodate(page))
+ copied = 0;
+
+ page_zero_new_buffers(page, start + copied, start + len);
+ }
+ flush_dcache_page(page);
+
+ reiserfs_commit_page(inode, page, start, start + copied);
+ unlock_page(page);
+ mark_page_accessed(page);
+ page_cache_release(page);
+
+ /* generic_commit_write does this for us, but does not update the
+ ** transaction tracking stuff when the size changes. So, we have
+ ** to do the i_size updates here.
+ */
+ pos += copied;
+ if (pos > inode->i_size) {
+ struct reiserfs_transaction_handle myth;
+ reiserfs_write_lock(inode->i_sb);
+ /* If the file have grown beyond the border where it
+ can have a tail, unmark it as needing a tail
+ packing */
+ if ((have_large_tails(inode->i_sb)
+ && inode->i_size > i_block_size(inode) * 4)
+ || (have_small_tails(inode->i_sb)
+ && inode->i_size > i_block_size(inode)))
+ REISERFS_I(inode)->i_flags &= ~i_pack_on_close_mask;
+
+ ret = journal_begin(&myth, inode->i_sb, 1);
+ if (ret) {
+ reiserfs_write_unlock(inode->i_sb);
+ goto journal_error;
+ }
+ reiserfs_update_inode_transaction(inode);
+ inode->i_size = pos;
+ /*
+ * this will just nest into our transaction. It's important
+ * to use mark_inode_dirty so the inode gets pushed around on the
+ * dirty lists, and so that O_SYNC works as expected
+ */
+ mark_inode_dirty(inode);
+ reiserfs_update_sd(&myth, inode);
+ update_sd = 1;
+ ret = journal_end(&myth, inode->i_sb, 1);
+ reiserfs_write_unlock(inode->i_sb);
+ if (ret)
+ goto journal_error;
+ }
+ if (th) {
+ reiserfs_write_lock(inode->i_sb);
+ if (!update_sd)
+ mark_inode_dirty(inode);
+ ret = reiserfs_end_persistent_transaction(th);
+ reiserfs_write_unlock(inode->i_sb);
+ if (ret)
+ goto out;
+ }
+
+ out:
+ return ret == 0 ? copied : ret;
+
+ journal_error:
+ if (th) {
+ reiserfs_write_lock(inode->i_sb);
+ if (!update_sd)
+ reiserfs_update_sd(th, inode);
+ ret = reiserfs_end_persistent_transaction(th);
+ reiserfs_write_unlock(inode->i_sb);
+ }
+
+ return ret;
+}
+
+int reiserfs_commit_write(struct file *f, struct page *page,
+ unsigned from, unsigned to)
{
struct inode *inode = page->mapping->host;
loff_t pos = ((loff_t) page->index << PAGE_CACHE_SHIFT) + to;
@@ -2997,9 +3154,9 @@ const struct address_space_operations re
.readpages = reiserfs_readpages,
.releasepage = reiserfs_releasepage,
.invalidatepage = reiserfs_invalidatepage,
- .sync_page = block_sync_page,
- .prepare_write = reiserfs_prepare_write,
- .commit_write = reiserfs_commit_write,
+ .sync_page = block_sync_page,
+ .write_begin = reiserfs_write_begin,
+ .write_end = reiserfs_write_end,
.bmap = reiserfs_aop_bmap,
.direct_IO = reiserfs_direct_IO,
.set_page_dirty = reiserfs_set_page_dirty,
diff -puN fs/reiserfs/ioctl.c~reiserfs-convert-to-new-aops fs/reiserfs/ioctl.c
--- linux-2.6.21-mm2/fs/reiserfs/ioctl.c~reiserfs-convert-to-new-aops 2007-05-16 20:13:36.000000000 +0300
+++ linux-2.6.21-mm2-vs/fs/reiserfs/ioctl.c 2007-05-16 20:13:36.000000000 +0300
@@ -129,6 +129,10 @@ long reiserfs_compat_ioctl(struct file *
}
#endif
+int reiserfs_commit_write(struct file *f, struct page *page,
+ unsigned from, unsigned to);
+int reiserfs_prepare_write(struct file *f, struct page *page,
+ unsigned from, unsigned to);
/*
** reiserfs_unpack
** Function try to convert tail from direct item into indirect.
@@ -176,15 +180,13 @@ static int reiserfs_unpack(struct inode
if (!page) {
goto out;
}
- retval =
- mapping->a_ops->prepare_write(NULL, page, write_from, write_from);
+ retval = reiserfs_prepare_write(NULL, page, write_from, write_from);
if (retval)
goto out_unlock;
/* conversion can change page contents, must flush */
flush_dcache_page(page);
- retval =
- mapping->a_ops->commit_write(NULL, page, write_from, write_from);
+ retval = reiserfs_commit_write(NULL, page, write_from, write_from);
REISERFS_I(inode)->i_flags |= i_nopack_mask;
out_unlock:
diff -puN fs/reiserfs/xattr.c~reiserfs-convert-to-new-aops fs/reiserfs/xattr.c
--- linux-2.6.21-mm2/fs/reiserfs/xattr.c~reiserfs-convert-to-new-aops 2007-05-16 20:13:36.000000000 +0300
+++ linux-2.6.21-mm2-vs/fs/reiserfs/xattr.c 2007-05-16 20:13:36.000000000 +0300
@@ -426,6 +426,12 @@ static inline __u32 xattr_hash(const cha
return csum_partial(msg, len, 0);
}
+int reiserfs_commit_write(struct file *f, struct page *page,
+ unsigned from, unsigned to);
+int reiserfs_prepare_write(struct file *f, struct page *page,
+ unsigned from, unsigned to);
+
+
/* Generic extended attribute operations that can be used by xa plugins */
/*
@@ -512,15 +518,15 @@ reiserfs_xattr_set(struct inode *inode,
rxh->h_hash = cpu_to_le32(xahash);
}
- err = mapping->a_ops->prepare_write(fp, page, page_offset,
- page_offset + chunk + skip);
+ err = reiserfs_prepare_write(fp, page, page_offset,
+ page_offset + chunk + skip);
if (!err) {
if (buffer)
memcpy(data + skip, buffer + buffer_pos, chunk);
err =
- mapping->a_ops->commit_write(fp, page, page_offset,
- page_offset + chunk +
- skip);
+ reiserfs_commit_write(fp, page, page_offset,
+ page_offset + chunk +
+ skip);
}
unlock_page(page);
reiserfs_put_page(page);
_
^ permalink raw reply [flat|nested] 48+ messages in thread
* [patch 02/41] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6
2007-05-25 12:21 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.22-rc2-mm1 npiggin
@ 2007-05-25 12:21 ` npiggin
0 siblings, 0 replies; 48+ messages in thread
From: npiggin @ 2007-05-25 12:21 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-fsdevel, Mark Fasheh, Linux Memory Management,
Andrew Morton
[-- Attachment #1: mm-revert-buffered-write-zero-length-iov.patch --]
[-- Type: text/plain, Size: 1660 bytes --]
From: Andrew Morton <akpm@osdl.org>
This was a bugfix against 6527c2bdf1f833cc18e8f42bd97973d583e4aa83, which we
also revert.
Cc: Linux Memory Management <linux-mm@kvack.org>
Cc: Linux Filesystems <linux-fsdevel@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
mm/filemap.c | 9 +--------
mm/filemap.h | 4 ++--
2 files changed, 3 insertions(+), 10 deletions(-)
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1969,12 +1969,6 @@ generic_file_buffered_write(struct kiocb
break;
}
- if (unlikely(bytes == 0)) {
- status = 0;
- copied = 0;
- goto zero_length_segment;
- }
-
status = a_ops->prepare_write(file, page, offset, offset+bytes);
if (unlikely(status)) {
loff_t isize = i_size_read(inode);
@@ -2004,8 +1998,7 @@ generic_file_buffered_write(struct kiocb
page_cache_release(page);
continue;
}
-zero_length_segment:
- if (likely(copied >= 0)) {
+ if (likely(copied > 0)) {
if (!status)
status = copied;
Index: linux-2.6/mm/filemap.h
===================================================================
--- linux-2.6.orig/mm/filemap.h
+++ linux-2.6/mm/filemap.h
@@ -87,7 +87,7 @@ filemap_set_next_iovec(const struct iove
const struct iovec *iov = *iovp;
size_t base = *basep;
- do {
+ while (bytes) {
int copy = min(bytes, iov->iov_len - base);
bytes -= copy;
@@ -96,7 +96,7 @@ filemap_set_next_iovec(const struct iove
iov++;
base = 0;
}
- } while (bytes);
+ }
*iovp = iov;
*basep = base;
}
--
^ permalink raw reply [flat|nested] 48+ messages in thread
end of thread, other threads:[~2007-05-25 12:37 UTC | newest]
Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-14 6:06 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.21-mm2 npiggin
2007-05-14 6:06 ` [patch 01/41] mm: revert KERNEL_DS buffered write optimisation npiggin
2007-05-14 6:06 ` [patch 02/41] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6 npiggin
2007-05-14 19:06 ` Dave Jones
2007-05-14 22:45 ` Nick Piggin
2007-05-14 6:06 ` [patch 03/41] Revert 6527c2bdf1f833cc18e8f42bd97973d583e4aa83 npiggin
2007-05-14 6:06 ` [patch 04/41] mm: clean up buffered write code npiggin
2007-05-14 6:06 ` [patch 05/41] mm: debug write deadlocks npiggin
2007-05-14 6:06 ` [patch 06/41] mm: trim more holes npiggin
2007-05-14 6:06 ` [patch 07/41] mm: buffered write cleanup npiggin
2007-05-14 6:06 ` [patch 08/41] mm: write iovec cleanup npiggin
2007-05-14 6:06 ` [patch 09/41] mm: fix pagecache write deadlocks npiggin
2007-05-14 6:06 ` [patch 10/41] mm: buffered write iterator npiggin
2007-05-14 6:06 ` [patch 11/41] fs: fix data-loss on error npiggin
2007-05-14 6:06 ` [patch 12/41] fs: introduce write_begin, write_end, and perform_write aops npiggin
2007-05-14 6:06 ` [patch 13/41] mm: restore KERNEL_DS optimisations npiggin
2007-05-14 6:06 ` [patch 14/41] implement simple fs aops npiggin
2007-05-14 6:06 ` [patch 15/41] block_dev convert to new aops npiggin
2007-05-14 6:06 ` [patch 16/41] rd " npiggin
2007-05-14 6:06 ` [patch 17/41] ext2 " npiggin
2007-05-16 12:45 ` Nick Piggin
2007-05-14 6:06 ` [patch 18/41] ext3 " npiggin
2007-05-14 6:06 ` [patch 19/41] ext4 " npiggin
2007-05-14 6:06 ` [patch 20/41] xfs " npiggin
2007-05-14 6:06 ` [patch 21/41] fs: new cont helpers npiggin
2007-05-14 6:06 ` [patch 22/41] fat convert to new aops npiggin
2007-05-14 6:06 ` [patch 23/41] adfs " npiggin
2007-05-14 6:06 ` [patch 24/41] hfs " npiggin
2007-05-14 6:06 ` [patch 25/41] hfsplus " npiggin
2007-05-14 6:06 ` [patch 26/41] hpfs " npiggin
2007-05-14 6:06 ` [patch 27/41] bfs " npiggin
2007-05-14 6:06 ` [patch 28/41] qnx4 " npiggin
2007-05-14 10:37 ` Anders Larsen
2007-05-14 6:06 ` [patch 29/41] reiserfs use generic write npiggin
2007-05-14 6:06 ` [patch 30/41] reiserfs convert to new aops npiggin
2007-05-14 6:06 ` [patch 31/41] nfs " npiggin
2007-05-14 6:06 ` [patch 32/41] smb " npiggin
2007-05-14 6:06 ` [patch 33/41] GFS2 " npiggin
2007-05-14 6:06 ` [patch 34/41] fuse " npiggin
2007-05-14 6:06 ` [patch 35/41] hostfs " npiggin
2007-05-14 6:06 ` [patch 36/41] jffs2 " npiggin
2007-05-14 6:06 ` [patch 37/41] ufs " npiggin
2007-05-14 6:06 ` [patch 38/41] udf " npiggin
2007-05-14 6:06 ` [patch 39/41] sysv " npiggin
2007-05-14 6:06 ` [patch 40/41] minix " npiggin
2007-05-14 6:07 ` [patch 41/41] jfs " npiggin
[not found] ` <200705142014.27237.vs@namesys.com>
[not found] ` <20070514224042.GD5531@wotan.suse.de>
2007-05-16 17:22 ` [patch 30/41] reiserfs " Vladimir V. Saveliev
-- strict thread matches above, loose matches on Subject: below --
2007-05-25 12:21 [patch 00/41] Buffered write deadlock fix and new aops for 2.6.22-rc2-mm1 npiggin
2007-05-25 12:21 ` [patch 02/41] Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6 npiggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).