* [PATCH] nfs: fix NR_FILE_DIRTY underflow @ 2006-12-13 12:12 Peter Zijlstra 2006-12-13 12:26 ` Trond Myklebust 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2006-12-13 12:12 UTC (permalink / raw) To: Andrew Morton, Trond Myklebust; +Cc: linux-kernel Still testing this patch, but it looks good so far. --- Just setting PG_dirty can cause NR_FILE_DIRTY to underflow which is bad (TM). Use set_page_dirty() which will do the right thing. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- fs/nfs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6-git2/fs/nfs/file.c =================================================================== --- linux-2.6-git2.orig/fs/nfs/file.c 2006-12-13 12:54:55.000000000 +0100 +++ linux-2.6-git2/fs/nfs/file.c 2006-12-13 12:55:12.000000000 +0100 @@ -321,7 +321,7 @@ static int nfs_release_page(struct page if (!(gfp & __GFP_FS)) return 0; /* Hack... Force nfs_wb_page() to write out the page */ - SetPageDirty(page); + set_page_dirty(page); return !nfs_wb_page(page->mapping->host, page); } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow 2006-12-13 12:12 [PATCH] nfs: fix NR_FILE_DIRTY underflow Peter Zijlstra @ 2006-12-13 12:26 ` Trond Myklebust 2006-12-13 15:01 ` Peter Zijlstra 2006-12-13 16:22 ` Peter Zijlstra 0 siblings, 2 replies; 9+ messages in thread From: Trond Myklebust @ 2006-12-13 12:26 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Andrew Morton, linux-kernel On Wed, 2006-12-13 at 13:12 +0100, Peter Zijlstra wrote: > Still testing this patch, but it looks good so far. > > --- > Just setting PG_dirty can cause NR_FILE_DIRTY to underflow > which is bad (TM). > > Use set_page_dirty() which will do the right thing. Actually, I'd prefer to have it do the right thing by getting rid of that call to test_clear_page_dirty() inside invalidate_inode_pages2_range(). That is causing loss of data integrity, and is what is causing us to have to hack NFS in the first place. Cheers Trond > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > --- > fs/nfs/file.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-2.6-git2/fs/nfs/file.c > =================================================================== > --- linux-2.6-git2.orig/fs/nfs/file.c 2006-12-13 12:54:55.000000000 +0100 > +++ linux-2.6-git2/fs/nfs/file.c 2006-12-13 12:55:12.000000000 +0100 > @@ -321,7 +321,7 @@ static int nfs_release_page(struct page > if (!(gfp & __GFP_FS)) > return 0; > /* Hack... Force nfs_wb_page() to write out the page */ > - SetPageDirty(page); > + set_page_dirty(page); > return !nfs_wb_page(page->mapping->host, page); > } > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow 2006-12-13 12:26 ` Trond Myklebust @ 2006-12-13 15:01 ` Peter Zijlstra 2006-12-13 17:40 ` Trond Myklebust 2006-12-13 16:22 ` Peter Zijlstra 1 sibling, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2006-12-13 15:01 UTC (permalink / raw) To: Trond Myklebust; +Cc: Andrew Morton, linux-kernel, Nick Piggin On Wed, 2006-12-13 at 07:26 -0500, Trond Myklebust wrote: > On Wed, 2006-12-13 at 13:12 +0100, Peter Zijlstra wrote: > > Still testing this patch, but it looks good so far. > > > > --- > > Just setting PG_dirty can cause NR_FILE_DIRTY to underflow > > which is bad (TM). > > > > Use set_page_dirty() which will do the right thing. > > Actually, I'd prefer to have it do the right thing by getting rid of > that call to test_clear_page_dirty() inside > invalidate_inode_pages2_range(). That is causing loss of data integrity, > and is what is causing us to have to hack NFS in the first place. Ah, I think I see what your problem is there. How about this totally untested patch: --- Delay clearing the dirty page state till after removing it from the mapping in invalidate_inode_pages2_range(). This will give try_to_release_pages() a shot to flush dirty data. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- fs/nfs/file.c | 2 -- include/linux/page-flags.h | 2 ++ mm/page-writeback.c | 17 +++++++++++------ mm/truncate.c | 11 +++-------- 4 files changed, 16 insertions(+), 16 deletions(-) Index: linux-2.6-git/fs/nfs/file.c =================================================================== --- linux-2.6-git.orig/fs/nfs/file.c 2006-12-13 15:31:26.000000000 +0100 +++ linux-2.6-git/fs/nfs/file.c 2006-12-13 15:39:33.000000000 +0100 @@ -320,8 +320,6 @@ static int nfs_release_page(struct page */ if (!(gfp & __GFP_FS)) return 0; - /* Hack... Force nfs_wb_page() to write out the page */ - SetPageDirty(page); return !nfs_wb_page(page_file_mapping(page)->host, page); } Index: linux-2.6-git/include/linux/page-flags.h =================================================================== --- linux-2.6-git.orig/include/linux/page-flags.h 2006-12-13 15:35:50.000000000 +0100 +++ linux-2.6-git/include/linux/page-flags.h 2006-12-13 15:36:14.000000000 +0100 @@ -252,7 +252,9 @@ static inline void SetPageUptodate(struc #define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags) struct page; /* forward declaration */ +struct address_space; +int __test_clear_page_dirty(struct address_space *mapping, struct page *page); int test_clear_page_dirty(struct page *page); int test_clear_page_writeback(struct page *page); int test_set_page_writeback(struct page *page); Index: linux-2.6-git/mm/page-writeback.c =================================================================== --- linux-2.6-git.orig/mm/page-writeback.c 2006-12-13 15:34:15.000000000 +0100 +++ linux-2.6-git/mm/page-writeback.c 2006-12-13 15:39:41.000000000 +0100 @@ -850,13 +850,8 @@ int set_page_dirty_lock(struct page *pag } EXPORT_SYMBOL(set_page_dirty_lock); -/* - * Clear a page's dirty flag, while caring for dirty memory accounting. - * Returns true if the page was previously dirty. - */ -int test_clear_page_dirty(struct page *page) +int __test_clear_page_dirty(struct address_space *mapping, struct page *page) { - struct address_space *mapping = page_mapping(page); unsigned long flags; if (!mapping) @@ -880,6 +875,16 @@ int test_clear_page_dirty(struct page *p write_unlock_irqrestore(&mapping->tree_lock, flags); return 0; } + +/* + * Clear a page's dirty flag, while caring for dirty memory accounting. + * Returns true if the page was previously dirty. + */ +int test_clear_page_dirty(struct page *page) +{ + struct address_space *mapping = page_mapping(page); + return __test_clear_page_dirty(mapping, page); +} EXPORT_SYMBOL(test_clear_page_dirty); /* Index: linux-2.6-git/mm/truncate.c =================================================================== --- linux-2.6-git.orig/mm/truncate.c 2006-12-13 15:36:38.000000000 +0100 +++ linux-2.6-git/mm/truncate.c 2006-12-13 15:44:01.000000000 +0100 @@ -307,18 +307,12 @@ invalidate_complete_page2(struct address return 0; write_lock_irq(&mapping->tree_lock); - if (PageDirty(page)) - goto failed; - BUG_ON(PagePrivate(page)); __remove_from_page_cache(page); write_unlock_irq(&mapping->tree_lock); ClearPageUptodate(page); page_cache_release(page); /* pagecache ref */ return 1; -failed: - write_unlock_irq(&mapping->tree_lock); - return 0; } /** @@ -386,12 +380,13 @@ int invalidate_inode_pages2_range(struct PAGE_CACHE_SIZE, 0); } } - was_dirty = test_clear_page_dirty(page); + was_dirty = PageDirty(page); if (!invalidate_complete_page2(mapping, page)) { if (was_dirty) set_page_dirty(page); ret = -EIO; - } + } else + __test_clear_page_dirty(mapping, page); unlock_page(page); } pagevec_release(&pvec); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow 2006-12-13 15:01 ` Peter Zijlstra @ 2006-12-13 17:40 ` Trond Myklebust 2006-12-13 18:48 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Trond Myklebust @ 2006-12-13 17:40 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Andrew Morton, linux-kernel, Nick Piggin On Wed, 2006-12-13 at 16:01 +0100, Peter Zijlstra wrote: > On Wed, 2006-12-13 at 07:26 -0500, Trond Myklebust wrote: > > On Wed, 2006-12-13 at 13:12 +0100, Peter Zijlstra wrote: > > > Still testing this patch, but it looks good so far. > > > > > > --- > > > Just setting PG_dirty can cause NR_FILE_DIRTY to underflow > > > which is bad (TM). > > > > > > Use set_page_dirty() which will do the right thing. > > > > Actually, I'd prefer to have it do the right thing by getting rid of > > that call to test_clear_page_dirty() inside > > invalidate_inode_pages2_range(). That is causing loss of data integrity, > > and is what is causing us to have to hack NFS in the first place. > > Ah, I think I see what your problem is there. > How about this totally untested patch: > > --- > Delay clearing the dirty page state till after removing it from the > mapping in invalidate_inode_pages2_range(). This will give > try_to_release_pages() a shot to flush dirty data. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > --- > fs/nfs/file.c | 2 -- > include/linux/page-flags.h | 2 ++ > mm/page-writeback.c | 17 +++++++++++------ > mm/truncate.c | 11 +++-------- > 4 files changed, 16 insertions(+), 16 deletions(-) > > Index: linux-2.6-git/fs/nfs/file.c > =================================================================== > --- linux-2.6-git.orig/fs/nfs/file.c 2006-12-13 15:31:26.000000000 +0100 > +++ linux-2.6-git/fs/nfs/file.c 2006-12-13 15:39:33.000000000 +0100 > @@ -320,8 +320,6 @@ static int nfs_release_page(struct page > */ > if (!(gfp & __GFP_FS)) > return 0; > - /* Hack... Force nfs_wb_page() to write out the page */ > - SetPageDirty(page); > return !nfs_wb_page(page_file_mapping(page)->host, page); > } > > Index: linux-2.6-git/include/linux/page-flags.h > =================================================================== > --- linux-2.6-git.orig/include/linux/page-flags.h 2006-12-13 15:35:50.000000000 +0100 > +++ linux-2.6-git/include/linux/page-flags.h 2006-12-13 15:36:14.000000000 +0100 > @@ -252,7 +252,9 @@ static inline void SetPageUptodate(struc > #define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags) > > struct page; /* forward declaration */ > +struct address_space; > > +int __test_clear_page_dirty(struct address_space *mapping, struct page *page); > int test_clear_page_dirty(struct page *page); > int test_clear_page_writeback(struct page *page); > int test_set_page_writeback(struct page *page); > Index: linux-2.6-git/mm/page-writeback.c > =================================================================== > --- linux-2.6-git.orig/mm/page-writeback.c 2006-12-13 15:34:15.000000000 +0100 > +++ linux-2.6-git/mm/page-writeback.c 2006-12-13 15:39:41.000000000 +0100 > @@ -850,13 +850,8 @@ int set_page_dirty_lock(struct page *pag > } > EXPORT_SYMBOL(set_page_dirty_lock); > > -/* > - * Clear a page's dirty flag, while caring for dirty memory accounting. > - * Returns true if the page was previously dirty. > - */ > -int test_clear_page_dirty(struct page *page) > +int __test_clear_page_dirty(struct address_space *mapping, struct page *page) > { > - struct address_space *mapping = page_mapping(page); > unsigned long flags; > > if (!mapping) > @@ -880,6 +875,16 @@ int test_clear_page_dirty(struct page *p > write_unlock_irqrestore(&mapping->tree_lock, flags); > return 0; > } > + > +/* > + * Clear a page's dirty flag, while caring for dirty memory accounting. > + * Returns true if the page was previously dirty. > + */ > +int test_clear_page_dirty(struct page *page) > +{ > + struct address_space *mapping = page_mapping(page); > + return __test_clear_page_dirty(mapping, page); > +} > EXPORT_SYMBOL(test_clear_page_dirty); > > /* > Index: linux-2.6-git/mm/truncate.c > =================================================================== > --- linux-2.6-git.orig/mm/truncate.c 2006-12-13 15:36:38.000000000 +0100 > +++ linux-2.6-git/mm/truncate.c 2006-12-13 15:44:01.000000000 +0100 > @@ -307,18 +307,12 @@ invalidate_complete_page2(struct address > return 0; > > write_lock_irq(&mapping->tree_lock); > - if (PageDirty(page)) > - goto failed; > - > BUG_ON(PagePrivate(page)); > __remove_from_page_cache(page); > write_unlock_irq(&mapping->tree_lock); > ClearPageUptodate(page); > page_cache_release(page); /* pagecache ref */ > return 1; > -failed: > - write_unlock_irq(&mapping->tree_lock); > - return 0; > } > > /** > @@ -386,12 +380,13 @@ int invalidate_inode_pages2_range(struct > PAGE_CACHE_SIZE, 0); > } > } > - was_dirty = test_clear_page_dirty(page); > + was_dirty = PageDirty(page); > if (!invalidate_complete_page2(mapping, page)) { > if (was_dirty) > set_page_dirty(page); ^^^^^^^^^^^^^^^^^^^^^ Why would you still need this? > ret = -EIO; > - } > + } else > + __test_clear_page_dirty(mapping, page); > unlock_page(page); > } > pagevec_release(&pvec); > Cheers Trond ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow 2006-12-13 17:40 ` Trond Myklebust @ 2006-12-13 18:48 ` Peter Zijlstra 2006-12-14 1:29 ` Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2006-12-13 18:48 UTC (permalink / raw) To: Trond Myklebust; +Cc: Andrew Morton, linux-kernel, Nick Piggin On Wed, 2006-12-13 at 12:40 -0500, Trond Myklebust wrote: > On Wed, 2006-12-13 at 16:01 +0100, Peter Zijlstra wrote: > > @@ -386,12 +380,13 @@ int invalidate_inode_pages2_range(struct > > PAGE_CACHE_SIZE, 0); > > } > > } > > - was_dirty = test_clear_page_dirty(page); > > + was_dirty = PageDirty(page); > > if (!invalidate_complete_page2(mapping, page)) { > > if (was_dirty) > > set_page_dirty(page); > ^^^^^^^^^^^^^^^^^^^^^ > > Why would you still need this? Hmm yeah, that doesn't make sense. What kind of twist did I get my brain into. This should do. Hmm, does this agree with the DIO path? I just noticed generic_file_direct_IO() also uses invalidate_inode_pages2_range(). I think it does, its looks like all they want is to ensure nothing remains in the pagecache after a write, which itself ensures nothing is dirty to begin with. --- Delay clearing the dirty page state till after we've invalidated the page in invalidate_complete_page2(). This gives try_to_release_pages() a chance to flush dirty data. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- fs/nfs/file.c | 2 -- mm/truncate.c | 14 ++------------ 2 files changed, 2 insertions(+), 14 deletions(-) Index: linux-2.6-git/fs/nfs/file.c =================================================================== --- linux-2.6-git.orig/fs/nfs/file.c 2006-12-13 19:41:09.000000000 +0100 +++ linux-2.6-git/fs/nfs/file.c 2006-12-13 19:42:18.000000000 +0100 @@ -320,8 +320,6 @@ static int nfs_release_page(struct page */ if (!(gfp & __GFP_FS)) return 0; - /* Hack... Force nfs_wb_page() to write out the page */ - set_page_dirty(page); return !nfs_wb_page(page_file_mapping(page)->host, page); } Index: linux-2.6-git/mm/truncate.c =================================================================== --- linux-2.6-git.orig/mm/truncate.c 2006-12-13 19:41:09.000000000 +0100 +++ linux-2.6-git/mm/truncate.c 2006-12-13 19:42:56.000000000 +0100 @@ -306,19 +306,14 @@ invalidate_complete_page2(struct address if (PagePrivate(page) && !try_to_release_page(page, GFP_KERNEL)) return 0; + test_clear_page_dirty(page); write_lock_irq(&mapping->tree_lock); - if (PageDirty(page)) - goto failed; - BUG_ON(PagePrivate(page)); __remove_from_page_cache(page); write_unlock_irq(&mapping->tree_lock); ClearPageUptodate(page); page_cache_release(page); /* pagecache ref */ return 1; -failed: - write_unlock_irq(&mapping->tree_lock); - return 0; } /** @@ -350,7 +345,6 @@ int invalidate_inode_pages2_range(struct for (i = 0; !ret && i < pagevec_count(&pvec); i++) { struct page *page = pvec.pages[i]; pgoff_t page_index; - int was_dirty; lock_page(page); if (page->mapping != mapping) { @@ -386,12 +380,8 @@ int invalidate_inode_pages2_range(struct PAGE_CACHE_SIZE, 0); } } - was_dirty = test_clear_page_dirty(page); - if (!invalidate_complete_page2(mapping, page)) { - if (was_dirty) - set_page_dirty(page); + if (!invalidate_complete_page2(mapping, page)) ret = -EIO; - } unlock_page(page); } pagevec_release(&pvec); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow 2006-12-13 18:48 ` Peter Zijlstra @ 2006-12-14 1:29 ` Andrew Morton 2006-12-14 1:41 ` Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2006-12-14 1:29 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Trond Myklebust, linux-kernel, Nick Piggin On Wed, 13 Dec 2006 19:48:34 +0100 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > --- linux-2.6-git.orig/mm/truncate.c 2006-12-13 19:41:09.000000000 +0100 > +++ linux-2.6-git/mm/truncate.c 2006-12-13 19:42:56.000000000 +0100 > @@ -306,19 +306,14 @@ invalidate_complete_page2(struct address > if (PagePrivate(page) && !try_to_release_page(page, GFP_KERNEL)) > return 0; > > + test_clear_page_dirty(page); > write_lock_irq(&mapping->tree_lock); > - if (PageDirty(page)) > - goto failed; > - > BUG_ON(PagePrivate(page)); > __remove_from_page_cache(page); > write_unlock_irq(&mapping->tree_lock); > ClearPageUptodate(page); > page_cache_release(page); /* pagecache ref */ > return 1; > -failed: > - write_unlock_irq(&mapping->tree_lock); > - return 0; > } > > /** > @@ -350,7 +345,6 @@ int invalidate_inode_pages2_range(struct > for (i = 0; !ret && i < pagevec_count(&pvec); i++) { > struct page *page = pvec.pages[i]; > pgoff_t page_index; > - int was_dirty; > > lock_page(page); > if (page->mapping != mapping) { > @@ -386,12 +380,8 @@ int invalidate_inode_pages2_range(struct > PAGE_CACHE_SIZE, 0); > } > } > - was_dirty = test_clear_page_dirty(page); > - if (!invalidate_complete_page2(mapping, page)) { > - if (was_dirty) > - set_page_dirty(page); > + if (!invalidate_complete_page2(mapping, page)) > ret = -EIO; > - } > unlock_page(page); > } > pagevec_release(&pvec); a) we're now calling try_to_release_page() with a potentially-dirty page, whereas it was previously clean. I wouldn't expect ->releasepage() implementations to go looking at PG_Dirty, because that's not what they're suppoed to be interested in. But they might do, dunno. b) If invalidate_complete_page2() failed due to, say, dirty buffer_heads then we now have a clean page with dirty buffers. That is an illegal state and the page will leak permanently. I _think_ that's what the was_dirty logic is in there for: to preserve the correct page-vs-buffers dirtiness coherency. But I'd need to do some 2.5.x changelog-dumpster-diving to be sure. Sigh. I don't know what invalidate_inode_pages2() is *supposed to do*. What are its semantics wrt unfreeable page metadata? Against page dirtiness? It was written for direct-io and had one set of semantics for that, then NFS came along and seemed to require a slightly different set of semantics, even though the earlier semantics weren't really defined, leading to a belief that the present semantics are "wrong", without a definition of what semantics NFS actually desires. So let's start again. Trond, please define precisely and completely and without reference to the existing implementation: what behaviour does NFS want? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow 2006-12-14 1:29 ` Andrew Morton @ 2006-12-14 1:41 ` Andrew Morton 2006-12-14 14:52 ` Trond Myklebust 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2006-12-14 1:41 UTC (permalink / raw) To: Peter Zijlstra, Trond Myklebust, linux-kernel, Nick Piggin On Wed, 13 Dec 2006 17:29:21 -0800 Andrew Morton <akpm@osdl.org> wrote: > a) we're now calling try_to_release_page() with a potentially-dirty > page, whereas it was previously clean. > > I wouldn't expect ->releasepage() implementations to go looking at > PG_Dirty, because that's not what they're suppoed to be interested in. > But they might do, dunno. Still an issue, probably minor. > b) If invalidate_complete_page2() failed due to, say, dirty buffer_heads > then we now have a clean page with dirty buffers. That is an illegal > state and the page will leak permanently. > > I _think_ that's what the was_dirty logic is in there for: to > preserve the correct page-vs-buffers dirtiness coherency. But I'd need > to do some 2.5.x changelog-dumpster-diving to be sure. no, that's bs. The patch looks OK from that POV: try_to_release_page() will be able to clear clean buffers from a dirty page. And in fact if it did that, it will then clean the page for us (see test_clear_page_dirty() in try_to_free_buffers()). But we still need the clear_page_dirty() in invalidate_complete_page2() in case we didn't call try_to_release_page() at all. > Trond, please define precisely and completely and without reference to > the existing implementation: what behaviour does NFS want? But this would be nice. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow 2006-12-14 1:41 ` Andrew Morton @ 2006-12-14 14:52 ` Trond Myklebust 0 siblings, 0 replies; 9+ messages in thread From: Trond Myklebust @ 2006-12-14 14:52 UTC (permalink / raw) To: Andrew Morton; +Cc: Peter Zijlstra, linux-kernel, Nick Piggin On Wed, 2006-12-13 at 17:41 -0800, Andrew Morton wrote: > > Trond, please define precisely and completely and without reference to > > the existing implementation: what behaviour does NFS want? Part of the behaviour is dictated by our needs for invalidate_inode_pages2(), part of the behaviour is dictated by the 2-stage NFS writeout mechanism. Starting with invalidate_inode_pages2(). The goal here is to have a function that is guaranteed to force all currently cached pages to be read in from the server afresh. In other words, we'd like to throw out the old mapping of the file, and start with a new one. However, we also need to guarantee that any modifications that a user may have made to the file are preserved. Throwing out a dirty page is forbidden unless the data has been written to disk first. What we're doing in try_to_release_page() in the current implementation of invalidate_inode_pages2() is therefore to fix races: the goal is to ensure that we write back any dirty data before the page is removed. This is made necessary by the fact that the VM may at any time override our call to unmap_mapping_range() and allow the page to be re-dirtied by the user. Next: about the 2-stage NFS writeout mechanism. A client may want to allow the server to cache writeback data for a while, so that it can flush several WRITE RPC calls to disk at the same time, maximising I/O efficiency in terms of elevator algorithms etc. It signals this to the server by means of the 'unstable' flag on the WRITE RPC call, which allows the former to just write the data into its pagecache. Before the client can actually free up the page it still needs to know that the data it wrote to the server has been flushed from the pagecache onto disk, and this is done by sending the COMMIT RPC request. The latter acts rather like fdatasync() does on local data: it guarantees that all file data within a specified range has been flushed to disk. Failure of the COMMIT request signals to the client that the server may have rebooted, and so the page needs to be written out again. Our second use of try_to_release_page() is therefore to ensure that the server has indeed flushed that dirty data from its pagecache to its disk before we allow the VM to release the page on the client. We don't send a COMMIT request for each and every page that is to be released, but we do ensure that at least one COMMIT has been sent that covers the data range represented by the page to be freed. Cheers Trond ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow 2006-12-13 12:26 ` Trond Myklebust 2006-12-13 15:01 ` Peter Zijlstra @ 2006-12-13 16:22 ` Peter Zijlstra 1 sibling, 0 replies; 9+ messages in thread From: Peter Zijlstra @ 2006-12-13 16:22 UTC (permalink / raw) To: Trond Myklebust; +Cc: Andrew Morton, linux-kernel, Nick Piggin, linux-mm - Sorry for possible duplicates, but I don't seem to be getting to lkml - On Wed, 2006-12-13 at 07:26 -0500, Trond Myklebust wrote: > On Wed, 2006-12-13 at 13:12 +0100, Peter Zijlstra wrote: > > Still testing this patch, but it looks good so far. > > > > --- > > Just setting PG_dirty can cause NR_FILE_DIRTY to underflow > > which is bad (TM). > > > > Use set_page_dirty() which will do the right thing. > > Actually, I'd prefer to have it do the right thing by getting rid of > that call to test_clear_page_dirty() inside > invalidate_inode_pages2_range(). That is causing loss of data integrity, > and is what is causing us to have to hack NFS in the first place. Ah, I think I see what your problem is there. How about this totally untested patch: (little update - it seems to compile and run, now testing if it fixes the problem too) --- Delay clearing the dirty page state till after removing it from the mapping in invalidate_inode_pages2_range(). This will give try_to_release_pages() a shot to flush dirty data. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- fs/nfs/file.c | 2 -- include/linux/page-flags.h | 2 ++ mm/page-writeback.c | 17 +++++++++++------ mm/truncate.c | 11 +++-------- 4 files changed, 16 insertions(+), 16 deletions(-) Index: linux-2.6-git/fs/nfs/file.c =================================================================== --- linux-2.6-git.orig/fs/nfs/file.c 2006-12-13 15:31:26.000000000 +0100 +++ linux-2.6-git/fs/nfs/file.c 2006-12-13 15:39:33.000000000 +0100 @@ -320,8 +320,6 @@ static int nfs_release_page(struct page */ if (!(gfp & __GFP_FS)) return 0; - /* Hack... Force nfs_wb_page() to write out the page */ - SetPageDirty(page); return !nfs_wb_page(page_file_mapping(page)->host, page); } Index: linux-2.6-git/include/linux/page-flags.h =================================================================== --- linux-2.6-git.orig/include/linux/page-flags.h 2006-12-13 15:35:50.000000000 +0100 +++ linux-2.6-git/include/linux/page-flags.h 2006-12-13 15:36:14.000000000 +0100 @@ -252,7 +252,9 @@ static inline void SetPageUptodate(struc #define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags) struct page; /* forward declaration */ +struct address_space; +int __test_clear_page_dirty(struct address_space *mapping, struct page *page); int test_clear_page_dirty(struct page *page); int test_clear_page_writeback(struct page *page); int test_set_page_writeback(struct page *page); Index: linux-2.6-git/mm/page-writeback.c =================================================================== --- linux-2.6-git.orig/mm/page-writeback.c 2006-12-13 15:34:15.000000000 +0100 +++ linux-2.6-git/mm/page-writeback.c 2006-12-13 15:39:41.000000000 +0100 @@ -850,13 +850,8 @@ int set_page_dirty_lock(struct page *pag } EXPORT_SYMBOL(set_page_dirty_lock); -/* - * Clear a page's dirty flag, while caring for dirty memory accounting. - * Returns true if the page was previously dirty. - */ -int test_clear_page_dirty(struct page *page) +int __test_clear_page_dirty(struct address_space *mapping, struct page *page) { - struct address_space *mapping = page_mapping(page); unsigned long flags; if (!mapping) @@ -880,6 +875,16 @@ int test_clear_page_dirty(struct page *p write_unlock_irqrestore(&mapping->tree_lock, flags); return 0; } + +/* + * Clear a page's dirty flag, while caring for dirty memory accounting. + * Returns true if the page was previously dirty. + */ +int test_clear_page_dirty(struct page *page) +{ + struct address_space *mapping = page_mapping(page); + return __test_clear_page_dirty(mapping, page); +} EXPORT_SYMBOL(test_clear_page_dirty); /* Index: linux-2.6-git/mm/truncate.c =================================================================== --- linux-2.6-git.orig/mm/truncate.c 2006-12-13 15:36:38.000000000 +0100 +++ linux-2.6-git/mm/truncate.c 2006-12-13 15:44:01.000000000 +0100 @@ -307,18 +307,12 @@ invalidate_complete_page2(struct address return 0; write_lock_irq(&mapping->tree_lock); - if (PageDirty(page)) - goto failed; - BUG_ON(PagePrivate(page)); __remove_from_page_cache(page); write_unlock_irq(&mapping->tree_lock); ClearPageUptodate(page); page_cache_release(page); /* pagecache ref */ return 1; -failed: - write_unlock_irq(&mapping->tree_lock); - return 0; } /** @@ -386,12 +380,13 @@ int invalidate_inode_pages2_range(struct PAGE_CACHE_SIZE, 0); } } - was_dirty = test_clear_page_dirty(page); + was_dirty = PageDirty(page); if (!invalidate_complete_page2(mapping, page)) { if (was_dirty) set_page_dirty(page); ret = -EIO; - } + } else + __test_clear_page_dirty(mapping, page); unlock_page(page); } pagevec_release(&pvec); ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-12-14 14:52 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-12-13 12:12 [PATCH] nfs: fix NR_FILE_DIRTY underflow Peter Zijlstra 2006-12-13 12:26 ` Trond Myklebust 2006-12-13 15:01 ` Peter Zijlstra 2006-12-13 17:40 ` Trond Myklebust 2006-12-13 18:48 ` Peter Zijlstra 2006-12-14 1:29 ` Andrew Morton 2006-12-14 1:41 ` Andrew Morton 2006-12-14 14:52 ` Trond Myklebust 2006-12-13 16:22 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox