* [PATCH] nfs: fix NR_FILE_DIRTY underflow
@ 2006-12-13 12:12 Peter Zijlstra
2006-12-13 12:26 ` Trond Myklebust
0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2006-12-13 12:12 UTC (permalink / raw)
To: Andrew Morton, Trond Myklebust; +Cc: linux-kernel
Still testing this patch, but it looks good so far.
---
Just setting PG_dirty can cause NR_FILE_DIRTY to underflow
which is bad (TM).
Use set_page_dirty() which will do the right thing.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
fs/nfs/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6-git2/fs/nfs/file.c
===================================================================
--- linux-2.6-git2.orig/fs/nfs/file.c 2006-12-13 12:54:55.000000000 +0100
+++ linux-2.6-git2/fs/nfs/file.c 2006-12-13 12:55:12.000000000 +0100
@@ -321,7 +321,7 @@ static int nfs_release_page(struct page
if (!(gfp & __GFP_FS))
return 0;
/* Hack... Force nfs_wb_page() to write out the page */
- SetPageDirty(page);
+ set_page_dirty(page);
return !nfs_wb_page(page->mapping->host, page);
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow
2006-12-13 12:12 [PATCH] nfs: fix NR_FILE_DIRTY underflow Peter Zijlstra
@ 2006-12-13 12:26 ` Trond Myklebust
2006-12-13 15:01 ` Peter Zijlstra
2006-12-13 16:22 ` Peter Zijlstra
0 siblings, 2 replies; 9+ messages in thread
From: Trond Myklebust @ 2006-12-13 12:26 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Andrew Morton, linux-kernel
On Wed, 2006-12-13 at 13:12 +0100, Peter Zijlstra wrote:
> Still testing this patch, but it looks good so far.
>
> ---
> Just setting PG_dirty can cause NR_FILE_DIRTY to underflow
> which is bad (TM).
>
> Use set_page_dirty() which will do the right thing.
Actually, I'd prefer to have it do the right thing by getting rid of
that call to test_clear_page_dirty() inside
invalidate_inode_pages2_range(). That is causing loss of data integrity,
and is what is causing us to have to hack NFS in the first place.
Cheers
Trond
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> fs/nfs/file.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-2.6-git2/fs/nfs/file.c
> ===================================================================
> --- linux-2.6-git2.orig/fs/nfs/file.c 2006-12-13 12:54:55.000000000 +0100
> +++ linux-2.6-git2/fs/nfs/file.c 2006-12-13 12:55:12.000000000 +0100
> @@ -321,7 +321,7 @@ static int nfs_release_page(struct page
> if (!(gfp & __GFP_FS))
> return 0;
> /* Hack... Force nfs_wb_page() to write out the page */
> - SetPageDirty(page);
> + set_page_dirty(page);
> return !nfs_wb_page(page->mapping->host, page);
> }
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow
2006-12-13 12:26 ` Trond Myklebust
@ 2006-12-13 15:01 ` Peter Zijlstra
2006-12-13 17:40 ` Trond Myklebust
2006-12-13 16:22 ` Peter Zijlstra
1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2006-12-13 15:01 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Andrew Morton, linux-kernel, Nick Piggin
On Wed, 2006-12-13 at 07:26 -0500, Trond Myklebust wrote:
> On Wed, 2006-12-13 at 13:12 +0100, Peter Zijlstra wrote:
> > Still testing this patch, but it looks good so far.
> >
> > ---
> > Just setting PG_dirty can cause NR_FILE_DIRTY to underflow
> > which is bad (TM).
> >
> > Use set_page_dirty() which will do the right thing.
>
> Actually, I'd prefer to have it do the right thing by getting rid of
> that call to test_clear_page_dirty() inside
> invalidate_inode_pages2_range(). That is causing loss of data integrity,
> and is what is causing us to have to hack NFS in the first place.
Ah, I think I see what your problem is there.
How about this totally untested patch:
---
Delay clearing the dirty page state till after removing it from the
mapping in invalidate_inode_pages2_range(). This will give
try_to_release_pages() a shot to flush dirty data.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
fs/nfs/file.c | 2 --
include/linux/page-flags.h | 2 ++
mm/page-writeback.c | 17 +++++++++++------
mm/truncate.c | 11 +++--------
4 files changed, 16 insertions(+), 16 deletions(-)
Index: linux-2.6-git/fs/nfs/file.c
===================================================================
--- linux-2.6-git.orig/fs/nfs/file.c 2006-12-13 15:31:26.000000000 +0100
+++ linux-2.6-git/fs/nfs/file.c 2006-12-13 15:39:33.000000000 +0100
@@ -320,8 +320,6 @@ static int nfs_release_page(struct page
*/
if (!(gfp & __GFP_FS))
return 0;
- /* Hack... Force nfs_wb_page() to write out the page */
- SetPageDirty(page);
return !nfs_wb_page(page_file_mapping(page)->host, page);
}
Index: linux-2.6-git/include/linux/page-flags.h
===================================================================
--- linux-2.6-git.orig/include/linux/page-flags.h 2006-12-13 15:35:50.000000000 +0100
+++ linux-2.6-git/include/linux/page-flags.h 2006-12-13 15:36:14.000000000 +0100
@@ -252,7 +252,9 @@ static inline void SetPageUptodate(struc
#define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags)
struct page; /* forward declaration */
+struct address_space;
+int __test_clear_page_dirty(struct address_space *mapping, struct page *page);
int test_clear_page_dirty(struct page *page);
int test_clear_page_writeback(struct page *page);
int test_set_page_writeback(struct page *page);
Index: linux-2.6-git/mm/page-writeback.c
===================================================================
--- linux-2.6-git.orig/mm/page-writeback.c 2006-12-13 15:34:15.000000000 +0100
+++ linux-2.6-git/mm/page-writeback.c 2006-12-13 15:39:41.000000000 +0100
@@ -850,13 +850,8 @@ int set_page_dirty_lock(struct page *pag
}
EXPORT_SYMBOL(set_page_dirty_lock);
-/*
- * Clear a page's dirty flag, while caring for dirty memory accounting.
- * Returns true if the page was previously dirty.
- */
-int test_clear_page_dirty(struct page *page)
+int __test_clear_page_dirty(struct address_space *mapping, struct page *page)
{
- struct address_space *mapping = page_mapping(page);
unsigned long flags;
if (!mapping)
@@ -880,6 +875,16 @@ int test_clear_page_dirty(struct page *p
write_unlock_irqrestore(&mapping->tree_lock, flags);
return 0;
}
+
+/*
+ * Clear a page's dirty flag, while caring for dirty memory accounting.
+ * Returns true if the page was previously dirty.
+ */
+int test_clear_page_dirty(struct page *page)
+{
+ struct address_space *mapping = page_mapping(page);
+ return __test_clear_page_dirty(mapping, page);
+}
EXPORT_SYMBOL(test_clear_page_dirty);
/*
Index: linux-2.6-git/mm/truncate.c
===================================================================
--- linux-2.6-git.orig/mm/truncate.c 2006-12-13 15:36:38.000000000 +0100
+++ linux-2.6-git/mm/truncate.c 2006-12-13 15:44:01.000000000 +0100
@@ -307,18 +307,12 @@ invalidate_complete_page2(struct address
return 0;
write_lock_irq(&mapping->tree_lock);
- if (PageDirty(page))
- goto failed;
-
BUG_ON(PagePrivate(page));
__remove_from_page_cache(page);
write_unlock_irq(&mapping->tree_lock);
ClearPageUptodate(page);
page_cache_release(page); /* pagecache ref */
return 1;
-failed:
- write_unlock_irq(&mapping->tree_lock);
- return 0;
}
/**
@@ -386,12 +380,13 @@ int invalidate_inode_pages2_range(struct
PAGE_CACHE_SIZE, 0);
}
}
- was_dirty = test_clear_page_dirty(page);
+ was_dirty = PageDirty(page);
if (!invalidate_complete_page2(mapping, page)) {
if (was_dirty)
set_page_dirty(page);
ret = -EIO;
- }
+ } else
+ __test_clear_page_dirty(mapping, page);
unlock_page(page);
}
pagevec_release(&pvec);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow
2006-12-13 12:26 ` Trond Myklebust
2006-12-13 15:01 ` Peter Zijlstra
@ 2006-12-13 16:22 ` Peter Zijlstra
1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2006-12-13 16:22 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Andrew Morton, linux-kernel, Nick Piggin, linux-mm
- Sorry for possible duplicates, but I don't seem to be getting to lkml -
On Wed, 2006-12-13 at 07:26 -0500, Trond Myklebust wrote:
> On Wed, 2006-12-13 at 13:12 +0100, Peter Zijlstra wrote:
> > Still testing this patch, but it looks good so far.
> >
> > ---
> > Just setting PG_dirty can cause NR_FILE_DIRTY to underflow
> > which is bad (TM).
> >
> > Use set_page_dirty() which will do the right thing.
>
> Actually, I'd prefer to have it do the right thing by getting rid of
> that call to test_clear_page_dirty() inside
> invalidate_inode_pages2_range(). That is causing loss of data integrity,
> and is what is causing us to have to hack NFS in the first place.
Ah, I think I see what your problem is there.
How about this totally untested patch:
(little update - it seems to compile and run, now testing if it fixes
the problem too)
---
Delay clearing the dirty page state till after removing it from the
mapping in invalidate_inode_pages2_range(). This will give
try_to_release_pages() a shot to flush dirty data.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
fs/nfs/file.c | 2 --
include/linux/page-flags.h | 2 ++
mm/page-writeback.c | 17 +++++++++++------
mm/truncate.c | 11 +++--------
4 files changed, 16 insertions(+), 16 deletions(-)
Index: linux-2.6-git/fs/nfs/file.c
===================================================================
--- linux-2.6-git.orig/fs/nfs/file.c 2006-12-13 15:31:26.000000000 +0100
+++ linux-2.6-git/fs/nfs/file.c 2006-12-13 15:39:33.000000000 +0100
@@ -320,8 +320,6 @@ static int nfs_release_page(struct page
*/
if (!(gfp & __GFP_FS))
return 0;
- /* Hack... Force nfs_wb_page() to write out the page */
- SetPageDirty(page);
return !nfs_wb_page(page_file_mapping(page)->host, page);
}
Index: linux-2.6-git/include/linux/page-flags.h
===================================================================
--- linux-2.6-git.orig/include/linux/page-flags.h 2006-12-13 15:35:50.000000000 +0100
+++ linux-2.6-git/include/linux/page-flags.h 2006-12-13 15:36:14.000000000 +0100
@@ -252,7 +252,9 @@ static inline void SetPageUptodate(struc
#define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags)
struct page; /* forward declaration */
+struct address_space;
+int __test_clear_page_dirty(struct address_space *mapping, struct page *page);
int test_clear_page_dirty(struct page *page);
int test_clear_page_writeback(struct page *page);
int test_set_page_writeback(struct page *page);
Index: linux-2.6-git/mm/page-writeback.c
===================================================================
--- linux-2.6-git.orig/mm/page-writeback.c 2006-12-13 15:34:15.000000000 +0100
+++ linux-2.6-git/mm/page-writeback.c 2006-12-13 15:39:41.000000000 +0100
@@ -850,13 +850,8 @@ int set_page_dirty_lock(struct page *pag
}
EXPORT_SYMBOL(set_page_dirty_lock);
-/*
- * Clear a page's dirty flag, while caring for dirty memory accounting.
- * Returns true if the page was previously dirty.
- */
-int test_clear_page_dirty(struct page *page)
+int __test_clear_page_dirty(struct address_space *mapping, struct page *page)
{
- struct address_space *mapping = page_mapping(page);
unsigned long flags;
if (!mapping)
@@ -880,6 +875,16 @@ int test_clear_page_dirty(struct page *p
write_unlock_irqrestore(&mapping->tree_lock, flags);
return 0;
}
+
+/*
+ * Clear a page's dirty flag, while caring for dirty memory accounting.
+ * Returns true if the page was previously dirty.
+ */
+int test_clear_page_dirty(struct page *page)
+{
+ struct address_space *mapping = page_mapping(page);
+ return __test_clear_page_dirty(mapping, page);
+}
EXPORT_SYMBOL(test_clear_page_dirty);
/*
Index: linux-2.6-git/mm/truncate.c
===================================================================
--- linux-2.6-git.orig/mm/truncate.c 2006-12-13 15:36:38.000000000 +0100
+++ linux-2.6-git/mm/truncate.c 2006-12-13 15:44:01.000000000 +0100
@@ -307,18 +307,12 @@ invalidate_complete_page2(struct address
return 0;
write_lock_irq(&mapping->tree_lock);
- if (PageDirty(page))
- goto failed;
-
BUG_ON(PagePrivate(page));
__remove_from_page_cache(page);
write_unlock_irq(&mapping->tree_lock);
ClearPageUptodate(page);
page_cache_release(page); /* pagecache ref */
return 1;
-failed:
- write_unlock_irq(&mapping->tree_lock);
- return 0;
}
/**
@@ -386,12 +380,13 @@ int invalidate_inode_pages2_range(struct
PAGE_CACHE_SIZE, 0);
}
}
- was_dirty = test_clear_page_dirty(page);
+ was_dirty = PageDirty(page);
if (!invalidate_complete_page2(mapping, page)) {
if (was_dirty)
set_page_dirty(page);
ret = -EIO;
- }
+ } else
+ __test_clear_page_dirty(mapping, page);
unlock_page(page);
}
pagevec_release(&pvec);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow
2006-12-13 15:01 ` Peter Zijlstra
@ 2006-12-13 17:40 ` Trond Myklebust
2006-12-13 18:48 ` Peter Zijlstra
0 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2006-12-13 17:40 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Andrew Morton, linux-kernel, Nick Piggin
On Wed, 2006-12-13 at 16:01 +0100, Peter Zijlstra wrote:
> On Wed, 2006-12-13 at 07:26 -0500, Trond Myklebust wrote:
> > On Wed, 2006-12-13 at 13:12 +0100, Peter Zijlstra wrote:
> > > Still testing this patch, but it looks good so far.
> > >
> > > ---
> > > Just setting PG_dirty can cause NR_FILE_DIRTY to underflow
> > > which is bad (TM).
> > >
> > > Use set_page_dirty() which will do the right thing.
> >
> > Actually, I'd prefer to have it do the right thing by getting rid of
> > that call to test_clear_page_dirty() inside
> > invalidate_inode_pages2_range(). That is causing loss of data integrity,
> > and is what is causing us to have to hack NFS in the first place.
>
> Ah, I think I see what your problem is there.
> How about this totally untested patch:
>
> ---
> Delay clearing the dirty page state till after removing it from the
> mapping in invalidate_inode_pages2_range(). This will give
> try_to_release_pages() a shot to flush dirty data.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> fs/nfs/file.c | 2 --
> include/linux/page-flags.h | 2 ++
> mm/page-writeback.c | 17 +++++++++++------
> mm/truncate.c | 11 +++--------
> 4 files changed, 16 insertions(+), 16 deletions(-)
>
> Index: linux-2.6-git/fs/nfs/file.c
> ===================================================================
> --- linux-2.6-git.orig/fs/nfs/file.c 2006-12-13 15:31:26.000000000 +0100
> +++ linux-2.6-git/fs/nfs/file.c 2006-12-13 15:39:33.000000000 +0100
> @@ -320,8 +320,6 @@ static int nfs_release_page(struct page
> */
> if (!(gfp & __GFP_FS))
> return 0;
> - /* Hack... Force nfs_wb_page() to write out the page */
> - SetPageDirty(page);
> return !nfs_wb_page(page_file_mapping(page)->host, page);
> }
>
> Index: linux-2.6-git/include/linux/page-flags.h
> ===================================================================
> --- linux-2.6-git.orig/include/linux/page-flags.h 2006-12-13 15:35:50.000000000 +0100
> +++ linux-2.6-git/include/linux/page-flags.h 2006-12-13 15:36:14.000000000 +0100
> @@ -252,7 +252,9 @@ static inline void SetPageUptodate(struc
> #define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags)
>
> struct page; /* forward declaration */
> +struct address_space;
>
> +int __test_clear_page_dirty(struct address_space *mapping, struct page *page);
> int test_clear_page_dirty(struct page *page);
> int test_clear_page_writeback(struct page *page);
> int test_set_page_writeback(struct page *page);
> Index: linux-2.6-git/mm/page-writeback.c
> ===================================================================
> --- linux-2.6-git.orig/mm/page-writeback.c 2006-12-13 15:34:15.000000000 +0100
> +++ linux-2.6-git/mm/page-writeback.c 2006-12-13 15:39:41.000000000 +0100
> @@ -850,13 +850,8 @@ int set_page_dirty_lock(struct page *pag
> }
> EXPORT_SYMBOL(set_page_dirty_lock);
>
> -/*
> - * Clear a page's dirty flag, while caring for dirty memory accounting.
> - * Returns true if the page was previously dirty.
> - */
> -int test_clear_page_dirty(struct page *page)
> +int __test_clear_page_dirty(struct address_space *mapping, struct page *page)
> {
> - struct address_space *mapping = page_mapping(page);
> unsigned long flags;
>
> if (!mapping)
> @@ -880,6 +875,16 @@ int test_clear_page_dirty(struct page *p
> write_unlock_irqrestore(&mapping->tree_lock, flags);
> return 0;
> }
> +
> +/*
> + * Clear a page's dirty flag, while caring for dirty memory accounting.
> + * Returns true if the page was previously dirty.
> + */
> +int test_clear_page_dirty(struct page *page)
> +{
> + struct address_space *mapping = page_mapping(page);
> + return __test_clear_page_dirty(mapping, page);
> +}
> EXPORT_SYMBOL(test_clear_page_dirty);
>
> /*
> Index: linux-2.6-git/mm/truncate.c
> ===================================================================
> --- linux-2.6-git.orig/mm/truncate.c 2006-12-13 15:36:38.000000000 +0100
> +++ linux-2.6-git/mm/truncate.c 2006-12-13 15:44:01.000000000 +0100
> @@ -307,18 +307,12 @@ invalidate_complete_page2(struct address
> return 0;
>
> write_lock_irq(&mapping->tree_lock);
> - if (PageDirty(page))
> - goto failed;
> -
> BUG_ON(PagePrivate(page));
> __remove_from_page_cache(page);
> write_unlock_irq(&mapping->tree_lock);
> ClearPageUptodate(page);
> page_cache_release(page); /* pagecache ref */
> return 1;
> -failed:
> - write_unlock_irq(&mapping->tree_lock);
> - return 0;
> }
>
> /**
> @@ -386,12 +380,13 @@ int invalidate_inode_pages2_range(struct
> PAGE_CACHE_SIZE, 0);
> }
> }
> - was_dirty = test_clear_page_dirty(page);
> + was_dirty = PageDirty(page);
> if (!invalidate_complete_page2(mapping, page)) {
> if (was_dirty)
> set_page_dirty(page);
^^^^^^^^^^^^^^^^^^^^^
Why would you still need this?
> ret = -EIO;
> - }
> + } else
> + __test_clear_page_dirty(mapping, page);
> unlock_page(page);
> }
> pagevec_release(&pvec);
>
Cheers
Trond
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow
2006-12-13 17:40 ` Trond Myklebust
@ 2006-12-13 18:48 ` Peter Zijlstra
2006-12-14 1:29 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2006-12-13 18:48 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Andrew Morton, linux-kernel, Nick Piggin
On Wed, 2006-12-13 at 12:40 -0500, Trond Myklebust wrote:
> On Wed, 2006-12-13 at 16:01 +0100, Peter Zijlstra wrote:
> > @@ -386,12 +380,13 @@ int invalidate_inode_pages2_range(struct
> > PAGE_CACHE_SIZE, 0);
> > }
> > }
> > - was_dirty = test_clear_page_dirty(page);
> > + was_dirty = PageDirty(page);
> > if (!invalidate_complete_page2(mapping, page)) {
> > if (was_dirty)
> > set_page_dirty(page);
> ^^^^^^^^^^^^^^^^^^^^^
>
> Why would you still need this?
Hmm yeah, that doesn't make sense. What kind of twist did I get my brain
into.
This should do.
Hmm, does this agree with the DIO path? I just noticed
generic_file_direct_IO() also uses invalidate_inode_pages2_range().
I think it does, its looks like all they want is to ensure nothing
remains in the pagecache after a write, which itself ensures nothing is
dirty to begin with.
---
Delay clearing the dirty page state till after we've invalidated the
page in invalidate_complete_page2(). This gives try_to_release_pages() a
chance to flush dirty data.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
fs/nfs/file.c | 2 --
mm/truncate.c | 14 ++------------
2 files changed, 2 insertions(+), 14 deletions(-)
Index: linux-2.6-git/fs/nfs/file.c
===================================================================
--- linux-2.6-git.orig/fs/nfs/file.c 2006-12-13 19:41:09.000000000 +0100
+++ linux-2.6-git/fs/nfs/file.c 2006-12-13 19:42:18.000000000 +0100
@@ -320,8 +320,6 @@ static int nfs_release_page(struct page
*/
if (!(gfp & __GFP_FS))
return 0;
- /* Hack... Force nfs_wb_page() to write out the page */
- set_page_dirty(page);
return !nfs_wb_page(page_file_mapping(page)->host, page);
}
Index: linux-2.6-git/mm/truncate.c
===================================================================
--- linux-2.6-git.orig/mm/truncate.c 2006-12-13 19:41:09.000000000 +0100
+++ linux-2.6-git/mm/truncate.c 2006-12-13 19:42:56.000000000 +0100
@@ -306,19 +306,14 @@ invalidate_complete_page2(struct address
if (PagePrivate(page) && !try_to_release_page(page, GFP_KERNEL))
return 0;
+ test_clear_page_dirty(page);
write_lock_irq(&mapping->tree_lock);
- if (PageDirty(page))
- goto failed;
-
BUG_ON(PagePrivate(page));
__remove_from_page_cache(page);
write_unlock_irq(&mapping->tree_lock);
ClearPageUptodate(page);
page_cache_release(page); /* pagecache ref */
return 1;
-failed:
- write_unlock_irq(&mapping->tree_lock);
- return 0;
}
/**
@@ -350,7 +345,6 @@ int invalidate_inode_pages2_range(struct
for (i = 0; !ret && i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
pgoff_t page_index;
- int was_dirty;
lock_page(page);
if (page->mapping != mapping) {
@@ -386,12 +380,8 @@ int invalidate_inode_pages2_range(struct
PAGE_CACHE_SIZE, 0);
}
}
- was_dirty = test_clear_page_dirty(page);
- if (!invalidate_complete_page2(mapping, page)) {
- if (was_dirty)
- set_page_dirty(page);
+ if (!invalidate_complete_page2(mapping, page))
ret = -EIO;
- }
unlock_page(page);
}
pagevec_release(&pvec);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow
2006-12-13 18:48 ` Peter Zijlstra
@ 2006-12-14 1:29 ` Andrew Morton
2006-12-14 1:41 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2006-12-14 1:29 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Trond Myklebust, linux-kernel, Nick Piggin
On Wed, 13 Dec 2006 19:48:34 +0100
Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> --- linux-2.6-git.orig/mm/truncate.c 2006-12-13 19:41:09.000000000 +0100
> +++ linux-2.6-git/mm/truncate.c 2006-12-13 19:42:56.000000000 +0100
> @@ -306,19 +306,14 @@ invalidate_complete_page2(struct address
> if (PagePrivate(page) && !try_to_release_page(page, GFP_KERNEL))
> return 0;
>
> + test_clear_page_dirty(page);
> write_lock_irq(&mapping->tree_lock);
> - if (PageDirty(page))
> - goto failed;
> -
> BUG_ON(PagePrivate(page));
> __remove_from_page_cache(page);
> write_unlock_irq(&mapping->tree_lock);
> ClearPageUptodate(page);
> page_cache_release(page); /* pagecache ref */
> return 1;
> -failed:
> - write_unlock_irq(&mapping->tree_lock);
> - return 0;
> }
>
> /**
> @@ -350,7 +345,6 @@ int invalidate_inode_pages2_range(struct
> for (i = 0; !ret && i < pagevec_count(&pvec); i++) {
> struct page *page = pvec.pages[i];
> pgoff_t page_index;
> - int was_dirty;
>
> lock_page(page);
> if (page->mapping != mapping) {
> @@ -386,12 +380,8 @@ int invalidate_inode_pages2_range(struct
> PAGE_CACHE_SIZE, 0);
> }
> }
> - was_dirty = test_clear_page_dirty(page);
> - if (!invalidate_complete_page2(mapping, page)) {
> - if (was_dirty)
> - set_page_dirty(page);
> + if (!invalidate_complete_page2(mapping, page))
> ret = -EIO;
> - }
> unlock_page(page);
> }
> pagevec_release(&pvec);
a) we're now calling try_to_release_page() with a potentially-dirty
page, whereas it was previously clean.
I wouldn't expect ->releasepage() implementations to go looking at
PG_Dirty, because that's not what they're suppoed to be interested in.
But they might do, dunno.
b) If invalidate_complete_page2() failed due to, say, dirty buffer_heads
then we now have a clean page with dirty buffers. That is an illegal
state and the page will leak permanently.
I _think_ that's what the was_dirty logic is in there for: to
preserve the correct page-vs-buffers dirtiness coherency. But I'd need
to do some 2.5.x changelog-dumpster-diving to be sure.
Sigh. I don't know what invalidate_inode_pages2() is *supposed to do*.
What are its semantics wrt unfreeable page metadata? Against page
dirtiness?
It was written for direct-io and had one set of semantics for that, then
NFS came along and seemed to require a slightly different set of semantics,
even though the earlier semantics weren't really defined, leading to a
belief that the present semantics are "wrong", without a definition of what
semantics NFS actually desires.
So let's start again.
Trond, please define precisely and completely and without reference to
the existing implementation: what behaviour does NFS want?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow
2006-12-14 1:29 ` Andrew Morton
@ 2006-12-14 1:41 ` Andrew Morton
2006-12-14 14:52 ` Trond Myklebust
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2006-12-14 1:41 UTC (permalink / raw)
To: Peter Zijlstra, Trond Myklebust, linux-kernel, Nick Piggin
On Wed, 13 Dec 2006 17:29:21 -0800
Andrew Morton <akpm@osdl.org> wrote:
> a) we're now calling try_to_release_page() with a potentially-dirty
> page, whereas it was previously clean.
>
> I wouldn't expect ->releasepage() implementations to go looking at
> PG_Dirty, because that's not what they're suppoed to be interested in.
> But they might do, dunno.
Still an issue, probably minor.
> b) If invalidate_complete_page2() failed due to, say, dirty buffer_heads
> then we now have a clean page with dirty buffers. That is an illegal
> state and the page will leak permanently.
>
> I _think_ that's what the was_dirty logic is in there for: to
> preserve the correct page-vs-buffers dirtiness coherency. But I'd need
> to do some 2.5.x changelog-dumpster-diving to be sure.
no, that's bs. The patch looks OK from that POV: try_to_release_page()
will be able to clear clean buffers from a dirty page.
And in fact if it did that, it will then clean the page for us (see
test_clear_page_dirty() in try_to_free_buffers()).
But we still need the clear_page_dirty() in invalidate_complete_page2() in
case we didn't call try_to_release_page() at all.
> Trond, please define precisely and completely and without reference to
> the existing implementation: what behaviour does NFS want?
But this would be nice.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nfs: fix NR_FILE_DIRTY underflow
2006-12-14 1:41 ` Andrew Morton
@ 2006-12-14 14:52 ` Trond Myklebust
0 siblings, 0 replies; 9+ messages in thread
From: Trond Myklebust @ 2006-12-14 14:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: Peter Zijlstra, linux-kernel, Nick Piggin
On Wed, 2006-12-13 at 17:41 -0800, Andrew Morton wrote:
> > Trond, please define precisely and completely and without reference to
> > the existing implementation: what behaviour does NFS want?
Part of the behaviour is dictated by our needs for
invalidate_inode_pages2(), part of the behaviour is dictated by the
2-stage NFS writeout mechanism.
Starting with invalidate_inode_pages2(). The goal here is to have a
function that is guaranteed to force all currently cached pages to be
read in from the server afresh. In other words, we'd like to throw out
the old mapping of the file, and start with a new one. However, we also
need to guarantee that any modifications that a user may have made to
the file are preserved. Throwing out a dirty page is forbidden unless
the data has been written to disk first.
What we're doing in try_to_release_page() in the current implementation
of invalidate_inode_pages2() is therefore to fix races: the goal is to
ensure that we write back any dirty data before the page is removed.
This is made necessary by the fact that the VM may at any time override
our call to unmap_mapping_range() and allow the page to be re-dirtied by
the user.
Next: about the 2-stage NFS writeout mechanism. A client may want to
allow the server to cache writeback data for a while, so that it can
flush several WRITE RPC calls to disk at the same time, maximising I/O
efficiency in terms of elevator algorithms etc.
It signals this to the server by means of the 'unstable' flag on the
WRITE RPC call, which allows the former to just write the data into its
pagecache. Before the client can actually free up the page it still
needs to know that the data it wrote to the server has been flushed from
the pagecache onto disk, and this is done by sending the COMMIT RPC
request. The latter acts rather like fdatasync() does on local data: it
guarantees that all file data within a specified range has been flushed
to disk. Failure of the COMMIT request signals to the client that the
server may have rebooted, and so the page needs to be written out again.
Our second use of try_to_release_page() is therefore to ensure that the
server has indeed flushed that dirty data from its pagecache to its disk
before we allow the VM to release the page on the client. We don't send
a COMMIT request for each and every page that is to be released, but we
do ensure that at least one COMMIT has been sent that covers the data
range represented by the page to be freed.
Cheers
Trond
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-12-14 14:52 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-13 12:12 [PATCH] nfs: fix NR_FILE_DIRTY underflow Peter Zijlstra
2006-12-13 12:26 ` Trond Myklebust
2006-12-13 15:01 ` Peter Zijlstra
2006-12-13 17:40 ` Trond Myklebust
2006-12-13 18:48 ` Peter Zijlstra
2006-12-14 1:29 ` Andrew Morton
2006-12-14 1:41 ` Andrew Morton
2006-12-14 14:52 ` Trond Myklebust
2006-12-13 16:22 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox