* Re: Possible problem with commit a6305ddb080 : NFS: Fix a race with the new commit code
[not found] ` <20100427143542.001f8dbe-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2010-04-27 12:00 ` Trond Myklebust
[not found] ` <1272369635.16814.52.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2010-04-27 12:00 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-nfs
On Tue, 2010-04-27 at 14:35 +1000, Neil Brown wrote:
> Hi Trond,
> I think the above mentioned commit might have added a new race to replace
> the old ....
>
> I have report of a BUG in nfs_page_async_flush.
>
> It isn't a vanilla upstream kernel - there are a bunch of SUSE patches
> in there - so quoting the line-number won't help you, but it is the
> BUG_ON(ret != 0);
> after the call to nfs_set_page_writeback.
> (https://bugzilla.novell.com/show_bug.cgi?id=599628)
>
> This implies that nfs_find_and_lock_request got a new lock on the page,
> and then we found that it was already flagged for writeback.
That's odd. Callers such as write_cache_pages() should normally be doing
a wait_on_page_writeback() after taking the page lock but prior to
calling the filesystem.
> The commit mentioned create just such an opportunity. It reorders things
> so that a page is unlocked before writeback is cleared, thus creating a window
> for that BUG to fire.
>
> What is the race that you were trying to fix?
I want to ensure that the call to write_inode() immediately after
filemap_fdatawait() in the function writeback_single_inode() works
correctly. Prior to this fix, there was a race whereby the Pg_writeback
flag could be cleared, but the nfs_page structure would still be locked.
This again would result in the page being skipped by nfs_scan_list() and
so it wouldn't be registered as COMMITed.
Cheers
Trond
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Possible problem with commit a6305ddb080 : NFS: Fix a race with the new commit code
[not found] ` <1272369635.16814.52.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2010-04-27 22:21 ` Trond Myklebust
[not found] ` <1272406873.14667.6.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2010-04-27 22:21 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-nfs
On Tue, 2010-04-27 at 08:00 -0400, Trond Myklebust wrote:
> On Tue, 2010-04-27 at 14:35 +1000, Neil Brown wrote:
> > Hi Trond,
> > I think the above mentioned commit might have added a new race to replace
> > the old ....
> >
> > I have report of a BUG in nfs_page_async_flush.
> >
> > It isn't a vanilla upstream kernel - there are a bunch of SUSE patches
> > in there - so quoting the line-number won't help you, but it is the
> > BUG_ON(ret != 0);
> > after the call to nfs_set_page_writeback.
> > (https://bugzilla.novell.com/show_bug.cgi?id=599628)
> >
> > This implies that nfs_find_and_lock_request got a new lock on the page,
> > and then we found that it was already flagged for writeback.
>
> That's odd. Callers such as write_cache_pages() should normally be doing
> a wait_on_page_writeback() after taking the page lock but prior to
> calling the filesystem.
The following patch ought to fix it. I suspect the same race exists in
the ->readpage() path, so it makes sense to fix nfs_wb_page() rather
than putting the wait_on_page_writeback call in
nfs_try_to_update_request().
Cheers
Trond
------------------------------------------------------------------------------------------------
NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in
nfs_page_async_flush.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
fs/nfs/write.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ccde2ae..c700698 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1472,6 +1472,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
BUG_ON(!PageLocked(page));
for (;;) {
+ wait_on_page_writeback(page);
req = nfs_page_find_request(page);
if (req == NULL)
break;
@@ -1511,10 +1512,12 @@ int nfs_wb_page(struct inode *inode, struct page *page)
int ret;
while(PagePrivate(page)) {
+ wait_on_page_writeback(page);
if (clear_page_dirty_for_io(page)) {
ret = nfs_writepage_locked(page, &wbc);
if (ret < 0)
goto out_error;
+ continue;
}
req = nfs_find_and_lock_request(page);
if (!req)
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: Possible problem with commit a6305ddb080 : NFS: Fix a race with the new commit code
[not found] ` <1272406873.14667.6.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2010-04-27 22:35 ` Trond Myklebust
[not found] ` <1272407756.14667.17.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2010-04-27 22:35 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-nfs
On Tue, 2010-04-27 at 18:21 -0400, Trond Myklebust wrote:
> On Tue, 2010-04-27 at 08:00 -0400, Trond Myklebust wrote:
> > On Tue, 2010-04-27 at 14:35 +1000, Neil Brown wrote:
> > > Hi Trond,
> > > I think the above mentioned commit might have added a new race to replace
> > > the old ....
> > >
> > > I have report of a BUG in nfs_page_async_flush.
> > >
> > > It isn't a vanilla upstream kernel - there are a bunch of SUSE patches
> > > in there - so quoting the line-number won't help you, but it is the
> > > BUG_ON(ret != 0);
> > > after the call to nfs_set_page_writeback.
> > > (https://bugzilla.novell.com/show_bug.cgi?id=599628)
> > >
> > > This implies that nfs_find_and_lock_request got a new lock on the page,
> > > and then we found that it was already flagged for writeback.
> >
> > That's odd. Callers such as write_cache_pages() should normally be doing
> > a wait_on_page_writeback() after taking the page lock but prior to
> > calling the filesystem.
>
> The following patch ought to fix it. I suspect the same race exists in
> the ->readpage() path, so it makes sense to fix nfs_wb_page() rather
> than putting the wait_on_page_writeback call in
> nfs_try_to_update_request().
Actually, this patch is even better since it cleans up nfs_wb_page()
too.
Cheers
Trond
------------------------------------------------------------------------------------------
NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in
nfs_page_async_flush. According to the trace in
https://bugzilla.novell.com/show_bug.cgi?id=599628
the problem appears to be due to nfs_wb_page() not waiting for the
PG_writeback flag to clear.
There is a ditto problem in nfs_wb_page_cancel()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
fs/nfs/write.c | 19 ++++---------------
1 files changed, 4 insertions(+), 15 deletions(-)
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ccde2ae..3aea3ca 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1472,6 +1472,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
BUG_ON(!PageLocked(page));
for (;;) {
+ wait_on_page_writeback(page);
req = nfs_page_find_request(page);
if (req == NULL)
break;
@@ -1506,30 +1507,18 @@ int nfs_wb_page(struct inode *inode, struct page *page)
.range_start = range_start,
.range_end = range_end,
};
- struct nfs_page *req;
- int need_commit;
int ret;
while(PagePrivate(page)) {
+ wait_on_page_writeback(page);
if (clear_page_dirty_for_io(page)) {
ret = nfs_writepage_locked(page, &wbc);
if (ret < 0)
goto out_error;
}
- req = nfs_find_and_lock_request(page);
- if (!req)
- break;
- if (IS_ERR(req)) {
- ret = PTR_ERR(req);
+ ret = sync_inode(inode, &wbc);
+ if (ret < 0)
goto out_error;
- }
- need_commit = test_bit(PG_CLEAN, &req->wb_flags);
- nfs_clear_page_tag_locked(req);
- if (need_commit) {
- ret = nfs_commit_inode(inode, FLUSH_SYNC);
- if (ret < 0)
- goto out_error;
- }
}
return 0;
out_error:
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: Possible problem with commit a6305ddb080 : NFS: Fix a race with the new commit code
[not found] ` <1272407756.14667.17.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2010-05-03 1:34 ` Neil Brown
0 siblings, 0 replies; 4+ messages in thread
From: Neil Brown @ 2010-05-03 1:34 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-nfs
On Tue, 27 Apr 2010 18:35:56 -0400
Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> On Tue, 2010-04-27 at 18:21 -0400, Trond Myklebust wrote:
> > On Tue, 2010-04-27 at 08:00 -0400, Trond Myklebust wrote:
> > > On Tue, 2010-04-27 at 14:35 +1000, Neil Brown wrote:
> > > > Hi Trond,
> > > > I think the above mentioned commit might have added a new race to replace
> > > > the old ....
> > > >
> > > > I have report of a BUG in nfs_page_async_flush.
> > > >
> > > > It isn't a vanilla upstream kernel - there are a bunch of SUSE patches
> > > > in there - so quoting the line-number won't help you, but it is the
> > > > BUG_ON(ret != 0);
> > > > after the call to nfs_set_page_writeback.
> > > > (https://bugzilla.novell.com/show_bug.cgi?id=599628)
> > > >
> > > > This implies that nfs_find_and_lock_request got a new lock on the page,
> > > > and then we found that it was already flagged for writeback.
> > >
> > > That's odd. Callers such as write_cache_pages() should normally be doing
> > > a wait_on_page_writeback() after taking the page lock but prior to
> > > calling the filesystem.
> >
> > The following patch ought to fix it. I suspect the same race exists in
> > the ->readpage() path, so it makes sense to fix nfs_wb_page() rather
> > than putting the wait_on_page_writeback call in
> > nfs_try_to_update_request().
>
> Actually, this patch is even better since it cleans up nfs_wb_page()
> too.
Thanks Trond!
I won't pretend to completely understand it, but it certainly looks credible
and removes some code, which is always nice!
I don't think the problem was easily reproducible so I cannot easily test
if this fixes it, so I'll just assume it does and let you know if I
hear otherwise.
Thanks,
NeilBrown
>
> Cheers
> Trond
> ------------------------------------------------------------------------------------------
> NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
>
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
>
> Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in
> nfs_page_async_flush. According to the trace in
> https://bugzilla.novell.com/show_bug.cgi?id=599628
> the problem appears to be due to nfs_wb_page() not waiting for the
> PG_writeback flag to clear.
>
> There is a ditto problem in nfs_wb_page_cancel()
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> fs/nfs/write.c | 19 ++++---------------
> 1 files changed, 4 insertions(+), 15 deletions(-)
>
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index ccde2ae..3aea3ca 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1472,6 +1472,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
>
> BUG_ON(!PageLocked(page));
> for (;;) {
> + wait_on_page_writeback(page);
> req = nfs_page_find_request(page);
> if (req == NULL)
> break;
> @@ -1506,30 +1507,18 @@ int nfs_wb_page(struct inode *inode, struct page *page)
> .range_start = range_start,
> .range_end = range_end,
> };
> - struct nfs_page *req;
> - int need_commit;
> int ret;
>
> while(PagePrivate(page)) {
> + wait_on_page_writeback(page);
> if (clear_page_dirty_for_io(page)) {
> ret = nfs_writepage_locked(page, &wbc);
> if (ret < 0)
> goto out_error;
> }
> - req = nfs_find_and_lock_request(page);
> - if (!req)
> - break;
> - if (IS_ERR(req)) {
> - ret = PTR_ERR(req);
> + ret = sync_inode(inode, &wbc);
> + if (ret < 0)
> goto out_error;
> - }
> - need_commit = test_bit(PG_CLEAN, &req->wb_flags);
> - nfs_clear_page_tag_locked(req);
> - if (need_commit) {
> - ret = nfs_commit_inode(inode, FLUSH_SYNC);
> - if (ret < 0)
> - goto out_error;
> - }
> }
> return 0;
> out_error:
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-05-03 1:35 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20100427143542.001f8dbe@notabene.brown>
[not found] ` <20100427143542.001f8dbe-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-04-27 12:00 ` Possible problem with commit a6305ddb080 : NFS: Fix a race with the new commit code Trond Myklebust
[not found] ` <1272369635.16814.52.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-04-27 22:21 ` Trond Myklebust
[not found] ` <1272406873.14667.6.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-04-27 22:35 ` Trond Myklebust
[not found] ` <1272407756.14667.17.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-05-03 1:34 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox