* Re: Possible problem with commit a6305ddb080 : NFS: Fix a race with the new commit code [not found] ` <20100427143542.001f8dbe-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> @ 2010-04-27 12:00 ` Trond Myklebust [not found] ` <1272369635.16814.52.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Trond Myklebust @ 2010-04-27 12:00 UTC (permalink / raw) To: Neil Brown; +Cc: linux-nfs On Tue, 2010-04-27 at 14:35 +1000, Neil Brown wrote: > Hi Trond, > I think the above mentioned commit might have added a new race to replace > the old .... > > I have report of a BUG in nfs_page_async_flush. > > It isn't a vanilla upstream kernel - there are a bunch of SUSE patches > in there - so quoting the line-number won't help you, but it is the > BUG_ON(ret != 0); > after the call to nfs_set_page_writeback. > (https://bugzilla.novell.com/show_bug.cgi?id=599628) > > This implies that nfs_find_and_lock_request got a new lock on the page, > and then we found that it was already flagged for writeback. That's odd. Callers such as write_cache_pages() should normally be doing a wait_on_page_writeback() after taking the page lock but prior to calling the filesystem. > The commit mentioned create just such an opportunity. It reorders things > so that a page is unlocked before writeback is cleared, thus creating a window > for that BUG to fire. > > What is the race that you were trying to fix? I want to ensure that the call to write_inode() immediately after filemap_fdatawait() in the function writeback_single_inode() works correctly. Prior to this fix, there was a race whereby the Pg_writeback flag could be cleared, but the nfs_page structure would still be locked. This again would result in the page being skipped by nfs_scan_list() and so it wouldn't be registered as COMMITed. Cheers Trond ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <1272369635.16814.52.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>]
* Re: Possible problem with commit a6305ddb080 : NFS: Fix a race with the new commit code [not found] ` <1272369635.16814.52.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> @ 2010-04-27 22:21 ` Trond Myklebust [not found] ` <1272406873.14667.6.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Trond Myklebust @ 2010-04-27 22:21 UTC (permalink / raw) To: Neil Brown; +Cc: linux-nfs On Tue, 2010-04-27 at 08:00 -0400, Trond Myklebust wrote: > On Tue, 2010-04-27 at 14:35 +1000, Neil Brown wrote: > > Hi Trond, > > I think the above mentioned commit might have added a new race to replace > > the old .... > > > > I have report of a BUG in nfs_page_async_flush. > > > > It isn't a vanilla upstream kernel - there are a bunch of SUSE patches > > in there - so quoting the line-number won't help you, but it is the > > BUG_ON(ret != 0); > > after the call to nfs_set_page_writeback. > > (https://bugzilla.novell.com/show_bug.cgi?id=599628) > > > > This implies that nfs_find_and_lock_request got a new lock on the page, > > and then we found that it was already flagged for writeback. > > That's odd. Callers such as write_cache_pages() should normally be doing > a wait_on_page_writeback() after taking the page lock but prior to > calling the filesystem. The following patch ought to fix it. I suspect the same race exists in the ->readpage() path, so it makes sense to fix nfs_wb_page() rather than putting the wait_on_page_writeback call in nfs_try_to_update_request(). Cheers Trond ------------------------------------------------------------------------------------------------ NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear From: Trond Myklebust <Trond.Myklebust@netapp.com> Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in nfs_page_async_flush. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> --- fs/nfs/write.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index ccde2ae..c700698 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1472,6 +1472,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page) BUG_ON(!PageLocked(page)); for (;;) { + wait_on_page_writeback(page); req = nfs_page_find_request(page); if (req == NULL) break; @@ -1511,10 +1512,12 @@ int nfs_wb_page(struct inode *inode, struct page *page) int ret; while(PagePrivate(page)) { + wait_on_page_writeback(page); if (clear_page_dirty_for_io(page)) { ret = nfs_writepage_locked(page, &wbc); if (ret < 0) goto out_error; + continue; } req = nfs_find_and_lock_request(page); if (!req) ^ permalink raw reply related [flat|nested] 4+ messages in thread
[parent not found: <1272406873.14667.6.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>]
* Re: Possible problem with commit a6305ddb080 : NFS: Fix a race with the new commit code [not found] ` <1272406873.14667.6.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> @ 2010-04-27 22:35 ` Trond Myklebust [not found] ` <1272407756.14667.17.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Trond Myklebust @ 2010-04-27 22:35 UTC (permalink / raw) To: Neil Brown; +Cc: linux-nfs On Tue, 2010-04-27 at 18:21 -0400, Trond Myklebust wrote: > On Tue, 2010-04-27 at 08:00 -0400, Trond Myklebust wrote: > > On Tue, 2010-04-27 at 14:35 +1000, Neil Brown wrote: > > > Hi Trond, > > > I think the above mentioned commit might have added a new race to replace > > > the old .... > > > > > > I have report of a BUG in nfs_page_async_flush. > > > > > > It isn't a vanilla upstream kernel - there are a bunch of SUSE patches > > > in there - so quoting the line-number won't help you, but it is the > > > BUG_ON(ret != 0); > > > after the call to nfs_set_page_writeback. > > > (https://bugzilla.novell.com/show_bug.cgi?id=599628) > > > > > > This implies that nfs_find_and_lock_request got a new lock on the page, > > > and then we found that it was already flagged for writeback. > > > > That's odd. Callers such as write_cache_pages() should normally be doing > > a wait_on_page_writeback() after taking the page lock but prior to > > calling the filesystem. > > The following patch ought to fix it. I suspect the same race exists in > the ->readpage() path, so it makes sense to fix nfs_wb_page() rather > than putting the wait_on_page_writeback call in > nfs_try_to_update_request(). Actually, this patch is even better since it cleans up nfs_wb_page() too. Cheers Trond ------------------------------------------------------------------------------------------ NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear From: Trond Myklebust <Trond.Myklebust@netapp.com> Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in nfs_page_async_flush. According to the trace in https://bugzilla.novell.com/show_bug.cgi?id=599628 the problem appears to be due to nfs_wb_page() not waiting for the PG_writeback flag to clear. There is a ditto problem in nfs_wb_page_cancel() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> --- fs/nfs/write.c | 19 ++++--------------- 1 files changed, 4 insertions(+), 15 deletions(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index ccde2ae..3aea3ca 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1472,6 +1472,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page) BUG_ON(!PageLocked(page)); for (;;) { + wait_on_page_writeback(page); req = nfs_page_find_request(page); if (req == NULL) break; @@ -1506,30 +1507,18 @@ int nfs_wb_page(struct inode *inode, struct page *page) .range_start = range_start, .range_end = range_end, }; - struct nfs_page *req; - int need_commit; int ret; while(PagePrivate(page)) { + wait_on_page_writeback(page); if (clear_page_dirty_for_io(page)) { ret = nfs_writepage_locked(page, &wbc); if (ret < 0) goto out_error; } - req = nfs_find_and_lock_request(page); - if (!req) - break; - if (IS_ERR(req)) { - ret = PTR_ERR(req); + ret = sync_inode(inode, &wbc); + if (ret < 0) goto out_error; - } - need_commit = test_bit(PG_CLEAN, &req->wb_flags); - nfs_clear_page_tag_locked(req); - if (need_commit) { - ret = nfs_commit_inode(inode, FLUSH_SYNC); - if (ret < 0) - goto out_error; - } } return 0; out_error: ^ permalink raw reply related [flat|nested] 4+ messages in thread
[parent not found: <1272407756.14667.17.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>]
* Re: Possible problem with commit a6305ddb080 : NFS: Fix a race with the new commit code [not found] ` <1272407756.14667.17.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> @ 2010-05-03 1:34 ` Neil Brown 0 siblings, 0 replies; 4+ messages in thread From: Neil Brown @ 2010-05-03 1:34 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Tue, 27 Apr 2010 18:35:56 -0400 Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Tue, 2010-04-27 at 18:21 -0400, Trond Myklebust wrote: > > On Tue, 2010-04-27 at 08:00 -0400, Trond Myklebust wrote: > > > On Tue, 2010-04-27 at 14:35 +1000, Neil Brown wrote: > > > > Hi Trond, > > > > I think the above mentioned commit might have added a new race to replace > > > > the old .... > > > > > > > > I have report of a BUG in nfs_page_async_flush. > > > > > > > > It isn't a vanilla upstream kernel - there are a bunch of SUSE patches > > > > in there - so quoting the line-number won't help you, but it is the > > > > BUG_ON(ret != 0); > > > > after the call to nfs_set_page_writeback. > > > > (https://bugzilla.novell.com/show_bug.cgi?id=599628) > > > > > > > > This implies that nfs_find_and_lock_request got a new lock on the page, > > > > and then we found that it was already flagged for writeback. > > > > > > That's odd. Callers such as write_cache_pages() should normally be doing > > > a wait_on_page_writeback() after taking the page lock but prior to > > > calling the filesystem. > > > > The following patch ought to fix it. I suspect the same race exists in > > the ->readpage() path, so it makes sense to fix nfs_wb_page() rather > > than putting the wait_on_page_writeback call in > > nfs_try_to_update_request(). > > Actually, this patch is even better since it cleans up nfs_wb_page() > too. Thanks Trond! I won't pretend to completely understand it, but it certainly looks credible and removes some code, which is always nice! I don't think the problem was easily reproducible so I cannot easily test if this fixes it, so I'll just assume it does and let you know if I hear otherwise. Thanks, NeilBrown > > Cheers > Trond > ------------------------------------------------------------------------------------------ > NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear > > From: Trond Myklebust <Trond.Myklebust@netapp.com> > > Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in > nfs_page_async_flush. According to the trace in > https://bugzilla.novell.com/show_bug.cgi?id=599628 > the problem appears to be due to nfs_wb_page() not waiting for the > PG_writeback flag to clear. > > There is a ditto problem in nfs_wb_page_cancel() > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > --- > > fs/nfs/write.c | 19 ++++--------------- > 1 files changed, 4 insertions(+), 15 deletions(-) > > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c > index ccde2ae..3aea3ca 100644 > --- a/fs/nfs/write.c > +++ b/fs/nfs/write.c > @@ -1472,6 +1472,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page) > > BUG_ON(!PageLocked(page)); > for (;;) { > + wait_on_page_writeback(page); > req = nfs_page_find_request(page); > if (req == NULL) > break; > @@ -1506,30 +1507,18 @@ int nfs_wb_page(struct inode *inode, struct page *page) > .range_start = range_start, > .range_end = range_end, > }; > - struct nfs_page *req; > - int need_commit; > int ret; > > while(PagePrivate(page)) { > + wait_on_page_writeback(page); > if (clear_page_dirty_for_io(page)) { > ret = nfs_writepage_locked(page, &wbc); > if (ret < 0) > goto out_error; > } > - req = nfs_find_and_lock_request(page); > - if (!req) > - break; > - if (IS_ERR(req)) { > - ret = PTR_ERR(req); > + ret = sync_inode(inode, &wbc); > + if (ret < 0) > goto out_error; > - } > - need_commit = test_bit(PG_CLEAN, &req->wb_flags); > - nfs_clear_page_tag_locked(req); > - if (need_commit) { > - ret = nfs_commit_inode(inode, FLUSH_SYNC); > - if (ret < 0) > - goto out_error; > - } > } > return 0; > out_error: > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-05-03 1:35 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20100427143542.001f8dbe@notabene.brown>
[not found] ` <20100427143542.001f8dbe-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-04-27 12:00 ` Possible problem with commit a6305ddb080 : NFS: Fix a race with the new commit code Trond Myklebust
[not found] ` <1272369635.16814.52.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-04-27 22:21 ` Trond Myklebust
[not found] ` <1272406873.14667.6.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-04-27 22:35 ` Trond Myklebust
[not found] ` <1272407756.14667.17.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-05-03 1:34 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox