* [PATCH] writeback: Fix data corruption on NFS
@ 2013-12-11 19:59 Jan Kara
2013-12-14 5:45 ` Fengguang Wu
0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2013-12-11 19:59 UTC (permalink / raw)
To: Wu Fengguang
Cc: linux-fsdevel, Dan Duval, trond.myklebust, chuck.lever, Jan Kara
Commit 4f8ad655dbc8 "writeback: Refactor writeback_single_inode()" added
a condition to skip clean inode. However this is wrong in WB_SYNC_ALL
mode because there we also want to wait for outstanding writeback on
possibly clean inode. This was causing occasional data corruption issues
on NFS because it uses sync_inode() to make sure all outstanding writes
are flushed to the server before truncating the inode and with
sync_inode() returning prematurely file was sometimes extended back
by an outstanding write after it was truncated.
So modify the test to also check for pages under writeback in
WB_SYNC_ALL mode.
CC: stable@vger.kernel.org # >= 3.5
Fixes: 4f8ad655dbc82cf05d2edc11e66b78a42d38bf93
Reported-and-tested-by: Dan Duval <dan.duval@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
Fenguang, can you please merge this patch? Thanks!
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1f4a10ece2f1..e0259a163f98 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -516,13 +516,16 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
}
WARN_ON(inode->i_state & I_SYNC);
/*
- * Skip inode if it is clean. We don't want to mess with writeback
- * lists in this function since flusher thread may be doing for example
- * sync in parallel and if we move the inode, it could get skipped. So
- * here we make sure inode is on some writeback list and leave it there
- * unless we have completely cleaned the inode.
+ * Skip inode if it is clean and we have no outstanding writeback in
+ * WB_SYNC_ALL mode. We don't want to mess with writeback lists in this
+ * function since flusher thread may be doing for example sync in
+ * parallel and if we move the inode, it could get skipped. So here we
+ * make sure inode is on some writeback list and leave it there unless
+ * we have completely cleaned the inode.
*/
- if (!(inode->i_state & I_DIRTY))
+ if (!(inode->i_state & I_DIRTY) &&
+ (wbc->sync_mode != WB_SYNC_ALL ||
+ !mapping_tagged(inode->i_mapping, PAGECACHE_TAG_WRITEBACK)))
goto out;
inode->i_state |= I_SYNC;
spin_unlock(&inode->i_lock);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] writeback: Fix data corruption on NFS
2013-12-11 19:59 [PATCH] writeback: Fix data corruption on NFS Jan Kara
@ 2013-12-14 5:45 ` Fengguang Wu
2014-01-13 10:58 ` Jan Kara
0 siblings, 1 reply; 4+ messages in thread
From: Fengguang Wu @ 2013-12-14 5:45 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-fsdevel, Dan Duval, trond.myklebust, chuck.lever, Jan Kara
On Wed, Dec 11, 2013 at 08:59:36PM +0100, Jan Kara wrote:
> Commit 4f8ad655dbc8 "writeback: Refactor writeback_single_inode()" added
> a condition to skip clean inode. However this is wrong in WB_SYNC_ALL
> mode because there we also want to wait for outstanding writeback on
> possibly clean inode. This was causing occasional data corruption issues
> on NFS because it uses sync_inode() to make sure all outstanding writes
> are flushed to the server before truncating the inode and with
> sync_inode() returning prematurely file was sometimes extended back
> by an outstanding write after it was truncated.
>
> So modify the test to also check for pages under writeback in
> WB_SYNC_ALL mode.
Applied to the writeback tree. Thank you, Jan!
Regards,
Fengguang
> CC: stable@vger.kernel.org # >= 3.5
> Fixes: 4f8ad655dbc82cf05d2edc11e66b78a42d38bf93
> Reported-and-tested-by: Dan Duval <dan.duval@oracle.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/fs-writeback.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> Fenguang, can you please merge this patch? Thanks!
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 1f4a10ece2f1..e0259a163f98 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -516,13 +516,16 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
> }
> WARN_ON(inode->i_state & I_SYNC);
> /*
> - * Skip inode if it is clean. We don't want to mess with writeback
> - * lists in this function since flusher thread may be doing for example
> - * sync in parallel and if we move the inode, it could get skipped. So
> - * here we make sure inode is on some writeback list and leave it there
> - * unless we have completely cleaned the inode.
> + * Skip inode if it is clean and we have no outstanding writeback in
> + * WB_SYNC_ALL mode. We don't want to mess with writeback lists in this
> + * function since flusher thread may be doing for example sync in
> + * parallel and if we move the inode, it could get skipped. So here we
> + * make sure inode is on some writeback list and leave it there unless
> + * we have completely cleaned the inode.
> */
> - if (!(inode->i_state & I_DIRTY))
> + if (!(inode->i_state & I_DIRTY) &&
> + (wbc->sync_mode != WB_SYNC_ALL ||
> + !mapping_tagged(inode->i_mapping, PAGECACHE_TAG_WRITEBACK)))
> goto out;
> inode->i_state |= I_SYNC;
> spin_unlock(&inode->i_lock);
> --
> 1.8.1.4
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] writeback: Fix data corruption on NFS
2013-12-14 5:45 ` Fengguang Wu
@ 2014-01-13 10:58 ` Jan Kara
2014-01-13 11:22 ` Fengguang Wu
0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2014-01-13 10:58 UTC (permalink / raw)
To: Fengguang Wu
Cc: Jan Kara, linux-fsdevel, Dan Duval, trond.myklebust, chuck.lever
On Sat 14-12-13 13:45:39, Wu Fengguang wrote:
> On Wed, Dec 11, 2013 at 08:59:36PM +0100, Jan Kara wrote:
> > Commit 4f8ad655dbc8 "writeback: Refactor writeback_single_inode()" added
> > a condition to skip clean inode. However this is wrong in WB_SYNC_ALL
> > mode because there we also want to wait for outstanding writeback on
> > possibly clean inode. This was causing occasional data corruption issues
> > on NFS because it uses sync_inode() to make sure all outstanding writes
> > are flushed to the server before truncating the inode and with
> > sync_inode() returning prematurely file was sometimes extended back
> > by an outstanding write after it was truncated.
> >
> > So modify the test to also check for pages under writeback in
> > WB_SYNC_ALL mode.
>
> Applied to the writeback tree. Thank you, Jan!
Didn't you forget to send pull request? I don't see the patch in mainline
yet and since this is a data corruption issue, it would be nice to get it
out for 3.13...
Honza
> Regards,
> Fengguang
>
> > CC: stable@vger.kernel.org # >= 3.5
> > Fixes: 4f8ad655dbc82cf05d2edc11e66b78a42d38bf93
> > Reported-and-tested-by: Dan Duval <dan.duval@oracle.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/fs-writeback.c | 15 +++++++++------
> > 1 file changed, 9 insertions(+), 6 deletions(-)
> >
> > Fenguang, can you please merge this patch? Thanks!
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 1f4a10ece2f1..e0259a163f98 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -516,13 +516,16 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
> > }
> > WARN_ON(inode->i_state & I_SYNC);
> > /*
> > - * Skip inode if it is clean. We don't want to mess with writeback
> > - * lists in this function since flusher thread may be doing for example
> > - * sync in parallel and if we move the inode, it could get skipped. So
> > - * here we make sure inode is on some writeback list and leave it there
> > - * unless we have completely cleaned the inode.
> > + * Skip inode if it is clean and we have no outstanding writeback in
> > + * WB_SYNC_ALL mode. We don't want to mess with writeback lists in this
> > + * function since flusher thread may be doing for example sync in
> > + * parallel and if we move the inode, it could get skipped. So here we
> > + * make sure inode is on some writeback list and leave it there unless
> > + * we have completely cleaned the inode.
> > */
> > - if (!(inode->i_state & I_DIRTY))
> > + if (!(inode->i_state & I_DIRTY) &&
> > + (wbc->sync_mode != WB_SYNC_ALL ||
> > + !mapping_tagged(inode->i_mapping, PAGECACHE_TAG_WRITEBACK)))
> > goto out;
> > inode->i_state |= I_SYNC;
> > spin_unlock(&inode->i_lock);
> > --
> > 1.8.1.4
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] writeback: Fix data corruption on NFS
2014-01-13 10:58 ` Jan Kara
@ 2014-01-13 11:22 ` Fengguang Wu
0 siblings, 0 replies; 4+ messages in thread
From: Fengguang Wu @ 2014-01-13 11:22 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Dan Duval, trond.myklebust, chuck.lever,
Linus Torvalds
On Mon, Jan 13, 2014 at 11:58:26AM +0100, Jan Kara wrote:
> On Sat 14-12-13 13:45:39, Wu Fengguang wrote:
> > On Wed, Dec 11, 2013 at 08:59:36PM +0100, Jan Kara wrote:
> > > Commit 4f8ad655dbc8 "writeback: Refactor writeback_single_inode()" added
> > > a condition to skip clean inode. However this is wrong in WB_SYNC_ALL
> > > mode because there we also want to wait for outstanding writeback on
> > > possibly clean inode. This was causing occasional data corruption issues
> > > on NFS because it uses sync_inode() to make sure all outstanding writes
> > > are flushed to the server before truncating the inode and with
> > > sync_inode() returning prematurely file was sometimes extended back
> > > by an outstanding write after it was truncated.
> > >
> > > So modify the test to also check for pages under writeback in
> > > WB_SYNC_ALL mode.
> >
> > Applied to the writeback tree. Thank you, Jan!
> Didn't you forget to send pull request? I don't see the patch in mainline
> yet and since this is a data corruption issue, it would be nice to get it
> out for 3.13...
Sorry I scheduled it for the next release.. However you are right that
it should have been merged earlier. I should have speak out the plan
so that we realize and resolve the different opinions earlier.
I'll try to send a pull request to Linus and hope he still accepts
patches.
Fengguang
> > > CC: stable@vger.kernel.org # >= 3.5
> > > Fixes: 4f8ad655dbc82cf05d2edc11e66b78a42d38bf93
> > > Reported-and-tested-by: Dan Duval <dan.duval@oracle.com>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > > fs/fs-writeback.c | 15 +++++++++------
> > > 1 file changed, 9 insertions(+), 6 deletions(-)
> > >
> > > Fenguang, can you please merge this patch? Thanks!
> > >
> > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > > index 1f4a10ece2f1..e0259a163f98 100644
> > > --- a/fs/fs-writeback.c
> > > +++ b/fs/fs-writeback.c
> > > @@ -516,13 +516,16 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
> > > }
> > > WARN_ON(inode->i_state & I_SYNC);
> > > /*
> > > - * Skip inode if it is clean. We don't want to mess with writeback
> > > - * lists in this function since flusher thread may be doing for example
> > > - * sync in parallel and if we move the inode, it could get skipped. So
> > > - * here we make sure inode is on some writeback list and leave it there
> > > - * unless we have completely cleaned the inode.
> > > + * Skip inode if it is clean and we have no outstanding writeback in
> > > + * WB_SYNC_ALL mode. We don't want to mess with writeback lists in this
> > > + * function since flusher thread may be doing for example sync in
> > > + * parallel and if we move the inode, it could get skipped. So here we
> > > + * make sure inode is on some writeback list and leave it there unless
> > > + * we have completely cleaned the inode.
> > > */
> > > - if (!(inode->i_state & I_DIRTY))
> > > + if (!(inode->i_state & I_DIRTY) &&
> > > + (wbc->sync_mode != WB_SYNC_ALL ||
> > > + !mapping_tagged(inode->i_mapping, PAGECACHE_TAG_WRITEBACK)))
> > > goto out;
> > > inode->i_state |= I_SYNC;
> > > spin_unlock(&inode->i_lock);
> > > --
> > > 1.8.1.4
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-01-13 11:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-11 19:59 [PATCH] writeback: Fix data corruption on NFS Jan Kara
2013-12-14 5:45 ` Fengguang Wu
2014-01-13 10:58 ` Jan Kara
2014-01-13 11:22 ` Fengguang Wu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).