From: Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Trond Myklebust
<Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
Steve Rago <sar-a+KepyhlMvJWk0Htik3J/w@public.gmane.org>,
"linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"jens.axboe" <jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
Peter Staubach <staubach-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Arjan van de Ven <arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>,
"linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH 1/6] VFS: Ensure that writeback_single_inode() commits unstable writes
Date: Fri, 8 Jan 2010 09:53:19 +0800 [thread overview]
Message-ID: <20100108015319.GA9216@localhost> (raw)
In-Reply-To: <1262914651.2659.110.camel@localhost>
On Fri, Jan 08, 2010 at 09:37:31AM +0800, Trond Myklebust wrote:
> On Fri, 2010-01-08 at 09:17 +0800, Wu Fengguang wrote:
> > On Thu, Jan 07, 2010 at 11:10:22PM +0800, Trond Myklebust wrote:
> > > On Thu, 2010-01-07 at 22:56 +0800, Wu Fengguang wrote:
> > > > On Thu, Jan 07, 2010 at 12:38:02PM +0800, Myklebust, Trond wrote:
> > > >
> > > > > > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > > > > > index d171696..910be28 100644
> > > > > > > --- a/fs/nfs/write.c
> > > > > > > +++ b/fs/nfs/write.c
> > > > > > > @@ -441,7 +441,7 @@ nfs_mark_request_commit(struct nfs_page *req)
> > > > > > > spin_unlock(&inode->i_lock);
> > > > > > > inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
> > > > > > > inc_bdi_stat(req->wb_page->mapping->backing_dev_info, BDI_RECLAIMABLE);
> > > > > > > - __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> > > > > > > + mark_inode_unstable_pages(inode);
> > > > > >
> > > > > > Then we shall mark I_DIRTY_DATASYNC on other places that extend i_size.
> > > > >
> > > > > Why? The NFS client itself shouldn't ever set I_DIRTY_DATASYNC after
> > > > > this patch is applied. We won't ever need it.
> > > > >
> > > > > If the VM or VFS is doing it, then they ought to be fixed: there is no
> > > > > reason to assume that all filesystems need to sync their inodes on
> > > > > i_size changes.
> > > >
> > > > Sorry, one more question.
> > > >
> > > > It seems to me that you are replacing
> > > >
> > > > I_DIRTY_DATASYNC => write_inode()
> > > > with
> > > > I_UNSTABLE_PAGES => commit_unstable_pages()
> > > >
> > > > Is that change for the sake of clarity? Or to fix some problem?
> > > > (This patch does fix some problems, but do they inherently require
> > > > the above change?)
> > >
> > > As I said previously, the write_inode() call is done _before_ you sync
> > > the dirty pages to the server, whereas commit_unstable_pages() wants to
> > > be done _after_ syncing. So the two are not the same, and we cannot
> > > replace commit_unstable_pages() with write_inode().
> >
> > This is the ordering:
> >
> > 0 do_writepages()
> > 1 if (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
> > 2 write_inode()
> > 3 if (wait)
> > 4 filemap_fdatawait()
> > 5 if (I_UNSTABLE_PAGES)
> > 6 commit_unstable_pages()
> >
> > The page is synced to NFS server in line 0.
> >
> > The only difference is write_inode() is called before filemap_fdatawait(),
> > while commit_unstable_pages() is called after it.
> >
> > Note that filemap_fdatawait() will only be called on WB_SYNC_ALL, so I
> > still cannot understand the difference..
>
> The difference is precisely that...
Thanks, I got it.
> In the case of WB_SYNC_ALL we want the call to filemap_fdatawait() to
> occur before we call commit_unstable_pages(), so that we know that all
> the in-flight write rpc calls are done before we ask that they be
> committed to stable storage.
That's good order for WB_SYNC_ALL. However this is optimizing a minor
case, and what I cared is WB_SYNC_NONE :)
> In the case of WB_SYNC_NONE, there is no wait, and so we are forced to
> play games with heuristics and/or add the force_commit_unstable flag
> because we don't wait for the dirty pages to be cleaned. I don't like
> this, but those are the semantics that we've defined for WB_SYNC_NONE.
For WB_SYNC_NONE we will now also wait for WRITE completion with the
combination of NFS_PAGE_TAG_LOCKED-based-bail-out and redirty_tail().
This is retry based, so less elegant.
But that's not the whole story. The
I_UNSTABLE_PAGES+commit_unstable_pages() scheme seems elegant for
WB_SYNC_ALL, however it may break the pipeline for big files in a
perfect loop:
loop {
WRITE 4MB
COMMIT 4MB
}
While the retry based WB_SYNC_ALL will keep back-off COMMITs because
do_writepages() keep submit new WRITEs. So its loop would be
loop {
WRITE 4MB
<skip COMMIT>
WRITE 4MB
<skip COMMIT>
WRITE 4MB
<skip COMMIT>
WRITE 4MB
<skip COMMIT>
...
<redirty_tail timeout>
COMMIT 400MB
}
That can be improved by lifting the writeback chunk size from 4MB to >=128MB.
Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-01-08 1:53 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1261015420.1947.54.camel@serenity>
[not found] ` <1261037877.27920.36.camel@laptop>
[not found] ` <20091219122033.GA11360@localhost>
[not found] ` <1261232747.1947.194.camel@serenity>
2009-12-22 1:59 ` [PATCH] improve the performance of large sequential write NFS workloads Wu Fengguang
2009-12-22 12:35 ` Jan Kara
[not found] ` <20091222123538.GB604-jyMamyUUXNJG4ohzP4jBZS1Fcj925eT/@public.gmane.org>
2009-12-23 8:43 ` Christoph Hellwig
[not found] ` <20091223084302.GA14912-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2009-12-23 13:32 ` Jan Kara
[not found] ` <20091223133244.GB3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-24 5:25 ` Wu Fengguang
2009-12-24 1:26 ` Wu Fengguang
2009-12-22 16:41 ` Steve Rago
2009-12-24 1:21 ` Wu Fengguang
2009-12-24 14:49 ` Steve Rago
2009-12-25 7:37 ` Wu Fengguang
2009-12-23 14:21 ` Trond Myklebust
2009-12-23 18:05 ` Jan Kara
[not found] ` <20091223180551.GD3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-23 19:12 ` Trond Myklebust
2009-12-24 2:52 ` Wu Fengguang
2009-12-24 12:04 ` Trond Myklebust
2009-12-25 5:56 ` Wu Fengguang
2009-12-30 16:22 ` Trond Myklebust
2009-12-31 5:04 ` Wu Fengguang
2009-12-31 19:13 ` Trond Myklebust
2010-01-06 3:03 ` Wu Fengguang
2010-01-06 16:56 ` Trond Myklebust
2010-01-06 18:26 ` Trond Myklebust
2010-01-06 18:37 ` Peter Zijlstra
2010-01-06 18:52 ` Trond Myklebust
2010-01-06 19:07 ` Peter Zijlstra
2010-01-06 19:21 ` Trond Myklebust
2010-01-06 19:53 ` Trond Myklebust
2010-01-06 20:09 ` Jan Kara
[not found] ` <20100106200928.GB22781-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2010-01-06 20:51 ` [PATCH 0/6] " Trond Myklebust
[not found] ` <20100106205110.22547.85345.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-06 20:51 ` [PATCH 5/6] VM: Use per-bdi unstable accounting to improve use of wbc->force_commit Trond Myklebust
[not found] ` <20100106205110.22547.32584.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07 2:34 ` Wu Fengguang
2010-01-06 20:51 ` [PATCH 2/6] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07 2:29 ` Wu Fengguang
2010-01-07 4:49 ` Trond Myklebust
2010-01-07 5:03 ` Wu Fengguang
2010-01-07 5:30 ` Trond Myklebust
2010-01-07 14:37 ` Wu Fengguang
2010-01-06 20:51 ` [PATCH 3/6] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
[not found] ` <20100106205110.22547.93554.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07 1:48 ` Wu Fengguang
2010-01-06 20:51 ` [PATCH 6/6] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
[not found] ` <20100106205110.22547.31434.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07 2:32 ` Wu Fengguang
2010-01-06 20:51 ` [PATCH 1/6] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
[not found] ` <20100106205110.22547.17971.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-06 21:38 ` Jan Kara
[not found] ` <20100106213843.GD22781-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2010-01-06 21:48 ` Trond Myklebust
2010-01-07 2:18 ` Wu Fengguang
[not found] ` <1262839082.2185.15.camel@localhost>
2010-01-07 4:48 ` Wu Fengguang
2010-01-07 4:53 ` [PATCH 0/5] Re: [PATCH] improve the performance of large sequential write NFS workloads Trond Myklebust
[not found] ` <20100107045330.5986.55090.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07 4:53 ` [PATCH 3/5] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07 4:53 ` [PATCH 1/5] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
2010-01-07 4:53 ` [PATCH 5/5] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
2010-01-07 4:53 ` [PATCH 2/5] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
2010-01-07 4:53 ` [PATCH 4/5] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07 14:56 ` [PATCH 1/6] VFS: Ensure that writeback_single_inode() commits unstable writes Wu Fengguang
2010-01-07 15:10 ` Trond Myklebust
2010-01-08 1:17 ` Wu Fengguang
2010-01-08 1:37 ` Trond Myklebust
2010-01-08 1:53 ` Wu Fengguang [this message]
2010-01-08 9:25 ` Christoph Hellwig
2010-01-08 13:46 ` Trond Myklebust
2010-01-08 13:54 ` Christoph Hellwig
2010-01-08 14:15 ` Trond Myklebust
2010-01-06 20:51 ` [PATCH 4/6] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07 1:56 ` Wu Fengguang
2010-01-06 21:44 ` [PATCH 0/6] Re: [PATCH] improve the performance of large sequential write NFS workloads Jan Kara
2010-01-06 22:03 ` Trond Myklebust
2010-01-07 8:16 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100108015319.GA9216@localhost \
--to=fengguang.wu-ral2jqcrhueavxtiumwx3w@public.gmane.org \
--cc=Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
--cc=arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=jack-AlSwsSmVLrQ@public.gmane.org \
--cc=jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mingo-X9Un+BFzKDI@public.gmane.org \
--cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=sar-a+KepyhlMvJWk0Htik3J/w@public.gmane.org \
--cc=staubach-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).