linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Steve Rago <sar-a+KepyhlMvJWk0Htik3J/w@public.gmane.org>
Cc: Jan Kara <jack@suse.cz>, Wu Fengguang <fengguang.wu@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jens.axboe" <jens.axboe@oracle.com>,
	Peter Staubach <staubach@redhat.com>
Subject: Re: [PATCH] improve the performance of large sequential write NFS workloads
Date: Wed, 23 Dec 2009 22:49:12 +0100	[thread overview]
Message-ID: <1261604952.18047.7.camel@localhost> (raw)
In-Reply-To: <1261599385.13028.142.camel@serenity>

On Wed, 2009-12-23 at 15:16 -0500, Steve Rago wrote: 
> On Wed, 2009-12-23 at 19:39 +0100, Jan Kara wrote:
> > On Tue 22-12-09 11:20:15, Steve Rago wrote:
> > > 
> > > On Tue, 2009-12-22 at 13:25 +0100, Jan Kara wrote:
> > > > > I originally spent several months playing with the balance_dirty_pages
> > > > > algorithm.  The main drawback is that it affects more than the inodes
> > > > > that the caller is writing and that the control of what to do is too
> > > >   Can you be more specific here please?
> > > 
> > > Sure;  balance_dirty_pages() will schedule writeback by the flusher
> > > thread once the number of dirty pages exceeds dirty_background_ratio.
> > > The flusher thread calls writeback_inodes_wb() to flush all dirty inodes
> > > associated with the bdi.  Similarly, the process dirtying the pages will
> > > call writeback_inodes_wbc() when it's bdi threshold has been exceeded.
> > > The first problem is that these functions process all dirty inodes with
> > > the same backing device, which can lead to excess (duplicate) flushing
> > > of the same inode.  Second, there is no distinction between pages that
> > > need to be committed and pages that have commits pending in
> > > NR_UNSTABLE_NFS/BDI_RECLAIMABLE (a page that has a commit pending won't
> > > be cleaned any faster by sending more commits).  This tends to overstate
> > > the amount of memory that can be cleaned, leading to additional commit
> > > requests.  Third, these functions generate a commit for each set of
> > > writes they do, which might not be appropriate.  For background writing,
> > > you'd like to delay the commit as long as possible.
> >   Ok, I get it. Thanks for explanation. The problem with more writing
> > threads bites us also for ordinary SATA drives (the IO pattern and thus
> > throughput gets worse and worse the more threads do writes). The plan is to
> > let only flusher thread do the IO and throttled thread in
> > balance_dirty_pages just waits for flusher thread to do the work. There
> > were even patches for this floating around but I'm not sure what's happened
> > to them. So that part of the problem should be easy to solve.
> >   Another part is about sending commits - if we have just one thread doing
> > flushing, we have no problems with excessive commits for one inode. You're
> > right that we may want to avoid sending commits for background writeback
> > but until we send the commit, pages are just accumulating in the unstable
> > state, aren't they? So we might want to periodically send the commit for
> > the inode anyway to get rid of those pages. So from this point of view,
> > sending commit after each writepages call does not seem like a so bad idea
> > - although it might be more appropriate to send it some time after the
> > writepages call when we are not close to dirty limit so that server has
> > more time to do more natural "unforced" writeback...
> 
> When to send the commit is a complex question to answer.  If you delay
> it long enough, the server's flusher threads will have already done most
> of the work for you, so commits can be cheap, but you don't have access
> to the necessary information to figure this out.  You can't delay it too
> long, though, because the unstable pages on the client will grow too
> large, creating memory pressure.  I have a second patch, which I haven't
> posted yet, that adds feedback piggy-backed on the NFS write response,
> which allows the NFS client to free pages proactively.  This greatly
> reduces the need to send commit messages, but it extends the protocol
> (in a backward-compatible manner), so it could be hard to convince
> people to accept.

There are only 2 cases when the client should send a COMMIT: 
     1. When it hits a synchronisation point (i.e. when the user calls
        f/sync(), or close(), or when the user sets/clears a file
        lock). 
     2. When memory pressure causes the VM to wants to free up those
        pages that are marked as clean but unstable.

We should never be sending COMMIT in any other situation, since that
would imply that the client somehow has better information on how to
manage dirty pages on the server than the server's own VM.

Cheers
  Trond

  reply	other threads:[~2009-12-23 21:49 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-17  2:03 [PATCH] improve the performance of large sequential write NFS workloads Steve Rago
2009-12-17  8:17 ` Peter Zijlstra
2009-12-18 19:33   ` Steve Rago
2009-12-18 19:41     ` Ingo Molnar
2009-12-18 21:20       ` Steve Rago
2009-12-18 22:07         ` Ingo Molnar
2009-12-18 22:46           ` Steve Rago
2009-12-19  8:08         ` Arjan van de Ven
2009-12-19 13:37           ` Steve Rago
2009-12-18 19:44     ` Peter Zijlstra
2009-12-19 12:20   ` Wu Fengguang
2009-12-19 14:25     ` Steve Rago
2009-12-22  1:59       ` Wu Fengguang
2009-12-22 12:35         ` Jan Kara
     [not found]           ` <20091222123538.GB604-jyMamyUUXNJG4ohzP4jBZS1Fcj925eT/@public.gmane.org>
2009-12-23  8:43             ` Christoph Hellwig
2009-12-23 13:32               ` Jan Kara
     [not found]                 ` <20091223133244.GB3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-24  5:25                   ` Wu Fengguang
2009-12-24  1:26           ` Wu Fengguang
2009-12-22 13:01         ` Martin Knoblauch
     [not found]           ` <787373.9318.qm-rpBZDh8Qtqs5A34FEqDeB/u2YVrzzGjVVpNB7YpNyf8@public.gmane.org>
2009-12-24  1:46             ` Wu Fengguang
2009-12-22 16:41         ` Steve Rago
2009-12-24  1:21           ` Wu Fengguang
2009-12-24 14:49             ` Steve Rago
2009-12-25  7:37               ` Wu Fengguang
2009-12-23 14:21         ` Trond Myklebust
2009-12-23 18:05           ` Jan Kara
     [not found]             ` <20091223180551.GD3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-23 19:12               ` Trond Myklebust
2009-12-24  2:52                 ` Wu Fengguang
2009-12-24 12:04                   ` Trond Myklebust
2009-12-25  5:56                     ` Wu Fengguang
2009-12-30 16:22                       ` Trond Myklebust
2009-12-31  5:04                         ` Wu Fengguang
2009-12-31 19:13                           ` Trond Myklebust
2010-01-06  3:03                             ` Wu Fengguang
2010-01-06 16:56                               ` Trond Myklebust
2010-01-06 18:26                                 ` Trond Myklebust
2010-01-06 18:37                                   ` Peter Zijlstra
2010-01-06 18:52                                     ` Trond Myklebust
2010-01-06 19:07                                       ` Peter Zijlstra
2010-01-06 19:21                                         ` Trond Myklebust
2010-01-06 19:53                                           ` Trond Myklebust
2010-01-06 20:09                                             ` Jan Kara
2010-01-06 20:51                                               ` [PATCH 0/6] " Trond Myklebust
     [not found]                                                 ` <20100106205110.22547.85345.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-06 20:51                                                   ` [PATCH 6/6] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
     [not found]                                                     ` <20100106205110.22547.31434.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  2:32                                                       ` Wu Fengguang
2010-01-06 20:51                                                   ` [PATCH 4/6] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07  1:56                                                     ` Wu Fengguang
2010-01-06 20:51                                                   ` [PATCH 3/6] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
     [not found]                                                     ` <20100106205110.22547.93554.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  1:48                                                       ` Wu Fengguang
2010-01-06 20:51                                                   ` [PATCH 2/6] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07  2:29                                                     ` Wu Fengguang
2010-01-07  4:49                                                       ` Trond Myklebust
2010-01-07  5:03                                                         ` Wu Fengguang
2010-01-07  5:30                                                           ` Trond Myklebust
2010-01-07 14:37                                                             ` Wu Fengguang
2010-01-07 14:41                                                               ` [PATCH 0/5] Re: [PATCH] improve the performance of large sequential write NFS workloads Trond Myklebust
     [not found]                                                                 ` <20100107144137.17158.53673.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07 14:41                                                                   ` [PATCH 3/5] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07 14:41                                                                   ` [PATCH 1/5] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
2010-01-07 14:41                                                                   ` [PATCH 5/5] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
2010-01-07 14:41                                                                   ` [PATCH 2/5] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
2010-01-07 14:41                                                                   ` [PATCH 4/5] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-06 20:51                                                   ` [PATCH 5/6] VM: Use per-bdi unstable accounting to improve use of wbc->force_commit Trond Myklebust
     [not found]                                                     ` <20100106205110.22547.32584.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  2:34                                                       ` Wu Fengguang
2010-01-06 20:51                                                   ` [PATCH 1/6] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
     [not found]                                                     ` <20100106205110.22547.17971.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-06 21:38                                                       ` Jan Kara
     [not found]                                                         ` <20100106213843.GD22781-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2010-01-06 21:48                                                           ` Trond Myklebust
2010-01-07  2:18                                                     ` Wu Fengguang
     [not found]                                                       ` <1262839082.2185.15.camel@localhost>
2010-01-07  4:48                                                         ` Wu Fengguang
2010-01-07  4:53                                                           ` [PATCH 0/5] Re: [PATCH] improve the performance of large sequential write NFS workloads Trond Myklebust
     [not found]                                                             ` <20100107045330.5986.55090.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  4:53                                                               ` [PATCH 5/5] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
2010-01-07  4:53                                                               ` [PATCH 2/5] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
2010-01-07  4:53                                                               ` [PATCH 1/5] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
2010-01-07  4:53                                                               ` [PATCH 3/5] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07  4:53                                                               ` [PATCH 4/5] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07 14:56                                                         ` [PATCH 1/6] VFS: Ensure that writeback_single_inode() commits unstable writes Wu Fengguang
2010-01-07 15:10                                                           ` Trond Myklebust
2010-01-08  1:17                                                             ` Wu Fengguang
2010-01-08  1:37                                                               ` Trond Myklebust
2010-01-08  1:53                                                                 ` Wu Fengguang
2010-01-08  9:25                                                             ` Christoph Hellwig
2010-01-08 13:46                                                               ` Trond Myklebust
2010-01-08 13:54                                                                 ` Christoph Hellwig
2010-01-08 14:15                                                                   ` Trond Myklebust
2010-01-06 21:44                                                   ` [PATCH 0/6] Re: [PATCH] improve the performance of large sequential write NFS workloads Jan Kara
2010-01-06 22:03                                                     ` Trond Myklebust
2010-01-07  8:16                                                 ` Peter Zijlstra
2009-12-22 12:25       ` Jan Kara
     [not found]         ` <20091222122557.GA604-jyMamyUUXNJG4ohzP4jBZS1Fcj925eT/@public.gmane.org>
2009-12-22 12:38           ` Peter Zijlstra
2009-12-22 12:55             ` Jan Kara
2009-12-22 16:20         ` Steve Rago
2009-12-23 18:39           ` Jan Kara
     [not found]             ` <20091223183912.GE3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-23 20:16               ` Steve Rago
2009-12-23 21:49                 ` Trond Myklebust [this message]
2009-12-23 23:13                   ` Steve Rago
2009-12-23 23:44                     ` Trond Myklebust
2009-12-24  4:30                       ` Steve Rago

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1261604952.18047.7.camel@localhost \
    --to=trond.myklebust@netapp.com \
    --cc=fengguang.wu@intel.com \
    --cc=jack@suse.cz \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sar-a+KepyhlMvJWk0Htik3J/w@public.gmane.org \
    --cc=staubach@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).