linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Steve Rago <sar-a+KepyhlMvJWk0Htik3J/w@public.gmane.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Trond.Myklebust@netapp.com" <Trond.Myklebust@netapp.com>,
	"jens.axboe" <jens.axboe@oracle.com>,
	Peter Staubach <staubach@redhat.com>
Subject: Re: [PATCH] improve the performance of large sequential write NFS workloads
Date: Sat, 19 Dec 2009 20:20:33 +0800	[thread overview]
Message-ID: <20091219122033.GA11360@localhost> (raw)
In-Reply-To: <1261037877.27920.36.camel@laptop>

Hi Steve,

// I should really read the NFS code, but maybe you can help us better
// understand the problem :)

On Thu, Dec 17, 2009 at 04:17:57PM +0800, Peter Zijlstra wrote:
> On Wed, 2009-12-16 at 21:03 -0500, Steve Rago wrote:
> > Eager Writeback for NFS Clients
> > -------------------------------
> > Prevent applications that write large sequential streams of data (like backup, for example)
> > from entering into a memory pressure state, which degrades performance by falling back to
> > synchronous operations (both synchronous writes and additional commits).

What exactly is the "memory pressure state" condition?  What's the
code to do the "synchronous writes and additional commits" and maybe
how they are triggered?

> > This is accomplished by preventing the client application from
> > dirtying pages faster than they can be written to the server:
> > clients write pages eagerly instead of lazily.

We already have the balance_dirty_pages() based global throttling.
So what makes the performance difference in your proposed "per-inode" throttling?
balance_dirty_pages() does have much larger threshold than yours. 

> > The eager writeback is controlled by a sysctl: fs.nfs.nfs_max_woutstanding set to 0 disables
> > the feature.  Otherwise it contains the maximum number of outstanding NFS writes that can be
> > in flight for a given file.  This is used to block the application from dirtying more pages
> > until the writes are complete.

What if we do heuristic write-behind for sequential NFS writes?

Another related proposal from Peter Staubach is to start async writeback
(without the throttle in your proposal) when one inode have enough pages
dirtied:

        Another approach that I suggested was to keep track of the
        number of pages which are dirty on a per-inode basis.  When
        enough pages are dirty to fill an over the wire transfer,
        then schedule an asynchronous write to transmit that data to
        the server.  This ties in with support to ensure that the
        server/network is not completely overwhelmed by the client
        by flow controlling the writing application to better match
        the bandwidth and latencies of the network and server.
        With this support, the NFS client tends not to fill memory
        with dirty pages and thus, does not depend upon the other
        parts of the system to flush these pages.

Can the above alternatives fix the same problem? (or perhaps, is the
per-inode throttling really necessary?)

> > This patch is based heavily (okay, almost entirely) on a prior patch by Peter Staubach.  For
> > the original patch, see http://article.gmane.org/gmane.linux.nfs/24323.
> >
> > The patch below applies to linux-2.6.32-rc7, but it should apply cleanly to vanilla linux-2.6.32.
> >
> > Performance data and tuning notes can be found on my web site (http://www.nec-labs.com/~sar).
> > With iozone, I see about 50% improvement for large sequential write workloads over a 1Gb Ethernet.
> > With an in-house micro-benchmark, I see 80% improvement for large, single-stream, sequential
> > workloads (where "large" is defined to be greater than the memory size on the client).

These are impressive numbers. I wonder what would be the minimal patch
(just hacking it to fast, without all the aux bits)? Is it this chunk
to call nfs_wb_eager()?

> > @@ -623,10 +635,21 @@ static ssize_t nfs_file_write(struct kio
> >       nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, count);
> >       result = generic_file_aio_write(iocb, iov, nr_segs, pos);
> >       /* Return error values for O_SYNC and IS_SYNC() */
> > -     if (result >= 0 && nfs_need_sync_write(iocb->ki_filp, inode)) {
> > -             int err = nfs_do_fsync(nfs_file_open_context(iocb->ki_filp), inode);
> > -             if (err < 0)
> > -                     result = err;
> > +     if (result >= 0) {
> > +             if (nfs_need_sync_write(iocb->ki_filp, inode)) {
> > +                     int err;
> > +
> > +                     err = nfs_do_fsync(nfs_file_open_context(iocb->ki_filp),
> > +                                        inode);
> > +                     if (err < 0)
> > +                             result = err;
> > +             } else if (nfs_max_woutstanding != 0 &&
> > +                  nfs_is_seqwrite(inode, pos) &&
> > +                  atomic_read(&nfsi->ndirty) >= NFS_SERVER(inode)->wpages) {
> > +                     nfs_wb_eager(inode);
> > +             }
> > +             if (result > 0)
> > +                     nfsi->wrpos = pos + result;
> >       }

Thanks,
Fengguang


  parent reply	other threads:[~2009-12-19 12:23 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-17  2:03 [PATCH] improve the performance of large sequential write NFS workloads Steve Rago
2009-12-17  8:17 ` Peter Zijlstra
2009-12-18 19:33   ` Steve Rago
2009-12-18 19:41     ` Ingo Molnar
2009-12-18 21:20       ` Steve Rago
2009-12-18 22:07         ` Ingo Molnar
2009-12-18 22:46           ` Steve Rago
2009-12-19  8:08         ` Arjan van de Ven
2009-12-19 13:37           ` Steve Rago
2009-12-18 19:44     ` Peter Zijlstra
2009-12-19 12:20   ` Wu Fengguang [this message]
2009-12-19 14:25     ` Steve Rago
2009-12-22  1:59       ` Wu Fengguang
2009-12-22 12:35         ` Jan Kara
     [not found]           ` <20091222123538.GB604-jyMamyUUXNJG4ohzP4jBZS1Fcj925eT/@public.gmane.org>
2009-12-23  8:43             ` Christoph Hellwig
2009-12-23 13:32               ` Jan Kara
     [not found]                 ` <20091223133244.GB3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-24  5:25                   ` Wu Fengguang
2009-12-24  1:26           ` Wu Fengguang
2009-12-22 13:01         ` Martin Knoblauch
     [not found]           ` <787373.9318.qm-rpBZDh8Qtqs5A34FEqDeB/u2YVrzzGjVVpNB7YpNyf8@public.gmane.org>
2009-12-24  1:46             ` Wu Fengguang
2009-12-22 16:41         ` Steve Rago
2009-12-24  1:21           ` Wu Fengguang
2009-12-24 14:49             ` Steve Rago
2009-12-25  7:37               ` Wu Fengguang
2009-12-23 14:21         ` Trond Myklebust
2009-12-23 18:05           ` Jan Kara
     [not found]             ` <20091223180551.GD3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-23 19:12               ` Trond Myklebust
2009-12-24  2:52                 ` Wu Fengguang
2009-12-24 12:04                   ` Trond Myklebust
2009-12-25  5:56                     ` Wu Fengguang
2009-12-30 16:22                       ` Trond Myklebust
2009-12-31  5:04                         ` Wu Fengguang
2009-12-31 19:13                           ` Trond Myklebust
2010-01-06  3:03                             ` Wu Fengguang
2010-01-06 16:56                               ` Trond Myklebust
2010-01-06 18:26                                 ` Trond Myklebust
2010-01-06 18:37                                   ` Peter Zijlstra
2010-01-06 18:52                                     ` Trond Myklebust
2010-01-06 19:07                                       ` Peter Zijlstra
2010-01-06 19:21                                         ` Trond Myklebust
2010-01-06 19:53                                           ` Trond Myklebust
2010-01-06 20:09                                             ` Jan Kara
2010-01-06 20:51                                               ` [PATCH 0/6] " Trond Myklebust
     [not found]                                                 ` <20100106205110.22547.85345.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-06 20:51                                                   ` [PATCH 6/6] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
     [not found]                                                     ` <20100106205110.22547.31434.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  2:32                                                       ` Wu Fengguang
2010-01-06 20:51                                                   ` [PATCH 3/6] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
     [not found]                                                     ` <20100106205110.22547.93554.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  1:48                                                       ` Wu Fengguang
2010-01-06 20:51                                                   ` [PATCH 4/6] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07  1:56                                                     ` Wu Fengguang
2010-01-06 20:51                                                   ` [PATCH 2/6] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07  2:29                                                     ` Wu Fengguang
2010-01-07  4:49                                                       ` Trond Myklebust
2010-01-07  5:03                                                         ` Wu Fengguang
2010-01-07  5:30                                                           ` Trond Myklebust
2010-01-07 14:37                                                             ` Wu Fengguang
2010-01-07 14:41                                                               ` [PATCH 0/5] Re: [PATCH] improve the performance of large sequential write NFS workloads Trond Myklebust
     [not found]                                                                 ` <20100107144137.17158.53673.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07 14:41                                                                   ` [PATCH 4/5] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07 14:41                                                                   ` [PATCH 2/5] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
2010-01-07 14:41                                                                   ` [PATCH 5/5] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
2010-01-07 14:41                                                                   ` [PATCH 3/5] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07 14:41                                                                   ` [PATCH 1/5] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
2010-01-06 20:51                                                   ` [PATCH 5/6] VM: Use per-bdi unstable accounting to improve use of wbc->force_commit Trond Myklebust
     [not found]                                                     ` <20100106205110.22547.32584.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  2:34                                                       ` Wu Fengguang
2010-01-06 20:51                                                   ` [PATCH 1/6] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
     [not found]                                                     ` <20100106205110.22547.17971.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-06 21:38                                                       ` Jan Kara
     [not found]                                                         ` <20100106213843.GD22781-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2010-01-06 21:48                                                           ` Trond Myklebust
2010-01-07  2:18                                                     ` Wu Fengguang
     [not found]                                                       ` <1262839082.2185.15.camel@localhost>
2010-01-07  4:48                                                         ` Wu Fengguang
2010-01-07  4:53                                                           ` [PATCH 0/5] Re: [PATCH] improve the performance of large sequential write NFS workloads Trond Myklebust
     [not found]                                                             ` <20100107045330.5986.55090.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  4:53                                                               ` [PATCH 3/5] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07  4:53                                                               ` [PATCH 4/5] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07  4:53                                                               ` [PATCH 5/5] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
2010-01-07  4:53                                                               ` [PATCH 2/5] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
2010-01-07  4:53                                                               ` [PATCH 1/5] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
2010-01-07 14:56                                                         ` [PATCH 1/6] " Wu Fengguang
2010-01-07 15:10                                                           ` Trond Myklebust
2010-01-08  1:17                                                             ` Wu Fengguang
2010-01-08  1:37                                                               ` Trond Myklebust
2010-01-08  1:53                                                                 ` Wu Fengguang
2010-01-08  9:25                                                             ` Christoph Hellwig
2010-01-08 13:46                                                               ` Trond Myklebust
2010-01-08 13:54                                                                 ` Christoph Hellwig
2010-01-08 14:15                                                                   ` Trond Myklebust
2010-01-06 21:44                                                   ` [PATCH 0/6] Re: [PATCH] improve the performance of large sequential write NFS workloads Jan Kara
2010-01-06 22:03                                                     ` Trond Myklebust
2010-01-07  8:16                                                 ` Peter Zijlstra
2009-12-22 12:25       ` Jan Kara
     [not found]         ` <20091222122557.GA604-jyMamyUUXNJG4ohzP4jBZS1Fcj925eT/@public.gmane.org>
2009-12-22 12:38           ` Peter Zijlstra
2009-12-22 12:55             ` Jan Kara
2009-12-22 16:20         ` Steve Rago
2009-12-23 18:39           ` Jan Kara
     [not found]             ` <20091223183912.GE3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-23 20:16               ` Steve Rago
2009-12-23 21:49                 ` Trond Myklebust
2009-12-23 23:13                   ` Steve Rago
2009-12-23 23:44                     ` Trond Myklebust
2009-12-24  4:30                       ` Steve Rago

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091219122033.GA11360@localhost \
    --to=fengguang.wu@intel.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sar-a+KepyhlMvJWk0Htik3J/w@public.gmane.org \
    --cc=staubach@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).