All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Steve Rago <sar@nec-labs.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Trond.Myklebust@netapp.com" <Trond.Myklebust@netapp.com>,
	"jens.axboe" <jens.axboe@oracle.com>,
	Peter Staubach <staubach@redhat.com>,
	Arjan van de Ven <arjan@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH] improve the performance of large sequential write NFS workloads
Date: Thu, 24 Dec 2009 09:26:06 +0800	[thread overview]
Message-ID: <20091224012606.GB8486@localhost> (raw)
In-Reply-To: <20091222123538.GB604@atrey.karlin.mff.cuni.cz>

On Tue, Dec 22, 2009 at 08:35:39PM +0800, Jan Kara wrote:
> > 2) NFS commit stops pipeline because it sleep&wait inside i_mutex,
> >    which blocks all other NFSDs trying to write/writeback the inode.
> > 
> >    nfsd_sync:
> >      take i_mutex
> >        filemap_fdatawrite
> >        filemap_fdatawait
> >      drop i_mutex
>   I believe this is unrelated to the problem Steve is trying to solve.
> When we get to doing sync writes the performance is busted so we better
> shouldn't get to that (unless user asked for that of course).

Yes, first priority is always to reduce the COMMITs and the number of
writeback pages they submitted under WB_SYNC_ALL. And I guess the
"increase write chunk beyond 128MB" patches can serve it well.

The i_mutex should impact NFS write performance for single big copy in
this way: pdflush submits many (4MB write, 1 commit) pairs, because
the write and commit each will take i_mutex, it effectively limits the
server side io queue depth to <=4MB: the next 4MB dirty data won't
reach page cache until the previous 4MB is completely synced to disk.

There are two kinds of inefficiency here:
- the small queue depth
- the interleaved use of CPU/DISK:
        loop {
                write 4MB       => normally only CPU
                writeback 4MB   => mostly disk
        }

When writing many small dirty files _plus_ one big file, there will
still be interleaved write/writeback: the 4MB write will be broken
into 8 NFS writes with the default wsize=524288. So there may be one
nfsd doing COMMIT, another 7 nfsd waiting for the big file's i_mutex.
All 8 nfsd are "busy" and pipeline is destroyed. Just a possibility.

> >    If filemap_fdatawait() can be moved out of i_mutex (or just remove
> >    the lock), we solve the root problem:
> > 
> >    nfsd_sync:
> >      [take i_mutex]
> >        filemap_fdatawrite  => can also be blocked, but less a problem
> >      [drop i_mutex]
> >        filemap_fdatawait
> >  
> >    Maybe it's a dumb question, but what's the purpose of i_mutex here?
> >    For correctness or to prevent livelock? I can imagine some livelock
> >    problem here (current implementation can easily wait for extra
> >    pages), however not too hard to fix.
>   Generally, most filesystems take i_mutex during fsync to
> a) avoid all sorts of livelocking problems
> b) serialize fsyncs for one inode (mostly for simplicity)
>   I don't see what advantage would it bring that we get rid of i_mutex
> for fdatawait - only that maybe writers could proceed while we are
> waiting but is that really the problem?

The i_mutex at least has some performance impact. Another one would be
the WB_SYNC_ALL. All are related to the COMMIT/sync write behavior.

Are there some other _direct_ causes?

Thanks,
Fengguang

  parent reply	other threads:[~2009-12-24  1:26 UTC|newest]

Thread overview: 175+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-17  2:03 [PATCH] improve the performance of large sequential write NFS workloads Steve Rago
2009-12-17  8:17 ` Peter Zijlstra
2009-12-18 19:33   ` Steve Rago
2009-12-18 19:41     ` Ingo Molnar
2009-12-18 19:41       ` Ingo Molnar
2009-12-18 21:20       ` Steve Rago
2009-12-18 21:20         ` Steve Rago
2009-12-18 22:07         ` Ingo Molnar
2009-12-18 22:07           ` Ingo Molnar
2009-12-18 22:46           ` Steve Rago
2009-12-18 22:46             ` Steve Rago
2009-12-19  8:08         ` Arjan van de Ven
2009-12-19  8:08           ` Arjan van de Ven
2009-12-19 13:37           ` Steve Rago
2009-12-19 13:37             ` Steve Rago
2009-12-18 19:44     ` Peter Zijlstra
2009-12-18 19:44       ` Peter Zijlstra
2009-12-19 12:20   ` Wu Fengguang
2009-12-19 12:20     ` Wu Fengguang
2009-12-19 14:25     ` Steve Rago
2009-12-22  1:59       ` Wu Fengguang
2009-12-22 12:35         ` Jan Kara
2009-12-22 12:35           ` Jan Kara
2009-12-22 12:35           ` Jan Kara
     [not found]           ` <20091222123538.GB604-jyMamyUUXNJG4ohzP4jBZS1Fcj925eT/@public.gmane.org>
2009-12-23  8:43             ` Christoph Hellwig
2009-12-23  8:43               ` Christoph Hellwig
2009-12-23  8:43               ` Christoph Hellwig
2009-12-23 13:32               ` Jan Kara
2009-12-23 13:32                 ` Jan Kara
2009-12-23 13:32                 ` Jan Kara
     [not found]                 ` <20091223133244.GB3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-24  5:25                   ` Wu Fengguang
2009-12-24  5:25                     ` Wu Fengguang
2009-12-24  5:25                     ` Wu Fengguang
2009-12-24  1:26           ` Wu Fengguang [this message]
2009-12-22 13:01         ` Martin Knoblauch
2009-12-22 13:01           ` Martin Knoblauch
     [not found]           ` <787373.9318.qm-rpBZDh8Qtqs5A34FEqDeB/u2YVrzzGjVVpNB7YpNyf8@public.gmane.org>
2009-12-24  1:46             ` Wu Fengguang
2009-12-24  1:46               ` Wu Fengguang
2009-12-22 16:41         ` Steve Rago
2009-12-22 16:41           ` Steve Rago
2009-12-24  1:21           ` Wu Fengguang
2009-12-24 14:49             ` Steve Rago
2009-12-24 14:49               ` Steve Rago
2009-12-24 14:49               ` Steve Rago
2009-12-25  7:37               ` Wu Fengguang
2009-12-23 14:21         ` Trond Myklebust
2009-12-23 14:21           ` Trond Myklebust
2009-12-23 14:21           ` Trond Myklebust
2009-12-23 18:05           ` Jan Kara
     [not found]             ` <20091223180551.GD3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-23 19:12               ` Trond Myklebust
2009-12-23 19:12                 ` Trond Myklebust
2009-12-23 19:12                 ` Trond Myklebust
2009-12-24  2:52                 ` Wu Fengguang
2009-12-24  2:52                   ` Wu Fengguang
2009-12-24  2:52                   ` Wu Fengguang
2009-12-24 12:04                   ` Trond Myklebust
2009-12-24 12:04                     ` Trond Myklebust
2009-12-24 12:04                     ` Trond Myklebust
2009-12-25  5:56                     ` Wu Fengguang
2009-12-25  5:56                       ` Wu Fengguang
2009-12-25  5:56                       ` Wu Fengguang
2009-12-30 16:22                       ` Trond Myklebust
2009-12-31  5:04                         ` Wu Fengguang
2009-12-31  5:04                           ` Wu Fengguang
2009-12-31  5:04                           ` Wu Fengguang
2009-12-31 19:13                           ` Trond Myklebust
2010-01-06  3:03                             ` Wu Fengguang
2010-01-06 16:56                               ` Trond Myklebust
2010-01-06 16:56                                 ` Trond Myklebust
2010-01-06 16:56                                 ` Trond Myklebust
2010-01-06 18:26                                 ` Trond Myklebust
2010-01-06 18:26                                   ` Trond Myklebust
2010-01-06 18:37                                   ` Peter Zijlstra
2010-01-06 18:37                                     ` Peter Zijlstra
2010-01-06 18:52                                     ` Trond Myklebust
2010-01-06 18:52                                       ` Trond Myklebust
2010-01-06 19:07                                       ` Peter Zijlstra
2010-01-06 19:21                                         ` Trond Myklebust
2010-01-06 19:21                                           ` Trond Myklebust
2010-01-06 19:21                                           ` Trond Myklebust
2010-01-06 19:53                                           ` Trond Myklebust
2010-01-06 19:53                                             ` Trond Myklebust
2010-01-06 20:09                                             ` Jan Kara
2010-01-06 20:09                                               ` Jan Kara
2010-01-06 20:51                                               ` [PATCH 0/6] " Trond Myklebust
2010-01-06 20:51                                                 ` Trond Myklebust
     [not found]                                                 ` <20100106205110.22547.85345.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-06 20:51                                                   ` [PATCH 2/6] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-06 20:51                                                     ` Trond Myklebust
2010-01-07  2:29                                                     ` Wu Fengguang
2010-01-07  4:49                                                       ` Trond Myklebust
2010-01-07  4:49                                                         ` Trond Myklebust
2010-01-07  5:03                                                         ` Wu Fengguang
2010-01-07  5:03                                                           ` Wu Fengguang
2010-01-07  5:30                                                           ` Trond Myklebust
2010-01-07  5:30                                                             ` Trond Myklebust
2010-01-07 14:37                                                             ` Wu Fengguang
2010-01-07 14:37                                                               ` Wu Fengguang
2010-01-07 14:41                                                               ` [PATCH 0/5] Re: [PATCH] improve the performance of large sequential write NFS workloads Trond Myklebust
     [not found]                                                                 ` <20100107144137.17158.53673.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07 14:41                                                                   ` [PATCH 4/5] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07 14:41                                                                   ` [PATCH 2/5] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
2010-01-07 14:41                                                                   ` [PATCH 1/5] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
2010-01-07 14:41                                                                   ` [PATCH 5/5] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
2010-01-07 14:41                                                                   ` [PATCH 3/5] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-06 20:51                                                   ` [PATCH 4/6] " Trond Myklebust
2010-01-06 20:51                                                     ` Trond Myklebust
2010-01-07  1:56                                                     ` Wu Fengguang
2010-01-06 20:51                                                   ` [PATCH 3/6] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
2010-01-06 20:51                                                     ` Trond Myklebust
     [not found]                                                     ` <20100106205110.22547.93554.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  1:48                                                       ` Wu Fengguang
2010-01-07  1:48                                                         ` Wu Fengguang
2010-01-06 20:51                                                   ` [PATCH 6/6] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
2010-01-06 20:51                                                     ` Trond Myklebust
     [not found]                                                     ` <20100106205110.22547.31434.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  2:32                                                       ` Wu Fengguang
2010-01-07  2:32                                                         ` Wu Fengguang
2010-01-06 20:51                                                   ` [PATCH 1/6] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
2010-01-06 20:51                                                     ` Trond Myklebust
     [not found]                                                     ` <20100106205110.22547.17971.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-06 21:38                                                       ` Jan Kara
2010-01-06 21:38                                                         ` Jan Kara
     [not found]                                                         ` <20100106213843.GD22781-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2010-01-06 21:48                                                           ` Trond Myklebust
2010-01-06 21:48                                                             ` Trond Myklebust
2010-01-07  2:18                                                     ` Wu Fengguang
2010-01-07  2:18                                                       ` Wu Fengguang
     [not found]                                                       ` <1262839082.2185.15.camel@localhost>
2010-01-07  4:48                                                         ` Wu Fengguang
2010-01-07  4:48                                                           ` Wu Fengguang
2010-01-07  4:53                                                           ` [PATCH 0/5] Re: [PATCH] improve the performance of large sequential write NFS workloads Trond Myklebust
2010-01-07  4:53                                                             ` Trond Myklebust
     [not found]                                                             ` <20100107045330.5986.55090.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  4:53                                                               ` [PATCH 5/5] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
2010-01-07  4:53                                                                 ` Trond Myklebust
2010-01-07  4:53                                                               ` [PATCH 3/5] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07  4:53                                                                 ` Trond Myklebust
2010-01-07  4:53                                                               ` [PATCH 2/5] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
2010-01-07  4:53                                                                 ` Trond Myklebust
2010-01-07  4:53                                                               ` [PATCH 4/5] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07  4:53                                                                 ` Trond Myklebust
2010-01-07  4:53                                                               ` [PATCH 1/5] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
2010-01-07  4:53                                                                 ` Trond Myklebust
2010-01-07 14:56                                                         ` [PATCH 1/6] " Wu Fengguang
2010-01-07 14:56                                                           ` Wu Fengguang
2010-01-07 15:10                                                           ` Trond Myklebust
2010-01-07 15:10                                                             ` Trond Myklebust
2010-01-08  1:17                                                             ` Wu Fengguang
2010-01-08  1:17                                                               ` Wu Fengguang
2010-01-08  1:37                                                               ` Trond Myklebust
2010-01-08  1:37                                                                 ` Trond Myklebust
2010-01-08  1:53                                                                 ` Wu Fengguang
2010-01-08  1:53                                                                   ` Wu Fengguang
2010-01-08  9:25                                                             ` Christoph Hellwig
2010-01-08  9:25                                                               ` Christoph Hellwig
2010-01-08 13:46                                                               ` Trond Myklebust
2010-01-08 13:54                                                                 ` Christoph Hellwig
2010-01-08 14:15                                                                   ` Trond Myklebust
2010-01-06 20:51                                                   ` [PATCH 5/6] VM: Use per-bdi unstable accounting to improve use of wbc->force_commit Trond Myklebust
2010-01-06 20:51                                                     ` Trond Myklebust
     [not found]                                                     ` <20100106205110.22547.32584.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  2:34                                                       ` Wu Fengguang
2010-01-07  2:34                                                         ` Wu Fengguang
2010-01-06 21:44                                                   ` [PATCH 0/6] Re: [PATCH] improve the performance of large sequential write NFS workloads Jan Kara
2010-01-06 21:44                                                     ` Jan Kara
2010-01-06 22:03                                                     ` Trond Myklebust
2010-01-07  8:16                                                 ` Peter Zijlstra
2009-12-22 12:25       ` Jan Kara
     [not found]         ` <20091222122557.GA604-jyMamyUUXNJG4ohzP4jBZS1Fcj925eT/@public.gmane.org>
2009-12-22 12:38           ` Peter Zijlstra
2009-12-22 12:38             ` Peter Zijlstra
2009-12-22 12:55             ` Jan Kara
2009-12-22 12:55               ` Jan Kara
2009-12-22 16:20         ` Steve Rago
2009-12-23 18:39           ` Jan Kara
2009-12-23 18:39             ` Jan Kara
     [not found]             ` <20091223183912.GE3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-23 20:16               ` Steve Rago
2009-12-23 20:16                 ` Steve Rago
2009-12-23 21:49                 ` Trond Myklebust
2009-12-23 21:49                   ` Trond Myklebust
2009-12-23 23:13                   ` Steve Rago
2009-12-23 23:44                     ` Trond Myklebust
2009-12-23 23:44                       ` Trond Myklebust
2009-12-24  4:30                       ` Steve Rago

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091224012606.GB8486@localhost \
    --to=fengguang.wu@intel.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=arjan@infradead.org \
    --cc=jack@suse.cz \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=sar@nec-labs.com \
    --cc=staubach@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.