linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, Wu Fengguang <fengguang.wu@intel.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: NFS page states & writeback
Date: Sun, 27 Mar 2011 17:26:41 +0200	[thread overview]
Message-ID: <1301239601.22136.23.camel@lade.trondhjem.org> (raw)
In-Reply-To: <20110326011806.GT26611@dastard>

On Sat, 2011-03-26 at 12:18 +1100, Dave Chinner wrote:
> Yes - though this only reduces the variance the client sees in
> steady state operation.  Realistically, we don't care if one commit
> takes 2s for 100MB and the next takes 0.2s for the next 100MB as
> long as we've been able to send 50MB/s of writes over the wire
> consistently. IOWs, what we need to care about is getting the data
> to the server as quickly as possible and decoupling that from the
> commit operation.  i.e. we need to maximise and smooth the rate at
> which we send dirty pages to the server, not the rate at which we
> convert unstable pages to stable. If the server can't handle the
> write rate we send it, if will slow downteh rate at which it
> processes writes and we get congestion feedback that way (i.e. via
> the network channel).
> 
> Essentially what I'm trying to say is that I don't think
> unstable->clean operations (i.e. the commit) should affect or
> control  the estimated bandwidth of the channel. A commit is an
> operation that can be tuned to optimise throughput, but because of
> it's variance it's not really an operation that can be used to
> directly measure and control that throughput.

Agreed. However as I have said before, most of the problem here is that
the Linux server is assuming that it should cache the data maximally as
if this were a local process.

Once the NFS client starts flushing data to the server, it is because
the client no longer wants to cache, but rather wants to see the data
put onto stable storage as quickly as possible.
At that point, the server should be focussing doing the same. It should
not be setting the low water mark at 20% of total memory before starting
writeback, because that means that the COMMIT may have to wait for
several GB of data of data to hit the platter.
If the water mark was set at say 100MB or so, then writeback would be
much smoother...

> It is also worth remembering that some NFS servers return STABLE as
> the state of the data in their write response. This transitions the
> pages directly from writeback to clean, so there is no unstable
> state or need for a commit operation. Hence the bandwidth estimation
> in these cases is directly related to the network/protocol
> throughput. If we can run background commit operations triggered by
> write responses, then we have the same bandwidth estimation
> behaviour for writes regardless of whether they return as STABLE or
> UNSTABLE on the server...

If the server were doing its job of acting as a glorified disk instead
of trying to act as a caching device, then most of that data should
already be on disk before the client sends the COMMIT.

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


  reply	other threads:[~2011-03-27 15:26 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-25  1:28 NFS page states & writeback Jan Kara
2011-03-25  4:47 ` Dave Chinner
2011-03-25  7:11   ` Wu Fengguang
2011-03-25 22:24   ` Jan Kara
     [not found]     ` <20110325222458.GB26932-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2011-03-25 23:04       ` Dave Chinner
2011-03-25  7:00 ` Wu Fengguang
2011-03-25  9:39   ` Dave Chinner
2011-03-25 14:22     ` Wu Fengguang
2011-03-25 14:32       ` Wu Fengguang
2011-03-25 18:26       ` Jan Kara
2011-03-25 22:55       ` Dave Chinner
2011-03-25 23:24         ` Jan Kara
2011-03-26  1:18           ` Dave Chinner
2011-03-27 15:26             ` Trond Myklebust [this message]
     [not found]               ` <1301239601.22136.23.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-03-28  0:23                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1301239601.22136.23.camel@lade.trondhjem.org \
    --to=trond.myklebust@netapp.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).