Re: NFS page states & writeback

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
To: Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
	"linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Subject: Re: NFS page states & writeback
Date: Fri, 25 Mar 2011 20:39:57 +1100	[thread overview]
Message-ID: <20110325093957.GL26611@dastard> (raw)
In-Reply-To: <20110325070054.GA5970@localhost>

On Fri, Mar 25, 2011 at 03:00:54PM +0800, Wu Fengguang wrote:
> Hi Jan,
> 
> On Fri, Mar 25, 2011 at 09:28:03AM +0800, Jan Kara wrote:
> >   Hi,
> > 
> >   while working on changes to balance_dirty_pages() I was investigating why
> > NFS writeback is *so* bumpy when I do not call writeback_inodes_wb() from
> > balance_dirty_pages(). Take a single dd writing to NFS. What I can
> > see is that we quickly accumulate dirty pages upto limit - ~700 MB on that
> > machine. So flusher thread starts working and in an instant all these ~700
> > MB transition from Dirty state to Writeback state. Then, as server acks
> 
> That can be fixed by the following patch:
> 
>         [PATCH 09/27] nfs: writeback pages wait queue
>         https://lkml.org/lkml/2011/3/3/79

I don't think this is a good definition of write congestion for a
NFS (or any other network fs) client. Firstly, writeback congestion
is really dependent on the size of the network send window
remaining. That is, if you've filled the socket buffer with writes
and would block trying to queue more pages on the socket, then you
are congested. i.e. the measure of congestion is the rate at which
write request can be sent to the server and processed by the server.

Secondly, the big problem that causes the lumpiness is that we only
send commits when we reach at large threshold of unstable pages.
Because most servers tend to cache large writes in RAM,
the server might have a long commit latency because it has to write
hundred of MB of data to disk to complete the commit.

IOWs, the client sends the commit only when it really needs the
pages the be cleaned, and then we have the latency of the server
write before it responds that they are clean. Hence commits can take
a long time to complete and mark pages clean on the client side.

A solution that IRIX used for this problem was the concept of a
background commit. While doing writeback on an inode, if it sent
more than than a certain threshold of data (typically in the range
of 0.5-2s worth of data) to the server without a commit being
issued, it would send an _asynchronous_ commit with the current dirty
range to the server. That way the server starts writing the data
before it hits dirty thresholds (i.e. prevents GBs of dirty data
being cached on the server so commit lantecy is kept low).

When the background commit completes the NFS client can then convert
pages in the commit range to clean. Hence we keep the number of
unstable pages under control without needing to wait for a certain
number of unstable pages to build up before commits are triggered.
This allows the process of writing dirty pages to clean
unstable pages at roughly the same rate as the write rate without
needing any magic thresholds to be configured....

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2011-03-25  9:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-25  1:28 NFS page states & writeback Jan Kara
2011-03-25  4:47 ` Dave Chinner
2011-03-25  7:11   ` Wu Fengguang
2011-03-25 22:24   ` Jan Kara
     [not found]     ` <20110325222458.GB26932-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2011-03-25 23:04       ` Dave Chinner
2011-03-25  7:00 ` Wu Fengguang
2011-03-25  9:39   ` Dave Chinner [this message]
2011-03-25 14:22     ` Wu Fengguang
2011-03-25 14:32       ` Wu Fengguang
2011-03-25 18:26       ` Jan Kara
2011-03-25 22:55       ` Dave Chinner
2011-03-25 23:24         ` Jan Kara
2011-03-26  1:18           ` Dave Chinner
2011-03-27 15:26             ` Trond Myklebust
     [not found]               ` <1301239601.22136.23.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-03-28  0:23                 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110325093957.GL26611@dastard \
    --to=david-fqsqvqoi3ljby3ivrkzq2a@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=jack-AlSwsSmVLrQ@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).