public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: tytso@mit.edu
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	xfs@oss.sgi.com
Subject: Re: [PATCH 3/4] writeback: pay attention to wbc->nr_to_write in write_cache_pages
Date: Sat, 24 Apr 2010 23:33:15 -0400	[thread overview]
Message-ID: <20100425033315.GC667@thunk.org> (raw)
In-Reply-To: <1271731314-5893-4-git-send-email-david@fromorbit.com>

On Tue, Apr 20, 2010 at 12:41:53PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> If a filesystem writes more than one page in ->writepage, write_cache_pages
> fails to notice this and continues to attempt writeback when wbc->nr_to_write
> has gone negative - this trace was captured from XFS:
> 
> 
>     wbc_writeback_start: towrt=1024
>     wbc_writepage: towrt=1024
>     wbc_writepage: towrt=0
>     wbc_writepage: towrt=-1
>     wbc_writepage: towrt=-5
>     wbc_writepage: towrt=-21
>     wbc_writepage: towrt=-85
> 
> This has adverse effects on filesystem writeback behaviour. write_cache_pages()
> needs to terminate after a certain number of pages are written, not after a
> certain number of calls to ->writepage are made. Make it observe the current
> value of wbc->nr_to_write and treat a value of <= 0 as though it is a either a
> termination condition or a trigger to reset to MAX_WRITEḆACK_PAGES for data
> integrity syncs.

Be careful here.  If you are going to write more pages than what the
writeback code has requested (the stupid no more than 1024 pages
restriction in the writeback code before it jumps to start writing
some other inode), you actually need to let the returned
wbc->nr_to_write go negative, so that wb_writeback() knows how many
pages it has written.

In other words, the writeback code assumes that 

  <orignal value of nr_to_write> - <returned wbc->nr_to_write>

is

  <number of pages actually written>

If you don't let wbc->nr_to_write go negative, the writeback code will
be confused about how many pages were _actually_ written, and the
writeback code ends up writing too much.  See commit 2faf2e1.

All of this is a crock of course.  The file system shouldn't be
second-guessing the writeback code.  Instead the writeback code should
be adaptively measuring how long it takes to were written out N pages
to a particular block device, and then decide what's the appropriate
setting for nr_to_write.  What makes sense for a USB stick, or a 4200
RPM laptop drive, may not make sense for a massive RAID array....

But since we don't have that, both XFS and ext4 have workarounds for
brain-damaged writeback behaviour.  (I did some testing, and even for
standard laptop drives the cap of 1024 pages is just Way Too Small;
that limit was set something like a decade ago, and everyone has been
afraid to change it, even though disks have gotten a wee bit faster
since those days.)

    	   	      	       	     	 - Ted

  parent reply	other threads:[~2010-04-25  3:33 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-20  2:41 [PATCH 0/4] writeback: tracing and wbc->nr_to_write fixes Dave Chinner
2010-04-20  2:41 ` [PATCH 1/4] writeback: initial tracing support Dave Chinner
2010-05-21 15:06   ` Christoph Hellwig
2010-04-20  2:41 ` [PATCH 2/4] writeback: Add tracing to balance_dirty_pages Dave Chinner
2010-04-20  2:41 ` [PATCH 3/4] writeback: pay attention to wbc->nr_to_write in write_cache_pages Dave Chinner
2010-04-22 19:07   ` Jan Kara
2010-04-25  3:33   ` tytso [this message]
2010-04-26  1:49     ` Dave Chinner
2010-04-26  2:43       ` tytso
2010-04-26  2:45         ` tytso
2010-04-27  3:30         ` Dave Chinner
2010-04-29 21:39   ` Andrew Morton
2010-04-30  6:01     ` Aneesh Kumar K. V
2010-04-30 19:43       ` Andrew Morton
2010-05-01 19:47         ` tytso
2010-04-20  2:41 ` [PATCH 4/4] xfs: remove nr_to_write writeback windup Dave Chinner
2010-04-22 19:09   ` Jan Kara
2010-04-26  0:46     ` Dave Chinner
2010-04-20  3:40 ` [PATCH 5/4] writeback: limit write_cache_pages integrity scanning to current EOF Dave Chinner
2010-04-20 23:28   ` Jamie Lokier
2010-04-20 23:31     ` Dave Chinner
2010-04-22 19:13   ` Jan Kara
2010-04-20 12:02 ` [PATCH 0/4] writeback: tracing and wbc->nr_to_write fixes Richard Kennedy
2010-04-20 23:29   ` Dave Chinner
2010-05-21 15:05 ` Christoph Hellwig
2010-05-22  0:09   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100425033315.GC667@thunk.org \
    --to=tytso@mit.edu \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox