Re: [PATCH] Prevent large file writeback starvation

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@osdl.org>
To: David Chinner <dgc@sgi.com>
Cc: dgc@sgi.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] Prevent large file writeback starvation
Date: Sun, 5 Feb 2006 22:22:15 -0800	[thread overview]
Message-ID: <20060205222215.313f30a9.akpm@osdl.org> (raw)
In-Reply-To: <20060206054815.GJ43335175@melbourne.sgi.com>

David Chinner <dgc@sgi.com> wrote:
>
> > >From a quick peek, this code:
> > 
> > 			if (wbc->for_kupdate) {
> > 				/*
> > 				 * For the kupdate function we leave the inode
> > 				 * at the head of sb_dirty so it will get more
> > 				 * writeout as soon as the queue becomes
> > 				 * uncongested.
> > 				 */
> > 				inode->i_state |= I_DIRTY_PAGES;
> > 				list_move_tail(&inode->i_list, &sb->s_dirty);
> > 
> > 
> > isn't working right any more.
> 
> If the intent is to continue writing it back until fully
> sync'd, then shouldn't we be moving that to the tail of I/O list so
> we don't have to iterate over the dirty list again before we try to
> write another chunk out?

Only if dirtied_when has expired.  Until that's true I think it's right to
move onto other (potentially expired) inodes.

Your patch leaves these inodes on s_io, actually.

> > > 
> > > It appears that it is intended to handle congested devices. The thing
> > > is, 1024 pages on writeback is not enough to congest a single disk,
> > > let alone a RAID box 10 or 100 times faster than a single disk.
> > > Hence we're stopping writeback long before we congest the device.
> > 
> > I think the comment is misleading.  The writeout pass can terminate because
> > wbc->nr_to_write was satisfied, as well as for queue congestion.
> 
> Exactly my point and what the patch addresses - it allows writeback on
> that inode to continue from where it left off if the device was not
> congested.

But what will it do to other inodes?  Say, ones which have expired?  This
inode could take many minutes to write out if it's all fragmented.

s_dirty is supposed to be kept in dirtied_when order, btw.

> > I suspect what's happened here is that someone other than pdflush has tried
> > to do some writeback and didn't set for_kupdate, so we ended up resetting
> > dirtied_when.
> 
> If it's not wb_kupdate that is trying to write it back, and we have little
> memory pressure, and we completed writing the file long ago, then what behaves
> exactly like wb_kupdate for hours on end apart from wb_kupdate?

Don't know.   I'm not sure that we exactly know what's going on yet?

The list_move_tail is supposed to put the inode at the *head* of s_dirty. 
So it's the first one which gets encountered on the next pdflush pass.

And I guess that's working OK.  Except we only write 4MB of it each five
seconds.  Is that the case?

If so, why would that happen?  Take a look at wb_kupdate().  It's supposed
to work *continuously* on the inodes until writeback_inodes() failed to
write back enough pages.  It takes this as an indication that there's no
more work to do at this time.

It'd be interesting to take a look at what's happening in wb_kupdate().

> > > Therefore, lets only move the inode back onto the dirty list if the device
> > > really is congested. Patch against 2.6.15-rc2 below.
> > 
> > This'll break something else, I bet :(
> 
> Wonderful. What needs testing to indicate something else hasn't broken?

Hard.

> Does anyone have any regression tests for this code?

No.

next prev parent reply	other threads:[~2006-02-06  6:22 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-06  4:00 [PATCH] Prevent large file writeback starvation David Chinner
2006-02-06  4:27 ` Andrew Morton
2006-02-06  5:48   ` David Chinner
2006-02-06  6:22     ` Andrew Morton [this message]
2006-02-06  6:36       ` Andrew Morton
2006-02-06 11:57         ` David Chinner
2006-02-06 11:55       ` David Chinner
2006-02-06 23:14         ` Andrew Morton
2006-02-07  0:34           ` David Chinner
2006-02-07  1:04             ` Andrew Morton
2006-02-07  1:31               ` David Chinner
2006-02-07  5:27                 ` Andrew Morton
2006-02-07  7:42                   ` David Chinner
2006-02-07 22:51                     ` Andrew Morton
2006-02-07  7:49           ` David Chinner
2006-02-06 14:36   ` Mark Lord
2006-02-06 14:39     ` Mark Lord
2006-02-06 20:11       ` Andrew Morton
2006-02-13 13:59         ` dirty pages (Was: Re: [PATCH] Prevent large file writeback starvation) Johannes Stezenbach
2006-02-13 20:08           ` Andrew Morton
2006-02-13 22:48             ` Johannes Stezenbach
2006-02-13 23:04               ` Andrew Morton
2006-02-13 23:31                 ` Johannes Stezenbach
2006-02-13 23:52             ` Mark Lord
2006-02-14  0:50               ` Mark Lord
2006-02-14 16:32               ` Mark Lord
2006-04-11 12:42           ` Alexander Bergolth
2006-03-20 22:40         ` [PATCH] Prevent large file writeback starvation Alexander Bergolth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060205222215.313f30a9.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=dgc@sgi.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).