Re: [PATCH] Prevent large file writeback starvation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@osdl.org>
To: David Chinner <dgc@sgi.com>
Cc: dgc@sgi.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] Prevent large file writeback starvation
Date: Sun, 5 Feb 2006 22:22:15 -0800	[thread overview]
Message-ID: <20060205222215.313f30a9.akpm@osdl.org> (raw)
In-Reply-To: <20060206054815.GJ43335175@melbourne.sgi.com>

David Chinner <dgc@sgi.com> wrote:
>
> > >From a quick peek, this code:
> > 
> > 			if (wbc->for_kupdate) {
> > 				/*
> > 				 * For the kupdate function we leave the inode
> > 				 * at the head of sb_dirty so it will get more
> > 				 * writeout as soon as the queue becomes
> > 				 * uncongested.
> > 				 */
> > 				inode->i_state |= I_DIRTY_PAGES;
> > 				list_move_tail(&inode->i_list, &sb->s_dirty);
> > 
> > 
> > isn't working right any more.
> 
> If the intent is to continue writing it back until fully
> sync'd, then shouldn't we be moving that to the tail of I/O list so
> we don't have to iterate over the dirty list again before we try to
> write another chunk out?

Only if dirtied_when has expired.  Until that's true I think it's right to
move onto other (potentially expired) inodes.

Your patch leaves these inodes on s_io, actually.

> > > 
> > > It appears that it is intended to handle congested devices. The thing
> > > is, 1024 pages on writeback is not enough to congest a single disk,
> > > let alone a RAID box 10 or 100 times faster than a single disk.
> > > Hence we're stopping writeback long before we congest the device.
> > 
> > I think the comment is misleading.  The writeout pass can terminate because
> > wbc->nr_to_write was satisfied, as well as for queue congestion.
> 
> Exactly my point and what the patch addresses - it allows writeback on
> that inode to continue from where it left off if the device was not
> congested.

But what will it do to other inodes?  Say, ones which have expired?  This
inode could take many minutes to write out if it's all fragmented.

s_dirty is supposed to be kept in dirtied_when order, btw.

> > I suspect what's happened here is that someone other than pdflush has tried
> > to do some writeback and didn't set for_kupdate, so we ended up resetting
> > dirtied_when.
> 
> If it's not wb_kupdate that is trying to write it back, and we have little
> memory pressure, and we completed writing the file long ago, then what behaves
> exactly like wb_kupdate for hours on end apart from wb_kupdate?

Don't know.   I'm not sure that we exactly know what's going on yet?

The list_move_tail is supposed to put the inode at the *head* of s_dirty. 
So it's the first one which gets encountered on the next pdflush pass.

And I guess that's working OK.  Except we only write 4MB of it each five
seconds.  Is that the case?

If so, why would that happen?  Take a look at wb_kupdate().  It's supposed
to work *continuously* on the inodes until writeback_inodes() failed to
write back enough pages.  It takes this as an indication that there's no
more work to do at this time.

It'd be interesting to take a look at what's happening in wb_kupdate().

> > > Therefore, lets only move the inode back onto the dirty list if the device
> > > really is congested. Patch against 2.6.15-rc2 below.
> > 
> > This'll break something else, I bet :(
> 
> Wonderful. What needs testing to indicate something else hasn't broken?

Hard.

> Does anyone have any regression tests for this code?

No.

next prev parent reply	other threads:[~2006-02-06  6:22 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-06  4:00 [PATCH] Prevent large file writeback starvation David Chinner
2006-02-06  4:27 ` Andrew Morton
2006-02-06  5:48   ` David Chinner
2006-02-06  6:22     ` Andrew Morton [this message]
2006-02-06  6:36       ` Andrew Morton
2006-02-06 11:57         ` David Chinner
2006-02-06 11:55       ` David Chinner
2006-02-06 23:14         ` Andrew Morton
2006-02-07  0:34           ` David Chinner
2006-02-07  1:04             ` Andrew Morton
2006-02-07  1:31               ` David Chinner
2006-02-07  5:27                 ` Andrew Morton
2006-02-07  7:42                   ` David Chinner
2006-02-07 22:51                     ` Andrew Morton
2006-02-07  7:49           ` David Chinner
2006-02-06 14:36   ` Mark Lord
2006-02-06 14:39     ` Mark Lord
2006-02-06 15:53       ` Several Hangs on diferent Hardwares and diferent kernels Pedro Alves
2006-02-06 20:11       ` [PATCH] Prevent large file writeback starvation Andrew Morton
2006-02-13 13:59         ` dirty pages (Was: Re: [PATCH] Prevent large file writeback starvation) Johannes Stezenbach
2006-02-13 20:08           ` Andrew Morton
2006-02-13 22:48             ` Johannes Stezenbach
2006-02-13 23:04               ` Andrew Morton
2006-02-13 23:31                 ` Johannes Stezenbach
2006-02-13 23:52             ` Mark Lord
2006-02-14  0:50               ` Mark Lord
2006-02-14 16:32               ` Mark Lord
2006-04-11 12:42           ` Alexander Bergolth
2006-03-20 22:40         ` [PATCH] Prevent large file writeback starvation Alexander Bergolth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060205222215.313f30a9.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=dgc@sgi.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.