linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Fengguang Wu <wfg@mail.ustc.edu.cn>
To: Michael Rubin <mrubin@google.com>
Cc: a.p.zijlstra@chello.nl, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] Converting writeback linked lists to a tree based data structure
Date: Thu, 17 Jan 2008 17:41:41 +0800	[thread overview]
Message-ID: <400562938.07583@ustc.edu.cn> (raw)
Message-ID: <E1JFRFm-00011Q-0q@localhost.localdomain> (raw)
In-Reply-To: <20080115080921.70E3810653@localhost>

On Tue, Jan 15, 2008 at 12:09:21AM -0800, Michael Rubin wrote:
> 1) Adding a datastructure to guarantee fairness when writing
>    inodes to disk and simplify the code base.
> 
>    When inodes are parked for writeback they are parked in the
>    flush_tree. The flush tree is a data structure based on an rb tree.

The main benefit of rbtree is possibly better support of future policies.
Can you demonstrate an example?

(grumble:
Apart from the benefit of flexibility, I don't think it makes things
simpler, nor does the introduction of rbtree automatically fixes bugs.
Bugs can only be avoided by good understanding of all possible cases.)

The most tricky writeback issues could be starvation prevention
between
        - small/large files
        - new/old files
        - superblocks
Some kind of limit should be applied for each. They used to be:
        - requeue to s_more_io whenever MAX_WRITEBACK_PAGES is reached
          this preempts big files
        - refill s_io iif it is drained
          this prevents promotion of big/old files
        - return from sync_sb_inodes() after one go of s_io
          (todo: don't restart from the first superblock in writeback_inodes())
          this prevents busy superblock from starving others
          and ensures fairness between superblocks
 
Michael, could you sort out and document the new starvation prevention schemes?

>    Duplicate keys are handled by making a list in the tree for each key
>    value. The order of how we choose the next inode to flush is decided
>    by two fields. First the earliest dirtied_when value. If there are
>    duplicate dirtied_when values then the earliest i_flush_gen value
>    determines who gets flushed next.
> 
>    The flush tree organizes the dirtied_when keys with the rb_tree. Any
>    inodes with a duplicate dirtied_when value are link listed together. This
>    link list is sorted by the inode's i_flush_gen. When both the
>    dirtied_when and the i_flush_gen are identical the order in the
>    linked list determines the order we flush the inodes.

Introduce i_flush_gen to help restarting from the last inode?
Well, it's not as simple as list_heads.

> 2) Added an inode flag to allow inodes to be marked so that they
>    are never written back to disk.
> 
>    The motivation behind this change is several fold. The first is
>    to insure fairness in the writeback algorithm. The second is to

What do you mean by fairness? Why cannot I_WRITEBACK_NEVER be in a
decoupled standalone patch?

>    deal with a bug where the writing to large files concurrently
>    to smaller ones creates a situation where writeback cannot
>    keep up with traffic and memory baloons until the we hit the
>    threshold watermark. This can result in surprising long latency
>    with respect to disk traffic. This latency can take minutes. The

>    flush tree fixes this issue and fixes several other minor issues
>    with fairness also.

More details about the fixings, please?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-01-17  9:41 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-15  8:09 [patch] Converting writeback linked lists to a tree based data structure Michael Rubin, Michael Rubin
2008-01-15  8:46 ` Peter Zijlstra
2008-01-15 17:53   ` Michael Rubin
     [not found]     ` <E1JEyWa-0001Ys-F9@localhost.localdomain>
2008-01-16  3:01       ` Fengguang Wu
2008-01-16  3:44         ` Andrew Morton
     [not found]           ` <E1JEzqb-0003YX-Rg@localhost.localdomain>
2008-01-16  4:25             ` Fengguang Wu
2008-01-16  4:42               ` Andrew Morton
     [not found]                 ` <E1JF0It-0000yD-Mi@localhost.localdomain>
2008-01-16  4:55                   ` Fengguang Wu
2008-01-16  5:51                     ` Andrew Morton
     [not found]                       ` <E1JF4Ey-0000x4-5p@localhost.localdomain>
2008-01-16  9:07                         ` Fengguang Wu
2008-01-18  7:36                           ` Mike Waychison
2008-01-16 22:35                         ` David Chinner
     [not found]                           ` <E1JFLEW-0002oE-G1@localhost.localdomain>
2008-01-17  3:16                             ` Fengguang Wu
2008-01-17  5:21                             ` David Chinner
2008-01-16  7:55           ` David Chinner
2008-01-16  8:13             ` Andrew Morton
     [not found]               ` <E1JF7yp-0006l8-5P@localhost.localdomain>
2008-01-16 13:06                 ` Fengguang Wu
2008-01-16 18:55         ` Michael Rubin
     [not found]           ` <E1JFLTR-0002pn-4Y@localhost.localdomain>
2008-01-17  3:31             ` Fengguang Wu
     [not found] ` <E1JFRFm-00011Q-0q@localhost.localdomain>
2008-01-17  9:41   ` Fengguang Wu [this message]
2008-01-17 21:07     ` Michael Rubin
     [not found]       ` <E1JFjGz-0001eU-3O@localhost.localdomain>
2008-01-18  4:56         ` Fengguang Wu
2008-01-18  5:41           ` Andi Kleen
     [not found]             ` <E1JFkHy-0001jR-VD@localhost.localdomain>
2008-01-18  6:01               ` Fengguang Wu
2008-01-18  7:48             ` Mike Waychison
2008-01-18  6:43           ` Michael Rubin
     [not found]             ` <E1JFnZz-00015z-Vq@localhost.localdomain>
2008-01-18  9:32               ` Fengguang Wu
2008-01-18  5:01       ` David Chinner
2008-01-18  5:38         ` Michael Rubin
2008-01-18  8:54           ` David Chinner
2008-01-18  9:26             ` Michael Rubin
     [not found]         ` <E1JFjyv-0001hU-FA@localhost.localdomain>
2008-01-18  5:41           ` Fengguang Wu
2008-01-19  2:50           ` David Chinner
  -- strict thread matches above, loose matches on Subject: below --
2007-12-13  0:32 Michael Rubin, Michael Rubin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=400562938.07583@ustc.edu.cn \
    --to=wfg@mail.ustc.edu.cn \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mrubin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).