From: Fengguang Wu <wfg@mail.ustc.edu.cn>
To: Chris Mason <chris.mason@oracle.com>
Cc: Andrew Morton <akpm@osdl.org>, Ken Chen <kenchen@google.com>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Jens Axboe <jens.axboe@oracle.com>
Subject: Re: [PATCH 0/6] writeback time order/delay fixes take 3
Date: Fri, 24 Aug 2007 21:24:58 +0800 [thread overview]
Message-ID: <387961898.15210@ustc.edu.cn> (raw)
Message-ID: <20070824132458.GC7933@mail.ustc.edu.cn> (raw)
In-Reply-To: <20070822084201.2c4eceb6@think.oraclecorp.com>
On Wed, Aug 22, 2007 at 08:42:01AM -0400, Chris Mason wrote:
> > My vague idea is to
> > - keep the s_io/s_more_io as a FIFO/cyclic writeback dispatching
> > queue.
> > - convert s_dirty to some radix-tree/rbtree based data structure.
> > It would have dual functions: delayed-writeback and
> > clustered-writeback.
> > clustered-writeback:
> > - Use inode number as clue of locality, hence the key for the sorted
> > tree.
> > - Drain some more s_dirty inodes into s_io on every kupdate wakeup,
> > but do it in the ascending order of inode number instead of
> > ->dirtied_when.
> >
> > delayed-writeback:
> > - Make sure that a full scan of the s_dirty tree takes <=30s, i.e.
> > dirty_expire_interval.
>
> I think we should assume a full scan of s_dirty is impossible in the
> presence of concurrent writers. We want to be able to pick a start
> time (right now) and find all the inodes older than that start time.
> New things will come in while we're scanning. But perhaps that's what
> you're saying...
Yeah, I was thinking about elevators :)
Or call it sweeping based on address-hint(inode number).
> At any rate, we've got two types of lists now. One keeps track of age
> and the other two keep track of what is currently being written. I
> would try two things:
>
> 1) s_dirty stays a list for FIFO. s_io becomes a radix tree that
> indexes by inode number (or some arbitrary field the FS can set in the
> inode). Radix tree tags are used to indicate which things in s_io are
> already in progress or are pending (hand waving because I'm not sure
> exactly).
>
> inodes are pulled off s_dirty and the corresponding slot in s_io is
> tagged to indicate IO has started. Any nearby inodes in s_io are also
> sent down.
>
> 2) s_dirty and s_io both become radix trees. s_dirty is indexed by a
> sequence number that corresponds to age. It is treated as a big
> circular indexed list that can wrap around over time. Radix tree tags
> are used both on s_dirty and s_io to flag which inodes are in progress.
It's meaningless to convert s_io to radix tree. Because inodes on s_io
will normally be sent to block layer elevators at the same time.
Also s_dirty holds 30 seconds of inodes, while s_io only 5 seconds.
The more inodes, the more chances of good clustering. That's the
general rule.
s_dirty is the right place to do address-clustering.
As for the dirty_expire_interval parameter on dirty age,
we can apply a simple rule: do one full scan/sweep over the
fs-address-space in every 30s, syncing all inodes encountered,
and sparing those newly dirtied in less than 5s. With that rule,
any inode will get synced after being dirtied for 5-35 seconds.
-fengguang
next prev parent reply other threads:[~2007-08-24 13:25 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-12 9:11 [PATCH 0/6] writeback time order/delay fixes take 3 Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-22 0:23 ` Chris Mason
2007-08-22 1:18 ` Fengguang Wu
2007-08-22 1:18 ` Fengguang Wu
2007-08-22 12:42 ` Chris Mason
2007-08-23 2:47 ` David Chinner
2007-08-23 12:13 ` Chris Mason
2007-08-24 12:56 ` Fengguang Wu
2007-08-24 12:56 ` Fengguang Wu
2007-08-24 13:24 ` Fengguang Wu [this message]
2007-08-24 13:24 ` Fengguang Wu
2007-08-24 14:36 ` Chris Mason
2007-08-23 2:33 ` David Chinner
2007-08-24 13:55 ` Fengguang Wu
2007-08-24 13:55 ` Fengguang Wu
2007-08-28 14:55 ` David Chinner
2007-08-28 15:08 ` Chris Mason
2007-08-28 16:33 ` David Chinner
2007-08-28 16:57 ` Chris Mason
2007-08-29 7:53 ` Fengguang Wu
2007-08-29 7:53 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 1/6] writeback: fix time ordering of the per superblock inode lists 8 Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 2/6] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 3/6] writeback: remove pages_skipped accounting in __block_write_full_page() Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-13 1:03 ` David Chinner
2007-08-13 10:30 ` Fengguang Wu
2007-08-13 10:30 ` Fengguang Wu
2007-08-17 7:13 ` Fengguang Wu
2007-08-17 7:13 ` Fengguang Wu
2007-08-17 7:13 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 4/6] check dirty inode list Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 5/6] prevent time-ordering warnings Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 6/6] track redirty_tail() calls Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=387961898.15210@ustc.edu.cn \
--to=wfg@mail.ustc.edu.cn \
--cc=akpm@osdl.org \
--cc=chris.mason@oracle.com \
--cc=jens.axboe@oracle.com \
--cc=kenchen@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.