From: Mike Waychison <mikew@google.com>
To: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michael Rubin <mrubin@google.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] Converting writeback linked lists to a tree based data structure
Date: Thu, 17 Jan 2008 23:36:46 -0800 [thread overview]
Message-ID: <4790570E.80709@google.com> (raw)
In-Reply-To: <400474447.19383@ustc.edu.cn>
Fengguang Wu wrote:
> On Tue, Jan 15, 2008 at 09:51:49PM -0800, Andrew Morton wrote:
>> On Wed, 16 Jan 2008 12:55:07 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
>>
>>> On Tue, Jan 15, 2008 at 08:42:36PM -0800, Andrew Morton wrote:
>>>> On Wed, 16 Jan 2008 12:25:53 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
>>>>
>>>>> list_heads are OK if we use them for one and only function.
>>>> Not really. They're inappropriate when you wish to remember your
>>>> position in the list while you dropped the lock (as we must do in
>>>> writeback).
>>>>
>>>> A data structure which permits us to interate across the search key rather
>>>> than across the actual storage locations is more appropriate.
>>> I totally agree with you. What I mean is to first do the split of
>>> functions - into three: ordering, starvation prevention, and blockade
>>> waiting.
>> Does "ordering" here refer to ordering bt time-of-first-dirty?
>
> Ordering by dirtied_when or i_ino, either is OK.
>
>> What is "blockade waiting"?
>
> Some inodes/pages cannot be synced now for some reason and should be
> retried after a while.
>
>>> Then to do better ordering by adopting radix tree(or rbtree
>>> if radix tree is not enough),
>> ordering of what?
>
> Switch from time to location.
>
Given the way LBAs are located on disk and the fact that rotational
latency is a large factor in changing locations of a drive head, any
attempts to do a C-SCAN pass are pretty much useless. Further
complicating this is any volume management that sits between the fs and
the actual storage.
A nice feature to have longer term is to have the write_inodes paths for
background flushing understand storage congestion _through_ any volume
management. This would allow us to back off background flushing on a per
spindle basis (when using drives of course) and avoid write congestion
in both the io scheduler and in the drive's writecaches, which I
believe, but don't have hard evidence, get congested today, knocking the
drive into a fifo fashion in firmware.
A data structure that allows us to keep a dirtied_when values consistent
across back-offs and blocking allows us to further develop the
background writeout paths to get to this point (though exposing this
congestion information will require more work deeper in the stack).
>>> and lastly get rid of the list_heads to
>>> avoid locking. Does it sound like a good path?
>> I'd have thaought that replacing list_heads with another data structure
>> would be a simgle commit.
>
> That would be easy. s_more_io and s_more_io_wait can all be converted
> to radix trees.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
WARNING: multiple messages have this Message-ID (diff)
From: Mike Waychison <mikew@google.com>
To: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michael Rubin <mrubin@google.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] Converting writeback linked lists to a tree based data structure
Date: Thu, 17 Jan 2008 23:36:46 -0800 [thread overview]
Message-ID: <4790570E.80709@google.com> (raw)
In-Reply-To: <400474447.19383@ustc.edu.cn>
Fengguang Wu wrote:
> On Tue, Jan 15, 2008 at 09:51:49PM -0800, Andrew Morton wrote:
>> On Wed, 16 Jan 2008 12:55:07 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
>>
>>> On Tue, Jan 15, 2008 at 08:42:36PM -0800, Andrew Morton wrote:
>>>> On Wed, 16 Jan 2008 12:25:53 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
>>>>
>>>>> list_heads are OK if we use them for one and only function.
>>>> Not really. They're inappropriate when you wish to remember your
>>>> position in the list while you dropped the lock (as we must do in
>>>> writeback).
>>>>
>>>> A data structure which permits us to interate across the search key rather
>>>> than across the actual storage locations is more appropriate.
>>> I totally agree with you. What I mean is to first do the split of
>>> functions - into three: ordering, starvation prevention, and blockade
>>> waiting.
>> Does "ordering" here refer to ordering bt time-of-first-dirty?
>
> Ordering by dirtied_when or i_ino, either is OK.
>
>> What is "blockade waiting"?
>
> Some inodes/pages cannot be synced now for some reason and should be
> retried after a while.
>
>>> Then to do better ordering by adopting radix tree(or rbtree
>>> if radix tree is not enough),
>> ordering of what?
>
> Switch from time to location.
>
Given the way LBAs are located on disk and the fact that rotational
latency is a large factor in changing locations of a drive head, any
attempts to do a C-SCAN pass are pretty much useless. Further
complicating this is any volume management that sits between the fs and
the actual storage.
A nice feature to have longer term is to have the write_inodes paths for
background flushing understand storage congestion _through_ any volume
management. This would allow us to back off background flushing on a per
spindle basis (when using drives of course) and avoid write congestion
in both the io scheduler and in the drive's writecaches, which I
believe, but don't have hard evidence, get congested today, knocking the
drive into a fifo fashion in firmware.
A data structure that allows us to keep a dirtied_when values consistent
across back-offs and blocking allows us to further develop the
background writeout paths to get to this point (though exposing this
congestion information will require more work deeper in the stack).
>>> and lastly get rid of the list_heads to
>>> avoid locking. Does it sound like a good path?
>> I'd have thaought that replacing list_heads with another data structure
>> would be a simgle commit.
>
> That would be easy. s_more_io and s_more_io_wait can all be converted
> to radix trees.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-01-18 7:37 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-15 8:09 [patch] Converting writeback linked lists to a tree based data structure Michael Rubin
2008-01-15 8:09 ` Michael Rubin, Michael Rubin
2008-01-15 8:46 ` Peter Zijlstra
2008-01-15 8:46 ` Peter Zijlstra
2008-01-15 17:53 ` Michael Rubin
2008-01-15 17:53 ` Michael Rubin
2008-01-16 3:01 ` Fengguang Wu
2008-01-16 3:01 ` Fengguang Wu
2008-01-16 3:01 ` Fengguang Wu
2008-01-16 3:44 ` Andrew Morton
2008-01-16 3:44 ` Andrew Morton
2008-01-16 4:25 ` Fengguang Wu
2008-01-16 4:25 ` Fengguang Wu
2008-01-16 4:25 ` Fengguang Wu
2008-01-16 4:42 ` Andrew Morton
2008-01-16 4:42 ` Andrew Morton
2008-01-16 4:55 ` Fengguang Wu
2008-01-16 4:55 ` Fengguang Wu
2008-01-16 4:55 ` Fengguang Wu
2008-01-16 5:51 ` Andrew Morton
2008-01-16 5:51 ` Andrew Morton
2008-01-16 9:07 ` Fengguang Wu
2008-01-16 9:07 ` Fengguang Wu
2008-01-16 9:07 ` Fengguang Wu
2008-01-18 7:36 ` Mike Waychison [this message]
2008-01-18 7:36 ` Mike Waychison
2008-01-16 22:35 ` David Chinner
2008-01-16 22:35 ` David Chinner
2008-01-17 3:16 ` Fengguang Wu
2008-01-17 3:16 ` Fengguang Wu
2008-01-17 3:16 ` Fengguang Wu
2008-01-17 5:21 ` David Chinner
2008-01-17 5:21 ` David Chinner
2008-01-16 7:55 ` David Chinner
2008-01-16 7:55 ` David Chinner
2008-01-16 8:13 ` Andrew Morton
2008-01-16 8:13 ` Andrew Morton
2008-01-16 13:06 ` Fengguang Wu
2008-01-16 13:06 ` Fengguang Wu
2008-01-16 13:06 ` Fengguang Wu
2008-01-16 18:55 ` Michael Rubin
2008-01-16 18:55 ` Michael Rubin
2008-01-17 3:31 ` Fengguang Wu
2008-01-17 3:31 ` Fengguang Wu
2008-01-17 3:31 ` Fengguang Wu
2008-01-17 9:41 ` Fengguang Wu
2008-01-17 9:41 ` Fengguang Wu
2008-01-17 9:41 ` Fengguang Wu
2008-01-17 21:07 ` Michael Rubin
2008-01-17 21:07 ` Michael Rubin
2008-01-18 4:56 ` Fengguang Wu
2008-01-18 4:56 ` Fengguang Wu
2008-01-18 4:56 ` Fengguang Wu
2008-01-18 5:41 ` Andi Kleen
2008-01-18 5:41 ` Andi Kleen
2008-01-18 6:01 ` Fengguang Wu
2008-01-18 6:01 ` Fengguang Wu
2008-01-18 6:01 ` Fengguang Wu
2008-01-18 7:48 ` Mike Waychison
2008-01-18 7:48 ` Mike Waychison
2008-01-18 6:43 ` Michael Rubin
2008-01-18 6:43 ` Michael Rubin
2008-01-18 9:32 ` Fengguang Wu
2008-01-18 9:32 ` Fengguang Wu
2008-01-18 9:32 ` Fengguang Wu
2008-01-18 5:01 ` David Chinner
2008-01-18 5:01 ` David Chinner
2008-01-18 5:38 ` Michael Rubin
2008-01-18 5:38 ` Michael Rubin
2008-01-18 8:54 ` David Chinner
2008-01-18 8:54 ` David Chinner
2008-01-18 9:26 ` Michael Rubin
2008-01-18 9:26 ` Michael Rubin
2008-01-18 5:41 ` Fengguang Wu
2008-01-18 5:41 ` Fengguang Wu
2008-01-18 5:41 ` Fengguang Wu
2008-01-19 2:50 ` David Chinner
2008-01-19 2:50 ` David Chinner
-- strict thread matches above, loose matches on Subject: below --
2007-12-13 0:32 Michael Rubin
2007-12-13 0:32 ` Michael Rubin, Michael Rubin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4790570E.80709@google.com \
--to=mikew@google.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mrubin@google.com \
--cc=wfg@mail.ustc.edu.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.