All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Waychison <mikew@google.com>
To: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michael Rubin <mrubin@google.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] Converting writeback linked lists to a tree based data structure
Date: Thu, 17 Jan 2008 23:36:46 -0800	[thread overview]
Message-ID: <4790570E.80709@google.com> (raw)
In-Reply-To: <400474447.19383@ustc.edu.cn>

Fengguang Wu wrote:
> On Tue, Jan 15, 2008 at 09:51:49PM -0800, Andrew Morton wrote:
>> On Wed, 16 Jan 2008 12:55:07 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
>>
>>> On Tue, Jan 15, 2008 at 08:42:36PM -0800, Andrew Morton wrote:
>>>> On Wed, 16 Jan 2008 12:25:53 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
>>>>
>>>>> list_heads are OK if we use them for one and only function.
>>>> Not really.  They're inappropriate when you wish to remember your
>>>> position in the list while you dropped the lock (as we must do in
>>>> writeback).
>>>>
>>>> A data structure which permits us to interate across the search key rather
>>>> than across the actual storage locations is more appropriate.
>>> I totally agree with you. What I mean is to first do the split of
>>> functions - into three: ordering, starvation prevention, and blockade
>>> waiting.
>> Does "ordering" here refer to ordering bt time-of-first-dirty?
> 
> Ordering by dirtied_when or i_ino, either is OK.
> 
>> What is "blockade waiting"?
> 
> Some inodes/pages cannot be synced now for some reason and should be
> retried after a while.
> 
>>> Then to do better ordering by adopting radix tree(or rbtree
>>> if radix tree is not enough),
>> ordering of what?
> 
> Switch from time to location.
> 

Given the way LBAs are located on disk and the fact that rotational 
latency is a large factor in changing locations of a drive head, any 
attempts to do a C-SCAN pass are pretty much useless.  Further 
complicating this is any volume management that sits between the fs and 
the actual storage.

A nice feature to have longer term is to have the write_inodes paths for 
background flushing understand storage congestion _through_ any volume 
management. This would allow us to back off background flushing on a per 
spindle basis (when using drives of course) and avoid write congestion 
in both the io scheduler and in the drive's writecaches, which I 
believe, but don't have hard evidence, get congested today, knocking the 
drive into a fifo fashion in firmware.

A data structure that allows us to keep a dirtied_when values consistent 
across back-offs and blocking allows us to further develop the 
background writeout paths to get to this point (though exposing this 
congestion information will require more work deeper in the stack).

>>> and lastly get rid of the list_heads to
>>> avoid locking. Does it sound like a good path?
>> I'd have thaought that replacing list_heads with another data structure
>> would be a simgle commit.
> 
> That would be easy. s_more_io and s_more_io_wait can all be converted
> to radix trees.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


WARNING: multiple messages have this Message-ID (diff)
From: Mike Waychison <mikew@google.com>
To: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michael Rubin <mrubin@google.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] Converting writeback linked lists to a tree based data structure
Date: Thu, 17 Jan 2008 23:36:46 -0800	[thread overview]
Message-ID: <4790570E.80709@google.com> (raw)
In-Reply-To: <400474447.19383@ustc.edu.cn>

Fengguang Wu wrote:
> On Tue, Jan 15, 2008 at 09:51:49PM -0800, Andrew Morton wrote:
>> On Wed, 16 Jan 2008 12:55:07 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
>>
>>> On Tue, Jan 15, 2008 at 08:42:36PM -0800, Andrew Morton wrote:
>>>> On Wed, 16 Jan 2008 12:25:53 +0800 Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
>>>>
>>>>> list_heads are OK if we use them for one and only function.
>>>> Not really.  They're inappropriate when you wish to remember your
>>>> position in the list while you dropped the lock (as we must do in
>>>> writeback).
>>>>
>>>> A data structure which permits us to interate across the search key rather
>>>> than across the actual storage locations is more appropriate.
>>> I totally agree with you. What I mean is to first do the split of
>>> functions - into three: ordering, starvation prevention, and blockade
>>> waiting.
>> Does "ordering" here refer to ordering bt time-of-first-dirty?
> 
> Ordering by dirtied_when or i_ino, either is OK.
> 
>> What is "blockade waiting"?
> 
> Some inodes/pages cannot be synced now for some reason and should be
> retried after a while.
> 
>>> Then to do better ordering by adopting radix tree(or rbtree
>>> if radix tree is not enough),
>> ordering of what?
> 
> Switch from time to location.
> 

Given the way LBAs are located on disk and the fact that rotational 
latency is a large factor in changing locations of a drive head, any 
attempts to do a C-SCAN pass are pretty much useless.  Further 
complicating this is any volume management that sits between the fs and 
the actual storage.

A nice feature to have longer term is to have the write_inodes paths for 
background flushing understand storage congestion _through_ any volume 
management. This would allow us to back off background flushing on a per 
spindle basis (when using drives of course) and avoid write congestion 
in both the io scheduler and in the drive's writecaches, which I 
believe, but don't have hard evidence, get congested today, knocking the 
drive into a fifo fashion in firmware.

A data structure that allows us to keep a dirtied_when values consistent 
across back-offs and blocking allows us to further develop the 
background writeout paths to get to this point (though exposing this 
congestion information will require more work deeper in the stack).

>>> and lastly get rid of the list_heads to
>>> avoid locking. Does it sound like a good path?
>> I'd have thaought that replacing list_heads with another data structure
>> would be a simgle commit.
> 
> That would be easy. s_more_io and s_more_io_wait can all be converted
> to radix trees.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-01-18  7:37 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-15  8:09 [patch] Converting writeback linked lists to a tree based data structure Michael Rubin
2008-01-15  8:09 ` Michael Rubin, Michael Rubin
2008-01-15  8:46 ` Peter Zijlstra
2008-01-15  8:46   ` Peter Zijlstra
2008-01-15 17:53   ` Michael Rubin
2008-01-15 17:53     ` Michael Rubin
2008-01-16  3:01     ` Fengguang Wu
2008-01-16  3:01       ` Fengguang Wu
2008-01-16  3:01         ` Fengguang Wu
2008-01-16  3:44         ` Andrew Morton
2008-01-16  3:44           ` Andrew Morton
2008-01-16  4:25           ` Fengguang Wu
2008-01-16  4:25             ` Fengguang Wu
2008-01-16  4:25               ` Fengguang Wu
2008-01-16  4:42               ` Andrew Morton
2008-01-16  4:42                 ` Andrew Morton
2008-01-16  4:55                 ` Fengguang Wu
2008-01-16  4:55                   ` Fengguang Wu
2008-01-16  4:55                     ` Fengguang Wu
2008-01-16  5:51                     ` Andrew Morton
2008-01-16  5:51                       ` Andrew Morton
2008-01-16  9:07                       ` Fengguang Wu
2008-01-16  9:07                         ` Fengguang Wu
2008-01-16  9:07                           ` Fengguang Wu
2008-01-18  7:36                           ` Mike Waychison [this message]
2008-01-18  7:36                             ` Mike Waychison
2008-01-16 22:35                         ` David Chinner
2008-01-16 22:35                           ` David Chinner
2008-01-17  3:16                           ` Fengguang Wu
2008-01-17  3:16                             ` Fengguang Wu
2008-01-17  3:16                               ` Fengguang Wu
2008-01-17  5:21                             ` David Chinner
2008-01-17  5:21                               ` David Chinner
2008-01-16  7:55           ` David Chinner
2008-01-16  7:55             ` David Chinner
2008-01-16  8:13             ` Andrew Morton
2008-01-16  8:13               ` Andrew Morton
2008-01-16 13:06               ` Fengguang Wu
2008-01-16 13:06                 ` Fengguang Wu
2008-01-16 13:06                   ` Fengguang Wu
2008-01-16 18:55         ` Michael Rubin
2008-01-16 18:55           ` Michael Rubin
2008-01-17  3:31           ` Fengguang Wu
2008-01-17  3:31             ` Fengguang Wu
2008-01-17  3:31               ` Fengguang Wu
2008-01-17  9:41 ` Fengguang Wu
2008-01-17  9:41   ` Fengguang Wu
2008-01-17  9:41     ` Fengguang Wu
2008-01-17 21:07     ` Michael Rubin
2008-01-17 21:07       ` Michael Rubin
2008-01-18  4:56       ` Fengguang Wu
2008-01-18  4:56         ` Fengguang Wu
2008-01-18  4:56           ` Fengguang Wu
2008-01-18  5:41           ` Andi Kleen
2008-01-18  5:41             ` Andi Kleen
2008-01-18  6:01             ` Fengguang Wu
2008-01-18  6:01               ` Fengguang Wu
2008-01-18  6:01                 ` Fengguang Wu
2008-01-18  7:48             ` Mike Waychison
2008-01-18  7:48               ` Mike Waychison
2008-01-18  6:43           ` Michael Rubin
2008-01-18  6:43             ` Michael Rubin
2008-01-18  9:32             ` Fengguang Wu
2008-01-18  9:32               ` Fengguang Wu
2008-01-18  9:32                 ` Fengguang Wu
2008-01-18  5:01       ` David Chinner
2008-01-18  5:01         ` David Chinner
2008-01-18  5:38         ` Michael Rubin
2008-01-18  5:38           ` Michael Rubin
2008-01-18  8:54           ` David Chinner
2008-01-18  8:54             ` David Chinner
2008-01-18  9:26             ` Michael Rubin
2008-01-18  9:26               ` Michael Rubin
2008-01-18  5:41         ` Fengguang Wu
2008-01-18  5:41           ` Fengguang Wu
2008-01-18  5:41             ` Fengguang Wu
2008-01-19  2:50           ` David Chinner
2008-01-19  2:50             ` David Chinner
  -- strict thread matches above, loose matches on Subject: below --
2007-12-13  0:32 Michael Rubin
2007-12-13  0:32 ` Michael Rubin, Michael Rubin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4790570E.80709@google.com \
    --to=mikew@google.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mrubin@google.com \
    --cc=wfg@mail.ustc.edu.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.