Re: [patch] Converting writeback linked lists to a tree based data structure

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Fengguang Wu <wfg@mail.ustc.edu.cn>
To: Michael Rubin <mrubin@google.com>
Cc: a.p.zijlstra@chello.nl, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] Converting writeback linked lists to a tree based data structure
Date: Fri, 18 Jan 2008 17:32:03 +0800	[thread overview]
Message-ID: <400651538.20437@ustc.edu.cn> (raw)
Message-ID: <E1JFnZz-00015z-Vq@localhost.localdomain> (raw)
In-Reply-To: <532480950801172243i21341a02s983a9e59b182c53e@mail.gmail.com>

On Thu, Jan 17, 2008 at 10:43:15PM -0800, Michael Rubin wrote:
> On Jan 17, 2008 8:56 PM, Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
> > On Thu, Jan 17, 2008 at 01:07:05PM -0800, Michael Rubin wrote:
> > Suppose we want to grant longer expiration window for temp files,
> > adding a new list named s_dirty_tmpfile would be a handy solution.
> 
> When you mean tmp do you mean files that eventually get written to

Yes, they are disk based and can be synced on.

> disk? If not I would just use the WRITEBACK_NEVER. If so I am not sure
> if that feature is worth making a special case. It seems like the
> location based ideas may be more useful.

I'm not interested in WRITEBACK_NEVER or location based writeback
for now :-)

> > > >         - refill s_io iif it is drained
> > > >           this prevents promotion of big/old files
> > >
> > > Once a big file gets its first do_writepages it is moved behind the
> > > other smaller files via i_flushed_when. And the same in reverse for
> > > big vs old.
> >
> > You mean i_flush_gen?
> 
> Yeah sorry. It was once called i_flush_when. (sheepish)
> 
> > No, sync_sb_inodes() will abort on every
> > MAX_WRITEBACK_PAGES, and s_flush_gen will be updated accordingly.
> > Hence the sync will restart from big/old files.
> 
> If I understand you correctly I am not sure I agree. Here is what I
> think happens in the patch:
> 
> 1) pull big inode off of flush tree
> 2) sync big inode
> 3) Hit MAX_WRITEBACK_PAGES
> 4) Re-insert big inode (without modifying the dirtied_when)
> 5) update the i_flush_gen on big inode and re-insert behind small
> inodes we have not synced yet.
> 
> In a subsequent sync_sb_inode we end up retrieving the small inode we
> had not serviced yet.

Yes, exactly. And then it will continue to sync the big one again.
It will never be able to move forward to the next dirtied_when before
exhausting the inodes in the current list(with the oldest dirtied_when).

> > > >         - return from sync_sb_inodes() after one go of s_io
> > >
> > > I am not sure how this limit helps things out. Is this for superblock
> > > starvation? Can you elaborate?
> >
> > We should have a way to go to next superblock even if new dirty inodes
> > or pages are emerging fast in this superblock. Fill and drain s_io
> > only once and then abort helps.
> 
> Got it.
> 
> > s_io is a stable and bounded working set in one go of superblock.
> 
> Is this necessary with MAX_WRITEBACK_PAGES? It feels like a double limit.

We need a limit and continuing scheme at each level. It was so hard to
sort them out, that I'm really reluctant to restart all the fuss again.

> > Basically you make one list_head in each rbtree node.
> > That list_head is recycled cyclic, and is an analog to the old
> > fashioned s_dirty. We need to know 'where we are' and 'where it ends'.
> > So an extra indicator must be introduced - i_flush_gen. It's awkward.
> > We are simply repeating the aged list_heads' problem.
> 
> To me they both feel a little awkward. I feel like the original
> problem in 2.6.23 led to a lot of examination which is bringing new
> possibilities to light.
> 
> BTW the issue that started me on this whole path (starving large
> files) was still present in 2.6.23-rc8 but now looks fixed in
> 2.6.24-rc3.
> Still no idea about your changes in 2.6.24-rc6-mm1. I have given up
> trying to get that thing to boot.

Hehe, I guess the bug is still there in 2.6.24-rc3. But should be gone
in the latest patchset.

WARNING: multiple messages have this Message-ID (diff)

From: Fengguang Wu <wfg@mail.ustc.edu.cn>
To: Michael Rubin <mrubin@google.com>
Cc: a.p.zijlstra@chello.nl, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] Converting writeback linked lists to a tree based data structure
Date: Fri, 18 Jan 2008 17:32:03 +0800	[thread overview]
Message-ID: <400651538.20437@ustc.edu.cn> (raw)
Message-ID: <E1JFnZz-00015z-Vq@localhost.localdomain> (raw)
In-Reply-To: <532480950801172243i21341a02s983a9e59b182c53e@mail.gmail.com>

On Thu, Jan 17, 2008 at 10:43:15PM -0800, Michael Rubin wrote:
> On Jan 17, 2008 8:56 PM, Fengguang Wu <wfg@mail.ustc.edu.cn> wrote:
> > On Thu, Jan 17, 2008 at 01:07:05PM -0800, Michael Rubin wrote:
> > Suppose we want to grant longer expiration window for temp files,
> > adding a new list named s_dirty_tmpfile would be a handy solution.
> 
> When you mean tmp do you mean files that eventually get written to

Yes, they are disk based and can be synced on.

> disk? If not I would just use the WRITEBACK_NEVER. If so I am not sure
> if that feature is worth making a special case. It seems like the
> location based ideas may be more useful.

I'm not interested in WRITEBACK_NEVER or location based writeback
for now :-)

> > > >         - refill s_io iif it is drained
> > > >           this prevents promotion of big/old files
> > >
> > > Once a big file gets its first do_writepages it is moved behind the
> > > other smaller files via i_flushed_when. And the same in reverse for
> > > big vs old.
> >
> > You mean i_flush_gen?
> 
> Yeah sorry. It was once called i_flush_when. (sheepish)
> 
> > No, sync_sb_inodes() will abort on every
> > MAX_WRITEBACK_PAGES, and s_flush_gen will be updated accordingly.
> > Hence the sync will restart from big/old files.
> 
> If I understand you correctly I am not sure I agree. Here is what I
> think happens in the patch:
> 
> 1) pull big inode off of flush tree
> 2) sync big inode
> 3) Hit MAX_WRITEBACK_PAGES
> 4) Re-insert big inode (without modifying the dirtied_when)
> 5) update the i_flush_gen on big inode and re-insert behind small
> inodes we have not synced yet.
> 
> In a subsequent sync_sb_inode we end up retrieving the small inode we
> had not serviced yet.

Yes, exactly. And then it will continue to sync the big one again.
It will never be able to move forward to the next dirtied_when before
exhausting the inodes in the current list(with the oldest dirtied_when).

> > > >         - return from sync_sb_inodes() after one go of s_io
> > >
> > > I am not sure how this limit helps things out. Is this for superblock
> > > starvation? Can you elaborate?
> >
> > We should have a way to go to next superblock even if new dirty inodes
> > or pages are emerging fast in this superblock. Fill and drain s_io
> > only once and then abort helps.
> 
> Got it.
> 
> > s_io is a stable and bounded working set in one go of superblock.
> 
> Is this necessary with MAX_WRITEBACK_PAGES? It feels like a double limit.

We need a limit and continuing scheme at each level. It was so hard to
sort them out, that I'm really reluctant to restart all the fuss again.

> > Basically you make one list_head in each rbtree node.
> > That list_head is recycled cyclic, and is an analog to the old
> > fashioned s_dirty. We need to know 'where we are' and 'where it ends'.
> > So an extra indicator must be introduced - i_flush_gen. It's awkward.
> > We are simply repeating the aged list_heads' problem.
> 
> To me they both feel a little awkward. I feel like the original
> problem in 2.6.23 led to a lot of examination which is bringing new
> possibilities to light.
> 
> BTW the issue that started me on this whole path (starving large
> files) was still present in 2.6.23-rc8 but now looks fixed in
> 2.6.24-rc3.
> Still no idea about your changes in 2.6.24-rc6-mm1. I have given up
> trying to get that thing to boot.

Hehe, I guess the bug is still there in 2.6.24-rc3. But should be gone
in the latest patchset.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-01-18 10:19 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-15  8:09 [patch] Converting writeback linked lists to a tree based data structure Michael Rubin
2008-01-15  8:09 ` Michael Rubin, Michael Rubin
2008-01-15  8:46 ` Peter Zijlstra
2008-01-15  8:46   ` Peter Zijlstra
2008-01-15 17:53   ` Michael Rubin
2008-01-15 17:53     ` Michael Rubin
2008-01-16  3:01     ` Fengguang Wu
2008-01-16  3:01       ` Fengguang Wu
2008-01-16  3:01         ` Fengguang Wu
2008-01-16  3:44         ` Andrew Morton
2008-01-16  3:44           ` Andrew Morton
2008-01-16  4:25           ` Fengguang Wu
2008-01-16  4:25             ` Fengguang Wu
2008-01-16  4:25               ` Fengguang Wu
2008-01-16  4:42               ` Andrew Morton
2008-01-16  4:42                 ` Andrew Morton
2008-01-16  4:55                 ` Fengguang Wu
2008-01-16  4:55                   ` Fengguang Wu
2008-01-16  4:55                     ` Fengguang Wu
2008-01-16  5:51                     ` Andrew Morton
2008-01-16  5:51                       ` Andrew Morton
2008-01-16  9:07                       ` Fengguang Wu
2008-01-16  9:07                         ` Fengguang Wu
2008-01-16  9:07                           ` Fengguang Wu
2008-01-18  7:36                           ` Mike Waychison
2008-01-18  7:36                             ` Mike Waychison
2008-01-16 22:35                         ` David Chinner
2008-01-16 22:35                           ` David Chinner
2008-01-17  3:16                           ` Fengguang Wu
2008-01-17  3:16                             ` Fengguang Wu
2008-01-17  3:16                               ` Fengguang Wu
2008-01-17  5:21                             ` David Chinner
2008-01-17  5:21                               ` David Chinner
2008-01-16  7:55           ` David Chinner
2008-01-16  7:55             ` David Chinner
2008-01-16  8:13             ` Andrew Morton
2008-01-16  8:13               ` Andrew Morton
2008-01-16 13:06               ` Fengguang Wu
2008-01-16 13:06                 ` Fengguang Wu
2008-01-16 13:06                   ` Fengguang Wu
2008-01-16 18:55         ` Michael Rubin
2008-01-16 18:55           ` Michael Rubin
2008-01-17  3:31           ` Fengguang Wu
2008-01-17  3:31             ` Fengguang Wu
2008-01-17  3:31               ` Fengguang Wu
2008-01-17  9:41 ` Fengguang Wu
2008-01-17  9:41   ` Fengguang Wu
2008-01-17  9:41     ` Fengguang Wu
2008-01-17 21:07     ` Michael Rubin
2008-01-17 21:07       ` Michael Rubin
2008-01-18  4:56       ` Fengguang Wu
2008-01-18  4:56         ` Fengguang Wu
2008-01-18  4:56           ` Fengguang Wu
2008-01-18  5:41           ` Andi Kleen
2008-01-18  5:41             ` Andi Kleen
2008-01-18  6:01             ` Fengguang Wu
2008-01-18  6:01               ` Fengguang Wu
2008-01-18  6:01                 ` Fengguang Wu
2008-01-18  7:48             ` Mike Waychison
2008-01-18  7:48               ` Mike Waychison
2008-01-18  6:43           ` Michael Rubin
2008-01-18  6:43             ` Michael Rubin
2008-01-18  9:32             ` Fengguang Wu [this message]
2008-01-18  9:32               ` Fengguang Wu
2008-01-18  9:32                 ` Fengguang Wu
2008-01-18  5:01       ` David Chinner
2008-01-18  5:01         ` David Chinner
2008-01-18  5:38         ` Michael Rubin
2008-01-18  5:38           ` Michael Rubin
2008-01-18  8:54           ` David Chinner
2008-01-18  8:54             ` David Chinner
2008-01-18  9:26             ` Michael Rubin
2008-01-18  9:26               ` Michael Rubin
2008-01-18  5:41         ` Fengguang Wu
2008-01-18  5:41           ` Fengguang Wu
2008-01-18  5:41             ` Fengguang Wu
2008-01-19  2:50           ` David Chinner
2008-01-19  2:50             ` David Chinner
  -- strict thread matches above, loose matches on Subject: below --
2007-12-13  0:32 Michael Rubin
2007-12-13  0:32 ` Michael Rubin, Michael Rubin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=400651538.20437@ustc.edu.cn \
    --to=wfg@mail.ustc.edu.cn \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mrubin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.