linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed
From: Changman Lee <cm224.lee@samsung.com>
To: Chao Yu <chao2.yu@samsung.com>
Cc: 'Jaegeuk Kim' <jaegeuk@kernel.org>,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
Date: Wed, 07 Jan 2015 20:16:34 +0900	[thread overview]
Message-ID: <20150107111634.GA4720@lcm> (raw)
In-Reply-To: <001701d027cd$5c8d08b0$15a71a10$@samsung.com>

Hi Chao,

On Sun, Jan 04, 2015 at 11:19:28AM +0800, Chao Yu wrote:
> Hi Changman,
> 
> Sorry for replying late!
> 
> > -----Original Message-----
> > From: Changman Lee [mailto:cm224.lee@samsung.com]
> > Sent: Tuesday, December 30, 2014 8:32 AM
> > To: Jaegeuk Kim
> > Cc: Chao Yu; linux-f2fs-devel@lists.sourceforge.net; linux-kernel@vger.kernel.org
> > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
> > 
> > Hi all,
> > 
> > On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote:
> > > Hi Chao,
> > >
> > > On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote:
> > >
> > > [snip]
> > >
> > > Nice draft. :)
> > >
> > > >
> > > > Please see the draft below.
> > > >
> > > > 1) Extent management:
> > > > If we use global management that managing all extents which are from different
> > > > inodes in sbi, we will face with serious lock contention when we access these
> > > > extents belong to different inodes concurrently, the loss may outweights the
> > > > gain.
> > >
> > > Agreed.
> > >
> > > > So we choose a local management for extent which means all extents are
> > > > managed by inode itself to avoid above lock contention. Addtionlly, we manage
> > > > all extents globally by linking all inode into a global lru list for extent
> > > > cache shrinker.
> > > > Approach:
> > > > 	a) build extent tree/rwlock/lru list/extent count in each inode.
> > > > 		*extent tree: link all extent in rb-tree;
> > > > 		*rwlock: protect fields when accessing extent cache concurrently;
> > > > 		*lru list: sort all extents in accessing time order;
> > > > 		*extent count: record total count of extents in cache.
> > > > 	b) use lru shrink list in sbi to manage all inode which cached extents.
> > > > 		*inode will be added or repostioned in this global list whenever
> > > > 		extent is being access in this inode.
> > > > 		*use spinlock to protect this shrink list.
> > >
> > > 1. How about adding a data structure with inode number instead of referring
> > > inode pointer?
> > >
> > > 2. How about managing extent entries globally and setting an upper bound to
> > > the number of extent entries instead of limiting them per each inode?
> > > (The rb-tree will handle many extents per inode.)
> > >
> > > 3. It needs to set a minimum length for the candidate of extent cache.
> > >  (e.g., 64)
> > >
> > 
> > Agreed.
> > 
> > > So, for example,
> > > struct ino_entry_for_extents {
> > > 	inode number;
> > > 	rb_tree for extent_entry objects;
> > > 	rwlock;
> > > };
> > >
> > > struct extent_entry {
> > > 	blkaddr, len;
> > > 	list_head *;
> > > };
> > >
> > > Something like this.
> > >
> > > [A, B, C, ... are extent entry]
> > >
> > > The sbi has
> > > 1. an extent_list: (LRU) A -> B -> C -> D -> E -> F -> G (MRU)
> > > 2. radix_tree:  ino_entry_for_extents (#10) has D, B in rb-tree
> > >               ` ino_entry_for_extents (#11) has A, C in rb-tree
> > >               ` ino_entry_for_extents (#12) has F    in rb-tree
> > >               ` ino_entry_for_extents (#13) has G, E in rb-tree
> > >
> > > In f2fs_update_extent_cache and __get_data_block for #10,
> > >   ino_entry_for_extents (#10) was founded and updated D or B.
> > >   Then, updated entries are moved to MRU.
> > >
> > > In f2fs_evict_inode for #11, A and C are moved to LRU.
> > > But, if this inode is unlinked, all the A, C, and ino_entry_for_extens (#11)
> > > should be released.
> > >
> > > In f2fs_balance_fs_bg, some LRU extents are released according to the amount
> > > of consumed memory. Then, it frees any ino_entry_for_extents having no extent.
> > >
> > > IMO, we don't need to consider readahead for this, since get_data_block will
> > > be called by VFS readahead.
> > >
> > > Furthermore, we need to think about whether LRU is really best or not.
> > > IMO, the extent cache aims to improve second access speed, rather than initial
> > > cold misses. So, maybe MRU or another algorithms would be better.
> > >
> > 
> > Right. It's very comflicated to judge which is better.
> > In read or write path, extents could be made every time. At that time, we should
> > decide which extent evicts instead of new extents if we set upper bound.
> > In update, one extent could be seperated into 3. It requires 3 insertion and 1 deletion.
> > So if update happends frequently, we could give up extent management for some ranges.
> > And we need to bring ideas from vm managemnt. For example,
> > active/inactive list and second chance to promotion, or batch work for insertion/deletion
> > 
> > I thought suddenly 'Simple is best'.
> > Let's think about better ideas together.
> 
> Yeah, how about using an opposite way to the way of page cache manager?
> 
> for example:
> node page A,B,C,D is in page cache;
> extent a,b,c,d is in extent cache;
> extent a is built from page A, ..., d is built from page D.
> page cache: LRU A -> B -> C -> D MRU
> extent cache: LRU a -> b -> c -> d MRU
> 
> If we use
> 1) the same way LRU, cache pair A-a, B-b, ... may be reclaimed in the same time as OOM.
> 2) the opposite way, maybe A,B in page cache and d,c in extent cache will be reclaimed,
> but we still can hit whole cache in combination of page cache and extent cache.
> 
> So by the way '2)' we can increase the total coverage area of page cache + extent cache.

Good idea. :)
If we decide which one is better between global lru and per inode lru,
it's expected that f2fs read performance will be more improved.
I'll wait your patch.

Thanks,

> 
> > 
> > > Thanks,
> > >
> > > >
> > > > 2) Limitation:
> > > > In one inode, as we split or add extent in extent cache when read/write, extent
> > > > number will enlarge, so memory and CPU overhead will increase.
> > > > In order to control the overhead of memory and CPU, we try to set a upper bound
> > > > number to limit total extent number in each inode, This number is global
> > > > configuration which is visable to all inode. This number will be exported to
> > > > sysfs for configuring according to requirement of user. By default, designed
> > > > number is 8.
> > > >
> > 
> > Chao,
> > It's better which # of extent are controlled globally rather than limit extents
> > per inode as Jaegeuk said to reduce extent management overhead.
> 
> It's OK.
> 
> > 
> > > > 3) Shrinker:
> > > > There are two shrink paths:
> > > > 	a) one is triggered when extent count has exceed the upper bound of
> > > > 	inode's extent cache. We will try to release extent(s) from head of
> > > > 	inode's inner extent lru list until extent count is equal to upper bound.
> > > > 	This operation could be in f2fs_update_extent_cache().
> > > > 	b) the other one is triggered when memory util exceed threshold, we try
> > > > 	get inode from head of global lru list(s), and release extent(s) with
> > > > 	fixed number (by default: 64 extents) in inode one by one.
> > > > 	This operation could be in f2fs_balance_fs_bg().
> > > >
> > 
> > Let's consider to use register_shrinker which is called by vm when
> > memory pressure happens.
> 
> Great, thanks for your reminding! :-)
> 
> Thanks,
> Yu
> 
> > 
> > > > 4) Revalidation:
> > > > In ->evict(), extent cache will be released, in order to reuse extent cache
> > > > of inode when reopen for high hit ratio, a proper way is to add cached extent
> > > > tree into a hash table (could be radix tree), revalidate extent tree and recover
> > > > it to inode when reopen.
> > > > Besides, a global lru list is needed here for shrinker.
> > > >
> > > > 5) Readahead:
> > > > Expand extent cache by readaheading when we call get_dnode_of_data in non-emergency
> > > > path. Note, ra can affect lock contention for both ->rwlock in extent cache and
> > > > ->lock of global shrink list.
> > > >

  reply	other threads:[~2015-01-07 11:16 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-19 10:49 [RFC PATCH] f2fs: add extent cache base on rb-tree Chao Yu
2014-12-22  2:03 ` Changman Lee
2014-12-22  7:10   ` Chao Yu
2014-12-22 23:16     ` Jaegeuk Kim
2014-12-23  3:01       ` Chao Yu
2014-12-23  7:36         ` Jaegeuk Kim
2014-12-23  8:45           ` Changman Lee
2014-12-24  8:01           ` Chao Yu
2014-12-25  8:35             ` Jaegeuk Kim
2014-12-29  7:19               ` Chao Yu
2014-12-29 21:23                 ` Jaegeuk Kim
2014-12-30  0:32                   ` Changman Lee
2015-01-04  3:19                     ` Chao Yu
2015-01-07 11:16                       ` Changman Lee [this message]
2015-01-12  7:06                         ` Chao Yu
2014-12-30 10:10                   ` Chao Yu
2014-12-31  8:26                     ` Jaegeuk Kim
2015-01-04  8:24                       ` Chao Yu
2015-01-06 23:00                         ` Jaegeuk Kim
2015-01-12  7:04                           ` Chao Yu
2014-12-23  4:30     ` Changman Lee
2014-12-23  9:35       ` Chao Yu
2014-12-23 19:24         ` Jaegeuk Kim
2014-12-22  9:06   ` Chao Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150107111634.GA4720@lcm \
    --to=cm224.lee@samsung.com \
    --cc=chao2.yu@samsung.com \
    --cc=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).