RE: [RFC PATCH] f2fs: add extent cache base on rb-tree

linux-f2fs-devel.lists.sourceforge.net archive mirror
 help / color / mirror / Atom feed

From: Chao Yu <chao2.yu@samsung.com>
To: 'Jaegeuk Kim' <jaegeuk@kernel.org>
Cc: 'Changman Lee' <cm224.lee@samsung.com>,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: RE: [RFC PATCH] f2fs: add extent cache base on rb-tree
Date: Tue, 30 Dec 2014 18:10:21 +0800	[thread overview]
Message-ID: <004b01d02418$f77db490$e6791db0$@samsung.com> (raw)
In-Reply-To: <20141229212300.GB29975@jaegeuk-mac02.mot.com>

Hi Jaegeuk,

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Tuesday, December 30, 2014 5:23 AM
> To: Chao Yu
> Cc: 'Changman Lee'; linux-f2fs-devel@lists.sourceforge.net; linux-kernel@vger.kernel.org
> Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
> 
> Hi Chao,
> 
> On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote:
> 
> [snip]
> 
> Nice draft. :)

Thanks! :)

> 
> >
> > Please see the draft below.
> >
> > 1) Extent management:
> > If we use global management that managing all extents which are from different
> > inodes in sbi, we will face with serious lock contention when we access these
> > extents belong to different inodes concurrently, the loss may outweights the
> > gain.
> 
> Agreed.
> 
> > So we choose a local management for extent which means all extents are
> > managed by inode itself to avoid above lock contention. Addtionlly, we manage
> > all extents globally by linking all inode into a global lru list for extent
> > cache shrinker.
> > Approach:
> > 	a) build extent tree/rwlock/lru list/extent count in each inode.
> > 		*extent tree: link all extent in rb-tree;
> > 		*rwlock: protect fields when accessing extent cache concurrently;
> > 		*lru list: sort all extents in accessing time order;
> > 		*extent count: record total count of extents in cache.
> > 	b) use lru shrink list in sbi to manage all inode which cached extents.
> > 		*inode will be added or repostioned in this global list whenever
> > 		extent is being access in this inode.
> > 		*use spinlock to protect this shrink list.
> 
> 1. How about adding a data structure with inode number instead of referring
> inode pointer?
> 
> 2. How about managing extent entries globally and setting an upper bound to
> the number of extent entries instead of limiting them per each inode?
> (The rb-tree will handle many extents per inode.)

[assumption]
Approach A: global lru list;
Approach B: lru list per inode.

I have considered Approach A before writing the draft, although Approach A could
provide us higher ratio hit due to global lru, it may make lock contention more
intensively and has more memory overhead descripted below. So I choose more
conservative Approach B.
1) as extent_entry will split in Cow, we may lock extent_list more times when
move these separated extent entries in extent_list, unless we have batch mode
interface.
2) lock overhead between shrinker and lookuper and updater.
e.g. extent_list: (LRU) A -> B -> C -> D -> E -> F -> G (MRU)
(#1) has A, C, E, G in rb-tree
(#2) has B, D, F in rb-tree
shrinker op: lock list -> trylock #1 -> unlock list -> free A -> unlock #1
	lock list -> trylock #2 -> unlock list -> free B -> unlock #2
	lock list -> trylock #1 -> unlock list -> free C -> unlock #1
	...
*trylock/unlock* for all free extent entries belong to one inode will be separated
to lots of parts, resulting in more contention.
3) it has more memory overhead in each extent entry:
Approach A: need add inode number for backref and list_head *
Approach B: need add list_head *

Or maybe it's not a problem because we can afford all these above.

Anyway, I'm not against.

> 
> 3. It needs to set a minimum length for the candidate of extent cache.
>  (e.g., 64)

Is this used for shrinker? Can you please explain more about this minimum length?

> 
> So, for example,
> struct ino_entry_for_extents {
> 	inode number;
> 	rb_tree for extent_entry objects;
> 	rwlock;
> };
> 
> struct extent_entry {
> 	blkaddr, len;
> 	list_head *;

We should add one other field 'inode number' here, as through it we can get
rb-tree root from ino_entry_for_extents for rb_erase when we free the extent
entry in shrinker.

> };
> 
> Something like this.
> 
> [A, B, C, ... are extent entry]
> 
> The sbi has
> 1. an extent_list: (LRU) A -> B -> C -> D -> E -> F -> G (MRU)
> 2. radix_tree:  ino_entry_for_extents (#10) has D, B in rb-tree
>               ` ino_entry_for_extents (#11) has A, C in rb-tree
>               ` ino_entry_for_extents (#12) has F    in rb-tree
>               ` ino_entry_for_extents (#13) has G, E in rb-tree
> 
> In f2fs_update_extent_cache and __get_data_block for #10,
>   ino_entry_for_extents (#10) was founded and updated D or B.
>   Then, updated entries are moved to MRU.
> 
> In f2fs_evict_inode for #11, A and C are moved to LRU.
> But, if this inode is unlinked, all the A, C, and ino_entry_for_extens (#11)
> should be released.
> 
> In f2fs_balance_fs_bg, some LRU extents are released according to the amount
> of consumed memory. Then, it frees any ino_entry_for_extents having no extent.
> 
> IMO, we don't need to consider readahead for this, since get_data_block will
> be called by VFS readahead.

In get_data_block we can cache only one blkaddr once after get_dnode_of_data
returned, it seems less efficient. So why not readahead selectively more
contiguous valid blkaddr in get_dnode_of_data once?

> 
> Furthermore, we need to think about whether LRU is really best or not.
> IMO, the extent cache aims to improve second access speed, rather than initial
> cold misses. So, maybe MRU or another algorithms would be better.

Agree, let's rethink about this. :)

Thanks,

> 
> Thanks,
> 
> >
> > 2) Limitation:
> > In one inode, as we split or add extent in extent cache when read/write, extent
> > number will enlarge, so memory and CPU overhead will increase.
> > In order to control the overhead of memory and CPU, we try to set a upper bound
> > number to limit total extent number in each inode, This number is global
> > configuration which is visable to all inode. This number will be exported to
> > sysfs for configuring according to requirement of user. By default, designed
> > number is 8.
> >
> > 3) Shrinker:
> > There are two shrink paths:
> > 	a) one is triggered when extent count has exceed the upper bound of
> > 	inode's extent cache. We will try to release extent(s) from head of
> > 	inode's inner extent lru list until extent count is equal to upper bound.
> > 	This operation could be in f2fs_update_extent_cache().
> > 	b) the other one is triggered when memory util exceed threshold, we try
> > 	get inode from head of global lru list(s), and release extent(s) with
> > 	fixed number (by default: 64 extents) in inode one by one.
> > 	This operation could be in f2fs_balance_fs_bg().
> >
> > 4) Revalidation:
> > In ->evict(), extent cache will be released, in order to reuse extent cache
> > of inode when reopen for high hit ratio, a proper way is to add cached extent
> > tree into a hash table (could be radix tree), revalidate extent tree and recover
> > it to inode when reopen.
> > Besides, a global lru list is needed here for shrinker.
> >
> > 5) Readahead:
> > Expand extent cache by readaheading when we call get_dnode_of_data in non-emergency
> > path. Note, ra can affect lock contention for both ->rwlock in extent cache and
> > ->lock of global shrink list.
> >

next prev parent reply	other threads:[~2014-12-30 10:10 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-19 10:49 [RFC PATCH] f2fs: add extent cache base on rb-tree Chao Yu
2014-12-22  2:03 ` Changman Lee
2014-12-22  7:10   ` Chao Yu
2014-12-22 23:16     ` Jaegeuk Kim
2014-12-23  3:01       ` Chao Yu
2014-12-23  7:36         ` Jaegeuk Kim
2014-12-23  8:45           ` Changman Lee
2014-12-24  8:01           ` Chao Yu
2014-12-25  8:35             ` Jaegeuk Kim
2014-12-29  7:19               ` Chao Yu
2014-12-29 21:23                 ` Jaegeuk Kim
2014-12-30  0:32                   ` Changman Lee
2015-01-04  3:19                     ` Chao Yu
2015-01-07 11:16                       ` Changman Lee
2015-01-12  7:06                         ` Chao Yu
2014-12-30 10:10                   ` Chao Yu [this message]
2014-12-31  8:26                     ` Jaegeuk Kim
2015-01-04  8:24                       ` Chao Yu
2015-01-06 23:00                         ` Jaegeuk Kim
2015-01-12  7:04                           ` Chao Yu
2014-12-23  4:30     ` Changman Lee
2014-12-23  9:35       ` Chao Yu
2014-12-23 19:24         ` Jaegeuk Kim
2014-12-22  9:06   ` Chao Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='004b01d02418$f77db490$e6791db0$@samsung.com' \
    --to=chao2.yu@samsung.com \
    --cc=cm224.lee@samsung.com \
    --cc=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).