linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Zheng Liu <gnehzuil.liu@gmail.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org, Zheng Liu <wenqing.lz@taobao.com>
Subject: Re: [RFC][PATCH 3/9 v1] ext4: add physical block and status member into extent status tree
Date: Wed, 2 Jan 2013 12:22:55 +0100	[thread overview]
Message-ID: <20130102112255.GA30633@quack.suse.cz> (raw)
In-Reply-To: <20130101051607.GB7546@gmail.com>

On Tue 01-01-13 13:16:07, Zheng Liu wrote:
> On Mon, Dec 31, 2012 at 10:49:52PM +0100, Jan Kara wrote:
> > On Mon 24-12-12 15:55:36, Zheng Liu wrote:
> > > From: Zheng Liu <wenqing.lz@taobao.com>
> > > 
> > > es_pblk is used to record physical block that maps to the disk.  es_status is
> > > used to record the status of the extent.  Three status are defined, which are
> > > written, unwritten and delayed.
> >   So this means one extent is 48 bytes on 64-bit architectures. If I'm a
> > nasty user and create artificially fragmented file (by allocating every
> > second block), extent tree takes 6 MB per GB of file. That's quite a bit
> > and I think you need to provide a way for kernel to reclaim extent
> > structures...
> 
> Indeed, when a file has a lot of fragmentations, status tree will occupy
> a number of memory.  That is why it will be loaded on-demand.  When I make
> it, there are two solutions to load status tree.  One is loading
> on-demand, and another is loading complete extent tree in
> ext4_alloc_inode().  Finally I choose the former because it can reduce
> the pressure of memory at most of time.  But it has a disadvantage that
> status tree doesn't be fully trusted because it hasn't track a
> completely status of extent tree on disk.
  Not reading the whole extent tree in ext4_alloc_inode() is a good start
but it's not the whole solution IMHO. It saves us from unnecessary reading
of extents but still if someone reads the whole filesystem (like
grep -R "foo" /) you will still end up with all extents cached. And that
will make ext4 inodes pretty heavy in memory. Surely inode reclaim will
eventually release these inodes including cached extents but it is usually
more beneficial to cache the inode itself than more extents so allowing us
to strip cached extents without releasing inode itself would be good.

> I will provide a way to reclaim extent structures from status tree.  Now
> I have an idea in my mind that we can reclaim all extent which are
> WRITTEN/UNWRITTEN status because we always need DELAYED extent in
> fiemap, seek_data/hole and bigalloc code.  Furthermore, as you said in
> another mail, some unwritten extent which will be converted into
> written also doesn't be reclaimed.
> 
> Another question is when do these extents reclaim?  Currently when
> clear_inode() is called, the whole status tree will be reclaimed.  Maybe
> a switch in sysfs is a optional choice.  Any thoughts?
  The natural way to handle the shrinking is using 'shrinker' framework. In
this case, we could register a shrinker for shrinking extents. Just having
LRU of extents would increase the size of extent structure by 2 pointers
which is too big I'd think and I'm not yet sure how to choose extents for
reclaim in some other way. I will think about it...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2013-01-02 11:22 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-24  7:55 [RFC][PATCH 0/9 v1] ext4: extent status tree implementation (step2) Zheng Liu
2012-12-24  7:55 ` [RFC][PATCH 1/9 v1] ext4: fixup metadata reserve block warning when bigalloc and delalloc are enabled Zheng Liu
2012-12-24  7:55 ` [RFC][PATCH 2/9 v1] ext4: refine extent status tree Zheng Liu
2012-12-24  7:55 ` [RFC][PATCH 3/9 v1] ext4: add physical block and status member into " Zheng Liu
2012-12-31 21:49   ` Jan Kara
2013-01-01  5:16     ` Zheng Liu
2013-01-02 11:22       ` Jan Kara [this message]
2013-01-05  2:44         ` Zheng Liu
2013-01-08  1:27           ` Dave Chinner
2013-01-08  2:25             ` Zheng Liu
2012-12-24  7:55 ` [RFC][PATCH 4/9 v1] ext4: adjust interfaces of " Zheng Liu
2012-12-24  7:55 ` [RFC][PATCH 5/9 v1] ext4: track all extent status in " Zheng Liu
2012-12-24  7:55 ` [RFC][PATCH 6/9 v1] ext4: lookup block mapping " Zheng Liu
2012-12-24  7:55 ` [RFC][PATCH 7/9 v1] ext4: add a new convert function to convert an unwritten extent " Zheng Liu
2012-12-24  7:55 ` [RFC][PATCH 8/9 v1] ext4: refine unwritten extent conversion Zheng Liu
2012-12-31 16:36   ` Jan Kara
2012-12-31 17:04     ` Jan Kara
2012-12-31 21:58   ` Jan Kara
2013-01-01  5:24     ` Zheng Liu
2013-01-03 10:56       ` Jan Kara
2013-01-04  4:26         ` Zheng Liu
2012-12-24  7:55 ` [RFC][PATCH 9/9 v1] ext4: set dioread_nolock by default for extent-based files Zheng Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130102112255.GA30633@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=gnehzuil.liu@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=wenqing.lz@taobao.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).