From: Zheng Liu <gnehzuil.liu@gmail.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 5/5] ext4: add new ioctl EXT4_IOC_PRECACHE_EXTENTS
Date: Thu, 18 Jul 2013 21:06:11 +0800 [thread overview]
Message-ID: <20130718130611.GB14274@gmail.com> (raw)
In-Reply-To: <20130718025025.GA30405@thunk.org>
Hi Ted,
Thanks for your explanation. I can always learn something from your
reply. :-)
On Wed, Jul 17, 2013 at 10:50:25PM -0400, Theodore Ts'o wrote:
> On Thu, Jul 18, 2013 at 09:19:41AM +0800, Zheng Liu wrote:
> >
> > If I understand correctly, we don't want to reclaim from an inode with
> > EXT4_STATE_EXT_PRECACHED flag when __ratelimit() returns 0, right?
>
> No, the intent of the code was to make sure we don't trigger the
> warning too often, in case the system is under massive memory
> pressure. In the original implementation of this ioctl which we used
> at Google (with an extent cache that was much less functional than the
> extent status tree we now have upstream), the extents were pinned in
> memory permanently, until the inode is evicted from memory.
>
> I thought about doing this, since normally the cached extents will
> take less memory than the extent tree in the buffer cache (especially
> in any sane setup where the large tablespace, etc., files are are
> fallocated in advance and are largely contiguous). But for upstream,
> I was concerned that someone might deliberately create lots of
> fragmented files, and then call the precache ioctl on all of them.
Yes, at least for a internet company we can control everything, but for
upstream the kernel might run under some weird environments. The lesson
from this is that I need to think deeply for non-internet applications,
and make a better design. Now I fully agree with you about this
implementation. Meanwhile the patch looks good to me.
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
- Zheng
>
> So what I did was to change the sort function such that the shrinker
> would put those files at the end of the list. And although it's not
> in the patch that I've sent out, I've since changed it so that if the
> head of the list is an precached inode, and it's been more than 5
> seconds, we force a resort of the list.
>
> That way if we are under heavy memory pressure, we will eventually get
> rid of the precached extents --- but under normal circumstnaces, we
> try very hard not to, at least via the es_shrinker. (If the inode
> gets closed, and then eventually the inode gets evicted, then of
> course we'll drop all of the precached extents.)
>
> So the ratelimited warning is so we can know if this has happened,
> since it's probably a sign that something bad has happened. Either a
> process ran wild trying to precache too many extents, or the system
> was under far more memory pressure, which is probably something that
> needs to be fixed by changing some configuration parameter or by
> tweaking the load balancer.
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-07-18 12:47 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-16 15:17 [PATCH 0/5 v2] add extent status tree caching Theodore Ts'o
2013-07-16 15:17 ` [PATCH 1/5] ext4: refactor code to read the extent tree block Theodore Ts'o
2013-07-16 15:18 ` [PATCH 2/5] ext4: print the block number of invalid extent tree blocks Theodore Ts'o
2013-07-18 0:56 ` Zheng Liu
2013-07-16 15:18 ` [PATCH 3/5] ext4: use unsigned int for es_status values Theodore Ts'o
2013-07-16 15:18 ` [PATCH 4/5] ext4: cache all of an extent tree's leaf block upon reading Theodore Ts'o
2013-07-16 15:18 ` [PATCH 5/5] ext4: add new ioctl EXT4_IOC_PRECACHE_EXTENTS Theodore Ts'o
2013-07-18 1:19 ` Zheng Liu
2013-07-18 2:50 ` Theodore Ts'o
2013-07-18 13:06 ` Zheng Liu [this message]
2013-07-18 15:21 ` Theodore Ts'o
2013-07-18 18:35 ` [PATCH 0/5 v2] add extent status tree caching Eric Sandeen
2013-07-18 18:53 ` Theodore Ts'o
2013-07-19 0:56 ` Eric Sandeen
2013-07-19 2:59 ` Theodore Ts'o
2013-07-19 3:33 ` Dave Chinner
2013-07-19 14:22 ` Jeff Moyer
2013-07-19 16:19 ` Theodore Ts'o
2013-07-22 1:38 ` Dave Chinner
2013-07-22 2:17 ` Zheng Liu
2013-07-22 10:02 ` Dave Chinner
2013-07-22 12:57 ` Zheng Liu
2013-07-30 3:08 ` Dave Chinner
2013-08-04 1:27 ` Theodore Ts'o
2013-08-13 3:10 ` Dave Chinner
2013-08-13 3:21 ` Eric Sandeen
2013-08-13 13:04 ` Theodore Ts'o
2013-08-16 3:21 ` Dave Chinner
2013-08-16 14:39 ` Theodore Ts'o
2013-07-18 23:54 ` Zheng Liu
2013-07-19 0:07 ` Theodore Ts'o
2013-07-19 1:03 ` Zheng Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130718130611.GB14274@gmail.com \
--to=gnehzuil.liu@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.