From: Jan Kara <jack@suse.cz>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>, Zheng Liu <gnehzuil.liu@gmail.com>,
linux-ext4@vger.kernel.org,
Andreas Dilger <adilger.kernel@dilger.ca>,
Zheng Liu <wenqing.lz@taobao.com>
Subject: Re: [PATCH v3 4/6] ext4: change lru to round-robin in extent status tree shrinker
Date: Wed, 3 Sep 2014 17:31:22 +0200 [thread overview]
Message-ID: <20140903153122.GA17066@quack.suse.cz> (raw)
In-Reply-To: <20140903033738.GB2504@thunk.org>
On Tue 02-09-14 23:37:38, Ted Tso wrote:
> On Wed, Aug 27, 2014 at 05:01:21PM +0200, Jan Kara wrote:
> > On Thu 07-08-14 11:35:51, Zheng Liu wrote:
> > This comment is not directly related to this patch but looking into the
> > code made me think about it. It seems ugly to call __es_shrink() from
> > internals of ext4_es_insert_extent(). Also thinking about locking
> > implications makes me shudder a bit and finally this may make the pressure
> > on the extent cache artificially bigger because MM subsystem is not aware
> > of the shrinking you do here. I would prefer to leave shrinking on
> > the slab subsystem itself.
>
> If we fail, the allocation we only try to free at most one extent, so
> I don't think it's going to make the slab system that confused; it's
> the equivalent of freeing an entry and then using allocating it again.
>
> > Now GFP_ATOMIC allocation we use for extent cache makes it hard for the
> > slab subsystem and actually we could fairly easily use GFP_NOFS. We can just
> > allocate the structure before grabbing i_es_lock with GFP_NOFS allocation and
> > in case we don't need the structure, we can just free it again. It may
> > introduce some overhead from unnecessary alloc/free but things get simpler
> > that way (no need for that locked_ei argument for __es_shrink(), no need
> > for internal calls to __es_shrink() from within the filesystem).
>
> The tricky bit is that even __es_remove_extent() can require a memory
> allocation, and in the worst case, it's possible that
> ext4_es_insert_extent() can require *two* allocations. For example,
> if you start with a single large extent, and then need to insert a
> subregion with a different set of flags into the already existing
> extent, thus resulting in three extents where you started with one.
Right, I didn't realize that.
> And in some cases, no allocation is required at all....
>
> One thing that can help is that so long as we haven't done something
> critical, such as erase a delalloc region, we always release the write
> lock and retry the allocation with GFP_NOFS, and the try the operation
> again.
Yeah, maybe we could use mempools for this. It should make the code less
clumsy.
> So we may need to think a bit about what's the best way to improve
> this, although it is separate topic from making the shrinker be less
> heavyweight.
Agreed, it's a separate topic.
> > Nothing seems to prevent reclaim from freeing the inode after we drop
> > s_es_lock. So we could use freed memory. I don't think we want to pin the
> > inode here by grabbing a refcount since we don't want to deal with iput()
> > in the shrinker (that could mean having to delete the inode from shrinker
> > context). But what we could do it to grab ei->i_es_lock before dropping
> > s_es_lock. Since ext4_es_remove_extent() called from ext4_clear_inode()
> > always grabs i_es_lock, we are protected from inode being freed while we
> > hold that lock. But please add comments about this both to the
> > __es_shrink() and ext4_es_remove_extent().
>
> Something like this should work, yes?
Yes, this should work. I would just add a comment to
ext4_es_remove_extent() about the fact that ext4_clear_inode() requires
grabbing i_es_lock so that we don't do some clever optimization in future
and break these lifetime rules...
Also one question:
> - if (ei == locked_ei || !write_trylock(&ei->i_es_lock)) {
> - nr_skipped++;
> - spin_lock(&sbi->s_es_lock);
> __ext4_es_list_add(sbi, ei);
> + if (spin_is_contended(&sbi->s_es_lock)) {
> + spin_unlock(&sbi->s_es_lock);
> + spin_lock(&sbi->s_es_lock);
> + }
Why not cond_resched_lock(&sbi->s_es_lock)?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2014-09-03 15:31 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-07 3:35 [PATCH v3 0/6] ext4: extents status tree shrinker improvement Zheng Liu
2014-08-07 3:35 ` [PATCH v3 1/6] ext4: improve extents status tree trace point Zheng Liu
2014-09-02 2:25 ` Theodore Ts'o
2014-08-07 3:35 ` [PATCH v3 2/6] ext4: track extent status tree shrinker delay statictics Zheng Liu
2014-08-27 13:26 ` Jan Kara
2014-09-04 12:10 ` Zheng Liu
2014-09-04 15:49 ` Theodore Ts'o
2014-08-07 3:35 ` [PATCH v3 3/6] ext4: cache extent hole in extent status tree for ext4_da_map_blocks() Zheng Liu
2014-08-27 13:55 ` Jan Kara
2014-09-04 13:05 ` Zheng Liu
2014-09-02 2:43 ` Theodore Ts'o
2014-09-04 13:04 ` Zheng Liu
2014-09-04 15:54 ` Theodore Ts'o
2014-08-07 3:35 ` [PATCH v3 4/6] ext4: change lru to round-robin in extent status tree shrinker Zheng Liu
2014-08-27 15:01 ` Jan Kara
2014-09-03 3:37 ` Theodore Ts'o
2014-09-03 15:31 ` Jan Kara [this message]
2014-09-03 20:00 ` Theodore Ts'o
2014-09-03 22:14 ` Jan Kara
2014-09-03 22:38 ` Theodore Ts'o
[not found] ` <20140904071553.GA26930@quack.suse.cz>
2014-09-04 15:44 ` Theodore Ts'o
2014-09-08 15:47 ` Jan Kara
2014-08-07 3:35 ` [PATCH v3 5/6] ext4: use a list to track all reclaimable objects for extent status tree Zheng Liu
2014-08-27 15:13 ` Jan Kara
2014-09-03 3:44 ` Theodore Ts'o
2014-08-07 3:35 ` [PATCH v3 6/6] ext4: use a garbage collection algorithm to manage object Zheng Liu
2014-08-27 15:24 ` Jan Kara
2014-10-20 14:48 ` [PATCH v3 0/6] ext4: extents status tree shrinker improvement Theodore Ts'o
2014-10-21 10:22 ` Jan Kara
2014-10-21 15:58 ` 刘峥(文卿)
2014-11-03 16:10 ` Jan Kara
2014-11-07 2:38 ` Zheng Liu
2014-11-13 23:40 ` Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140903153122.GA17066@quack.suse.cz \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=gnehzuil.liu@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=wenqing.lz@taobao.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.