Re: [PATCH v3 4/6] ext4: change lru to round-robin in extent status tree shrinker

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>, Zheng Liu <gnehzuil.liu@gmail.com>,
	linux-ext4@vger.kernel.org,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Zheng Liu <wenqing.lz@taobao.com>
Subject: Re: [PATCH v3 4/6] ext4: change lru to round-robin in extent status tree shrinker
Date: Wed, 3 Sep 2014 17:31:22 +0200	[thread overview]
Message-ID: <20140903153122.GA17066@quack.suse.cz> (raw)
In-Reply-To: <20140903033738.GB2504@thunk.org>

On Tue 02-09-14 23:37:38, Ted Tso wrote:
> On Wed, Aug 27, 2014 at 05:01:21PM +0200, Jan Kara wrote:
> > On Thu 07-08-14 11:35:51, Zheng Liu wrote:
> >   This comment is not directly related to this patch but looking into the
> > code made me think about it. It seems ugly to call __es_shrink() from
> > internals of ext4_es_insert_extent(). Also thinking about locking
> > implications makes me shudder a bit and finally this may make the pressure
> > on the extent cache artificially bigger because MM subsystem is not aware
> > of the shrinking you do here. I would prefer to leave shrinking on
> > the slab subsystem itself.
> 
> If we fail, the allocation we only try to free at most one extent, so
> I don't think it's going to make the slab system that confused; it's
> the equivalent of freeing an entry and then using allocating it again.
> 
> > Now GFP_ATOMIC allocation we use for extent cache makes it hard for the
> > slab subsystem and actually we could fairly easily use GFP_NOFS. We can just
> > allocate the structure before grabbing i_es_lock with GFP_NOFS allocation and
> > in case we don't need the structure, we can just free it again. It may
> > introduce some overhead from unnecessary alloc/free but things get simpler
> > that way (no need for that locked_ei argument for __es_shrink(), no need
> > for internal calls to __es_shrink() from within the filesystem).
> 
> The tricky bit is that even __es_remove_extent() can require a memory
> allocation, and in the worst case, it's possible that
> ext4_es_insert_extent() can require *two* allocations.  For example,
> if you start with a single large extent, and then need to insert a
> subregion with a different set of flags into the already existing
> extent, thus resulting in three extents where you started with one.
  Right, I didn't realize that.

> And in some cases, no allocation is required at all....
> 
> One thing that can help is that so long as we haven't done something
> critical, such as erase a delalloc region, we always release the write
> lock and retry the allocation with GFP_NOFS, and the try the operation
> again.
  Yeah, maybe we could use mempools for this. It should make the code less
clumsy.

> So we may need to think a bit about what's the best way to improve
> this, although it is separate topic from making the shrinker be less
> heavyweight.
  Agreed, it's a separate topic.

> >   Nothing seems to prevent reclaim from freeing the inode after we drop
> > s_es_lock. So we could use freed memory. I don't think we want to pin the
> > inode here by grabbing a refcount since we don't want to deal with iput()
> > in the shrinker (that could mean having to delete the inode from shrinker
> > context). But what we could do it to grab ei->i_es_lock before dropping
> > s_es_lock. Since ext4_es_remove_extent() called from ext4_clear_inode()
> > always grabs i_es_lock, we are protected from inode being freed while we
> > hold that lock. But please add comments about this both to the
> > __es_shrink() and ext4_es_remove_extent().
> 
> Something like this should work, yes?
  Yes, this should work. I would just add a comment to
ext4_es_remove_extent() about the fact that ext4_clear_inode() requires
grabbing i_es_lock so that we don't do some clever optimization in future
and break these lifetime rules...

Also one question:

> -		if (ei == locked_ei || !write_trylock(&ei->i_es_lock)) {
> -			nr_skipped++;
> -			spin_lock(&sbi->s_es_lock);
>  			__ext4_es_list_add(sbi, ei);
> +			if (spin_is_contended(&sbi->s_es_lock)) {
> +				spin_unlock(&sbi->s_es_lock);
> +				spin_lock(&sbi->s_es_lock);
> +			}
  Why not cond_resched_lock(&sbi->s_es_lock)?


								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

next prev parent reply	other threads:[~2014-09-03 15:31 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-07  3:35 [PATCH v3 0/6] ext4: extents status tree shrinker improvement Zheng Liu
2014-08-07  3:35 ` [PATCH v3 1/6] ext4: improve extents status tree trace point Zheng Liu
2014-09-02  2:25   ` Theodore Ts'o
2014-08-07  3:35 ` [PATCH v3 2/6] ext4: track extent status tree shrinker delay statictics Zheng Liu
2014-08-27 13:26   ` Jan Kara
2014-09-04 12:10     ` Zheng Liu
2014-09-04 15:49       ` Theodore Ts'o
2014-08-07  3:35 ` [PATCH v3 3/6] ext4: cache extent hole in extent status tree for ext4_da_map_blocks() Zheng Liu
2014-08-27 13:55   ` Jan Kara
2014-09-04 13:05     ` Zheng Liu
2014-09-02  2:43   ` Theodore Ts'o
2014-09-04 13:04     ` Zheng Liu
2014-09-04 15:54       ` Theodore Ts'o
2014-08-07  3:35 ` [PATCH v3 4/6] ext4: change lru to round-robin in extent status tree shrinker Zheng Liu
2014-08-27 15:01   ` Jan Kara
2014-09-03  3:37     ` Theodore Ts'o
2014-09-03 15:31       ` Jan Kara [this message]
2014-09-03 20:00         ` Theodore Ts'o
2014-09-03 22:14           ` Jan Kara
2014-09-03 22:38             ` Theodore Ts'o
     [not found]               ` <20140904071553.GA26930@quack.suse.cz>
2014-09-04 15:44                 ` Theodore Ts'o
2014-09-08 15:47                   ` Jan Kara
2014-08-07  3:35 ` [PATCH v3 5/6] ext4: use a list to track all reclaimable objects for extent status tree Zheng Liu
2014-08-27 15:13   ` Jan Kara
2014-09-03  3:44     ` Theodore Ts'o
2014-08-07  3:35 ` [PATCH v3 6/6] ext4: use a garbage collection algorithm to manage object Zheng Liu
2014-08-27 15:24   ` Jan Kara
2014-10-20 14:48 ` [PATCH v3 0/6] ext4: extents status tree shrinker improvement Theodore Ts'o
2014-10-21 10:22   ` Jan Kara
2014-10-21 15:58     ` 刘峥(文卿)
2014-11-03 16:10       ` Jan Kara
2014-11-07  2:38         ` Zheng Liu
2014-11-13 23:40         ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140903153122.GA17066@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=gnehzuil.liu@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=wenqing.lz@taobao.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).