Re: [patch] mm: vmscan implement per-zone shrinkers

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@kernel.dk>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [patch] mm: vmscan implement per-zone shrinkers
Date: Tue, 16 Nov 2010 18:43:35 +1100	[thread overview]
Message-ID: <20101116074335.GA3460@amd> (raw)
In-Reply-To: <20101114182614.BEE5.A69D9226@jp.fujitsu.com>

On Sun, Nov 14, 2010 at 07:07:17PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > Hi,
> > 
> > I'm doing some works that require per-zone shrinkers, I'd like to get
> > the vmscan part signed off and merged by interested mm people, please.
> > 
> > [And before anybody else kindly suggests per-node shrinkers, please go
> > back and read all the discussion about this first.]
> 
> vmscan part looks good to me. however I hope fs folks review too even though
> I'm not sure who is best.
> 
> btw, I have some nitpick comments. see below.

Thanks for the review, it's very helpful.


> > +	void (*shrink_zone)(struct shrinker *shrink,
> > +		struct zone *zone, unsigned long scanned,
> > +		unsigned long total, unsigned long global,
> > +		unsigned long flags, gfp_t gfp_mask);
> > +
> 
> shrink_zone is slightly grep unfriendly. Can you consider shrink_slab_zone() 
> or something else?

Yes that's true. I want to move away from the term "slab" shrinker
however. It seems to confuse people (of course, the shrinker can shrink
memory from any allocator, not just slab).

shrink_cache_zone()?


> > +void shrinker_add_scan(unsigned long *dst,
> > +			unsigned long scanned, unsigned long total,
> > +			unsigned long objects, unsigned int ratio)
> >  {
> > -	struct shrinker *shrinker;
> > -	unsigned long ret = 0;
> > +	unsigned long long delta;
> >  
> > -	if (scanned == 0)
> > -		scanned = SWAP_CLUSTER_MAX;
> > +	delta = (unsigned long long)scanned * objects;
> > +	delta *= SHRINK_FACTOR;
> > +	do_div(delta, total + 1);
> 
> > +	delta *= SHRINK_FACTOR; /* ratio is also in SHRINK_FACTOR units */
> > +	do_div(delta, ratio + 1);
> 
> introdusing tiny macro is better than the comment.
> 
> >  
> > -	if (!down_read_trylock(&shrinker_rwsem))
> > -		return 1;	/* Assume we'll be able to shrink next time */
> > +	/*
> > +	 * Avoid risking looping forever due to too large nr value:
> > +	 * never try to free more than twice the estimate number of
> > +	 * freeable entries.
> > +	 */
> > +	*dst += delta;
> > +
> > +	if (*dst / SHRINK_FACTOR > objects)
> > +		*dst = objects * SHRINK_FACTOR;
> 
> objects * SHRINK_FACTOR appear twice in this function.
> calculate "objects = obj * SHRINK_FACTOR" at first improve
> code readability slightly.
 
I wasn't quite sure what you meant with this comment and the above one.
Could you illustrate what your preferred code would look like?


> > +unsigned long shrinker_do_scan(unsigned long *dst, unsigned long batch)
> 
> Seems misleading name a bit. shrinker_do_scan() does NOT scan. 
> It only does batch adjustment.

True. shrinker_get_batch_nr() or similar?

 
> > +{
> > +	unsigned long nr = ACCESS_ONCE(*dst);
> 
> Dumb question: why is this ACCESS_ONCE() necessary?
> 
> 
> > +	if (nr < batch * SHRINK_FACTOR)
> > +		return 0;
> > +	*dst = nr - batch * SHRINK_FACTOR;
> > +	return batch;

It should have a comment: *dst can be accessed without a lock.
However if nr is reloaded from memory between the two expressions
and *dst changes during that time, we could end up with a negative
result in *dst.

> {
> 	unsigned long nr = ACCESS_ONCE(*dst);
> 	batch *= SHRINK_FACTOR;
> 
> 	if (nr < batch)
> 		return 0;
> 	*dst = nr - batch;
> 	return batch;
> }
> 
> is slighly cleaner. however It's unclear why dst and batch argument
> need to have different unit (i.e why caller can't do batch * FACTOR?).

OK I'll take it into consideration. I guess I didn't want the caller
to care too much about the fixed point.


> > +	list_for_each_entry(shrinker, &shrinker_list, list) {
> > +		if (!shrinker->shrink_zone)
> > +			continue;
> > +		(*shrinker->shrink_zone)(shrinker, zone, scanned,
> > +					total, global, 0, gfp_mask);
> 
> flags argument is unused?

Yes it is, at the moment. I actually have a flag that I would like
to use (close to OOM flag), so I've just added the placeholder for
now.

It may well be useful for other things in future too.


> > @@ -1844,6 +1985,23 @@ static void shrink_zone(int priority, st
> >  	if (inactive_anon_is_low(zone, sc))
> >  		shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
> >  
> > +	/*
> > +	 * Don't shrink slabs when reclaiming memory from
> > +	 * over limit cgroups
> > +	 */
> > +	if (sc->may_reclaim_slab) {
> > +		struct reclaim_state *reclaim_state = current->reclaim_state;
> > +
> > +		shrink_slab(zone, sc->nr_scanned - nr_scanned,
> 
> Doubtful calculation. What mean "sc->nr_scanned - nr_scanned"?
> I think nr_scanned simply keep old slab balancing behavior.

OK, good catch.


> > +		for_each_zone_zonelist(zone, z, zonelist,
> > +				gfp_zone(sc->gfp_mask)) {
> > +			if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> > +				continue;
> >  
> > -			shrink_slab(sc->nr_scanned, sc->gfp_mask, lru_pages);
> > -			if (reclaim_state) {
> > -				sc->nr_reclaimed += reclaim_state->reclaimed_slab;
> > -				reclaim_state->reclaimed_slab = 0;
> > -			}
> > +			lru_pages += zone_reclaimable_pages(zone);
> 
> Do we really need this doubtful cpuset hardwall filtering? Why do we
> need to change slab reclaim pressure if cpuset is used. In old days,
> we didn't have per-zone slab shrinker, then we need artificial slab
> pressure boost for preventing false positive oom-killer. but now we have.

Yeah I'm not completely sure. But we should be mindful that until the
major caches are converted to LRU, we still have to care for the global
shrinker case too.


> However, If you strongly keep old behavior at this time, I don't oppose.
> We can change it later.

Yes I would prefer that, but I would welcome patches to improve things.


> > +		/*
> > +		 * lru_pages / 10  -- put a 10% pressure on the slab
> > +		 * which roughly corresponds to ZONE_RECLAIM_PRIORITY
> > +		 * scanning 1/16th of pagecache.
> > +		 *
> > +		 * Global slabs will be shrink at a relatively more
> > +		 * aggressive rate because we don't calculate the
> > +		 * global lru size for speed. But they really should
> > +		 * be converted to per zone slabs if they are important
> > +		 */
> > +		shrink_slab(zone, lru_pages / 10, lru_pages, lru_pages,
> > +				gfp_mask);
> 
> Why don't you use sc.nr_scanned? It seems straight forward.

Well it may not be over the pagecache limit.

I agree the situation is pretty ugly here with all these magic
constants, but I didn't want to change too much in this patch.

Thanks,
Nick

     prev parent reply	other threads:[~2010-11-16  7:43 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-09 12:32 [patch] mm: vmscan implement per-zone shrinkers Nick Piggin
2010-11-10  5:18 ` Dave Chinner
2010-11-10  6:32   ` Nick Piggin
2010-11-10  6:39     ` Nick Piggin
2010-11-10 11:05     ` Dave Chinner
2010-11-11  0:23       ` Nick Piggin
2010-11-11  5:21         ` Nick Piggin
2010-11-14 10:07 ` KOSAKI Motohiro
2010-11-15  0:50   ` KOSAKI Motohiro
2010-11-16  7:47     ` Nick Piggin
2010-11-16  7:53       ` Anca Emanuel
2010-11-16  8:05         ` Figo.zhang
2010-11-16  8:20           ` Anca Emanuel
2010-11-16  8:22             ` Figo.zhang
2010-11-16  8:26               ` Anca Emanuel
2010-11-17  2:41                 ` Figo.zhang
2010-11-17  4:29                   ` Anca Emanuel
2010-11-17  5:21                     ` Figo.zhang
2010-11-23  7:19                       ` KOSAKI Motohiro
2010-11-16  8:26         ` Nick Piggin
2010-11-23  7:21       ` KOSAKI Motohiro
2010-11-16  7:43   ` Nick Piggin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101116074335.GA3460@amd \
    --to=npiggin@kernel.dk \
    --cc=akpm@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).