linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Glauber Costa <glommer@openvz.org>
Cc: akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	mgorman@suse.de, david@fromorbit.com, linux-mm@kvack.org,
	cgroups@vger.kernel.org, kamezawa.hiroyu@jp.fujitsu.com,
	mhocko@suze.cz, hannes@cmpxchg.org, hughd@google.com,
	gthelen@google.com, "Dave Chinner" <dchinner@redhat.com>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Kent Overstreet" <koverstreet@google.com>,
	"Arve Hjønnevåg" <arve@android.com>,
	"John Stultz" <john.stultz@linaro.org>,
	"David Rientjes" <rientjes@google.com>,
	"Jerome Glisse" <jglisse@redhat.com>,
	"Thomas Hellstrom" <thellstrom@vmware.com>
Subject: Re: [PATCH v11 20/25] drivers: convert shrinkers to new count/scan API
Date: Fri, 7 Jun 2013 10:10:27 -0400	[thread overview]
Message-ID: <20130607141027.GH25649@phenom.dumpdata.com> (raw)
In-Reply-To: <1370550898-26711-21-git-send-email-glommer@openvz.org>

On Fri, Jun 07, 2013 at 12:34:53AM +0400, Glauber Costa wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Convert the driver shrinkers to the new API. Most changes are
> compile tested only because I either don't have the hardware or it's
> staging stuff.
> 
> FWIW, the md and android code is pretty good, but the rest of it
> makes me want to claw my eyes out.  The amount of broken code I just
> encountered is mind boggling.  I've added comments explaining what
> is broken, but I fear that some of the code would be best dealt with
> by being dragged behind the bike shed, burying in mud up to it's
> neck and then run over repeatedly with a blunt lawn mower.

The rest being i915, ttm, bcache- etc ?

> 
> Special mention goes to the zcache/zcache2 drivers. They can't
> co-exist in the build at the same time, they are under different
> menu options in menuconfig, they only show up when you've got the
> right set of mm subsystem options configured and so even compile
> testing is an exercise in pulling teeth.  And that doesn't even take
> into account the horrible, broken code...

Now that you have rebased it, did you still see issues here.

> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> index bd2a3b4..1746f30 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> @@ -377,28 +377,26 @@ out:
>  	return nr_free;
>  }
>  
> -/* Get good estimation how many pages are free in pools */
> -static int ttm_pool_get_num_unused_pages(void)
> -{
> -	unsigned i;
> -	int total = 0;
> -	for (i = 0; i < NUM_POOLS; ++i)
> -		total += _manager->pools[i].npages;
> -
> -	return total;
> -}
> -

I am unclear as of why you move this.
>  /**
>   * Callback for mm to request pool to reduce number of page held.
> + *
> + * XXX: (dchinner) Deadlock warning!
> + *
> + * ttm_page_pool_free() does memory allocation using GFP_KERNEL.  that means

That
> + * this can deadlock when called a sc->gfp_mask that is not equal to
> + * GFP_KERNEL.
> + *
> + * This code is crying out for a shrinker per pool....

It iterates over different pools.


The ttm_page_pool_free() could use GFP_ATOMIC to guard against the dead-lock
I think?

>   */
> -static int ttm_pool_mm_shrink(struct shrinker *shrink,
> -			      struct shrink_control *sc)
> +static long
> +ttm_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  {
>  	static atomic_t start_pool = ATOMIC_INIT(0);
>  	unsigned i;
>  	unsigned pool_offset = atomic_add_return(1, &start_pool);
>  	struct ttm_page_pool *pool;
>  	int shrink_pages = sc->nr_to_scan;
> +	long freed = 0;
>  
>  	pool_offset = pool_offset % NUM_POOLS;
>  	/* select start pool in round robin fashion */
> @@ -408,14 +406,28 @@ static int ttm_pool_mm_shrink(struct shrinker *shrink,
>  			break;
>  		pool = &_manager->pools[(i + pool_offset)%NUM_POOLS];
>  		shrink_pages = ttm_page_pool_free(pool, nr_free);
> +		freed += nr_free - shrink_pages;
>  	}
> -	/* return estimated number of unused pages in pool */
> -	return ttm_pool_get_num_unused_pages();
> +	return freed;
> +}
> +
> +
> +static long
> +ttm_pool_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> +{
> +	unsigned i;
> +	long count = 0;
> +
> +	for (i = 0; i < NUM_POOLS; ++i)
> +		count += _manager->pools[i].npages;
> +
> +	return count;
>  }
>  
>  static void ttm_pool_mm_shrink_init(struct ttm_pool_manager *manager)
>  {
> -	manager->mm_shrink.shrink = &ttm_pool_mm_shrink;
> +	manager->mm_shrink.count_objects = &ttm_pool_shrink_count;
> +	manager->mm_shrink.scan_objects = &ttm_pool_shrink_scan;
>  	manager->mm_shrink.seeks = 1;
>  	register_shrinker(&manager->mm_shrink);
>  }
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> index b8b3943..dc009f1 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> @@ -918,19 +918,6 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma, struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(ttm_dma_populate);
>  
> -/* Get good estimation how many pages are free in pools */
> -static int ttm_dma_pool_get_num_unused_pages(void)
> -{
> -	struct device_pools *p;
> -	unsigned total = 0;
> -
> -	mutex_lock(&_manager->lock);
> -	list_for_each_entry(p, &_manager->pools, pools)
> -		total += p->pool->npages_free;
> -	mutex_unlock(&_manager->lock);
> -	return total;
> -}
> -
>  /* Put all pages in pages list to correct pool to wait for reuse */
>  void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
>  {
> @@ -1002,18 +989,29 @@ EXPORT_SYMBOL_GPL(ttm_dma_unpopulate);
>  
>  /**
>   * Callback for mm to request pool to reduce number of page held.
> + *
> + * XXX: (dchinner) Deadlock warning!
> + *
> + * ttm_dma_page_pool_free() does GFP_KERNEL memory allocation, and so attention
> + * needs to be paid to sc->gfp_mask to determine if this can be done or not.
> + * GFP_KERNEL memory allocation in a GFP_ATOMIC reclaim context woul dbe really
> + * bad.

would be.
> + *
> + * I'm getting sadder as I hear more pathetical whimpers about needing per-pool
> + * shrinkers

Were are these whimpers coming from?

>   */
> -static int ttm_dma_pool_mm_shrink(struct shrinker *shrink,
> -				  struct shrink_control *sc)
> +static long
> +ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  {
>  	static atomic_t start_pool = ATOMIC_INIT(0);
>  	unsigned idx = 0;
>  	unsigned pool_offset = atomic_add_return(1, &start_pool);
>  	unsigned shrink_pages = sc->nr_to_scan;
>  	struct device_pools *p;
> +	long freed = 0;
>  
>  	if (list_empty(&_manager->pools))
> -		return 0;
> +		return SHRINK_STOP;
>  
>  	mutex_lock(&_manager->lock);
>  	pool_offset = pool_offset % _manager->npools;
> @@ -1029,18 +1027,33 @@ static int ttm_dma_pool_mm_shrink(struct shrinker *shrink,
>  			continue;
>  		nr_free = shrink_pages;
>  		shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free);
> +		freed += nr_free - shrink_pages;
> +
>  		pr_debug("%s: (%s:%d) Asked to shrink %d, have %d more to go\n",
>  			 p->pool->dev_name, p->pool->name, current->pid,
>  			 nr_free, shrink_pages);
>  	}
>  	mutex_unlock(&_manager->lock);
> -	/* return estimated number of unused pages in pool */
> -	return ttm_dma_pool_get_num_unused_pages();
> +	return freed;
> +}

That code looks good.
> +
> +static long
> +ttm_dma_pool_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> +{
> +	struct device_pools *p;
> +	long count = 0;
> +
> +	mutex_lock(&_manager->lock);
> +	list_for_each_entry(p, &_manager->pools, pools)
> +		count += p->pool->npages_free;
> +	mutex_unlock(&_manager->lock);
> +	return count;
>  }

But this needn't to be moved? Or is it b/c you would like the code to
be in "one section" ?

If so, please use the same style for functions as the rest of the file
has.

>  
>  static void ttm_dma_pool_mm_shrink_init(struct ttm_pool_manager *manager)
>  {
> -	manager->mm_shrink.shrink = &ttm_dma_pool_mm_shrink;
> +	manager->mm_shrink.count_objects = &ttm_dma_pool_shrink_count;
> +	manager->mm_shrink.scan_objects = &ttm_dma_pool_shrink_scan;
>  	manager->mm_shrink.seeks = 1;
>  	register_shrinker(&manager->mm_shrink);
>  }

.. snip..
> diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
> index dcceed2..4ade8e3 100644
> --- a/drivers/staging/zcache/zcache-main.c
> +++ b/drivers/staging/zcache/zcache-main.c
> @@ -1140,23 +1140,19 @@ static bool zcache_freeze;
>   * pageframes in use.  FIXME POLICY: Probably the writeback should only occur
>   * if the eviction doesn't free enough pages.
>   */
> -static int shrink_zcache_memory(struct shrinker *shrink,
> -				struct shrink_control *sc)
> +static long scan_zcache_memory(struct shrinker *shrink,
> +			       struct shrink_control *sc)
>  {
>  	static bool in_progress;
> -	int ret = -1;
> -	int nr = sc->nr_to_scan;
>  	int nr_evict = 0;
>  	int nr_writeback = 0;
>  	struct page *page;
>  	int  file_pageframes_inuse, anon_pageframes_inuse;
> -
> -	if (nr <= 0)
> -		goto skip_evict;
> +	long freed = 0;
>  
>  	/* don't allow more than one eviction thread at a time */
>  	if (in_progress)
> -		goto skip_evict;
> +		return 0;
>  
>  	in_progress = true;
>  
> @@ -1176,6 +1172,7 @@ static int shrink_zcache_memory(struct shrinker *shrink,
>  		if (page == NULL)
>  			break;
>  		zcache_free_page(page);
> +		freed++;
>  	}
>  
>  	zcache_last_active_anon_pageframes =
> @@ -1192,13 +1189,22 @@ static int shrink_zcache_memory(struct shrinker *shrink,
>  #ifdef CONFIG_ZCACHE_WRITEBACK
>  		int writeback_ret;
>  		writeback_ret = zcache_frontswap_writeback();
> -		if (writeback_ret == -ENOMEM)
> +		if (writeback_ret != -ENOMEM)
> +			freed++;
> +		else
>  #endif
>  			break;
>  	}
>  	in_progress = false;
>  
> -skip_evict:
> +	return freed;
> +}
> +
> +static long count_zcache_memory(struct shrinker *shrink,
> +				struct shrink_control *sc)
> +{
> +	int ret = -1;
> +
>  	/* resample: has changed, but maybe not all the way yet */
>  	zcache_last_active_file_pageframes =
>  		global_page_state(NR_LRU_BASE + LRU_ACTIVE_FILE);
> @@ -1212,7 +1218,8 @@ skip_evict:
>  }
>  
>  static struct shrinker zcache_shrinker = {
> -	.shrink = shrink_zcache_memory,
> +	.scan_objects = scan_zcache_memory,
> +	.count_objects = count_zcache_memory,
>  	.seeks = DEFAULT_SEEKS,
>  };
>  

That looks OK, but I think it needs an Ack from Greg KH as well?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-06-07 14:10 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-06 20:34 [PATCH v11 00/25] shrinkers rework: per-numa, generic lists, etc Glauber Costa
2013-06-06 20:34 ` [PATCH v11 02/25] super: fix calculation of shrinkable objects for small numbers Glauber Costa
     [not found] ` <1370550898-26711-1-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-06-06 20:34   ` [PATCH v11 01/25] fs: bump inode and dentry counters to long Glauber Costa
2013-06-06 20:34   ` [PATCH v11 03/25] dcache: convert dentry_stat.nr_unused to per-cpu counters Glauber Costa
2013-06-06 20:34   ` [PATCH v11 04/25] dentry: move to per-sb LRU locks Glauber Costa
2013-06-06 20:34   ` [PATCH v11 05/25] dcache: remove dentries from LRU before putting on dispose list Glauber Costa
2013-06-06 20:34   ` [PATCH v11 06/25] mm: new shrinker API Glauber Costa
2013-06-06 20:34   ` [PATCH v11 07/25] shrinker: convert superblock shrinkers to new API Glauber Costa
2013-06-06 20:34   ` [PATCH v11 08/25] list: add a new LRU list type Glauber Costa
2013-06-06 20:34   ` [PATCH v11 09/25] inode: convert inode lru list to generic lru list code Glauber Costa
2013-06-06 20:34   ` [PATCH v11 10/25] dcache: convert to use new lru list infrastructure Glauber Costa
2013-06-06 20:34   ` [PATCH v11 11/25] list_lru: per-node " Glauber Costa
2013-06-06 20:34   ` [PATCH v11 12/25] list_lru: per-node API Glauber Costa
2013-06-06 20:34   ` [PATCH v11 13/25] shrinker: add node awareness Glauber Costa
2013-06-06 20:34   ` [PATCH v11 14/25] vmscan: per-node deferred work Glauber Costa
2013-06-06 20:34   ` [PATCH v11 15/25] fs: convert inode and dentry shrinking to be node aware Glauber Costa
2013-06-06 20:34   ` [PATCH v11 16/25] xfs: convert buftarg LRU to generic code Glauber Costa
2013-06-06 20:34   ` [PATCH v11 17/25] xfs: rework buffer dispose list tracking Glauber Costa
2013-06-06 20:34   ` [PATCH v11 18/25] xfs: convert dquot cache lru to list_lru Glauber Costa
2013-06-06 20:34   ` [PATCH v11 21/25] i915: bail out earlier when shrinker cannot acquire mutex Glauber Costa
2013-06-06 20:34   ` [PATCH v11 23/25] hugepage: convert huge zero page shrinker to new shrinker API Glauber Costa
2013-06-06 20:34   ` [PATCH v11 24/25] shrinker: Kill old ->shrink API Glauber Costa
2013-06-06 20:34   ` [PATCH v11 25/25] list_lru: dynamically adjust node arrays Glauber Costa
2013-06-18  9:42     ` Li Zhong
2013-06-19  7:31       ` Glauber Costa
2013-06-19  9:12         ` Li Zhong
2013-06-19 13:29           ` Glauber Costa
2013-06-19 17:14             ` Andrew Morton
2013-06-20  0:50               ` Li Zhong
2013-06-20  1:35             ` Li Zhong
2013-06-20  2:37     ` Dave Chinner
2013-06-06 21:15   ` [PATCH v11 00/25] shrinkers rework: per-numa, generic lists, etc Andrew Morton
2013-06-07  6:11     ` Glauber Costa
     [not found]       ` <51B1797D.3010209-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-06-07  7:08         ` Glauber Costa
2013-06-07  8:04     ` Glauber Costa
2013-06-06 20:34 ` [PATCH v11 19/25] fs: convert fs shrinkers to new scan/count API Glauber Costa
2013-06-06 20:34 ` [PATCH v11 20/25] drivers: convert shrinkers to new count/scan API Glauber Costa
2013-06-07 14:10   ` Konrad Rzeszutek Wilk [this message]
2013-06-09 12:02     ` Glauber Costa
2013-06-06 20:34 ` [PATCH v11 22/25] shrinker: convert remaining shrinkers to " Glauber Costa
2013-06-06 22:31   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130607141027.GH25649@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=arve@android.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.vetter@ffwll.ch \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=glommer@openvz.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jglisse@redhat.com \
    --cc=john.stultz@linaro.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=koverstreet@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suze.cz \
    --cc=rientjes@google.com \
    --cc=thellstrom@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).