From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Glauber Costa <glommer@openvz.org>
Cc: akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
mgorman@suse.de, david@fromorbit.com, linux-mm@kvack.org,
cgroups@vger.kernel.org, kamezawa.hiroyu@jp.fujitsu.com,
mhocko@suze.cz, hannes@cmpxchg.org, hughd@google.com,
gthelen@google.com, "Dave Chinner" <dchinner@redhat.com>,
"Daniel Vetter" <daniel.vetter@ffwll.ch>,
"Kent Overstreet" <koverstreet@google.com>,
"Arve Hjønnevåg" <arve@android.com>,
"John Stultz" <john.stultz@linaro.org>,
"David Rientjes" <rientjes@google.com>,
"Jerome Glisse" <jglisse@redhat.com>,
"Thomas Hellstrom" <thellstrom@vmware.com>
Subject: Re: [PATCH v11 20/25] drivers: convert shrinkers to new count/scan API
Date: Fri, 7 Jun 2013 10:10:27 -0400 [thread overview]
Message-ID: <20130607141027.GH25649@phenom.dumpdata.com> (raw)
In-Reply-To: <1370550898-26711-21-git-send-email-glommer@openvz.org>
On Fri, Jun 07, 2013 at 12:34:53AM +0400, Glauber Costa wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Convert the driver shrinkers to the new API. Most changes are
> compile tested only because I either don't have the hardware or it's
> staging stuff.
>
> FWIW, the md and android code is pretty good, but the rest of it
> makes me want to claw my eyes out. The amount of broken code I just
> encountered is mind boggling. I've added comments explaining what
> is broken, but I fear that some of the code would be best dealt with
> by being dragged behind the bike shed, burying in mud up to it's
> neck and then run over repeatedly with a blunt lawn mower.
The rest being i915, ttm, bcache- etc ?
>
> Special mention goes to the zcache/zcache2 drivers. They can't
> co-exist in the build at the same time, they are under different
> menu options in menuconfig, they only show up when you've got the
> right set of mm subsystem options configured and so even compile
> testing is an exercise in pulling teeth. And that doesn't even take
> into account the horrible, broken code...
Now that you have rebased it, did you still see issues here.
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> index bd2a3b4..1746f30 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> @@ -377,28 +377,26 @@ out:
> return nr_free;
> }
>
> -/* Get good estimation how many pages are free in pools */
> -static int ttm_pool_get_num_unused_pages(void)
> -{
> - unsigned i;
> - int total = 0;
> - for (i = 0; i < NUM_POOLS; ++i)
> - total += _manager->pools[i].npages;
> -
> - return total;
> -}
> -
I am unclear as of why you move this.
> /**
> * Callback for mm to request pool to reduce number of page held.
> + *
> + * XXX: (dchinner) Deadlock warning!
> + *
> + * ttm_page_pool_free() does memory allocation using GFP_KERNEL. that means
That
> + * this can deadlock when called a sc->gfp_mask that is not equal to
> + * GFP_KERNEL.
> + *
> + * This code is crying out for a shrinker per pool....
It iterates over different pools.
The ttm_page_pool_free() could use GFP_ATOMIC to guard against the dead-lock
I think?
> */
> -static int ttm_pool_mm_shrink(struct shrinker *shrink,
> - struct shrink_control *sc)
> +static long
> +ttm_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> {
> static atomic_t start_pool = ATOMIC_INIT(0);
> unsigned i;
> unsigned pool_offset = atomic_add_return(1, &start_pool);
> struct ttm_page_pool *pool;
> int shrink_pages = sc->nr_to_scan;
> + long freed = 0;
>
> pool_offset = pool_offset % NUM_POOLS;
> /* select start pool in round robin fashion */
> @@ -408,14 +406,28 @@ static int ttm_pool_mm_shrink(struct shrinker *shrink,
> break;
> pool = &_manager->pools[(i + pool_offset)%NUM_POOLS];
> shrink_pages = ttm_page_pool_free(pool, nr_free);
> + freed += nr_free - shrink_pages;
> }
> - /* return estimated number of unused pages in pool */
> - return ttm_pool_get_num_unused_pages();
> + return freed;
> +}
> +
> +
> +static long
> +ttm_pool_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> +{
> + unsigned i;
> + long count = 0;
> +
> + for (i = 0; i < NUM_POOLS; ++i)
> + count += _manager->pools[i].npages;
> +
> + return count;
> }
>
> static void ttm_pool_mm_shrink_init(struct ttm_pool_manager *manager)
> {
> - manager->mm_shrink.shrink = &ttm_pool_mm_shrink;
> + manager->mm_shrink.count_objects = &ttm_pool_shrink_count;
> + manager->mm_shrink.scan_objects = &ttm_pool_shrink_scan;
> manager->mm_shrink.seeks = 1;
> register_shrinker(&manager->mm_shrink);
> }
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> index b8b3943..dc009f1 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> @@ -918,19 +918,6 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma, struct device *dev)
> }
> EXPORT_SYMBOL_GPL(ttm_dma_populate);
>
> -/* Get good estimation how many pages are free in pools */
> -static int ttm_dma_pool_get_num_unused_pages(void)
> -{
> - struct device_pools *p;
> - unsigned total = 0;
> -
> - mutex_lock(&_manager->lock);
> - list_for_each_entry(p, &_manager->pools, pools)
> - total += p->pool->npages_free;
> - mutex_unlock(&_manager->lock);
> - return total;
> -}
> -
> /* Put all pages in pages list to correct pool to wait for reuse */
> void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
> {
> @@ -1002,18 +989,29 @@ EXPORT_SYMBOL_GPL(ttm_dma_unpopulate);
>
> /**
> * Callback for mm to request pool to reduce number of page held.
> + *
> + * XXX: (dchinner) Deadlock warning!
> + *
> + * ttm_dma_page_pool_free() does GFP_KERNEL memory allocation, and so attention
> + * needs to be paid to sc->gfp_mask to determine if this can be done or not.
> + * GFP_KERNEL memory allocation in a GFP_ATOMIC reclaim context woul dbe really
> + * bad.
would be.
> + *
> + * I'm getting sadder as I hear more pathetical whimpers about needing per-pool
> + * shrinkers
Were are these whimpers coming from?
> */
> -static int ttm_dma_pool_mm_shrink(struct shrinker *shrink,
> - struct shrink_control *sc)
> +static long
> +ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> {
> static atomic_t start_pool = ATOMIC_INIT(0);
> unsigned idx = 0;
> unsigned pool_offset = atomic_add_return(1, &start_pool);
> unsigned shrink_pages = sc->nr_to_scan;
> struct device_pools *p;
> + long freed = 0;
>
> if (list_empty(&_manager->pools))
> - return 0;
> + return SHRINK_STOP;
>
> mutex_lock(&_manager->lock);
> pool_offset = pool_offset % _manager->npools;
> @@ -1029,18 +1027,33 @@ static int ttm_dma_pool_mm_shrink(struct shrinker *shrink,
> continue;
> nr_free = shrink_pages;
> shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free);
> + freed += nr_free - shrink_pages;
> +
> pr_debug("%s: (%s:%d) Asked to shrink %d, have %d more to go\n",
> p->pool->dev_name, p->pool->name, current->pid,
> nr_free, shrink_pages);
> }
> mutex_unlock(&_manager->lock);
> - /* return estimated number of unused pages in pool */
> - return ttm_dma_pool_get_num_unused_pages();
> + return freed;
> +}
That code looks good.
> +
> +static long
> +ttm_dma_pool_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> +{
> + struct device_pools *p;
> + long count = 0;
> +
> + mutex_lock(&_manager->lock);
> + list_for_each_entry(p, &_manager->pools, pools)
> + count += p->pool->npages_free;
> + mutex_unlock(&_manager->lock);
> + return count;
> }
But this needn't to be moved? Or is it b/c you would like the code to
be in "one section" ?
If so, please use the same style for functions as the rest of the file
has.
>
> static void ttm_dma_pool_mm_shrink_init(struct ttm_pool_manager *manager)
> {
> - manager->mm_shrink.shrink = &ttm_dma_pool_mm_shrink;
> + manager->mm_shrink.count_objects = &ttm_dma_pool_shrink_count;
> + manager->mm_shrink.scan_objects = &ttm_dma_pool_shrink_scan;
> manager->mm_shrink.seeks = 1;
> register_shrinker(&manager->mm_shrink);
> }
.. snip..
> diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
> index dcceed2..4ade8e3 100644
> --- a/drivers/staging/zcache/zcache-main.c
> +++ b/drivers/staging/zcache/zcache-main.c
> @@ -1140,23 +1140,19 @@ static bool zcache_freeze;
> * pageframes in use. FIXME POLICY: Probably the writeback should only occur
> * if the eviction doesn't free enough pages.
> */
> -static int shrink_zcache_memory(struct shrinker *shrink,
> - struct shrink_control *sc)
> +static long scan_zcache_memory(struct shrinker *shrink,
> + struct shrink_control *sc)
> {
> static bool in_progress;
> - int ret = -1;
> - int nr = sc->nr_to_scan;
> int nr_evict = 0;
> int nr_writeback = 0;
> struct page *page;
> int file_pageframes_inuse, anon_pageframes_inuse;
> -
> - if (nr <= 0)
> - goto skip_evict;
> + long freed = 0;
>
> /* don't allow more than one eviction thread at a time */
> if (in_progress)
> - goto skip_evict;
> + return 0;
>
> in_progress = true;
>
> @@ -1176,6 +1172,7 @@ static int shrink_zcache_memory(struct shrinker *shrink,
> if (page == NULL)
> break;
> zcache_free_page(page);
> + freed++;
> }
>
> zcache_last_active_anon_pageframes =
> @@ -1192,13 +1189,22 @@ static int shrink_zcache_memory(struct shrinker *shrink,
> #ifdef CONFIG_ZCACHE_WRITEBACK
> int writeback_ret;
> writeback_ret = zcache_frontswap_writeback();
> - if (writeback_ret == -ENOMEM)
> + if (writeback_ret != -ENOMEM)
> + freed++;
> + else
> #endif
> break;
> }
> in_progress = false;
>
> -skip_evict:
> + return freed;
> +}
> +
> +static long count_zcache_memory(struct shrinker *shrink,
> + struct shrink_control *sc)
> +{
> + int ret = -1;
> +
> /* resample: has changed, but maybe not all the way yet */
> zcache_last_active_file_pageframes =
> global_page_state(NR_LRU_BASE + LRU_ACTIVE_FILE);
> @@ -1212,7 +1218,8 @@ skip_evict:
> }
>
> static struct shrinker zcache_shrinker = {
> - .shrink = shrink_zcache_memory,
> + .scan_objects = scan_zcache_memory,
> + .count_objects = count_zcache_memory,
> .seeks = DEFAULT_SEEKS,
> };
>
That looks OK, but I think it needs an Ack from Greg KH as well?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-06-07 14:10 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-06 20:34 [PATCH v11 00/25] shrinkers rework: per-numa, generic lists, etc Glauber Costa
2013-06-06 20:34 ` [PATCH v11 02/25] super: fix calculation of shrinkable objects for small numbers Glauber Costa
[not found] ` <1370550898-26711-1-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-06-06 20:34 ` [PATCH v11 01/25] fs: bump inode and dentry counters to long Glauber Costa
2013-06-06 20:34 ` [PATCH v11 03/25] dcache: convert dentry_stat.nr_unused to per-cpu counters Glauber Costa
2013-06-06 20:34 ` [PATCH v11 04/25] dentry: move to per-sb LRU locks Glauber Costa
2013-06-06 20:34 ` [PATCH v11 05/25] dcache: remove dentries from LRU before putting on dispose list Glauber Costa
2013-06-06 20:34 ` [PATCH v11 06/25] mm: new shrinker API Glauber Costa
2013-06-06 20:34 ` [PATCH v11 07/25] shrinker: convert superblock shrinkers to new API Glauber Costa
2013-06-06 20:34 ` [PATCH v11 08/25] list: add a new LRU list type Glauber Costa
2013-06-06 20:34 ` [PATCH v11 09/25] inode: convert inode lru list to generic lru list code Glauber Costa
2013-06-06 20:34 ` [PATCH v11 10/25] dcache: convert to use new lru list infrastructure Glauber Costa
2013-06-06 20:34 ` [PATCH v11 11/25] list_lru: per-node " Glauber Costa
2013-06-06 20:34 ` [PATCH v11 12/25] list_lru: per-node API Glauber Costa
2013-06-06 20:34 ` [PATCH v11 13/25] shrinker: add node awareness Glauber Costa
2013-06-06 20:34 ` [PATCH v11 14/25] vmscan: per-node deferred work Glauber Costa
2013-06-06 20:34 ` [PATCH v11 15/25] fs: convert inode and dentry shrinking to be node aware Glauber Costa
2013-06-06 20:34 ` [PATCH v11 16/25] xfs: convert buftarg LRU to generic code Glauber Costa
2013-06-06 20:34 ` [PATCH v11 17/25] xfs: rework buffer dispose list tracking Glauber Costa
2013-06-06 20:34 ` [PATCH v11 18/25] xfs: convert dquot cache lru to list_lru Glauber Costa
2013-06-06 20:34 ` [PATCH v11 21/25] i915: bail out earlier when shrinker cannot acquire mutex Glauber Costa
2013-06-06 20:34 ` [PATCH v11 23/25] hugepage: convert huge zero page shrinker to new shrinker API Glauber Costa
2013-06-06 20:34 ` [PATCH v11 24/25] shrinker: Kill old ->shrink API Glauber Costa
2013-06-06 20:34 ` [PATCH v11 25/25] list_lru: dynamically adjust node arrays Glauber Costa
2013-06-18 9:42 ` Li Zhong
2013-06-19 7:31 ` Glauber Costa
2013-06-19 9:12 ` Li Zhong
2013-06-19 13:29 ` Glauber Costa
2013-06-19 17:14 ` Andrew Morton
2013-06-20 0:50 ` Li Zhong
2013-06-20 1:35 ` Li Zhong
2013-06-20 2:37 ` Dave Chinner
2013-06-06 21:15 ` [PATCH v11 00/25] shrinkers rework: per-numa, generic lists, etc Andrew Morton
2013-06-07 6:11 ` Glauber Costa
[not found] ` <51B1797D.3010209-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-06-07 7:08 ` Glauber Costa
2013-06-07 8:04 ` Glauber Costa
2013-06-06 20:34 ` [PATCH v11 19/25] fs: convert fs shrinkers to new scan/count API Glauber Costa
2013-06-06 20:34 ` [PATCH v11 20/25] drivers: convert shrinkers to new count/scan API Glauber Costa
2013-06-07 14:10 ` Konrad Rzeszutek Wilk [this message]
2013-06-09 12:02 ` Glauber Costa
2013-06-06 20:34 ` [PATCH v11 22/25] shrinker: convert remaining shrinkers to " Glauber Costa
2013-06-06 22:31 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130607141027.GH25649@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=arve@android.com \
--cc=cgroups@vger.kernel.org \
--cc=daniel.vetter@ffwll.ch \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=glommer@openvz.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jglisse@redhat.com \
--cc=john.stultz@linaro.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=koverstreet@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suze.cz \
--cc=rientjes@google.com \
--cc=thellstrom@vmware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).