From: Dave Chinner <david@fromorbit.com>
To: Glauber Costa <glommer@parallels.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
containers@lists.linux-foundation.org,
Pavel Emelyanov <xemul@parallels.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Hugh Dickins <hughd@google.com>, Nick Piggin <npiggin@kernel.dk>,
Andrea Arcangeli <aarcange@redhat.com>,
Rik van Riel <riel@redhat.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
James Bottomley <JBottomley@Parallels.com>,
Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH v3 1/4] factor out single-shrinker code
Date: Mon, 15 Aug 2011 16:43:59 +1000 [thread overview]
Message-ID: <20110815064359.GD26978@dastard> (raw)
In-Reply-To: <1313334832-1150-2-git-send-email-glommer@parallels.com>
On Sun, Aug 14, 2011 at 07:13:49PM +0400, Glauber Costa wrote:
> While shrinking our caches, vmscan.c passes through all
> registered shrinkers, trying to free objects as it goes.
>
> We would like to do that individually for some caches,
> like the dcache, when certain conditions apply (for
> example, when we reach a soon-to-exist maximum allowed size)
>
> To avoid re-writing the same logic at more than one place,
> this patch factors out the shrink logic at shrink_one_shrinker(),
> that we can call from other places of the kernel.
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Dave Chinner <david@fromorbit.com>
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> include/linux/shrinker.h | 6 ++
> mm/vmscan.c | 185 ++++++++++++++++++++++++----------------------
> 2 files changed, 104 insertions(+), 87 deletions(-)
>
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index 790651b..c5db650 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -39,4 +39,10 @@ struct shrinker {
> #define DEFAULT_SEEKS 2 /* A good number if you don't know better. */
> extern void register_shrinker(struct shrinker *);
> extern void unregister_shrinker(struct shrinker *);
> +
> +unsigned long shrink_one_shrinker(struct shrinker *shrinker,
> + struct shrink_control *shrink,
> + unsigned long nr_pages_scanned,
> + unsigned long lru_pages);
> +
> #endif
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 7ef6912..50dfc61 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -211,6 +211,102 @@ static inline int do_shrinker_shrink(struct shrinker *shrinker,
> }
>
> #define SHRINK_BATCH 128
> +unsigned long shrink_one_shrinker(struct shrinker *shrinker,
> + struct shrink_control *shrink,
> + unsigned long nr_pages_scanned,
> + unsigned long lru_pages)
> +{
This isn't the right place to cut the code apart. The act of
shrinking a cache is separate to the act of calculating how much to
shrink. you're taking the vmscan interface and trying to hack around
that when direct calls to a single shrinker is needed.....
> + unsigned long ret = 0;
> + unsigned long long delta;
> + unsigned long total_scan;
> + unsigned long max_pass;
> + int shrink_ret = 0;
> + long nr;
> + long new_nr;
> + long batch_size = shrinker->batch ? shrinker->batch
> + : SHRINK_BATCH;
> +
> + /*
> + * copy the current shrinker scan count into a local variable
> + * and zero it so that other concurrent shrinker invocations
> + * don't also do this scanning work.
> + */
> + do {
> + nr = shrinker->nr;
> + } while (cmpxchg(&shrinker->nr, nr, 0) != nr);
> +
> + total_scan = nr;
> + max_pass = do_shrinker_shrink(shrinker, shrink, 0);
> + delta = (4 * nr_pages_scanned) / shrinker->seeks;
> + delta *= max_pass;
> + do_div(delta, lru_pages + 1);
> + total_scan += delta;
> + if (total_scan < 0) {
> + printk(KERN_ERR "shrink_slab: %pF negative objects to "
> + "delete nr=%ld\n",
> + shrinker->shrink, total_scan);
> + total_scan = max_pass;
> + }
> +
> + /*
> + * We need to avoid excessive windup on filesystem shrinkers
> + * due to large numbers of GFP_NOFS allocations causing the
> + * shrinkers to return -1 all the time. This results in a large
> + * nr being built up so when a shrink that can do some work
> + * comes along it empties the entire cache due to nr >>>
> + * max_pass. This is bad for sustaining a working set in
> + * memory.
> + *
> + * Hence only allow the shrinker to scan the entire cache when
> + * a large delta change is calculated directly.
> + */
> + if (delta < max_pass / 4)
> + total_scan = min(total_scan, max_pass / 2);
> +
> + /*
> + * Avoid risking looping forever due to too large nr value:
> + * never try to free more than twice the estimate number of
> + * freeable entries.
> + */
> + if (total_scan > max_pass * 2)
> + total_scan = max_pass * 2;
> +
> + trace_mm_shrink_slab_start(shrinker, shrink, nr,
> + nr_pages_scanned, lru_pages,
> + max_pass, delta, total_scan);
> +
This bit from here:
> + while (total_scan >= batch_size) {
> + int nr_before;
> +
> + nr_before = do_shrinker_shrink(shrinker, shrink, 0);
> + shrink_ret = do_shrinker_shrink(shrinker, shrink,
> + batch_size);
> + if (shrink_ret == -1)
> + break;
> + if (shrink_ret < nr_before)
> + ret += nr_before - shrink_ret;
> + count_vm_events(SLABS_SCANNED, batch_size);
> + total_scan -= batch_size;
> +
> + cond_resched();
> + }
to here is the only common code. It takes a locked shrinker and a
set up scan_control structure. As it is, I've got a set of patches
that I'm working on that fix the garbage srhinker API and abstract
this section of the code correctly for external users. I'll try to
get that finished for review later this week.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2011-08-15 6:45 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-14 15:13 [PATCH v3 0/4] Per-container dcache limitation Glauber Costa
[not found] ` <1313334832-1150-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-08-14 15:13 ` [PATCH v3 1/4] factor out single-shrinker code Glauber Costa
2011-08-15 6:43 ` Dave Chinner [this message]
2011-08-14 15:13 ` [PATCH v3 2/4] Keep nr_dentry per super block Glauber Costa
2011-08-14 17:38 ` Eric Dumazet
2011-08-14 15:13 ` [PATCH v3 3/4] limit nr_dentries per superblock Glauber Costa
2011-08-15 7:03 ` Dave Chinner
2011-08-15 7:12 ` Pekka Enberg
2011-08-15 10:46 ` Dave Chinner
2011-08-15 10:58 ` Pekka Enberg
2011-08-15 11:05 ` Pavel Emelyanov
2011-08-15 11:14 ` Pekka Enberg
2011-08-15 11:32 ` Pavel Emelyanov
2011-08-15 11:55 ` Pekka Enberg
2011-08-15 12:12 ` Pavel Emelyanov
2011-08-15 12:23 ` Pekka Enberg
2011-08-15 12:37 ` Pavel Emelyanov
2011-08-16 2:11 ` Dave Chinner
2011-08-14 15:13 ` [PATCH v3 4/4] parse options in the vfs level Glauber Costa
2011-08-14 15:39 ` Vasiliy Kulikov
2011-08-15 0:03 ` Glauber Costa
2011-08-15 7:09 ` Dave Chinner
2011-08-24 2:19 ` Glauber Costa
2011-08-17 5:43 ` [PATCH v3 0/4] Per-container dcache limitation Dave Chinner
2011-08-17 18:44 ` Glauber Costa
2011-08-18 1:27 ` Dave Chinner
2011-08-22 11:42 ` Glauber Costa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110815064359.GD26978@dastard \
--to=david@fromorbit.com \
--cc=JBottomley@Parallels.com \
--cc=aarcange@redhat.com \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=eric.dumazet@gmail.com \
--cc=glommer@parallels.com \
--cc=hughd@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@kernel.dk \
--cc=riel@redhat.com \
--cc=viro@zeniv.linux.org.uk \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).