From: Paul Gortmaker <paul.gortmaker@windriver.com>
To: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Steven Rostedt <rostedt@goodmis.org>,
LKML <linux-kernel@vger.kernel.org>,
RT <linux-rt-users@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Clark Williams <clark@redhat.com>,
Pekka Enberg <penberg@kernel.org>
Subject: Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
Date: Mon, 1 Apr 2013 12:06:03 -0400 [thread overview]
Message-ID: <5159B06B.30007@windriver.com> (raw)
In-Reply-To: <0000013dc63a9086-7d10c4a8-748c-4e19-829a-856d8d42c8eb-000000@email.amazonses.com>
On 13-04-01 11:32 AM, Christoph Lameter wrote:
> On Thu, 28 Mar 2013, Paul Gortmaker wrote:
>
>>> Index: linux/init/Kconfig
>>> ===================================================================
>>> --- linux.orig/init/Kconfig 2013-03-28 12:14:26.958358688 -0500
>>> +++ linux/init/Kconfig 2013-03-28 12:19:46.275866639 -0500
>>> @@ -1514,6 +1514,14 @@ config SLOB
>>>
>>> endchoice
>>>
>>> +config SLUB_CPU_PARTIAL
>>> + depends on SLUB
>>> + bool "SLUB per cpu partial cache"
>>> + help
>>> + Per cpu partial caches accellerate freeing of objects at the
>>> + price of more indeterminism in the latency of the free.
>>> + Typically one would choose no for a realtime system.
>>
>> Is "batch" a better description than "accelerate" ? Something like
>
> Its not a batching but a cache that is going to be mainly used for new
> allocations on the same processor.
OK. In that case, a minor nit - since it is user facing text, we should
probably drop one of the "l" for "accelerate".
>
>> Per cpu partial caches allows batch freeing of objects to maximize
>> throughput. However, this can increase the length of time spent
>> holding key locks, which can increase latency spikes with respect
>> to responsiveness. Select yes unless you are tuning for a realtime
>> oriented system.
>>
>> Also, I believe this will cause a behaviour change for people who
>> just run "make oldconfig" -- since there is no default line. Meaning
>> that it used to be unconditionally on, but now I think it will be off
>> by default, if people just mindlessly hold down Enter key.
>
> Ok.
>>
>> For RT, we'll want default N if RT_FULL (RT_BASE?) but for mainline,
>> I expect you'll want default Y in order to be consistent with previous
>> behaviour?
>
> I was not sure exactly how to handle that one yet for realtime. So I need
> two different patches?
I don't think you need to worry about realtime -- meaning that I would guess
once the patch exists in mainline, Steve will probably cherry pick it onto
3.6.11.x-stable, and then he'd likely add a one-line follow on RT-specific
patch to make it set to off/disabled for RT. Similar for 3.4.x etc.
>
>> I've not built/booted yet, but I'll follow up if I see anything else in doing
>> that.
>
> Here is an updated patch. I will also send an updated fixup patch.
I'll give these some local testing today.
Thanks,
Paul.
--
>
>
> Subject: slub: Make cpu partial slab support configurable V2
>
> cpu partial support can introduce level of indeterminism that is not wanted
> in certain context (like a realtime kernel). Make it configurable.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/include/linux/slub_def.h
> ===================================================================
> --- linux.orig/include/linux/slub_def.h 2013-04-01 10:27:05.908964674 -0500
> +++ linux/include/linux/slub_def.h 2013-04-01 10:27:19.905178531 -0500
> @@ -47,7 +47,9 @@ struct kmem_cache_cpu {
> void **freelist; /* Pointer to next available object */
> unsigned long tid; /* Globally unique transaction id */
> struct page *page; /* The slab from which we are allocating */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> struct page *partial; /* Partially allocated frozen slabs */
> +#endif
> #ifdef CONFIG_SLUB_STATS
> unsigned stat[NR_SLUB_STAT_ITEMS];
> #endif
> @@ -84,7 +86,9 @@ struct kmem_cache {
> int size; /* The size of an object including meta data */
> int object_size; /* The size of an object without meta data */
> int offset; /* Free pointer offset. */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> int cpu_partial; /* Number of per cpu partial objects to keep around */
> +#endif
> struct kmem_cache_order_objects oo;
>
> /* Allocation and freeing of slabs */
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c 2013-04-01 10:27:05.908964674 -0500
> +++ linux/mm/slub.c 2013-04-01 10:27:19.905178531 -0500
> @@ -1531,7 +1531,9 @@ static inline void *acquire_slab(struct
> return freelist;
> }
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
> +#endif
> static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
>
> /*
> @@ -1570,10 +1572,20 @@ static void *get_partial_node(struct kme
> object = t;
> available = page->objects - (unsigned long)page->lru.next;
> } else {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> available = put_cpu_partial(s, page, 0);
> stat(s, CPU_PARTIAL_NODE);
> +#else
> + BUG();
> +#endif
> }
> - if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
> + if (kmem_cache_debug(s) ||
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> + available > s->cpu_partial / 2
> +#else
> + available > 0
> +#endif
> + )
> break;
>
> }
> @@ -1874,6 +1886,7 @@ redo:
> }
> }
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> /*
> * Unfreeze all the cpu partial slabs.
> *
> @@ -1989,6 +2002,7 @@ static int put_cpu_partial(struct kmem_c
> } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
> return pobjects;
> }
> +#endif
>
> static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
> {
> @@ -2013,7 +2027,9 @@ static inline void __flush_cpu_slab(stru
> if (c->page)
> flush_slab(s, c);
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> unfreeze_partials(s, c);
> +#endif
> }
> }
>
> @@ -2029,7 +2045,11 @@ static bool has_cpu_slab(int cpu, void *
> struct kmem_cache *s = info;
> struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> return c->page || c->partial;
> +#else
> + return c->page;
> +#endif
> }
>
> static void flush_all(struct kmem_cache *s)
> @@ -2225,7 +2245,10 @@ static void *__slab_alloc(struct kmem_ca
> page = c->page;
> if (!page)
> goto new_slab;
> +
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> redo:
> +#endif
>
> if (unlikely(!node_match(page, node))) {
> stat(s, ALLOC_NODE_MISMATCH);
> @@ -2278,6 +2301,7 @@ load_freelist:
>
> new_slab:
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> if (c->partial) {
> page = c->page = c->partial;
> c->partial = page->next;
> @@ -2285,6 +2309,7 @@ new_slab:
> c->freelist = NULL;
> goto redo;
> }
> +#endif
>
> freelist = new_slab_objects(s, gfpflags, node, &c);
>
> @@ -2491,6 +2516,7 @@ static void __slab_free(struct kmem_cach
> new.inuse--;
> if ((!new.inuse || !prior) && !was_frozen) {
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> if (!kmem_cache_debug(s) && !prior)
>
> /*
> @@ -2499,7 +2525,9 @@ static void __slab_free(struct kmem_cach
> */
> new.frozen = 1;
>
> - else { /* Needs to be taken off a list */
> + else
> +#endif
> + { /* Needs to be taken off a list */
>
> n = get_node(s, page_to_nid(page));
> /*
> @@ -2521,6 +2549,7 @@ static void __slab_free(struct kmem_cach
> "__slab_free"));
>
> if (likely(!n)) {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>
> /*
> * If we just froze the page then put it onto the
> @@ -2530,6 +2559,7 @@ static void __slab_free(struct kmem_cach
> put_cpu_partial(s, page, 1);
> stat(s, CPU_PARTIAL_FREE);
> }
> +#endif
> /*
> * The list lock was not taken therefore no list
> * activity can be necessary.
> @@ -3036,7 +3066,7 @@ static int kmem_cache_open(struct kmem_c
> * list to avoid pounding the page allocator excessively.
> */
> set_min_partial(s, ilog2(s->size) / 2);
> -
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> /*
> * cpu_partial determined the maximum number of objects kept in the
> * per cpu partial lists of a processor.
> @@ -3064,6 +3094,7 @@ static int kmem_cache_open(struct kmem_c
> s->cpu_partial = 13;
> else
> s->cpu_partial = 30;
> +#endif
>
> #ifdef CONFIG_NUMA
> s->remote_node_defrag_ratio = 1000;
> @@ -4424,13 +4455,14 @@ static ssize_t show_slab_objects(struct
> total += x;
> nodes[node] += x;
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> page = ACCESS_ONCE(c->partial);
> if (page) {
> x = page->pobjects;
> total += x;
> nodes[node] += x;
> }
> -
> +#endif
> per_cpu[node]++;
> }
> }
> @@ -4583,6 +4615,7 @@ static ssize_t min_partial_store(struct
> }
> SLAB_ATTR(min_partial);
>
> +#ifdef CONFIG_CPU_PARTIAL
> static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
> {
> return sprintf(buf, "%u\n", s->cpu_partial);
> @@ -4605,6 +4638,7 @@ static ssize_t cpu_partial_store(struct
> return length;
> }
> SLAB_ATTR(cpu_partial);
> +#endif
>
> static ssize_t ctor_show(struct kmem_cache *s, char *buf)
> {
> @@ -4644,6 +4678,7 @@ static ssize_t objects_partial_show(stru
> }
> SLAB_ATTR_RO(objects_partial);
>
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
> {
> int objects = 0;
> @@ -4674,6 +4709,7 @@ static ssize_t slabs_cpu_partial_show(st
> return len + sprintf(buf + len, "\n");
> }
> SLAB_ATTR_RO(slabs_cpu_partial);
> +#endif
>
> static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
> {
> @@ -4997,11 +5033,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
> STAT_ATTR(ORDER_FALLBACK, order_fallback);
> STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
> STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
> +#ifdef CONFIG_CPU_PARTIAL
> STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
> STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
> STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
> STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
> #endif
> +#endif
>
> static struct attribute *slab_attrs[] = {
> &slab_size_attr.attr,
> @@ -5009,7 +5047,9 @@ static struct attribute *slab_attrs[] =
> &objs_per_slab_attr.attr,
> &order_attr.attr,
> &min_partial_attr.attr,
> +#ifdef CONFIG_CPU_PARTIAL
> &cpu_partial_attr.attr,
> +#endif
> &objects_attr.attr,
> &objects_partial_attr.attr,
> &partial_attr.attr,
> @@ -5022,7 +5062,9 @@ static struct attribute *slab_attrs[] =
> &destroy_by_rcu_attr.attr,
> &shrink_attr.attr,
> &reserved_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> &slabs_cpu_partial_attr.attr,
> +#endif
> #ifdef CONFIG_SLUB_DEBUG
> &total_objects_attr.attr,
> &slabs_attr.attr,
> @@ -5064,11 +5106,13 @@ static struct attribute *slab_attrs[] =
> &order_fallback_attr.attr,
> &cmpxchg_double_fail_attr.attr,
> &cmpxchg_double_cpu_fail_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> &cpu_partial_alloc_attr.attr,
> &cpu_partial_free_attr.attr,
> &cpu_partial_node_attr.attr,
> &cpu_partial_drain_attr.attr,
> #endif
> +#endif
> #ifdef CONFIG_FAILSLAB
> &failslab_attr.attr,
> #endif
> Index: linux/init/Kconfig
> ===================================================================
> --- linux.orig/init/Kconfig 2013-04-01 10:27:05.908964674 -0500
> +++ linux/init/Kconfig 2013-04-01 10:31:46.497863625 -0500
> @@ -1514,6 +1514,17 @@ config SLOB
>
> endchoice
>
> +config SLUB_CPU_PARTIAL
> + default y
> + depends on SLUB
> + bool "SLUB per cpu partial cache"
> + help
> + Per cpu partial caches accellerate objects allocation and freeing
> + that is local to a processor at the price of more indeterminism
> + in the latency of the free. On overflow these caches will be cleared
> + which requires the taking of locks that may cause latency spikes.
> + Typically one would choose no for a realtime system.
> +
> config MMAP_ALLOW_UNINITIALIZED
> bool "Allow mmapped anonymous memory to be uninitialized"
> depends on EXPERT && !MMU
>
next prev parent reply other threads:[~2013-04-01 16:06 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-21 22:55 [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code Steven Rostedt
2013-03-22 15:41 ` Christoph Lameter
2013-03-23 3:51 ` Steven Rostedt
2013-03-25 14:34 ` Christoph Lameter
2013-03-25 15:57 ` Steven Rostedt
2013-03-25 16:13 ` Steven Rostedt
2013-03-25 17:51 ` Christoph Lameter
2013-03-25 18:03 ` Steven Rostedt
2013-03-25 18:27 ` Christoph Lameter
2013-03-25 18:32 ` Steven Rostedt
2013-03-27 2:59 ` Joonsoo Kim
2013-03-27 3:30 ` Steven Rostedt
2013-03-27 6:13 ` Joonsoo Kim
2013-03-28 17:29 ` Christoph Lameter
2013-04-08 12:25 ` Steven Rostedt
[not found] ` <alpine.DEB.2.02.1303281227520.16200@gentwo.org>
2013-03-28 17:30 ` Christoph Lameter
2013-03-29 2:43 ` Paul Gortmaker
2013-04-01 15:32 ` Christoph Lameter
2013-04-01 16:06 ` Paul Gortmaker [this message]
2013-04-02 0:07 ` Paul Gortmaker
2013-04-01 21:46 ` Paul Gortmaker
2013-04-02 1:37 ` Joonsoo Kim
[not found] ` <alpine.DEB.2.02.1304011025550.12690@gentwo.org>
2013-04-01 15:33 ` Christoph Lameter
2013-04-02 0:42 ` Joonsoo Kim
2013-04-02 6:48 ` Pekka Enberg
2013-04-02 19:25 ` Christoph Lameter
2013-04-04 0:58 ` Joonsoo Kim
2013-04-04 13:53 ` Christoph Lameter
2013-04-05 2:05 ` Joonsoo Kim
2013-04-05 14:35 ` Christoph Lameter
2013-04-08 12:32 ` Steven Rostedt
2013-04-10 6:31 ` Pekka Enberg
2013-04-10 7:31 ` Joonsoo Kim
2013-04-10 14:00 ` Christoph Lameter
2013-04-10 14:09 ` Steven Rostedt
2013-04-11 16:05 ` Steven Rostedt
2013-04-11 16:42 ` Christoph Lameter
2013-04-12 6:48 ` Pekka Enberg
2013-05-28 14:39 ` Steven Rostedt
2013-05-28 16:22 ` Christoph Lameter
[not found] ` <alpine.DEB.2.02.1305281121420.1627@gentwo.org>
2013-05-28 18:24 ` Christoph Lameter
2013-06-03 15:28 ` JoonSoo Kim
2013-06-03 19:20 ` Christoph Lameter
2013-06-04 22:21 ` JoonSoo Kim
2013-06-05 14:06 ` Christoph Lameter
2013-06-05 14:09 ` Christoph Lameter
2013-06-03 20:41 ` Christoph Lameter
2013-03-26 16:52 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5159B06B.30007@windriver.com \
--to=paul.gortmaker@windriver.com \
--cc=cl@linux.com \
--cc=clark@redhat.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=penberg@kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).