Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Christoph Lameter <cl@linux.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	LKML <linux-kernel@vger.kernel.org>,
	RT <linux-rt-users@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Clark Williams <clark@redhat.com>,
	Pekka Enberg <penberg@kernel.org>
Subject: Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code.
Date: Tue, 2 Apr 2013 10:37:21 +0900	[thread overview]
Message-ID: <20130402013721.GB16699@lge.com> (raw)
In-Reply-To: <0000013dc63a9086-7d10c4a8-748c-4e19-829a-856d8d42c8eb-000000@email.amazonses.com>

Hello, Christoph.

On Mon, Apr 01, 2013 at 03:32:43PM +0000, Christoph Lameter wrote:
> On Thu, 28 Mar 2013, Paul Gortmaker wrote:
> 
> > > Index: linux/init/Kconfig
> > > ===================================================================
> > > --- linux.orig/init/Kconfig     2013-03-28 12:14:26.958358688 -0500
> > > +++ linux/init/Kconfig  2013-03-28 12:19:46.275866639 -0500
> > > @@ -1514,6 +1514,14 @@ config SLOB
> > >
> > >  endchoice
> > >
> > > +config SLUB_CPU_PARTIAL
> > > +       depends on SLUB
> > > +       bool "SLUB per cpu partial cache"
> > > +       help
> > > +         Per cpu partial caches accellerate freeing of objects at the
> > > +         price of more indeterminism in the latency of the free.
> > > +         Typically one would choose no for a realtime system.
> >
> > Is "batch" a better description than "accelerate" ?  Something like
> 
> Its not a batching but a cache that is going to be mainly used for new
> allocations on the same processor.
> 
> >      Per cpu partial caches allows batch freeing of objects to maximize
> >      throughput.  However, this can increase the length of time spent
> >      holding key locks, which can increase latency spikes with respect
> >      to responsiveness.  Select yes unless you are tuning for a realtime
> >      oriented system.
> >
> > Also, I believe this will cause a behaviour change for people who
> > just run "make oldconfig" -- since there is no default line.  Meaning
> > that it used to be unconditionally on, but now I think it will be off
> > by default, if people just mindlessly hold down Enter key.
> 
> Ok.
> >
> > For RT, we'll want default N if RT_FULL (RT_BASE?) but for mainline,
> > I expect you'll want default Y in order to be consistent with previous
> > behaviour?
> 
> I was not sure exactly how to handle that one yet for realtime. So I need
> two different patches?
> 
> > I've not built/booted yet, but I'll follow up if I see anything else in doing
> > that.
> 
> Here is an updated patch. I will also send an updated fixup patch.
> 
> 
> Subject: slub: Make cpu partial slab support configurable V2
> 
> cpu partial support can introduce level of indeterminism that is not wanted
> in certain context (like a realtime kernel). Make it configurable.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/include/linux/slub_def.h
> ===================================================================
> --- linux.orig/include/linux/slub_def.h	2013-04-01 10:27:05.908964674 -0500
> +++ linux/include/linux/slub_def.h	2013-04-01 10:27:19.905178531 -0500
> @@ -47,7 +47,9 @@ struct kmem_cache_cpu {
>  	void **freelist;	/* Pointer to next available object */
>  	unsigned long tid;	/* Globally unique transaction id */
>  	struct page *page;	/* The slab from which we are allocating */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	struct page *partial;	/* Partially allocated frozen slabs */
> +#endif
>  #ifdef CONFIG_SLUB_STATS
>  	unsigned stat[NR_SLUB_STAT_ITEMS];
>  #endif
> @@ -84,7 +86,9 @@ struct kmem_cache {
>  	int size;		/* The size of an object including meta data */
>  	int object_size;	/* The size of an object without meta data */
>  	int offset;		/* Free pointer offset. */
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	int cpu_partial;	/* Number of per cpu partial objects to keep around */
> +#endif
>  	struct kmem_cache_order_objects oo;
> 
>  	/* Allocation and freeing of slabs */

When !CONFIG_SLUB_CPU_PARTIAL, should we remove these variable?
Without removing these, we can make code more simpler and maintainable.

> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-04-01 10:27:05.908964674 -0500
> +++ linux/mm/slub.c	2013-04-01 10:27:19.905178531 -0500
> @@ -1531,7 +1531,9 @@ static inline void *acquire_slab(struct
>  	return freelist;
>  }
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
> +#endif
>  static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
> 
>  /*
> @@ -1570,10 +1572,20 @@ static void *get_partial_node(struct kme
>  			object = t;
>  			available =  page->objects - (unsigned long)page->lru.next;
>  		} else {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			available = put_cpu_partial(s, page, 0);
>  			stat(s, CPU_PARTIAL_NODE);
> +#else
> +			BUG();
> +#endif
>  		}
> -		if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
> +		if (kmem_cache_debug(s) ||
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> +			available > s->cpu_partial / 2
> +#else
> +			available > 0
> +#endif
> +			)
>  			break;
> 
>  	}

How about introducing wrapper function, cpu_partial_enabled()
like as kmem_cache_debug()?

int cpu_partial_enabled(s)
{
	return kmem_cache_debug(s) || blablabla
}

As you already know, when kmem_cache_debug() is enabled,
cpu partial is maintained as zero. How about re-using this property
for implementing !CONFIG_CPU_PARTIAL?

Thanks.

> @@ -1874,6 +1886,7 @@ redo:
>  	}
>  }
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  /*
>   * Unfreeze all the cpu partial slabs.
>   *
> @@ -1989,6 +2002,7 @@ static int put_cpu_partial(struct kmem_c
>  	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage);
>  	return pobjects;
>  }
> +#endif
> 
>  static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
>  {
> @@ -2013,7 +2027,9 @@ static inline void __flush_cpu_slab(stru
>  		if (c->page)
>  			flush_slab(s, c);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  		unfreeze_partials(s, c);
> +#endif
>  	}
>  }
> 
> @@ -2029,7 +2045,11 @@ static bool has_cpu_slab(int cpu, void *
>  	struct kmem_cache *s = info;
>  	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	return c->page || c->partial;
> +#else
> +	return c->page;
> +#endif
>  }
> 
>  static void flush_all(struct kmem_cache *s)
> @@ -2225,7 +2245,10 @@ static void *__slab_alloc(struct kmem_ca
>  	page = c->page;
>  	if (!page)
>  		goto new_slab;
> +
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  redo:
> +#endif
> 
>  	if (unlikely(!node_match(page, node))) {
>  		stat(s, ALLOC_NODE_MISMATCH);
> @@ -2278,6 +2301,7 @@ load_freelist:
> 
>  new_slab:
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	if (c->partial) {
>  		page = c->page = c->partial;
>  		c->partial = page->next;
> @@ -2285,6 +2309,7 @@ new_slab:
>  		c->freelist = NULL;
>  		goto redo;
>  	}
> +#endif
> 
>  	freelist = new_slab_objects(s, gfpflags, node, &c);
> 
> @@ -2491,6 +2516,7 @@ static void __slab_free(struct kmem_cach
>  		new.inuse--;
>  		if ((!new.inuse || !prior) && !was_frozen) {
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			if (!kmem_cache_debug(s) && !prior)
> 
>  				/*
> @@ -2499,7 +2525,9 @@ static void __slab_free(struct kmem_cach
>  				 */
>  				new.frozen = 1;
> 
> -			else { /* Needs to be taken off a list */
> +			else
> +#endif
> +		       		{ /* Needs to be taken off a list */
> 
>  	                        n = get_node(s, page_to_nid(page));
>  				/*
> @@ -2521,6 +2549,7 @@ static void __slab_free(struct kmem_cach
>  		"__slab_free"));
> 
>  	if (likely(!n)) {
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
> 
>  		/*
>  		 * If we just froze the page then put it onto the
> @@ -2530,6 +2559,7 @@ static void __slab_free(struct kmem_cach
>  			put_cpu_partial(s, page, 1);
>  			stat(s, CPU_PARTIAL_FREE);
>  		}
> +#endif
>  		/*
>  		 * The list lock was not taken therefore no list
>  		 * activity can be necessary.
> @@ -3036,7 +3066,7 @@ static int kmem_cache_open(struct kmem_c
>  	 * list to avoid pounding the page allocator excessively.
>  	 */
>  	set_min_partial(s, ilog2(s->size) / 2);
> -
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	/*
>  	 * cpu_partial determined the maximum number of objects kept in the
>  	 * per cpu partial lists of a processor.
> @@ -3064,6 +3094,7 @@ static int kmem_cache_open(struct kmem_c
>  		s->cpu_partial = 13;
>  	else
>  		s->cpu_partial = 30;
> +#endif
> 
>  #ifdef CONFIG_NUMA
>  	s->remote_node_defrag_ratio = 1000;
> @@ -4424,13 +4455,14 @@ static ssize_t show_slab_objects(struct
>  			total += x;
>  			nodes[node] += x;
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  			page = ACCESS_ONCE(c->partial);
>  			if (page) {
>  				x = page->pobjects;
>  				total += x;
>  				nodes[node] += x;
>  			}
> -
> +#endif
>  			per_cpu[node]++;
>  		}
>  	}
> @@ -4583,6 +4615,7 @@ static ssize_t min_partial_store(struct
>  }
>  SLAB_ATTR(min_partial);
> 
> +#ifdef CONFIG_CPU_PARTIAL
>  static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	return sprintf(buf, "%u\n", s->cpu_partial);
> @@ -4605,6 +4638,7 @@ static ssize_t cpu_partial_store(struct
>  	return length;
>  }
>  SLAB_ATTR(cpu_partial);
> +#endif
> 
>  static ssize_t ctor_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4644,6 +4678,7 @@ static ssize_t objects_partial_show(stru
>  }
>  SLAB_ATTR_RO(objects_partial);
> 
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf)
>  {
>  	int objects = 0;
> @@ -4674,6 +4709,7 @@ static ssize_t slabs_cpu_partial_show(st
>  	return len + sprintf(buf + len, "\n");
>  }
>  SLAB_ATTR_RO(slabs_cpu_partial);
> +#endif
> 
>  static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf)
>  {
> @@ -4997,11 +5033,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_
>  STAT_ATTR(ORDER_FALLBACK, order_fallback);
>  STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail);
>  STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail);
> +#ifdef CONFIG_CPU_PARTIAL
>  STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc);
>  STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free);
>  STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node);
>  STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain);
>  #endif
> +#endif
> 
>  static struct attribute *slab_attrs[] = {
>  	&slab_size_attr.attr,
> @@ -5009,7 +5047,9 @@ static struct attribute *slab_attrs[] =
>  	&objs_per_slab_attr.attr,
>  	&order_attr.attr,
>  	&min_partial_attr.attr,
> +#ifdef CONFIG_CPU_PARTIAL
>  	&cpu_partial_attr.attr,
> +#endif
>  	&objects_attr.attr,
>  	&objects_partial_attr.attr,
>  	&partial_attr.attr,
> @@ -5022,7 +5062,9 @@ static struct attribute *slab_attrs[] =
>  	&destroy_by_rcu_attr.attr,
>  	&shrink_attr.attr,
>  	&reserved_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&slabs_cpu_partial_attr.attr,
> +#endif
>  #ifdef CONFIG_SLUB_DEBUG
>  	&total_objects_attr.attr,
>  	&slabs_attr.attr,
> @@ -5064,11 +5106,13 @@ static struct attribute *slab_attrs[] =
>  	&order_fallback_attr.attr,
>  	&cmpxchg_double_fail_attr.attr,
>  	&cmpxchg_double_cpu_fail_attr.attr,
> +#ifdef CONFIG_SLUB_CPU_PARTIAL
>  	&cpu_partial_alloc_attr.attr,
>  	&cpu_partial_free_attr.attr,
>  	&cpu_partial_node_attr.attr,
>  	&cpu_partial_drain_attr.attr,
>  #endif
> +#endif
>  #ifdef CONFIG_FAILSLAB
>  	&failslab_attr.attr,
>  #endif
> Index: linux/init/Kconfig
> ===================================================================
> --- linux.orig/init/Kconfig	2013-04-01 10:27:05.908964674 -0500
> +++ linux/init/Kconfig	2013-04-01 10:31:46.497863625 -0500
> @@ -1514,6 +1514,17 @@ config SLOB
> 
>  endchoice
> 
> +config SLUB_CPU_PARTIAL
> +	default y
> +	depends on SLUB
> +	bool "SLUB per cpu partial cache"
> +	help
> +	  Per cpu partial caches accellerate objects allocation and freeing
> +	  that is local to a processor at the price of more indeterminism
> +	  in the latency of the free. On overflow these caches will be cleared
> +	  which requires the taking of locks that may cause latency spikes.
> +	  Typically one would choose no for a realtime system.
> +
>  config MMAP_ALLOW_UNINITIALIZED
>  	bool "Allow mmapped anonymous memory to be uninitialized"
>  	depends on EXPERT && !MMU
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

next prev parent reply	other threads:[~2013-04-02  1:37 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-21 22:55 [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code Steven Rostedt
2013-03-22 15:41 ` Christoph Lameter
2013-03-23  3:51   ` Steven Rostedt
2013-03-25 14:34     ` Christoph Lameter
2013-03-25 15:57       ` Steven Rostedt
2013-03-25 16:13         ` Steven Rostedt
2013-03-25 17:51           ` Christoph Lameter
2013-03-25 18:03             ` Steven Rostedt
2013-03-25 18:27               ` Christoph Lameter
2013-03-25 18:32                 ` Steven Rostedt
2013-03-27  2:59                   ` Joonsoo Kim
2013-03-27  3:30                     ` Steven Rostedt
2013-03-27  6:13                       ` Joonsoo Kim
2013-03-28 17:29                         ` Christoph Lameter
2013-04-08 12:25                           ` Steven Rostedt
     [not found]                         ` <alpine.DEB.2.02.1303281227520.16200@gentwo.org>
2013-03-28 17:30                           ` Christoph Lameter
2013-03-29  2:43                             ` Paul Gortmaker
2013-04-01 15:32                               ` Christoph Lameter
2013-04-01 16:06                                 ` Paul Gortmaker
2013-04-02  0:07                                   ` Paul Gortmaker
2013-04-01 21:46                                 ` Paul Gortmaker
2013-04-02  1:37                                 ` Joonsoo Kim [this message]
     [not found]                               ` <alpine.DEB.2.02.1304011025550.12690@gentwo.org>
2013-04-01 15:33                                 ` Christoph Lameter
2013-04-02  0:42                                   ` Joonsoo Kim
2013-04-02  6:48                                     ` Pekka Enberg
2013-04-02 19:25                                     ` Christoph Lameter
2013-04-04  0:58                                       ` Joonsoo Kim
2013-04-04 13:53                                         ` Christoph Lameter
2013-04-05  2:05                                           ` Joonsoo Kim
2013-04-05 14:35                                             ` Christoph Lameter
2013-04-08 12:32                                     ` Steven Rostedt
2013-04-10  6:31                                       ` Pekka Enberg
2013-04-10  7:31                                         ` Joonsoo Kim
2013-04-10 14:00                                           ` Christoph Lameter
2013-04-10 14:09                                             ` Steven Rostedt
2013-04-11 16:05                             ` Steven Rostedt
2013-04-11 16:42                               ` Christoph Lameter
2013-04-12  6:48                                 ` Pekka Enberg
2013-05-28 14:39                             ` Steven Rostedt
2013-05-28 16:22                               ` Christoph Lameter
     [not found]                               ` <alpine.DEB.2.02.1305281121420.1627@gentwo.org>
2013-05-28 18:24                                 ` Christoph Lameter
2013-06-03 15:28                                   ` JoonSoo Kim
2013-06-03 19:20                                     ` Christoph Lameter
2013-06-04 22:21                                       ` JoonSoo Kim
2013-06-05 14:06                                         ` Christoph Lameter
2013-06-05 14:09                                         ` Christoph Lameter
2013-06-03 20:41                                     ` Christoph Lameter
2013-03-26 16:52       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130402013721.GB16699@lge.com \
    --to=iamjoonsoo.kim@lge.com \
    --cc=cl@linux.com \
    --cc=clark@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=penberg@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).