From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f177.google.com (mail-pd0-f177.google.com [209.85.192.177]) by kanga.kvack.org (Postfix) with ESMTP id B64216B0080 for ; Wed, 8 Apr 2015 18:53:06 -0400 (EDT) Received: by pdea3 with SMTP id a3so130610462pde.3 for ; Wed, 08 Apr 2015 15:53:06 -0700 (PDT) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTPS id hn5si7629388pac.180.2015.04.08.15.53.05 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Apr 2015 15:53:05 -0700 (PDT) Date: Wed, 8 Apr 2015 15:53:04 -0700 From: Andrew Morton Subject: Re: slub bulk alloc: Extract objects from the per cpu slab Message-Id: <20150408155304.4480f11f16b60f09879c350d@linux-foundation.org> In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: brouer@redhat.com, Joonsoo Kim , Pekka Enberg , David Rientjes , linux-mm@kvack.org On Wed, 8 Apr 2015 13:13:29 -0500 (CDT) Christoph Lameter wrote: > First piece: accelleration of retrieval of per cpu objects > > > If we are allocating lots of objects then it is advantageous to > disable interrupts and avoid the this_cpu_cmpxchg() operation to > get these objects faster. Note that we cannot do the fast operation > if debugging is enabled. Why can't we do it if debugging is enabled? > Note also that the requirement of having > interrupts disabled avoids having to do processor flag operations. > > Allocate as many objects as possible in the fast way and then fall > back to the generic implementation for the rest of the objects. Seems sane. What's the expected success rate of the initial bulk allocation attempt? > --- linux.orig/mm/slub.c > +++ linux/mm/slub.c > @@ -2761,7 +2761,32 @@ EXPORT_SYMBOL(kmem_cache_free_bulk); > bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, > void **p) > { > - return kmem_cache_alloc_bulk(s, flags, size, p); > + if (!kmem_cache_debug(s)) { > + struct kmem_cache_cpu *c; > + > + /* Drain objects in the per cpu slab */ > + local_irq_disable(); > + c = this_cpu_ptr(s->cpu_slab); > + > + while (size) { > + void *object = c->freelist; > + > + if (!object) > + break; > + > + c->freelist = get_freepointer(s, object); > + *p++ = object; > + size--; > + > + if (unlikely(flags & __GFP_ZERO)) > + memset(object, 0, s->object_size); > + } > + c->tid = next_tid(c->tid); > + > + local_irq_enable(); > + } > + > + return __kmem_cache_alloc_bulk(s, flags, size, p); This kmem_cache_cpu.tid logic is a bit opaque. The low-level operations seem reasonably well documented but I couldn't find anywhere which tells me how it all actually works - what is "disambiguation during cmpxchg" and how do we achieve it? I'm in two minds about putting slab-infrastructure-for-bulk-object-allocation-and-freeing-v3.patch and slub-bulk-alloc-extract-objects-from-the-per-cpu-slab.patch into 4.1. They're standalone (ie: no in-kernel callers!) hence harmless, and merging them will make Jesper's life a bit easier. But otoh they are unproven and have no in-kernel callers, so formally they shouldn't be merged yet. I suppose we can throw them away again if things don't work out. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org