Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: js1304@gmail.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	brouer@redhat.com
Subject: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache
Date: Tue, 12 Apr 2016 09:24:34 +0200	[thread overview]
Message-ID: <20160412092434.0929a04c@redhat.com> (raw)
In-Reply-To: <1460436666-20462-12-git-send-email-iamjoonsoo.kim@lge.com>

On Tue, 12 Apr 2016 13:51:06 +0900
js1304@gmail.com wrote:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> To check whther free objects exist or not precisely, we need to grab a
           ^^^^^^    
(spelling)
> lock.  But, accuracy isn't that important because race window would be
> even small and if there is too much free object, cache reaper would reap
> it.  So, this patch makes the check for free object exisistence not to
                                                      ^^^^^^^^^^^
(spelling)

> hold a lock.  This will reduce lock contention in heavily allocation case.
> 
> Note that until now, n->shared can be freed during the processing by
> writing slabinfo, but, with some trick in this patch, we can access it
> freely within interrupt disabled period.
> 
> Below is the result of concurrent allocation/free in slab allocation
> benchmark made by Christoph a long time ago.  I make the output simpler.
> The number shows cycle count during alloc/free respectively so less is
> better.

I cannot figure out which if Christoph's tests you are using.  And I
even have a copy of his test here:
 https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_test.c

I think you need to describe the test a bit better...

Looking a long time at the output on my own system, I guess you are
showing results from the "Concurrent allocs".  Then it would be
relevant how many CPUs your system have.

It would also be relevant to mention that N=10000.  And perhaps mention
that it means, e.g all CPUs do N=10000 alloc concurrently, synchronize
before doing N free concurrently.

> * Before
> Kmalloc N*alloc N*free(32): Average=248/966
> Kmalloc N*alloc N*free(64): Average=261/949
> Kmalloc N*alloc N*free(128): Average=314/1016
> Kmalloc N*alloc N*free(256): Average=741/1061
> Kmalloc N*alloc N*free(512): Average=1246/1152
> Kmalloc N*alloc N*free(1024): Average=2437/1259
> Kmalloc N*alloc N*free(2048): Average=4980/1800
> Kmalloc N*alloc N*free(4096): Average=9000/2078
> 
> * After
> Kmalloc N*alloc N*free(32): Average=344/792
> Kmalloc N*alloc N*free(64): Average=347/882
> Kmalloc N*alloc N*free(128): Average=390/959
> Kmalloc N*alloc N*free(256): Average=393/1067
> Kmalloc N*alloc N*free(512): Average=683/1229
> Kmalloc N*alloc N*free(1024): Average=1295/1325
> Kmalloc N*alloc N*free(2048): Average=2513/1664
> Kmalloc N*alloc N*free(4096): Average=4742/2172
> 
> It shows that allocation performance decreases for the object size up to
> 128 and it may be due to extra checks in cache_alloc_refill().  But, with
> considering improvement of free performance, net result looks the same.
> Result for other size class looks very promising, roughly, 50% performance
> improvement.

Super nice performance boost.  The numbers on my system are
significantly smaller, but this is a before/after test and the absolute
numbers are not that important.

Oh, maybe this was because I ran the test with SLUB... recompiling with
SLAB... and the results are comparable to your numbers (on my 8 core
i7-4790K CPU @ 4.00GHz)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: js1304@gmail.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	brouer@redhat.com
Subject: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache
Date: Tue, 12 Apr 2016 09:24:34 +0200	[thread overview]
Message-ID: <20160412092434.0929a04c@redhat.com> (raw)
In-Reply-To: <1460436666-20462-12-git-send-email-iamjoonsoo.kim@lge.com>

On Tue, 12 Apr 2016 13:51:06 +0900
js1304@gmail.com wrote:

> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> To check whther free objects exist or not precisely, we need to grab a
           ^^^^^^    
(spelling)
> lock.  But, accuracy isn't that important because race window would be
> even small and if there is too much free object, cache reaper would reap
> it.  So, this patch makes the check for free object exisistence not to
                                                      ^^^^^^^^^^^
(spelling)

> hold a lock.  This will reduce lock contention in heavily allocation case.
> 
> Note that until now, n->shared can be freed during the processing by
> writing slabinfo, but, with some trick in this patch, we can access it
> freely within interrupt disabled period.
> 
> Below is the result of concurrent allocation/free in slab allocation
> benchmark made by Christoph a long time ago.  I make the output simpler.
> The number shows cycle count during alloc/free respectively so less is
> better.

I cannot figure out which if Christoph's tests you are using.  And I
even have a copy of his test here:
 https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_test.c

I think you need to describe the test a bit better...

Looking a long time at the output on my own system, I guess you are
showing results from the "Concurrent allocs".  Then it would be
relevant how many CPUs your system have.

It would also be relevant to mention that N=10000.  And perhaps mention
that it means, e.g all CPUs do N=10000 alloc concurrently, synchronize
before doing N free concurrently.

> * Before
> Kmalloc N*alloc N*free(32): Average=248/966
> Kmalloc N*alloc N*free(64): Average=261/949
> Kmalloc N*alloc N*free(128): Average=314/1016
> Kmalloc N*alloc N*free(256): Average=741/1061
> Kmalloc N*alloc N*free(512): Average=1246/1152
> Kmalloc N*alloc N*free(1024): Average=2437/1259
> Kmalloc N*alloc N*free(2048): Average=4980/1800
> Kmalloc N*alloc N*free(4096): Average=9000/2078
> 
> * After
> Kmalloc N*alloc N*free(32): Average=344/792
> Kmalloc N*alloc N*free(64): Average=347/882
> Kmalloc N*alloc N*free(128): Average=390/959
> Kmalloc N*alloc N*free(256): Average=393/1067
> Kmalloc N*alloc N*free(512): Average=683/1229
> Kmalloc N*alloc N*free(1024): Average=1295/1325
> Kmalloc N*alloc N*free(2048): Average=2513/1664
> Kmalloc N*alloc N*free(4096): Average=4742/2172
> 
> It shows that allocation performance decreases for the object size up to
> 128 and it may be due to extra checks in cache_alloc_refill().  But, with
> considering improvement of free performance, net result looks the same.
> Result for other size class looks very promising, roughly, 50% performance
> improvement.

Super nice performance boost.  The numbers on my system are
significantly smaller, but this is a before/after test and the absolute
numbers are not that important.

Oh, maybe this was because I ran the test with SLUB... recompiling with
SLAB... and the results are comparable to your numbers (on my 8 core
i7-4790K CPU @ 4.00GHz)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

next prev parent reply	other threads:[~2016-04-12  7:24 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-12  4:50 [PATCH v2 00/11] mm/slab: reduce lock contention in alloc path js1304
2016-04-12  4:50 ` js1304
2016-04-12  4:50 ` [PATCH v2 01/11] mm/slab: fix the theoretical race by holding proper lock js1304
2016-04-12  4:50   ` js1304
2016-04-12 16:38   ` Christoph Lameter
2016-04-12 16:38     ` Christoph Lameter
2016-04-14  1:56     ` Joonsoo Kim
2016-04-14  1:56       ` Joonsoo Kim
2016-04-12  4:50 ` [PATCH v2 02/11] mm/slab: remove BAD_ALIEN_MAGIC again js1304
2016-04-12  4:50   ` js1304
2016-04-12 16:41   ` Christoph Lameter
2016-04-12 16:41     ` Christoph Lameter
2016-04-12  4:50 ` [PATCH v2 03/11] mm/slab: drain the free slab as much as possible js1304
2016-04-12  4:50   ` js1304
2016-04-12  4:50 ` [PATCH v2 04/11] mm/slab: factor out kmem_cache_node initialization code js1304
2016-04-12  4:50   ` js1304
2016-04-12 16:53   ` Christoph Lameter
2016-04-12 16:53     ` Christoph Lameter
2016-04-26  0:47   ` Joonsoo Kim
2016-04-26  0:47     ` Joonsoo Kim
2016-04-12  4:51 ` [PATCH v2 05/11] mm/slab: clean-up kmem_cache_node setup js1304
2016-04-12  4:51   ` js1304
2016-04-12 16:55   ` Christoph Lameter
2016-04-12 16:55     ` Christoph Lameter
2016-04-12  4:51 ` [PATCH v2 06/11] mm/slab: don't keep free slabs if free_objects exceeds free_limit js1304
2016-04-12  4:51   ` js1304
2016-07-22 11:51   ` Tetsuo Handa
2016-07-26  7:18     ` Joonsoo Kim
2016-04-12  4:51 ` [PATCH v2 07/11] mm/slab: racy access/modify the slab color js1304
2016-04-12  4:51   ` js1304
2016-04-12  4:51 ` [PATCH v2 08/11] mm/slab: make cache_grow() handle the page allocated on arbitrary node js1304
2016-04-12  4:51   ` js1304
2016-04-12  4:51 ` [PATCH v2 09/11] mm/slab: separate cache_grow() to two parts js1304
2016-04-12  4:51   ` js1304
2016-04-12  4:51 ` [PATCH v2 10/11] mm/slab: refill cpu cache through a new slab without holding a node lock js1304
2016-04-12  4:51   ` js1304
2016-04-12  4:51 ` [PATCH v2 11/11] mm/slab: lockless decision to grow cache js1304
2016-04-12  4:51   ` js1304
2016-04-12  7:24   ` Jesper Dangaard Brouer [this message]
2016-04-12  7:24     ` Jesper Dangaard Brouer
2016-04-12  8:16     ` Joonsoo Kim
2016-04-12  8:16       ` Joonsoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160412092434.0929a04c@redhat.com \
    --to=brouer@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=js1304@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.