Re: [PATCH 0/4] Introduce QPW for per-cpu operations

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Vlastimil Babka <vbabka@suse.com>
Cc: Michal Hocko <mhocko@suse.com>,
	Leonardo Bras <leobras.c@gmail.com>,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Frederic Weisbecker <fweisbecker@suse.de>
Subject: Re: [PATCH 0/4] Introduce QPW for per-cpu operations
Date: Thu, 26 Feb 2026 15:24:29 -0300	[thread overview]
Message-ID: <aaCP3V64INRZiZUH@tpad> (raw)
In-Reply-To: <1fd2efef-888b-4d3c-9c72-bdb2d594336f@suse.com>

On Mon, Feb 23, 2026 at 07:09:47PM +0100, Vlastimil Babka wrote:
> On 2/20/26 17:55, Marcelo Tosatti wrote:
> > 
> > #include <linux/module.h>
> > #include <linux/kernel.h>
> > #include <linux/slab.h>
> > #include <linux/timex.h>
> > #include <linux/preempt.h>
> > #include <linux/irqflags.h>
> > #include <linux/vmalloc.h>
> > 
> > MODULE_LICENSE("GPL");
> > MODULE_AUTHOR("Gemini AI");
> > MODULE_DESCRIPTION("A simple kmalloc performance benchmark");
> > 
> > static int size = 64; // Default allocation size in bytes
> > module_param(size, int, 0644);
> > 
> > static int iterations = 1000000; // Default number of iterations
> > module_param(iterations, int, 0644);
> > 
> > static int __init kmalloc_bench_init(void) {
> >     void **ptrs;
> >     cycles_t start, end;
> >     uint64_t total_cycles;
> >     int i;
> >     pr_info("kmalloc_bench: Starting test (size=%d, iterations=%d)\n", size, iterations);
> > 
> >     // Allocate an array to store pointers to avoid immediate kfree-reuse optimization
> >     ptrs = vmalloc(sizeof(void *) * iterations);
> >     if (!ptrs) {
> >         pr_err("kmalloc_bench: Failed to allocate pointer array\n");
> >         return -ENOMEM;
> >     }
> > 
> >     preempt_disable();
> >     start = get_cycles();
> > 
> >     for (i = 0; i < iterations; i++) {
> >         ptrs[i] = kmalloc(size, GFP_ATOMIC);
> >     }
> > 
> >     end = get_cycles();
> > 
> >     total_cycles = end - start;
> >     preempt_enable();
> 
> While preempt_disable() simplifies things, it can misrepresent the cost of
> preempt_disable() that's part of the locking - that will become nested and
> then the nested preempt_disable() is typically cheaper, etc.
> 
> Also the way it kmallocs all iterations and then kfree all iterations may
> skew the probabilities of fastpaths, cache hotness etc.
> 
> When introducing sheaves I had a similar microbenchmark, but there was
> different amounts of inner-loop iteraions, no outer preempt_disable(), and
> linear vs randomized array. See:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/commit/?h=slub-percpu-sheaves-v6-benchmarking&id=04028eeffba18a4f821a7194bc9d14f7488bd7d9
> 
> (at this point the SLUB_HAS_SHEAVES parts should be removed and the
> kmem_cache_print_stats() stuff also shouldn't be interesting for QPW
> evaluation).

Hi Vlastimil,

There is a problem which the numbers vary significantly across runs
(on the same kernel, system is idle, cpu is isolated).

SLUB_HAS_SHEAVES is not defined on my build. Just copied slub_kunit.c
from slub-percpu-sheaves-v6-benchmarking
to current tip (and dropped call to kmem_cache_print_stats).

1st run:
[  635.059928] average (excl. iter 0): 56571797
[  635.235206] average (excl. iter 0): 58329901
[  635.409957] average (excl. iter 0): 57459678
[  635.585128] average (excl. iter 0): 58268333
[  635.767325] average (excl. iter 0): 60063837
[  635.944534] average (excl. iter 0): 58912817
[  636.154503] average (excl. iter 0): 68992131
[  636.362533] average (excl. iter 0): 69030629
[  636.536737] average (excl. iter 0): 56545622
[  636.704314] average (excl. iter 0): 55536407
[  636.879097] average (excl. iter 0): 57397803
[  637.051157] average (excl. iter 0): 57021907
[  637.296352] average (excl. iter 0): 81582815
[  637.539810] average (excl. iter 0): 81126686

2nd run:
[  662.824688] average (excl. iter 0): 56833529
[  662.996742] average (excl. iter 0): 57145388
[  663.167063] average (excl. iter 0): 55828870
[  663.339814] average (excl. iter 0): 57505312
[  663.514563] average (excl. iter 0): 57374528
[  663.690328] average (excl. iter 0): 57282062
[  663.896128] average (excl. iter 0): 68097440
[  664.103029] average (excl. iter 0): 69263914
[  664.276497] average (excl. iter 0): 57073271
[  664.442210] average (excl. iter 0): 54895879
[  664.617186] average (excl. iter 0): 56972700
[  664.787353] average (excl. iter 0): 56457173
[  665.028944] average (excl. iter 0): 80339269
[  665.268597] average (excl. iter 0): 80371907

3rd run:
[  716.278750] average (excl. iter 0): 54191777
[  716.442014] average (excl. iter 0): 54151132
[  716.605254] average (excl. iter 0): 53148722
[  716.766461] average (excl. iter 0): 53204894
[  716.933339] average (excl. iter 0): 54719251
[  717.098761] average (excl. iter 0): 54922923
[  717.296178] average (excl. iter 0): 65351864
[  717.491440] average (excl. iter 0): 65264027
[  717.660778] average (excl. iter 0): 54370768
[  717.823625] average (excl. iter 0): 54137410
[  717.988983] average (excl. iter 0): 54222488
[  718.152716] average (excl. iter 0): 54339019
[  718.387978] average (excl. iter 0): 78249026
[  718.619598] average (excl. iter 0): 77746198

Increasing total parameter from 10^6 to 10^7 does
not help:

1st run:
[ 1074.601686] average (excl. iter 0): 650711901
[ 1076.450880] average (excl. iter 0): 633014260
[ 1078.363300] average (excl. iter 0): 660440649
[ 1080.266134] average (excl. iter 0): 652695083
[ 1082.117007] average (excl. iter 0): 635632144
[ 1084.009277] average (excl. iter 0): 654270513
[ 1086.286343] average (excl. iter 0): 790520038
[ 1088.512516] average (excl. iter 0): 768071705
[ 1090.448161] average (excl. iter 0): 664564330 
[ 1092.349683] average (excl. iter 0): 659016349 
[ 1094.274099] average (excl. iter 0): 662388982
[ 1096.172362] average (excl. iter 0): 647972747
[ 1098.753304] average (excl. iter 0): 887576313
[ 1101.339897] average (excl. iter 0): 885102019

2nd run:
[ 1120.186284] average (excl. iter 0): 615756734
[ 1122.019323] average (excl. iter 0): 623846524
[ 1123.885801] average (excl. iter 0): 639124895
[ 1125.693617] average (excl. iter 0): 623667563  
[ 1127.588515] average (excl. iter 0): 646441510  
[ 1129.410285] average (excl. iter 0): 628291996
[ 1131.542157] average (excl. iter 0): 728497604
[ 1133.698744] average (excl. iter 0): 743717953
[ 1135.514112] average (excl. iter 0): 616621660
[ 1137.306874] average (excl. iter 0): 615863807
[ 1139.110637] average (excl. iter 0): 616425899
[ 1140.948769] average (excl. iter 0): 638115570
[ 1143.426557] average (excl. iter 0): 847799304
[ 1145.914827] average (excl. iter 0): 861180802

Will switch back to the simple test (and its pretty obvious 
from the patch itself that if qpw=0 the overhead should 
be zero, and it is). Its numbers are more
stable across runs.

next prev parent reply	other threads:[~2026-03-02 15:53 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 14:34 [PATCH 0/4] Introduce QPW for per-cpu operations Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 1/4] Introducing qpw_lock() and per-cpu queue & flush work Marcelo Tosatti
2026-02-06 15:20   ` Marcelo Tosatti
2026-02-07  0:16   ` Leonardo Bras
2026-02-11 12:09     ` Marcelo Tosatti
2026-02-14 21:32       ` Leonardo Bras
2026-02-06 14:34 ` [PATCH 2/4] mm/swap: move bh draining into a separate workqueue Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 3/4] swap: apply new queue_percpu_work_on() interface Marcelo Tosatti
2026-02-07  1:06   ` Leonardo Bras
2026-02-26 15:49     ` Marcelo Tosatti
2026-03-08 17:35       ` Leonardo Bras
2026-02-06 14:34 ` [PATCH 4/4] slub: " Marcelo Tosatti
2026-02-07  1:27   ` Leonardo Bras
2026-02-06 23:56 ` [PATCH 0/4] Introduce QPW for per-cpu operations Leonardo Bras
2026-02-10 14:01 ` Michal Hocko
2026-02-11 12:01   ` Marcelo Tosatti
2026-02-11 12:11     ` Marcelo Tosatti
2026-02-14 21:35       ` Leonardo Bras
2026-02-11 16:38     ` Michal Hocko
2026-02-11 16:50       ` Marcelo Tosatti
2026-02-11 16:59         ` Vlastimil Babka
2026-02-11 17:07         ` Michal Hocko
2026-02-14 22:02       ` Leonardo Bras
2026-02-16 11:00         ` Michal Hocko
2026-02-19 15:27           ` Marcelo Tosatti
2026-02-19 19:30             ` Michal Hocko
2026-02-20 14:30               ` Marcelo Tosatti
2026-02-23  9:18                 ` Michal Hocko
2026-03-03 10:55                   ` Frederic Weisbecker
2026-02-23 21:56               ` Frederic Weisbecker
2026-02-24 17:23                 ` Marcelo Tosatti
2026-02-25 21:49                   ` Frederic Weisbecker
2026-02-26  7:06                     ` Michal Hocko
2026-02-26 11:41                     ` Marcelo Tosatti
2026-03-03 11:08                       ` Frederic Weisbecker
2026-02-20 10:48             ` Vlastimil Babka
2026-02-20 12:31               ` Michal Hocko
2026-02-20 17:35               ` Marcelo Tosatti
2026-02-20 17:58                 ` Vlastimil Babka
2026-02-20 19:01                   ` Marcelo Tosatti
2026-02-23  9:11                     ` Michal Hocko
2026-02-23 11:20                       ` Marcelo Tosatti
2026-02-24 14:40                 ` Frederic Weisbecker
2026-02-24 18:12                   ` Marcelo Tosatti
2026-02-20 16:51           ` Marcelo Tosatti
2026-02-20 16:55             ` Marcelo Tosatti
2026-02-20 22:38               ` Leonardo Bras
2026-02-23 18:09               ` Vlastimil Babka
2026-02-26 18:24                 ` Marcelo Tosatti [this message]
2026-02-20 21:58           ` Leonardo Bras
2026-02-23  9:06             ` Michal Hocko
2026-02-28  1:23               ` Leonardo Bras
2026-03-03  0:19                 ` Marcelo Tosatti
2026-03-08 17:41                   ` Leonardo Bras
2026-03-09  9:52                     ` Vlastimil Babka (SUSE)
2026-03-11  0:01                       ` Leonardo Bras
2026-03-10 21:24                     ` Marcelo Tosatti
2026-03-11  0:03                       ` Leonardo Bras
2026-03-11 10:23                         ` Marcelo Tosatti
2026-02-19 13:15       ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaCP3V64INRZiZUH@tpad \
    --to=mtosatti@redhat.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=boqun.feng@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=fweisbecker@suse.de \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=leobras.c@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.