From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A6F79C4167B
	for <linux-mm@archiver.kernel.org>; Wed, 29 Nov 2023 20:16:21 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id F30966B00BC; Wed, 29 Nov 2023 15:16:20 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id EE0626B03BA; Wed, 29 Nov 2023 15:16:20 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id DCEEC6B03CF; Wed, 29 Nov 2023 15:16:20 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id CF6666B00BC
	for <linux-mm@kvack.org>; Wed, 29 Nov 2023 15:16:20 -0500 (EST)
Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id A33E8A06C9
	for <linux-mm@kvack.org>; Wed, 29 Nov 2023 20:16:20 +0000 (UTC)
X-FDA: 81512098920.27.8FA2CF2
Received: from gentwo.org (gentwo.org [62.72.0.81])
	by imf15.hostedemail.com (Postfix) with ESMTP id C28B9A002A
	for <linux-mm@kvack.org>; Wed, 29 Nov 2023 20:16:18 +0000 (UTC)
Authentication-Results: imf15.hostedemail.com;
	dkim=none;
	spf=softfail (imf15.hostedemail.com: 62.72.0.81 is neither permitted nor denied by domain of cl@linux.com) smtp.mailfrom=cl@linux.com;
	dmarc=fail reason="No valid SPF, No valid DKIM" header.from=linux.com (policy=none)
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1701288979;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=qLqu6lgOrkEwJCcYwI1Pa4CapQzRye0TPSsqr60AwGc=;
	b=K/PmB34HAwyApnNTKYYOawwqPKnN2tHFc6FfDWCSJYnNen+qsDOBNHCXVgRW+706W55Ong
	nrHVcJckNAd7InH7aLHDh3N2v6ENkRdrEvUyE4g0jHURv7LZvscSVHepIt2qVo/pZHmee0
	I6X7BXQf6h0Y0Ua3ShB/DwxeZoD9SXI=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701288979; a=rsa-sha256;
	cv=none;
	b=FOpo9VSTEvLn4XEyfX6i+1RRcOzwzQj1E6uGmygPkA0IqncWEZ2vwqTOoYpQJLrBV5JE8m
	fVl/r9dqi16BwuRtXF6EE3CnmfHv9u0h/aW5kTz7JR0psS7E9b7HjxmaPaXLO8rs4YcDsa
	1XqzWPfXfeJRJP2/qFrly8Eo0Op0crM=
ARC-Authentication-Results: i=1;
	imf15.hostedemail.com;
	dkim=none;
	spf=softfail (imf15.hostedemail.com: 62.72.0.81 is neither permitted nor denied by domain of cl@linux.com) smtp.mailfrom=cl@linux.com;
	dmarc=fail reason="No valid SPF, No valid DKIM" header.from=linux.com (policy=none)
Received: by gentwo.org (Postfix, from userid 1003)
	id 5A51448CA3; Wed, 29 Nov 2023 12:16:17 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
	by gentwo.org (Postfix) with ESMTP id 5921C430F2;
	Wed, 29 Nov 2023 12:16:17 -0800 (PST)
Date: Wed, 29 Nov 2023 12:16:17 -0800 (PST)
From: "Christoph Lameter (Ampere)" <cl@linux.com>
To: Vlastimil Babka <vbabka@suse.cz>
cc: Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, 
    Joonsoo Kim <iamjoonsoo.kim@lge.com>, Matthew Wilcox <willy@infradead.org>, 
    "Liam R. Howlett" <Liam.Howlett@oracle.com>, 
    Andrew Morton <akpm@linux-foundation.org>, 
    Roman Gushchin <roman.gushchin@linux.dev>, 
    Hyeonggon Yoo <42.hyeyoo@gmail.com>, 
    Alexander Potapenko <glider@google.com>, Marco Elver <elver@google.com>, 
    Dmitry Vyukov <dvyukov@google.com>, linux-mm@kvack.org, 
    linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, 
    kasan-dev@googlegroups.com
Subject: Re: [PATCH RFC v3 0/9] SLUB percpu array caches and maple tree
 nodes
In-Reply-To: <20231129-slub-percpu-caches-v3-0-6bcf536772bc@suse.cz>
Message-ID: <b51bfc04-d770-3385-736a-01aa733c4622@linux.com>
References: <20231129-slub-percpu-caches-v3-0-6bcf536772bc@suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
X-Stat-Signature: unscr4ubhruw6ksyhtk4uj8jkckkpop6
X-Rspamd-Server: rspam10
X-Rspamd-Queue-Id: C28B9A002A
X-Rspam-User: 
X-HE-Tag: 1701288978-887398
X-HE-Meta: U2FsdGVkX19+oMP4fSSJmauwxO7fFzdf1CSEVStdRrTGldrvXItrxXIAmKeToDPZv8PG4xbDSab7FCOvydf19mbK9bUkIIr133Fz/7ia+g9IpdXgbZp0k5Ta41Lod2JbpPmxEbhFKgW8kfLi/NnyO033qHm9CtWnmoI//YAlnVM6k+CVf6ovFGs9OC15cPsEFT0z54BVGSlPui4SC2cYFICueEFLm0bJ+g+7a5LMPyucff/0oApAYw2tSj2DgWc5qh96DigYRADE5UI/aGODUnMFrvROhlOp6V/RozzOdbg4Iz/zA7XY2AMudheFWUf03ZflKP4bZkPg0bSZ5Zkc7L0RQGZ8aIqa0U6TiO19YGla2eap2adHebOS/1mI5k0bLX+oadwmIG/a7qa2aKdhQ3WUq9VYmvt2N0V8+vHmBiTmtIKzQ5JZyKYdDhl231gISQQnWCsRgMUMrlF/dl427+rVHOIWgJKFlvj3KS6ICZBipeLmLcRymYHxIt9rFPt6sCOdSMcECXXUBzgPbbTKDkaesFSyZ+TDl59CcXMGx7+kQY0HZ8rcd+LBz1ydBUgkRMYsTCh3iSnT7em1X8LUDhitlmMU+1ohvfsYKTnkJagr70DPwnaZ8Dk1hJ9yPc1/FY65No6pf9uaNgCSznYJQCHz86TJI75g2TXTiJ9j0NpmPUv5rzRCyH+7AyU4PFy2hVlw9J4qdy+oYZmNJndkbx9cK02W1pH+kNYAKGnvl5gnl/QdpzXjC7fI/KCIDakNnE2bCnPK0GkhqGZ35IDES7WlevORSKfjMXtLC17JVlGeNLjSqHr6mf1jJcLZ809LCeCp+1Yt6dl7ayhJY++pHqaKaFXSEVcAYk19aNw6KJvmqwSG3dUCgZHPSFPwgMSPJKPuFxHAN5nQMUnEgyBlf+j/wSQIwglIw3tSf+BaiMz4e6WMoOOYpE5nvNRywHm9fbM2xxANehuCUvPRbam
 QMcjHdSZ
 F6QTbpMqepznzISCmt9UTISljtWM97OByR+EAnXpT6bkWeySpt/53RcqJA4gKfgAgA0+JqI0BLqwY4b2iqIgqM3Fdn8H2xL8k3AUljUcAzvluQpOyWoTfXF/fMGM3ATJaLaZpbOTaPC65gFhHaSt0mLdsO5EYFoBitQPlkGDOFSOM2gzqN0uySev+Q8+Qu++oCEl8hyifvbX8Zkdlzb2uCGFx3g==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Wed, 29 Nov 2023, Vlastimil Babka wrote:

> At LSF/MM I've mentioned that I see several use cases for introducing
> opt-in percpu arrays for caching alloc/free objects in SLUB. This is my
> first exploration of this idea, speficially for the use case of maple
> tree nodes. The assumptions are:

Hohumm... So we are not really removing SLAB but merging SLAB features 
into SLUB. In addition to per cpu slabs, we now have per cpu queues.

> - percpu arrays will be faster thank bulk alloc/free which needs
>  relatively long freelists to work well. Especially in the freeing case
>  we need the nodes to come from the same slab (or small set of those)

Percpu arrays require the code to handle individual objects. Handling 
freelists in partial SLABS means that numerous objects can be handled at 
once by handling the pointer to the list of objects.

In order to make the SLUB in page freelists work better you need to have 
larger freelist and that comes with larger page sizes. I.e. boot with
slub_min_order=5 or so to increase performance.

Also this means increasing TLB pressure. The in page freelists of SLUB 
cause objects from the same page be served. The SLAB queueing approach
results in objects being mixed from any address and thus neighboring 
objects may require more TLB entries.

> - preallocation for the worst case of needed nodes for a tree operation
>  that can't reclaim due to locks is wasteful. We could instead expect
>  that most of the time percpu arrays would satisfy the constained
>  allocations, and in the rare cases it does not we can dip into
>  GFP_ATOMIC reserves temporarily. So instead of preallocation just
>  prefill the arrays.

The partial percpu slabs could already do the same.

> - NUMA locality of the nodes is not a concern as the nodes of a
>  process's VMA tree end up all over the place anyway.

NUMA locality is already controlled by the user through the node 
specification for percpu slabs. All objects coming from the same in page 
freelist of SLUB have the same NUMA locality which simplifies things.

If you would consider NUMA locality for the percpu array then you'd be
back to my beloved alien caches. We were not able to avoid that when we 
tuned SLAB for maximum performance.

> Patch 5 adds the per-cpu array caches support. Locking is stolen from
> Mel's recent page allocator's pcplists implementation so it can avoid
> disabling IRQs and just disable preemption, but the trylocks can fail in
> rare situations - in most cases the locks are uncontended so the locking
> should be cheap.

Ok the locking is new but the design follows basic SLAB queue handling.