From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD1DAC7EE2A for ; Fri, 18 Aug 2023 16:44:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C2D1280056; Fri, 18 Aug 2023 12:44:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 572E1940012; Fri, 18 Aug 2023 12:44:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43ADD280056; Fri, 18 Aug 2023 12:44:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 344D4940012 for ; Fri, 18 Aug 2023 12:44:14 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id ED4A0120A9E for ; Fri, 18 Aug 2023 16:44:13 +0000 (UTC) X-FDA: 81137797986.11.88F8103 Received: from mail-yb1-f175.google.com (mail-yb1-f175.google.com [209.85.219.175]) by imf01.hostedemail.com (Postfix) with ESMTP id 275A440011 for ; Fri, 18 Aug 2023 16:44:11 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=AVR8drzz; spf=pass (imf01.hostedemail.com: domain of surenb@google.com designates 209.85.219.175 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692377052; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9P7447XsqgIF8Zs3AF355EyK2GKzJK0n7pd9FK7O74c=; b=XIjJ5O7WFePEhswj5P9sqtr546iOrfy46Opp6TF5x74k2Zs0sf+qPVAabz1jkau+MdBmjm P0kHpz7pY6Adki0ftFQey8diYkW5GTF0yJBWdYB7z6O0AooyxSYxknkUl/qdlG9plLFWvS O/inQEuH4gaMueDIg4Y1O0hMYwqplUU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692377052; a=rsa-sha256; cv=none; b=kLjguuf07gRtAXSD99RjqY5x5WZIFnuFVmdJUrAc1bwyWRmzlqX+IL8a35t16lyCJQtwvV hWXy9Ck3EWsaj25PmfrGhS4KnMnqRsJF5hCO66O2YlnWBrYlOMoG22vlg9HjtU4aw5dEjo Tcjb8exPQDNkv+wsthjxudUboLMI+xc= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=AVR8drzz; spf=pass (imf01.hostedemail.com: domain of surenb@google.com designates 209.85.219.175 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f175.google.com with SMTP id 3f1490d57ef6-d73c595b558so1067213276.2 for ; Fri, 18 Aug 2023 09:44:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692377051; x=1692981851; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9P7447XsqgIF8Zs3AF355EyK2GKzJK0n7pd9FK7O74c=; b=AVR8drzzV1jSIeS+6WTiIUSSUB8ZYXltISVhJBEn8PrLH0osW2yszwCzu7JBSdDJrY 7bs/r10buSJ5QfBO30XP7gYuern4TaeS/XhNtN/HOHNLuqjwGZ99S3nDStB83ZVFI5qH 6bndknIMHLDom9WPne/d9DviiSasD/JHcmy2ZyaNAeHxKkIhNc2YCo+p08P3y0pQz45y PgnPQgkaaUe22gM6oW6o+2ftvxm8J4/z0nnhcLe0QMRoqvADnRTPZ6vkUdq+qsuQSiqu a7HEmJtox31ZtxdYSulJjNkUqY/MX6tFuRO4pDE0KEAVzTMcUnNj8zqQ4zbYVebw7Fgr jg4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692377051; x=1692981851; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9P7447XsqgIF8Zs3AF355EyK2GKzJK0n7pd9FK7O74c=; b=fAw50Mb7PjPczjjUrDNCSGdvIkOJRXwxtgaq0wSnDkSNqTEmfYmIM0PoVWtsP7i5a3 qcDwO62CT7FK3tRRBs6Er+SS5BopKpge9SyqDS7vxY2CSaUirwBEYfDIptz++PkIuFg4 Xy9pEvS2SSYYZVUrqpTj+ASbQXIqniKJZZ0dyyyMtUCZyiVmGwj0ts1fXTY5yhM/1TxU kM+YC/nP0o4rTcGh5YrAENax4zQDA2i1alH5M5qOjuJNQX2gZHvdG2TOiOSEsgHvodYX gICfhzngKUrjlnr/YfD0KYtrU3GXHIl7y42qj4VdSGoLMmWR6nebZcG6hdcycPlBR4ca U44w== X-Gm-Message-State: AOJu0YxP4BZIiNwBJ1qy5Inz8ZrdTgBaET0ffsXiAn+P/AVUl8UWKdrS sWSFFFXlA3uFNJFmv8RigcgR4iiNS9ZnN8M4NFgpZA== X-Google-Smtp-Source: AGHT+IGacrk6tM8XfvYr6ThN49LbIVHzdTS1nW+KqSygNVELulS+P9RtU5E1u49WaXElBwnyPH/RTDuRygmQ1LIUH5o= X-Received: by 2002:a05:6902:1505:b0:d1f:6886:854a with SMTP id q5-20020a056902150500b00d1f6886854amr4225556ybu.9.1692377051051; Fri, 18 Aug 2023 09:44:11 -0700 (PDT) MIME-Version: 1.0 References: <20230810163627.6206-9-vbabka@suse.cz> In-Reply-To: <20230810163627.6206-9-vbabka@suse.cz> From: Suren Baghdasaryan Date: Fri, 18 Aug 2023 09:44:00 -0700 Message-ID: Subject: Re: [RFC v2 0/7] SLUB percpu array caches and maple tree nodes To: Vlastimil Babka Cc: "Liam R. Howlett" , Matthew Wilcox , Christoph Lameter , David Rientjes , Pekka Enberg , Joonsoo Kim , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Roman Gushchin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 3wc7s4oxxx99io9da4qe733d7pwn4xde X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 275A440011 X-Rspam-User: X-HE-Tag: 1692377051-940971 X-HE-Meta: U2FsdGVkX1+QA1IeWWPJpZ7hI5Uu0s/enRTUpksOO01sUwt46jiQp1mLTKgI+8r/eDH0uvpW976ZeBBnuXIraHsfck5Js9fJAN/gHlXeJ5dLBYPKu4PGiH2nWDcIxFEHr37V60wujPpwd20f5rvwb7ywqw0vV2KOqkCHEslwPSHog4COFcb3p37OcjoTvKiOWIjZF6qPKlliakYbTSE+/HxHict2p+EsEMEVwbBVe/a50YGoHOlI9DBAe28hrCaQoaoUbfgANWLND33qK6rcK9PB8IW5DMfwg4CP52rDF+gO0PQZ4Do60frsKt6KEuLL830/OFSrMzf1mvl+gOV7ArkFaCbiwMH0+4DUOax4MyqYBOyK+1hga7RP6vnXiZOhoKXAdWr7TR2suepj2n//AH3SxFN80ayGydvII3jOllLusQ5mS4KNJFimBwTxDMerftG+irb37CB7PFewXS/r/7ONYvHkfrzptkIU9Ggvg35tbfmD/CAeRJQwexh4H2gY2aYFsCFVJTrx6t912d/WkMYjimVIvJ3djLGtWy7sd2VGPWg+51eveu88OpNx3vULV0HZjuv3PW5y/xgfSdbF/EQaSijtjIDtwnBSzLKmrne/5JiItfSNNggMl1IIGc/3Zfdj1j8QNrtUMHdfRdx4SAokA5zaixnphXuvGS88szGA7gPW06AGCvhpyXhTRyzjCy0y+4z+gPFcqWeqlzoA6nTsAlxMCBQJkzWPTxUCDhb09Ns68ni8oD9ob9pfy8SwCiFen5al8yeG/3rNjSzwmKS3Y7i/IWjqfua3aQ4zf47tV+6pn98xQ1r1EFFZD41+rcvcAer14lCRGMD1z1WV88JFl5ZCzWD7aY0v6wWCtU8WkHTZWtgFLNAEb6wZNJOsroikXqqKDQw/euylXgnu+ayFbGHMIGVYYhijF8fR+4wUIcN+iYoVK559rVlPB0F8/a6ImSNfG/YFqLF0rEB ZfCRFoK2 kZ0woxeDLHt9dcvy+k1oQc3YvtPrvrK780NEhUF7rCcOSc2ogQO2lMHc9tSt89XaFtG5Kd/cpWLobNbjTM7z6w6I6PfaVkdtMkkLU7ix267+q0lzqfcPjDA37u73LGIsrHOtEIboitUkV77ISvZEHvgAblukoVENpoOoY9311SqQr4C0YpR6+XZO9pbHQqrBGEhJvF7PKTMeGVm3vG/4EZdvLg+CaGYeHhum0z1WwLWc7mDX5myxsPurKue68spKcxvdGhf037s5BV0JoeylD5HE2FKktD71dtwL32dnIAOnWx0g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 10, 2023 at 9:36=E2=80=AFAM Vlastimil Babka wr= ote: > > Also in git [1]. Changes since v1 [2]: > > - fix a few bugs > - SLAB marked as BROKEN so bots dont complain about missing functions > - incorporate Liam's patches, which allows getting rid of preallocations > in mas_prealloc() completely. This has reduced the allocation stats > further, with the whole series. > > More notes wrt v1 RFC feedback: > > - locking is still done as in v1, as it allows remote draining, which > should be added before this is suitable for merging > - there's currently no bulk freeing/refill of the percpu array, which > will eventually be added, but I expect most perf gain for the maple > tree use case to come from the avoided preallocations anyway > > ---- > > At LSF/MM I've mentioned that I see several use cases for introducing > opt-in percpu arrays for caching alloc/free objects in SLUB. This is my > first exploration of this idea, speficially for the use case of maple > tree nodes. We have brainstormed this use case on IRC last week with > Liam and Matthew and this how I understood the requirements: > > - percpu arrays will be faster thank bulk alloc/free which needs > relatively long freelists to work well. Especially in the freeing case > we need the nodes to come from the same slab (or small set of those) > > - preallocation for the worst case of needed nodes for a tree operation > that can't reclaim due to locks is wasteful. We could instead expect > that most of the time percpu arrays would satisfy the constained > allocations, and in the rare cases it does not we can dip into > GFP_ATOMIC reserves temporarily. Instead of preallocation just prefill > the arrays. > > - NUMA locality is not a concern as the nodes of a process's VMA tree > end up all over the place anyway. > > So this RFC patchset adds such percpu array in Patch 2. Locking is > stolen from Mel's recent page allocator's pcplists implementation so it > can avoid disabling IRQs and just disable preemption, but the trylocks > can fail in rare situations. > > Then maple tree is modified in patches 3-6 to benefit from this. This is > done in a very crude way as I'm not so familiar with the code. > > I've briefly tested this with virtme VM boot and checking the stats from > CONFIG_SLUB_STATS in sysfs. > > Patch 2: > > slub changes implemented including new counters alloc_cpu_cache > and free_cpu_cache but maple tree doesn't use them yet > > (none):/sys/kernel/slab/maple_node # grep . alloc_cpu_cache alloc_*path f= ree_cpu_cache free_*path | cut -d' ' -f1 > alloc_cpu_cache:0 > alloc_fastpath:54842 > alloc_slowpath:8142 > free_cpu_cache:0 > free_fastpath:32336 > free_slowpath:23484 > > Patch 3: > > maple node cache creates percpu array with 32 entries, > not changed anything else > > -> some allocs/free satisfied by the array > > alloc_cpu_cache:11956 > alloc_fastpath:40675 > alloc_slowpath:7308 > free_cpu_cache:12082 > free_fastpath:23617 > free_slowpath:17956 > > Patch 4: > > maple tree nodes bulk alloc/free converted to loop of normal alloc to use > percpu array more, because bulk alloc bypasses it > > -> majority alloc/free now satisfied by percpu array > > alloc_cpu_cache:54673 > alloc_fastpath:4491 > alloc_slowpath:737 > free_cpu_cache:54759 > free_fastpath:332 > free_slowpath:4723 > > Patch 5+6: > > mas_preallocate() just prefills the percpu array, doesn't preallocate any= thing > mas_store_prealloc() gains a retry loop with mas_nomem(mas, GFP_ATOMIC | = __GFP_NOFAIL) > > -> major drop of alloc/free > (the prefills are included in the accounting) > > alloc_cpu_cache:15036 > alloc_fastpath:4651 > alloc_slowpath:656 > free_cpu_cache:15102 > free_fastpath:299 > free_slowpath:4835 > > It would be interesting to see how it affects the workloads that saw > regressions from the maple tree introduction, as the slab operations > were suspected to be a major factor and now they should be both reduced > and made cheaper. Hi Vlastimil, I backported your patchset to 6.1 and tested it on Android with my mmap stress test (mmap a file-backed page, read-fault, unmap all in a tight loop). The performance of such tests is important for Android because that's what is being done during application launch and app launch time is an important metric for us. I recorded 1.8% performance improvement with this test. Thanks, Suren. > > Liam R. Howlett (2): > maple_tree: Remove MA_STATE_PREALLOC > tools: Add SLUB percpu array functions for testing > > Vlastimil Babka (5): > mm, slub: fix bulk alloc and free stats > mm, slub: add opt-in slub_percpu_array > maple_tree: use slub percpu array > maple_tree: avoid bulk alloc/free to use percpu array more > maple_tree: replace preallocation with slub percpu array prefill > > include/linux/slab.h | 4 + > include/linux/slub_def.h | 10 ++ > lib/maple_tree.c | 60 ++++--- > mm/Kconfig | 1 + > mm/slub.c | 221 +++++++++++++++++++++++- > tools/include/linux/slab.h | 4 + > tools/testing/radix-tree/linux.c | 14 ++ > tools/testing/radix-tree/linux/kernel.h | 1 + > 8 files changed, 286 insertions(+), 29 deletions(-) > > -- > 2.41.0 >