From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4208EB64DC for ; Fri, 21 Jul 2023 15:40:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C0EA8D0003; Fri, 21 Jul 2023 11:40:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 071278D0001; Fri, 21 Jul 2023 11:40:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E538D8D0003; Fri, 21 Jul 2023 11:40:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D6F248D0001 for ; Fri, 21 Jul 2023 11:40:14 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 86D7040357 for ; Fri, 21 Jul 2023 15:40:14 +0000 (UTC) X-FDA: 81036030348.11.BD77DBE Received: from mail-vk1-f169.google.com (mail-vk1-f169.google.com [209.85.221.169]) by imf15.hostedemail.com (Postfix) with ESMTP id 620B9A0012 for ; Fri, 21 Jul 2023 15:40:12 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=AulURjwY; spf=pass (imf15.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.221.169 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689954012; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=evK5HJGL5ye7tl1WXN57VN9jCRI5g3+DS9iwhFI+Agc=; b=rzEt/t3IqSvl+RZ9nPW4gZZroHzidnzfq+PvzcfgprljDeQQ6Sgf1UCTUtetlemSuItzSE 8ohTwZlU51IVPh0WcXoxBVt9wpab62J2xTteo32J50qm5/3PDBhNP9UPBuHeu4P2JvmcbG fogcO51JM3SZ9HhHHLPEsIosVHvZe6A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689954012; a=rsa-sha256; cv=none; b=cpaatVaS2WNh+OuknR7w4iMITVnpv0uFTgHzqhScL/eiG6vTK9uqzOqn9N+pKa/GhQtVsS ZCKk3qzHAfDETS8LmDXsqciW4h/IQCWRUl3lCAIBr2Z/wRlOgcFMljkgypFUpJu+lEmnlR I83O4Kfg3GGuAaAgrl4dprZY8wl/YOM= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=AulURjwY; spf=pass (imf15.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.221.169 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vk1-f169.google.com with SMTP id 71dfb90a1353d-47ec8c9d7a0so872122e0c.3 for ; Fri, 21 Jul 2023 08:40:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689954011; x=1690558811; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=evK5HJGL5ye7tl1WXN57VN9jCRI5g3+DS9iwhFI+Agc=; b=AulURjwYSqQhxEezRu6dx9SiQY76giXfT2ze64rEI5lLM9b+eIBLgAXhwCsM/3MXJE YjjT5Avj/uyDPGnNCwVmRecLqr0EAn8x8xX6CHh2oegoCsknEe4jo6GLrGJSmT+m43Hc jakzyTBJUj46tj+t6zEYySKkrsw0a5nVQrHtkUVDwZeAjk00H7APZ/RYLxRFMLO56uxP l7E1ZI6DRzZNcNw8NNxzz6p0+QloyTpVUlTbZzmut2P8W6fbiy4a5II6o0j7rriB0WK1 ZdEjGbBmQf9xCvbRdjtRB/HxZUQKAzJR4brYdc5tOoNZhvYjZolTgB0bFbImIiLFwBXF 3T+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689954011; x=1690558811; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=evK5HJGL5ye7tl1WXN57VN9jCRI5g3+DS9iwhFI+Agc=; b=WcxzcmhgpEkCHNrVIPsXm7MVZ7H33B341vK6Q7fCa181CPDUD3ftTvH1pfNOpm7FGu Eh6y0FLRAIi1Tm0osInJFai6zaEZLUxquv8oXbid7ofSN7oTwGAIbWv8sP6ZETr0G3kS rN3zEBH+iY5wBFqdARjj92XL90S8xZ1W1aSytUtB2DrxNcNbXLkeKrvjicqkkLjDTf2e ebNVcZ9uJnWQC5FnykJtI6qM4EjQI7GGTEZ0zM3RobAZBmCh/5aWzAUR19twRhrWFSMz cP7+TxumKdTAmeJexklRm9rm5fd9uQCuVSw7tFeRT5fRcVuSu6la8GQ/RX44sgd1A7LD /CWw== X-Gm-Message-State: ABy/qLbKM4AWBQ9SGfIcd7ECkZrbyoYOUG00ImvL+S/SD80yQ2HrNcM/ mHoBe1yWBCr8ilwwf3ofu6IN5cCi378w9STDOvI= X-Google-Smtp-Source: APBJJlGsSPtVHBGlHRoGitshCh6/hsQGXPSSlGShzQYPm9/kczHhLpd7snDwj/KHgMtgJ1jAEs6LgQ9ETxEMhJt9kWk= X-Received: by 2002:a1f:3d10:0:b0:47e:77f6:3a0f with SMTP id k16-20020a1f3d10000000b0047e77f63a0fmr1503581vka.13.1689954009730; Fri, 21 Jul 2023 08:40:09 -0700 (PDT) MIME-Version: 1.0 References: <20230628095740.589893-1-jaypatel@linux.ibm.com> <202307172140.3b34825a-oliver.sang@intel.com> In-Reply-To: From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Sat, 22 Jul 2023 00:39:58 +0900 Message-ID: Subject: Re: [PATCH] [RFC PATCH v2]mm/slub: Optimize slub memory usage To: Binder Makin Cc: Feng Tang , "Sang, Oliver" , Jay Patel , "oe-lkp@lists.linux.dev" , lkp , "linux-mm@kvack.org" , "Huang, Ying" , "Yin, Fengwei" , "cl@linux.com" , "penberg@kernel.org" , "rientjes@google.com" , "iamjoonsoo.kim@lge.com" , "akpm@linux-foundation.org" , "vbabka@suse.cz" , "aneesh.kumar@linux.ibm.com" , "tsahu@linux.ibm.com" , "piyushs@linux.ibm.com" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 620B9A0012 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: ih9o6sae7fq8doasa6sojobbkuh9gegc X-HE-Tag: 1689954012-524680 X-HE-Meta: U2FsdGVkX18XB1Tel9Fp+/LG6vIEQFhyyUBX11Wu14ynb9YchHsZ6mP+L1ERHlSqVwuX/+Uy/4LSbBv2XsbgcDQxux4WEe6IvvZKshEC5sLZBGdkRxWjZvt2a+YdNHZkGBJ2r/9E40Pf4kp4w59q00CjoyPAc3Te5S7xGKxMaG4Ebz2O1qS5axMdaJyMIpCRiFCwEw7jcLybtUsYOxWIgKKoFuUr8xbT7f5Qetb/eXlh47gZTJBn+6nhI777SzMzXHX1f/7hpgEkiLO+hqtLePb+j9KVvLJcOCNRf4bxIvT9Wj07M/UzigDdhzz6S0tmV7q3AoWlX5Tg5O47jvw1qtl7upWOILLoKETgWu4ZZ06Jte5pimkrRV/wlQs73RLbp+AjTftsm/LWXYaEMDfjL+Lt+TYmd0Jtc5BxcFTJn0YFhbrU/Rj72YRuX6zhqneYTKaBPouwjdy4XJgdbI8BW/UbUeiz39JlPxY20u6/y5RbA5iZ6BKqHPZEH7rfD+lzq+hB6dD/x79NfWF13sB4X0C/VUlcN5VQE35RjMRAqb+AE+1KNwNIzlH7Lox8b/rFcUObfDjUpLWG/qPqA9A6BchNggrNBsOEkQhsC/VUWDuoLouXbWtEgL2ZK0GfTYR3yfDe78cb/l83xIzWXZhFMj2jNSMz3nqudZqI27yjXL8Zzv+5mkJGWo2mxHj3RqLYswrOUauCcB8lHhb6zhI8dimdEHUC/LxaL9rljtM9/gQMl+RP1mVwLaw+hXKSZrFMnXI4kuIutAQWBO7CT38Nq0EVOQJU87fGFtqGfwyMhTo3HqU+atERrKa1Apd9gTrcddHFMEo2YaBfenAb52rYAPejv0xkH7eXogmaxn3qODeJQVfjeDV76hGU+JiXlZR9lWgIZhQFZDlY1w9kU2gBxJntV8h3j2zgIcIwrM48VT4d7GPZJ+x+1yyTOvLOLoef2MaFxxbzoGR3WRrgxtf BA/+PJLm CX9sen9btnN5Wivno7a3jXZOmcK8rGjhxaLt01nEEPNh1XG1swqd3FA9VG6JMP2MXYJdHLZImI/WirmNakPRL9TgsIvPao8sZLXuEtxsNYK2Ma//id3yBMlAdsDjSg+ru/B0MNaa3AKKsy+6K8trX10upEPTW4tNBMSVplKlnPvcqA8hM2RpVwcce3l1tTqOCU9NnK/xGtkQndjEb5UMpxMDJ5A6mttZXMc04xhTPE18Fcfbo3xbCUwSnYm+EHh5biIy56+9enxxtXVJR5LPgPQlZzcXY7XyT3q/5yn/gvaQaFV5SsLep/cVCoS3AyvAPU0WG3OitZH01vWmT99U+lYZWfN4nfFzK8pG3dGJtVuwUtcHyKQ3JMqkFPL2x+qsYGgbI6Gdcjfcc4yKwF6pqC37Np3xFgJ1L6p/s7DQP8zg28YRAOsQp0m5b+DZ3HBKLzafX0Rp0pdqdAq85LxRuETEOXDev5jfj+/kXU5A+vg6tbLt5DRiC5sXxxLtntKmQEjGhtzFZsLwDV58uyFcJnDA960kqjNgJMZefWJg9zKD63z8fg4vIeat3KpsYLCU8TR+6PTG2GCguaxsKPRLpqmoaPIJcew9ws8W7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 21, 2023 at 11:50=E2=80=AFPM Binder Makin = wrote: > > Quick run with hackbench and unixbench on large intel, amd, and arm machi= nes > Patch was applied to 6.1.38 > > hackbench > Intel performance -2.9% - +1.57% SReclaim -3.2% SUnreclaim -2.4% > Amd performance -28% - +7.58% SReclaim +21.31 SUnreclaim +20.72 > ARM performance -0.6 - +1.6% SReclaim +24% SUnreclaim +70% > > unixbench > Intel performance -1.4 - +1.59% SReclaimm -1.65% SUnreclaim -1.59% > Amd performance -1.9% - +1.05% SReclaim -3.1% SUnreclaimm -0.81% > ARM performance -0.09% - +0.54% SReclaimm -1.05% SUnreclaim -2.03% > > AMD Hackbench > 28% drop on hackbench_thread_pipes_234 Hi Binder, Thank you for measuring!! Can you please provide more information? Baseline is 6.1.38, and the other is the one, or two patches applied on baseline? (optimizing slub memory usage v2, and not allocating high order slabs from remote nodes) The 28% drop in AMD is quite huge, and the overall memory usage increased a= lot. Does the AMD machine have 2 sockets? Did remote node allocations increase or decrease? `numastat` Can you get some profiles indicating increased list_lock contention? (or change in values provided by `slabinfo skbuff_head_cache` when with CONFIG_SLUB_STATS built?) > On Thu, Jul 20, 2023 at 11:08=E2=80=AFAM Hyeonggon Yoo <42.hyeyoo@gmail.c= om> wrote: > > > > On Thu, Jul 20, 2023 at 11:16=E2=80=AFPM Feng Tang wrote: > > > > > > Hi Hyeonggon, > > > > > > On Thu, Jul 20, 2023 at 08:59:56PM +0800, Hyeonggon Yoo wrote: > > > > On Thu, Jul 20, 2023 at 12:01=E2=80=AFPM Oliver Sang wrote: > > > > > > > > > > hi, Hyeonggon Yoo, > > > > > > > > > > On Tue, Jul 18, 2023 at 03:43:16PM +0900, Hyeonggon Yoo wrote: > > > > > > On Mon, Jul 17, 2023 at 10:41=E2=80=AFPM kernel test robot > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > kernel test robot noticed a -12.5% regression of hackbench.th= roughput on: > > > > > > > > > > > > > > > > > > > > > commit: a0fd217e6d6fbd23e91f8796787b621e7d576088 ("[PATCH] [R= FC PATCH v2]mm/slub: Optimize slub memory usage") > > > > > > > url: https://github.com/intel-lab-lkp/linux/commits/Jay-Patel= /mm-slub-Optimize-slub-memory-usage/20230628-180050 > > > > > > > base: git://git.kernel.org/cgit/linux/kernel/git/vbabka/slab.= git for-next > > > > > > > patch link: https://lore.kernel.org/all/20230628095740.589893= -1-jaypatel@linux.ibm.com/ > > > > > > > patch subject: [PATCH] [RFC PATCH v2]mm/slub: Optimize slub m= emory usage > > > > > > > > > > > > > > testcase: hackbench > > > > > > > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 633= 8 CPU @ 2.00GHz (Ice Lake) with 256G memory > > > > > > > parameters: > > > > > > > > > > > > > > nr_threads: 100% > > > > > > > iterations: 4 > > > > > > > mode: process > > > > > > > ipc: socket > > > > > > > cpufreq_governor: performance > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not jus= t a new version of > > > > > > > the same patch/commit), kindly add following tags > > > > > > > | Reported-by: kernel test robot > > > > > > > | Closes: https://lore.kernel.org/oe-lkp/202307172140.3b34825= a-oliver.sang@intel.com > > > > > > > > > > > > > > > > > > > > > Details are as below: > > > > > > > -------------------------------------------------------------= -------------------------------------> > > > > > > > > > > > > > > > > > > > > > To reproduce: > > > > > > > > > > > > > > git clone https://github.com/intel/lkp-tests.git > > > > > > > cd lkp-tests > > > > > > > sudo bin/lkp install job.yaml # job file is= attached in this email > > > > > > > bin/lkp split-job --compatible job.yaml # generate th= e yaml file for lkp run > > > > > > > sudo bin/lkp run generated-yaml-file > > > > > > > > > > > > > > # if come across any failure that blocks the test, > > > > > > > # please remove ~/.lkp and /lkp dir to run from a cle= an state. > > > > > > > > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_thre= ads/rootfs/tbox_group/testcase: > > > > > > > gcc-12/performance/socket/4/x86_64-rhel-8.3/process/100%/de= bian-11.1-x86_64-20220510.cgz/lkp-icl-2sp2/hackbench > > > > > > > > > > > > > > commit: > > > > > > > 7bc162d5cc ("Merge branches 'slab/for-6.5/prandom', 'slab/f= or-6.5/slab_no_merge' and 'slab/for-6.5/slab-deprecate' into slab/for-next"= ) > > > > > > > a0fd217e6d ("mm/slub: Optimize slub memory usage") > > > > > > > > > > > > > > 7bc162d5cc4de5c3 a0fd217e6d6fbd23e91f8796787 > > > > > > > ---------------- --------------------------- > > > > > > > %stddev %change %stddev > > > > > > > \ | \ > > > > > > > 222503 =C4=85 86% +108.7% 464342 =C4=85 58% numa-= meminfo.node1.Active > > > > > > > 222459 =C4=85 86% +108.7% 464294 =C4=85 58% numa-= meminfo.node1.Active(anon) > > > > > > > 55573 =C4=85 85% +108.0% 115619 =C4=85 58% numa-= vmstat.node1.nr_active_anon > > > > > > > 55573 =C4=85 85% +108.0% 115618 =C4=85 58% numa-= vmstat.node1.nr_zone_active_anon > > > > > > > > > > > > I'm quite baffled while reading this. > > > > > > How did changing slab order calculation double the number of ac= tive anon pages? > > > > > > I doubt two experiments were performed on the same settings. > > > > > > > > > > let me introduce our test process. > > > > > > > > > > we make sure the tests upon commit and its parent have exact same= environment > > > > > except the kernel difference, and we also make sure the config to= build the > > > > > commit and its parent are identical. > > > > > > > > > > we run tests for one commit at least 6 times to make sure the dat= a is stable. > > > > > > > > > > such like for this case, we rebuild the commit and its parent's k= ernel, the > > > > > config is attached FYI. > > > > > > > > Hello Oliver, > > > > > > > > Thank you for confirming the testing environment is totally fine. > > > > and I'm sorry. I didn't mean to offend that your tests were bad. > > > > > > > > It was more like "oh, the data totally doesn't make sense to me" > > > > and I blamed the tests rather than my poor understanding of the dat= a ;) > > > > > > > > Anyway, > > > > as the data shows a repeatable regression, > > > > let's think more about the possible scenario: > > > > > > > > I can't stop thinking that the patch must've affected the system's > > > > reclamation behavior in some way. > > > > (I think more active anon pages with a similar number total of anon > > > > pages implies the kernel scanned more pages) > > > > > > > > It might be because kswapd was more frequently woken up (possible i= f > > > > skbs were allocated with GFP_ATOMIC) > > > > But the data provided is not enough to support this argument. > > > > > > > > > 2.43 =C2=B1 7% +4.5 6.90 =C2=B1 11% perf-profile.children.cycles= -pp.get_partial_node > > > > > 3.23 =C2=B1 5% +4.5 7.77 =C2=B1 9% perf-profile.c= hildren.cycles-pp.___slab_alloc > > > > > 7.51 =C2=B1 2% +4.6 12.11 =C2=B1 5% perf-profile.c= hildren.cycles-pp.kmalloc_reserve > > > > > 6.94 =C2=B1 2% +4.7 11.62 =C2=B1 6% perf-profile.ch= ildren.cycles-pp.__kmalloc_node_track_caller > > > > > 6.46 =C2=B1 2% +4.8 11.22 =C2=B1 6% perf-profile.ch= ildren.cycles-pp.__kmem_cache_alloc_node > > > > > 8.48 =C2=B1 4% +7.9 16.42 =C2=B1 8% perf-profile.c= hildren.cycles-pp._raw_spin_lock_irqsave > > > > > 6.12 =C2=B1 6% +8.6 14.74 =C2=B1 9% perf-profile.c= hildren.cycles-pp.native_queued_spin_lock_slowpath > > > > > > > > And this increased cycles in the SLUB slowpath implies that the act= ual > > > > number of objects available in > > > > the per cpu partial list has been decreased, possibly because of > > > > inaccuracy in the heuristic? > > > > (cuz the assumption that slabs cached per are half-filled, and that > > > > slabs' order is s->oo) > > > > > > From the patch: > > > > > > static unsigned int slub_max_order =3D > > > - IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : PAGE_ALLOC_COSTLY_ORDER; > > > + IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : 2; > > > > > > Could this be related? that it reduces the order for some slab cache, > > > so each per-cpu slab will has less objects, which makes the contentio= n > > > for per-node spinlock 'list_lock' more severe when the slab allocatio= n > > > is under pressure from many concurrent threads. > > > > hackbench uses skbuff_head_cache intensively. So we need to check if > > skbuff_head_cache's > > order was increased or decreased. On my desktop skbuff_head_cache's > > order is 1 and I roughly > > guessed it was increased, (but it's still worth checking in the testing= env) > > > > But decreased slab order does not necessarily mean decreased number > > of cached objects per CPU, because when oo_order(s->oo) is smaller, > > then it caches > > more slabs into the per cpu slab list. > > > > I think more problematic situation is when oo_order(s->oo) is higher, > > because the heuristic > > in SLUB assumes that each slab has order of oo_order(s->oo) and it's > > half-filled. if it allocates > > slabs with order lower than oo_order(s->oo), the number of cached > > objects per CPU > > decreases drastically due to the inaccurate assumption. > > > > So yeah, decreased number of cached objects per CPU could be the cause > > of the regression due to the heuristic. > > > > And I have another theory: it allocated high order slabs from remote no= de > > even if there are slabs with lower order in the local node. > > > > ofc we need further experiment, but I think both improving the > > accuracy of heuristic and > > avoiding allocating high order slabs from remote nodes would make SLUB > > more robust. > > > > > I don't have direct data to backup it, and I can try some experiment. > > > > Thank you for taking time for experiment! > > > > Thanks, > > Hyeonggon > > > > > > > then retest on this test machine: > > > > > 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (I= ce Lake) with 256G memory > >