From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-oi1-f175.google.com (mail-oi1-f175.google.com [209.85.167.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C7F3010940 for ; Thu, 11 Jan 2024 09:25:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="Ly0FBnKb" Received: by mail-oi1-f175.google.com with SMTP id 5614622812f47-3bd2a584fbcso2655739b6e.1 for ; Thu, 11 Jan 2024 01:25:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1704965110; x=1705569910; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=oPRsV3mpGhgl4tIUNnxd2xjVM8/07Y43tjO1bvwG0Q8=; b=Ly0FBnKbsSQSQiG2JMCeWs4K3EyGkdwPEaBfNaOnxmgTX/11qpwDbf2c8WRn4u/WYL AVgS0N4ipmlRI2+XE0gw8IkW7ChI8Guhke/r3pRAwFWB8+nyVg1nE8P8ccqquoK7/Gm1 vQxGS4tgtgDXcU7YPPNfenJLFDqvsMzDUzY+t25gr4CZCQuhPrs/OWlZKvUQfSxyNKq8 PgKME1RPv7ufMJ6aEpTWNsAHjhOk5QT8CWsI8CkROYd59gP0+MSRQjGnfRaShzZfNepe uiSiTj8dKypxT0sfKmT8GbiT74GU5rFLR9NM0RKhkBFK8CSrDKW+8Ez1wwfSd0Pb75QW +0eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704965110; x=1705569910; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=oPRsV3mpGhgl4tIUNnxd2xjVM8/07Y43tjO1bvwG0Q8=; b=QeCmu1r1O3+zKEL+rVtB9i7cUze4UZUoCs1Z6m9S3dLBAu/FR5S4lHpHi+3yEdcco9 dHRMMcUb5/3QQqWgKjdzMfYobSlSBLr8TvNGlIjmUEdSMQDZgkzmkuy7mD6TmTPVT8lp nYTYLNQYbv61qXAIHEwCq9ew5qbB1or/7029t7W81jZUxRHIygracfM3tIDpLJORGujZ btqf53e4/NAgzKqxePXgvNGgHXlfFqbvBzGq6nyd1Sd478/9uJMUGpBsYxjHa0Bm+IkO Z3hKW1Uyd4BCP1SlWG8b1XACNvcpI41HG4XFxoL87ab4C0FG/eM5uUY5MQX8q7PNsfHr ZfxA== X-Gm-Message-State: AOJu0YwON82xGSQh4i68Sm57qbO9ZkihjmGxMQoohAyw+008PkoX6/ue JiZYDmTv7RoVxoiFwTQvmJHNxZfZ06i3Dg== X-Google-Smtp-Source: AGHT+IEZJkHdKjVYcCRHPRa0m7EgOIyHW6Fngfug8G8Oi8WGZmvzxa4qkvyzCbfiIZPptwktvdeUhw== X-Received: by 2002:a05:6808:1782:b0:3bd:3db7:402d with SMTP id bg2-20020a056808178200b003bd3db7402dmr1032846oib.89.1704965109927; Thu, 11 Jan 2024 01:25:09 -0800 (PST) Received: from dread.disaster.area (pa49-180-249-6.pa.nsw.optusnet.com.au. [49.180.249.6]) by smtp.gmail.com with ESMTPSA id fa2-20020a056a002d0200b006d9af8c25easm708198pfb.84.2024.01.11.01.25.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Jan 2024 01:25:09 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1rNrJD-008tBh-02; Thu, 11 Jan 2024 20:25:07 +1100 Date: Thu, 11 Jan 2024 20:25:06 +1100 From: Dave Chinner To: "Uladzislau Rezki (Sony)" Cc: linux-mm@kvack.org, Andrew Morton , LKML , Baoquan He , Lorenzo Stoakes , Christoph Hellwig , Matthew Wilcox , "Liam R . Howlett" , "Paul E . McKenney" , Joel Fernandes , Oleksiy Avramchenko Subject: Re: [PATCH v3 10/11] mm: vmalloc: Set nr_nodes based on CPUs in a system Message-ID: References: <20240102184633.748113-1-urezki@gmail.com> <20240102184633.748113-11-urezki@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240102184633.748113-11-urezki@gmail.com> On Tue, Jan 02, 2024 at 07:46:32PM +0100, Uladzislau Rezki (Sony) wrote: > A number of nodes which are used in the alloc/free paths is > set based on num_possible_cpus() in a system. Please note a > high limit threshold though is fixed and corresponds to 128 > nodes. Large CPU count machines are NUMA machines. ALl of the allocation and reclaim is NUMA node based i.e. a pgdat per NUMA node. Shrinkers are also able to be run in a NUMA aware mode so that per-node structures can be reclaimed similar to how per-node LRU lists are scanned for reclaim. Hence I'm left to wonder if it would be better to have a vmalloc area per pgdat (or sub-node cluster) rather than just base the number on CPU count and then have an arbitrary maximum number when we get to 128 CPU cores. We can have 128 CPU cores in a single socket these days, so not being able to scale the vmalloc areas beyond a single socket seems like a bit of a limitation. Scaling out the vmalloc areas in a NUMA aware fashion allows the shrinker to be run in numa aware mode, which gets rid of the need for the global shrinker to loop over every single vmap area in every shrinker invocation. Only the vm areas on the node that has a memory shortage need to be scanned and reclaimed, it doesn't need reclaim everything globally when a single node runs out of memory. Yes, this may not give quite as good microbenchmark scalability results, but being able to locate each vm area in node local memory and have operation on them largely isolated to node-local tasks and vmalloc area reclaim will work much better on large multi-socket NUMA machines. Cheers, Dave. -- Dave Chinner david@fromorbit.com