From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF3C0249E5; Thu, 9 Jan 2025 16:03:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736438609; cv=none; b=nXUv/8ucwwpNibtbUlL0yBEeYcGPJVjMczVX3CJPZIrh4ep6Qu1Ac2/vrrBftA3xIZ4/2li4CnRoC+XZMr7Vkf6OPo+6i57LvrEYe8LxKLSqcJVHKnTLaMHIKZjtEuvMAmfpD46vhN7GhrB3krv0zNmLcrRQpT2hCc77t44CvT4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736438609; c=relaxed/simple; bh=MeRpIz07IO3DB6Ldz84dGTdMhtVBjunWM8RIYG1CZDs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=OgJSv4oeHOZfPaXQKXB8tyEk3ZjOk+b7XAsLa6vxWmNJQGoMMPSVdvSi6ZpMzPwqLRKFBSeLDgGVe1U+CZy3nnJkLbsXUyUckmCH+oOrlmsJCevmjRvafnyf/HHPSMLXUscGNp3oJ7h8zujlWs5rBX0aH8AVZ9yq7zhmgT9/TQA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=g2Pocp/D; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="g2Pocp/D" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C1E3DC4CED3; Thu, 9 Jan 2025 16:03:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1736438609; bh=MeRpIz07IO3DB6Ldz84dGTdMhtVBjunWM8RIYG1CZDs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=g2Pocp/D7ATfHqlN++yZr2YQ2pFNHk7+D97yBWrGkQNdjriYrXijXjq7eEfqNxPWU gBgac0p5Y3kmvE9xVndfsd2BNgR8Gj0BT0Yf9Y7TxCQHuptOdxjDOvWbLglrYFoX++ n2P3KIbUzwWALrY1OnRLxTOjnhmsjpl0Tqg1NzOSzLAjS6jMJNJJEl7DTA4aBDdwZB INnyhsIX+ILceH+MkpOvenfwzYhx5JbLNeNeVBjG/Wu6Y+2w4xYm8r3g1wt3meY0yx oT8Wc7Cj7M5GSxbVfAOq50JHRWdskC4/iA2wYdAL8GFPC+VrgHUaCADEiMiFfOgqMR /4vk9pZuv3bTA== Date: Thu, 9 Jan 2025 13:03:26 -0300 From: Arnaldo Carvalho de Melo To: Alan Maguire Cc: Ihor Solodrai , dwarves@vger.kernel.org, eddyz87@gmail.com, andrii@kernel.org, mykolal@fb.com, bpf@vger.kernel.org Subject: Re: [PATCH dwarves] dwarves: set cu->obstack chunk size to 128Kb Message-ID: References: <20241221030445.33907-1-ihor.solodrai@pm.me> <92a6a095-3a49-4204-af49-643f2db1e3a9@oracle.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Jan 08, 2025 at 05:22:56PM +0000, Alan Maguire wrote: > On 08/01/2025 16:38, Ihor Solodrai wrote: > > On Wednesday, January 8th, 2025 at 5:55 AM, Alan Maguire wrote: > > > >> > >> > >> On 21/12/2024 03:04, Ihor Solodrai wrote: > >> > >>> In dwarf_loader with growing nr_jobs the wall-clock time of BTF > >>> encoding starts worsening after a certain point [1]. > >>> > >>> While some overhead of additional threads is expected, it's not > >>> supposed to be noticeable unless nr_jobs is set to an unreasonably big > >>> value. > >>> > >>> It turns out when there are "too many" threads decoding DWARF, they > >>> start competing for memory allocation: significant number of cycles is > >>> spent in osq_lock - in the depth of malloc called within > >>> cu__zalloc. Which suggests that many threads are trying to allocate > >>> memory at the same time. > >>> > >>> See an example on a perf flamegraph for run with -j240 [2]. This is > >>> 12-core machine, so the effect is small. On machines with more cores > >>> this problem is worse. > >>> > >>> Increasing the chunk size of obstacks associated with CUs helps to > >>> reduce the performance penalty caused by this race condition. > >> > >> > >> Is this because starting with a larger obstack size means we don't have > >> to keep reallocating as the obstack grows? > > > > Yes. Bigger obstack size leads to lower number of malloc calls. The > > mallocs tend to happen at the same time between threads in the case of > > DWARF decoding. > > > > Curiously, setting a higher obstack chunk size (like 1Mb), does not > > improve the overall wall-clock time, and can even make it worse. > > This happens because the kernel takes a different code path to allocate > > bigger chunks of memory. And also most CUs are not big (at least in case > > of vmlinux), so a bigger chunk size probably increases wasted memory. > > > > 128Kb seems to be close to a sweet spot for the vmlinux. > > The default is 4Kb. > > > > Thanks for the additional details! > > Reviewed-by: Alan Maguire I'm adding these details and your reviewed-by tag to that cset. Thanks! - Arnaldo