From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65900218AAF for ; Sun, 23 Nov 2025 21:45:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763934327; cv=none; b=Grp4fmlPGHRbQD3y79dt/pFyUAr5iTdFCSNdzw92mN1A2R3hDEj2hPD4UA7kKIIGnJtSAcVi+o0bITks4e+y3dVfPKeDwHCp+AOry8A90IEmDVofRtHoa5ogBFvqMFMAMkzsRLFoQp1/Php9gDETbWA18dOrb8mqL0Ye15eSQII= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763934327; c=relaxed/simple; bh=z402kcAscD4Y/wx8hy0g0hytjp/7CY6toZQudtjEuaA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=AYXvJh6iwPs4HEjcx1EypQceKYp5DMlcxaMB3VTvBh+kLd8B4UEysYG+xUS4bgTLBgISA4cvche27vriWmQoOopuS9Pf/dZfKqXIxw1fo1tMQH0LJ7f8OY7H05ce7/zilKCWEMW50LXk2XFqdiyltRLhSFPIyVpJv6j85Lg9L8k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=mCiU0wos; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="mCiU0wos" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=vizaUFYQGCAL3oKV+7FMV++yDiuY68t3L39autjn1Ps=; b=mCiU0woslSrHvuhY7PtosZkaAH fUE+xRcz3gSABrILMU9HeYIM9ykFaLvcrtiL9hLX4DyPkGFivzbmSQG18WQkeYirOf2ASKYmRfBgE SWmblGnbf5WbuEJdagVUWpUR2zXZqom4cYpJCuoqz1IyIKvJcI6lD7AOksu32AONqVU/KLcjLqS0N d4Nkk+Bo2gKubxxnQfL2cE0laUJ+j8BsrfdIUcWAAjSdGbzCjoGFE1cZioSqFCjn4bxWWQQHB6dep g4/0tmqzFghXkwvC0u80gXLbWhC4TVTk+gwJegcCca7BabNADHGWYaJ6pxMEGnFwy2NeC9DNIGdFy 2BeOTwBg==; Received: from willy by casper.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1vNHtY-000000063SD-2OPY; Sun, 23 Nov 2025 21:45:20 +0000 Date: Sun, 23 Nov 2025 21:45:20 +0000 From: Matthew Wilcox To: Mateusz Guzik Cc: oleg@redhat.com, brauner@kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org Subject: Re: [PATCH 0/3] further damage-control lack of clone scalability Message-ID: References: <20251123063054.3502938-1-mjguzik@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Sun, Nov 23, 2025 at 05:39:16PM +0100, Mateusz Guzik wrote: > I have some recollection we talked about this on irc long time ago. > > It is my *suspicion* this would be best served with a sparse bitmap + > a hash table. Maybe! I've heard other people speculate that would be a better data structure. I know we switched away from a hash table for the page cache, but that has a different usage pattern where it's common to go from page N to page N+1, N+2, ... Other than ps, I don't think we often have that pattern for PIDs. > Such a solution was already present, but it got replaced by > 95846ecf9dac5089 ("pid: replace pid bitmap implementation with IDR > API"). > > Commit message cites the following bench results: > The following are the stats for ps, pstree and calling readdir on /proc > for 10,000 processes. > > ps: > With IDR API With bitmap > real 0m1.479s 0m2.319s > user 0m0.070s 0m0.060s > sys 0m0.289s 0m0.516s > > pstree: > With IDR API With bitmap > real 0m1.024s 0m1.794s > user 0m0.348s 0m0.612s > sys 0m0.184s 0m0.264s > > proc: > With IDR API With bitmap > real 0m0.059s 0m0.074s > user 0m0.000s 0m0.004s > sys 0m0.016s 0m0.016s > > Impact on clone was not benchmarked afaics. It shouldn't be too much effort for you to check out 95846ecf9dac5089 and 95846ecf9dac5089^ to run your benchmark on both? That would seem like the cheapest way of assessing the performance of hash+bitmap vs IDR. > Regardless, in order to give whatever replacement a fair perf eval > against idr, at least the following 2 bits need to get sorted out: > - the self-induced repeat locking of pidmap_lock > - high cost of kmalloc (to my understanding waiting for sheaves4all) The nice thing about XArray (compared to IDR) is that there's no requirement to preallocate. Only 1.6% of xa_alloc() calls result in calling slab. The downside is that means that XArray needs to know where its lock is (ie xa_lock) so that it can drop the lock in order to allocate without using GFP_ATOMIC. At one point I kind of had a plan to create a multi-xarray where you had multiple xarrays that shared a single lock. Or maybe this sharding is exactly what's needed; I haven't really analysed the pid locking to see what's needed.