From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2895730BF4E for ; Mon, 1 Dec 2025 17:54:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764611688; cv=none; b=tUDPaeUyjuCs62hsmuKN7KuhH94bv9Z91NxI5Y4VvVwVMXpZdqXCDTJxBu+EJMAT1cpMH6cSwOA6Erp5e04DzuAcRRhOM7+6/ABPbxqcPOkTEZ15AZvpFHfVGKmeDQvwdslg7Hu3bZ2TN+pr8mUrmg2pSqcqnV7kPkmK+YOFu6Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764611688; c=relaxed/simple; bh=UApK1oU1YHg2wF5jC/yjyZJfjfFdpdXm/w6+nATxEnY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=U3sd1hAsYXaW4rFhYTne8raiWpX2v80oJkBHIa/XJ8uyi/rLxh3s1tR3rEglKD9jxJrkSsoyUGzBUEVVOLIJkT259kGN4qhMMxlhZSOToYLBFyir+aWWA94+0+EjibMqC3U6LVHAApz3K8KgeqcMzUw5HV+++tjEbKyz3W77HK4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=H8QJosOo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="H8QJosOo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6D7D3C4CEF1; Mon, 1 Dec 2025 17:54:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1764611688; bh=UApK1oU1YHg2wF5jC/yjyZJfjfFdpdXm/w6+nATxEnY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=H8QJosOoqSwNtPI/AMeMgWUQPgo61YnBMfGBtRsRyUn8/Ya9QbfdNTbMaaUK0qr1P SYcipLwMK7zcSrjTOxwnK+R4gf5xYIYuRXH/VKqndt3CS9iAdcKFUKZ4I3gTvrrWnt 5u8PHpZDtIRL9uldjH3XaimZSOVR0tbbvifOgiNjseuG7MzncbVx0fJH9TyNNrrv+R HXfO7hL1ua4W1TIci0UyUnbwLYp+TXK6bxTCjfzHUtY4P7FOVzv61v0JFParUFj1vc X2W8Oq48Z/DEfOzgEVkgbQwYpouLgAqbUUWcfqhwIjDHvTEgJFnA0qzGWEPzL5mwZx dqPTqvbUXUodA== Date: Mon, 1 Dec 2025 18:54:38 +0100 From: Ingo Molnar To: Ian Rogers Cc: linux-perf-users@vger.kernel.org, Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Jiri Olsa Subject: Re: [PATCH] Fix (well, work around) perf stat --null segfault Message-ID: References: Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: * Ian Rogers wrote: > Agreed on the trainwreck :-) So NULL is a valid CPU map value. It > means no CPUs, but magically no CPUs can also be converted in the code > to become the "any" CPU value of -1. So iterating the perf_cpu_map of > NULL will yield a single CPU of -1 :-( This behavior existed before my > time working on the code, I've tried to make the expectation in the > using code clearer, does the user want the magic "any" CPU behavior? > "any" CPU is distinct from every/all CPUs which would be something > like CPUs 0-127, and then does "all" include online and offline CPUs? > This is something that's been discussed and I think things can be > better, I'd rather the code was explicit whenever terms like "all", > "online" and "any" were being used - it is very easy to confuse "all" > and "any". I've been trying to adjust the code and make it more > sensible with changes like: > https://lore.kernel.org/all/20240202234057.2085863-1-irogers@google.com/ > but I agree with you that things are a mess here. At least we > shouldn't be leaking memory (due to the reference count checking) and > we shouldn't confuse indices and CPU numbers (due to the introduction > of a wrapper type), problems that were pervasive and broke things like > uncore counters frequently before. > > Fwiw, part of me thinks we should change the perf_cpu_map to be a > cpu_set_t, the "any" case could be an additional boolean in the > struct. Perhaps we can just auto change "all" to "any" as is somewhat > done here: > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evlist.c?h=perf-tools-next#n92 > Then perf_cpu_map can be a cpu_set_t and I think life will be clearer. Yeah, so when I looked at the data structure a few minutes ago: /** * A sized, reference counted, sorted array of integers representing CPU * numbers. This is commonly used to capture which CPUs a PMU is associated * with. The indices into the cpumap are frequently used as they avoid having * gaps if CPU numbers were used. For events associated with a pid, rather than * a CPU, a single dummy map with an entry of -1 is used. */ DECLARE_RC_STRUCT(perf_cpu_map) { refcount_t refcnt; /** Length of the map array. */ int nr; /** The CPU values. */ struct perf_cpu map[]; }; My first thought was "Why isn't this a bitmask?". :-) The thing is, even with 8192 CPUs, the mask size is 1K. That's still not catastrophic compared to the nightmare of managing a [8192] array... And I would keep such core data structures as simple as possible: a single continuous bitmask. No sparse support, no NULL tricks. Just an always-present bitmask, and maybe an any/all modifier flag if the ABIs strongly suggest so. The closer a data structure is to the core concepts of a tool, the more important its robust KISS properties are - at almost any runtime cost. :) > Fwiw, even on the kernel side this is borked as I tried to explain in > this RFC with 0 follow ups: > https://lore.kernel.org/lkml/20250716223924.825772-1-irogers@google.com/ So that's a neat patch and kudos on the ASCII graphics. :-) I'd suggest a re-send. > Imagine the case of an event opened with 1 CPU and a TID. The filter > will mean the counter only counts on the 1 CPU but the enabled and > running time will mean the counter will be scaled (on your system) 128 > times! > All of this matters when you consider different events opened in the > same evlist and there being combined perf_cpu_maps like the evlist's > all_cpus value. > > - nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1; > > + if (evsel_list->core.all_cpus) > > + nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1; > > + else > > + nr = 0; > > So I think the better fix is to make max handle a NULL perf_cpu_map. > That will yield "any" CPU of -1 ( :-( ) and then the -1 + 1 will give > an empty aggregation map. > > I'll write this up, add the test and see if I can spot similar issues > wrt NULL in the vicinity in the perf_cpu_map code. Thank you! In the short run I'd prefer whichever fix can restore perf stat --null functionality the fastest. ;-) Thanks, Ingo