From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3844113D531; Tue, 6 May 2025 06:43:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746513821; cv=none; b=HrIeCIhKXdYuJod5iHEcBsVjGQo9nz28fB5+CKuB6EsogpPrpw3wrSLApikFvRcJBhUD60mNgHq0EQlwlC4bRITsE/IhcdCPTu6/QbDTOzkmqdGNBmns8r4c5dYTyyVULwaJqxVIbP0vSHx1iGm/ubQknfo//piGo1XQDjoEjYs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746513821; c=relaxed/simple; bh=61Vh5j7Cwed8wSGqkp/MdlV5dfIKU1kb/wM7E5fojPw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=jWV4q+zq6APp35hJ1JnhMO1GV9r7InkDHnO0N/QBJkQcsaYQQilPmpcl78CLQrXfS0iY6iVWLYYcAl62r+ltUQ0Rml9vFipU3t7/D2JmIwrsCavQZTLgx1WwoDegtpVv6cKNFVNE8i3cX+P7BllahRi6SAx3Nk+ZBZsYAd9kd64= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rjKO80ru; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rjKO80ru" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2F722C4CEE4; Tue, 6 May 2025 06:43:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746513820; bh=61Vh5j7Cwed8wSGqkp/MdlV5dfIKU1kb/wM7E5fojPw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rjKO80ruPVWgqOJpWGUdIIBG3XSqj2LMFgO7OX/4vKyRKeRbTGhk6/2ULD4OYR58n WA7EYcRnZLuHIoVrpzGyb3xE7ZF22bKzGO3XIIeqz3kJFoToVLg1mfxEmnR5WVziUn fsBYbrh1F6Po4AyIGJoyV5fmeIdkEbILwifoRP9o+R05Pwh41ush/70tWbr0N5w6nF DKejnn/rRQ4p4AAOmugATyNSSCLdosg60sH/wG71lNQbU4FPamrAs9bjg20zifW9Ni PDPlbWDqkU+eEizXHGSZxg+y1fZqZwBfLP3JLixEYSU3rtZ6pfCWhity3LQkFoD/Tw fIMswYdLD2I6g== Date: Mon, 5 May 2025 23:43:38 -0700 From: Namhyung Kim To: Dmitry Vyukov Cc: Arnaldo Carvalho de Melo , Ian Rogers , Kan Liang , Jiri Olsa , Adrian Hunter , Peter Zijlstra , Ingo Molnar , LKML , linux-perf-users@vger.kernel.org, Andi Kleen Subject: Re: [RFC/PATCH] perf report: Support latency profiling in system-wide mode Message-ID: References: <20250503003620.45072-1-namhyung@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: On Tue, May 06, 2025 at 07:55:25AM +0200, Dmitry Vyukov wrote: > On Tue, 6 May 2025 at 07:30, Namhyung Kim wrote: > > > > Hello, > > > > On Mon, May 05, 2025 at 10:08:17AM +0200, Dmitry Vyukov wrote: > > > On Sat, 3 May 2025 at 02:36, Namhyung Kim wrote: > > > > > > > > When it profile a target process (and its children), it's > > > > straight-forward to track parallelism using sched-switch info. The > > > > parallelism is kept in machine-level in this case. > > > > > > > > But when it profile multiple processes like in the system-wide mode, > > > > it might not be clear how to apply the (machine-level) parallelism to > > > > different tasks. That's why it disabled the latency profiling for > > > > system-wide mode. > > > > > > > > But it should be able to track parallelism in each process and it'd > > > > useful to profile latency issues in multi-threaded programs. So this > > > > patch tries to enable it. > > > > > > > > However using sched-switch info can be a problem since it may emit a lot > > > > more data and more chances for losing data when perf cannot keep up with > > > > it. > > > > > > > > Instead, it can maintain the current process for each CPU when it sees > > > > samples. > > > > > > Interesting. > > > > > > Few questions: > > > 1. Do we always see a CPU sample when a CPU becomes idle? Otherwise we > > > will think that the last thread runs on that CPU for arbitrary long, > > > when it's actually not. > > > > No, it's not guaranteed to have a sample for idle tasks. So right, it > > can mis-calculate the parallelism for the last task. If we can emit > > sched-switches only when it goes to the idle task, it'd be accurate. > > Then I think the profile can be significantly off if the system wasn't > ~100% loaded, right? Yep, it can be. > > > > 2. If yes, can we also lose that "terminating" even when a CPU becomes > > > idle? If yes, then it looks equivalent to missing a context switch > > > event. > > > > I'm not sure what you are asking. When it lose some records because the > > buffer is full, it'll see the task of the last sample on each CPU. > > Maybe we want to reset the current task after PERF_RECORD_LOST. > > This probably does not matter much if the answer to question 1 is No. > > But what I was is the following: > > let's say we have samples: > Sample 1 for Pid 42 on Cpu 10 > Sample 2 for idle task on Cpu 10 > ... no samples for some time on Cpu 10 ... > > When we process sample 2, we decrement the counter for running tasks > for Pid 42, right. > Now if sample 2 is lost, then we don't do decrement and the accounting > becomes off. > In a sense this is equivalent to the problem of losing context switch event. Right. But I think it's hard to be correct once it loses something. > > > > > 3. Does this mode kick in even for non system-wide profiles (collected > > > w/o context switch events)? If yes, do we properly understand when a > > > thread stops running for such profiles? How do we do that? There won't > > > be samples for idle/other tasks. > > > > For non system-wide profiles, the problem is that it cannot know when > > the current task is scheduled out so that it can decrease the count of > > parallelism. So this approach cannot work and sched-switch info is > > required. > > Where does the patch check that this mode is used only for system-wide profiles? > Is it that PERF_SAMPLE_CPU present only for system-wide profiles? Basically yes, but you can use --sample-cpu to add it. In util/evsel.c::evsel__config(): if (target__has_cpu(&opts->target) || opts->sample_cpu) evsel__set_sample_bit(evsel, CPU); Thanks, Namhyung