From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EDB6D1CF8B for ; Fri, 17 Jan 2025 01:33:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737077582; cv=none; b=CRvX5Z7rJYrhpstpzM8Dc646+dbz4rA8NyNtTHotj4kWtmmHILewQKbSOqx834C/SjJFJms/S+LPmvPmYho4jyfAFGKgogS6mRB8qG4evq171YPqi27cw9Ofn4ksH0ZffPHSWHshBX4AFWI3SpJqQB1e2FTE9r/Wa1QN75XZPxI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737077582; c=relaxed/simple; bh=UzeXIW+mIu3i24kiP0pBXUtExZybzMP64E6h8r2EWE0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=hrd+vHxBhmP7yEIjz4dBTeMJdyjSA5c3rqvBvi/3ywn0++bk5cTdknWHi2v8Gu3pyCFw4C3786brp6E3vYjiDL8ogYlWNYdsQfH0DHT+BotOqnXvpNIqOoKYuik/xlVSj5w1qFxZQT7QF17tCSMuoyqxZlu2BRFEvFhU3SOb2xU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Xyem+cMj; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Xyem+cMj" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6F7ECC4CED6; Fri, 17 Jan 2025 01:33:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1737077581; bh=UzeXIW+mIu3i24kiP0pBXUtExZybzMP64E6h8r2EWE0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Xyem+cMjekKdtRSIvvMZwwWjv2W+38l9RGvASMHK1OkpFce1QSV5ZYWA5jlM92DOZ kIiPqY6vSlJVQyM0/JM9822h9exOwYWDFmU580rxBLw9+IXsMXTSj2XJMxoDY4+h4/ DOCmlfWBH6aUuPaz5j+CoWJOq1NNRoOGC36fsM2Kt3xubPlvGpH0k1ez9VWWqfxoJq E1yinccvlui8idOifhnlibevu1VukeQpb6qrZglCiNoY3TSCDmVX/gyoRlr88qrt1y 8wcHRtwG0HMyyHCtZhnXvc1+cppur5DCiQ8AjrO6AoJrplnHjdKCTNPDTnBKknNqV3 UyQrMoRUJtygQ== Date: Thu, 16 Jan 2025 15:33:00 -1000 From: Tejun Heo To: Changwoo Min Cc: void@manifault.com, arighi@nvidia.com, kernel-dev@igalia.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/7] sched_ext: Implement event counter infrastructure and add an event Message-ID: References: <20250116151543.80163-1-changwoo@igalia.com> <20250116151543.80163-2-changwoo@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250116151543.80163-2-changwoo@igalia.com> Hello, On Fri, Jan 17, 2025 at 12:15:37AM +0900, Changwoo Min wrote: > +/* > + * Collection of event counters. > + */ > +struct scx_event_stat { If we keep this struct, maybe name it scx_event_stats? > + /* > + * If ops.select_cpu() returns a CPU which can't be used by the task, > + * the core scheduler code silently picks a fallback CPU. > + */ > + u64 INVAL_SELECT_CPU; Let's do $COMPONENT_$EVENT, so SELECT_CPU_INVALID (or maybe something more specific than invalid, like disallowed or fallback?). > +#define SCX_EVENT_IDX(e) (offsetof(struct scx_event_stat, e)/sizeof(u64)) > +#define SCX_EVENT_END_IDX() (sizeof(struct scx_event_stat)/sizeof(u64)) > +#define SCX_EVENT_DEFINE(e) SCX_EVENT_##e = SCX_EVENT_IDX(e) > + > +/* > + * Types of event counters. > + */ > +enum scx_event_kind { > + SCX_EVENT_BEGIN = 0, I think 0 BEGIN value can be implicit. > + SCX_EVENT_DEFINE(INVAL_SELECT_CPU), > + SCX_EVENT_END = SCX_EVENT_END_IDX(), SCX_NR_EVENTS? Also, we don't really need to macro to count the enums. > +}; This feels a bit odd to me. Why does it need both the struct fields and indices? Can we do either one of those? > +static const char *scx_event_stat_str[] = { > + [SCX_EVENT_INVAL_SELECT_CPU] = "invalid_select_cpu", > +}; and hopefully derive this through # macro arg expansion? > +/* > + * The event counter is organized by a per-CPU variable to minimize the > + * accounting overhead without synchronization. A system-wide view on the > + * event counter is constructed when requested by scx_bpf_get_event_stat(). > + */ > +static DEFINE_PER_CPU(struct scx_event_stat, event_stats); May be better to include "cpu" in the variable name. > +/** > + * scx_add_event - Increase an event counter for 'name' by 'cnt' > + * @name: an event name defined in struct scx_event_stat > + * @cnt: the number of the event occured > + */ > +#define scx_add_event(name, cnt) ({ \ > + struct scx_event_stat *__e; \ > + __e = get_cpu_ptr(&event_stats); \ > + WRITE_ONCE(__e->name, __e->name+ (cnt)); \ > + put_cpu_ptr(&event_stats); \ this_cpu_add()? > @@ -3607,8 +3667,10 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag > *ddsp_taskp = NULL; > if (ops_cpu_valid(cpu, "from ops.select_cpu()")) > return cpu; > - else > + else { > + scx_add_event(INVAL_SELECT_CPU, 1); > return prev_cpu; > + } formatting: if () { } else { } Also, I'm not sure this is a useful event to count. False ops_cpu_valid() indicates that the returned CPU is not even possible and the scheduler is ejected right away. What's more interesting is kernel/sched/core.c::select_task_rq() tripping on !is_cpu_allowed() and falling back using select_fallback_rq(). We can either hook into core.c or just compare the ops.select_cpu() picked CPU against the CPU the task ends up on in enqueue_task_scx(). > @@ -5053,6 +5115,15 @@ static void scx_ops_disable_workfn(struct kthread_work *work) > scx_rq_clock_invalidate(rq); > } > > + /* > + * Clear event counters so the next scx scheduler always gets > + * fresh event counter values. > + */ > + for_each_possible_cpu(cpu) { > + struct scx_event_stat *e = per_cpu_ptr(&event_stats, cpu); > + memset(e, 0, sizeof(*e)); > + } > + Wouldn't we want to keep these counters intact on ejection so that the scheduler's ejection path can capture the counters if necessary? Resetting on load probably is better. Thanks. -- tejun