From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EDB6D1CF8B
	for <linux-kernel@vger.kernel.org>; Fri, 17 Jan 2025 01:33:01 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1737077582; cv=none; b=CRvX5Z7rJYrhpstpzM8Dc646+dbz4rA8NyNtTHotj4kWtmmHILewQKbSOqx834C/SjJFJms/S+LPmvPmYho4jyfAFGKgogS6mRB8qG4evq171YPqi27cw9Ofn4ksH0ZffPHSWHshBX4AFWI3SpJqQB1e2FTE9r/Wa1QN75XZPxI=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1737077582; c=relaxed/simple;
	bh=UzeXIW+mIu3i24kiP0pBXUtExZybzMP64E6h8r2EWE0=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=hrd+vHxBhmP7yEIjz4dBTeMJdyjSA5c3rqvBvi/3ywn0++bk5cTdknWHi2v8Gu3pyCFw4C3786brp6E3vYjiDL8ogYlWNYdsQfH0DHT+BotOqnXvpNIqOoKYuik/xlVSj5w1qFxZQT7QF17tCSMuoyqxZlu2BRFEvFhU3SOb2xU=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Xyem+cMj; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Xyem+cMj"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6F7ECC4CED6;
	Fri, 17 Jan 2025 01:33:01 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1737077581;
	bh=UzeXIW+mIu3i24kiP0pBXUtExZybzMP64E6h8r2EWE0=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
	b=Xyem+cMjekKdtRSIvvMZwwWjv2W+38l9RGvASMHK1OkpFce1QSV5ZYWA5jlM92DOZ
	 kIiPqY6vSlJVQyM0/JM9822h9exOwYWDFmU580rxBLw9+IXsMXTSj2XJMxoDY4+h4/
	 DOCmlfWBH6aUuPaz5j+CoWJOq1NNRoOGC36fsM2Kt3xubPlvGpH0k1ez9VWWqfxoJq
	 E1yinccvlui8idOifhnlibevu1VukeQpb6qrZglCiNoY3TSCDmVX/gyoRlr88qrt1y
	 8wcHRtwG0HMyyHCtZhnXvc1+cppur5DCiQ8AjrO6AoJrplnHjdKCTNPDTnBKknNqV3
	 UyQrMoRUJtygQ==
Date: Thu, 16 Jan 2025 15:33:00 -1000
From: Tejun Heo <tj@kernel.org>
To: Changwoo Min <changwoo@igalia.com>
Cc: void@manifault.com, arighi@nvidia.com, kernel-dev@igalia.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/7] sched_ext: Implement event counter infrastructure
 and add an event
Message-ID: <Z4mzTDPObuHDIuFO@slm.duckdns.org>
References: <20250116151543.80163-1-changwoo@igalia.com>
 <20250116151543.80163-2-changwoo@igalia.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20250116151543.80163-2-changwoo@igalia.com>

Hello,

On Fri, Jan 17, 2025 at 12:15:37AM +0900, Changwoo Min wrote:
> +/*
> + * Collection of event counters.
> + */
> +struct scx_event_stat {

If we keep this struct, maybe name it scx_event_stats?

> +	/*
> +	 * If ops.select_cpu() returns a CPU which can't be used by the task,
> +	 * the core scheduler code silently picks a fallback CPU.
> +	 */
> +	u64		INVAL_SELECT_CPU;

Let's do $COMPONENT_$EVENT, so SELECT_CPU_INVALID (or maybe something more
specific than invalid, like disallowed or fallback?).

> +#define SCX_EVENT_IDX(e)	(offsetof(struct scx_event_stat, e)/sizeof(u64))
> +#define SCX_EVENT_END_IDX()	(sizeof(struct scx_event_stat)/sizeof(u64))
> +#define SCX_EVENT_DEFINE(e)	SCX_EVENT_##e = SCX_EVENT_IDX(e)
> +
> +/*
> + * Types of event counters.
> + */
> +enum scx_event_kind {
> +	SCX_EVENT_BEGIN = 0,

I think 0 BEGIN value can be implicit.

> +	SCX_EVENT_DEFINE(INVAL_SELECT_CPU),
> +	SCX_EVENT_END = SCX_EVENT_END_IDX(),

SCX_NR_EVENTS? Also, we don't really need to macro to count the enums.

> +};

This feels a bit odd to me. Why does it need both the struct fields and
indices? Can we do either one of those?

> +static const char *scx_event_stat_str[] = {
> +	[SCX_EVENT_INVAL_SELECT_CPU]	= "invalid_select_cpu",
> +};

and hopefully derive this through # macro arg expansion?

> +/*
> + * The event counter is organized by a per-CPU variable to minimize the
> + * accounting overhead without synchronization. A system-wide view on the
> + * event counter is constructed when requested by scx_bpf_get_event_stat().
> + */
> +static DEFINE_PER_CPU(struct scx_event_stat, event_stats);

May be better to include "cpu" in the variable name.

> +/**
> + * scx_add_event - Increase an event counter for 'name' by 'cnt'
> + * @name: an event name defined in struct scx_event_stat
> + * @cnt: the number of the event occured
> + */
> +#define scx_add_event(name, cnt) ({						\
> +	struct scx_event_stat *__e;						\
> +	__e = get_cpu_ptr(&event_stats);					\
> +	WRITE_ONCE(__e->name, __e->name+ (cnt));				\
> +	put_cpu_ptr(&event_stats);						\

this_cpu_add()?

> @@ -3607,8 +3667,10 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
>  		*ddsp_taskp = NULL;
>  		if (ops_cpu_valid(cpu, "from ops.select_cpu()"))
>  			return cpu;
> -		else
> +		else {
> +			scx_add_event(INVAL_SELECT_CPU, 1);
>  			return prev_cpu;
> +		}

formatting:

        if () {
        } else {
        }

Also, I'm not sure this is a useful event to count. False ops_cpu_valid()
indicates that the returned CPU is not even possible and the scheduler is
ejected right away. What's more interesting is
kernel/sched/core.c::select_task_rq() tripping on !is_cpu_allowed() and
falling back using select_fallback_rq().

We can either hook into core.c or just compare the ops.select_cpu() picked
CPU against the CPU the task ends up on in enqueue_task_scx().

> @@ -5053,6 +5115,15 @@ static void scx_ops_disable_workfn(struct kthread_work *work)
>  		scx_rq_clock_invalidate(rq);
>  	}
>  
> +	/*
> +	 * Clear event counters so the next scx scheduler always gets
> +	 * fresh event counter values.
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		struct scx_event_stat *e = per_cpu_ptr(&event_stats, cpu);
> +		memset(e, 0, sizeof(*e));
> +	}
> +

Wouldn't we want to keep these counters intact on ejection so that the
scheduler's ejection path can capture the counters if necessary? Resetting
on load probably is better.

Thanks.

-- 
tejun