From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 7F24EC3271E
	for <linux-arm-kernel@archiver.kernel.org>; Mon,  8 Jul 2024 14:01:09 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help
	:List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type:
	MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To:
	Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
	Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner;
	bh=SOmprW56uPaYj9dbNZ3GJESFv78jvPRrI4cX7x0EoeY=; b=MAJMcLp5/+tEGs7POxILaDK/+D
	HJHJuCh6jRX7lYgelRdTLblJ3o1e0lfOLF8wyA0ZSNPgJEI3IWwXE+lAfOWzd9KnecymX7i1fI4tC
	+PL8EtVu2jXmRgRDiT4yZ6nwR42Dy1PdAgXzbR22EczoKjRjw6rHMdhtZVf66cr7mhaCxECFssiT0
	D4OE2YsdcWbALOpdm7ezSz5LEhWDQh4wg7atKxpThgXl96vsEqs9ZZLZIyoipofBU+ZSEGLzuzVkk
	Bhd+qwbg+QPl2T433hIjxTRwGwNqcrngooZaMtv/FDt8C4+ayv4pAs5jYnisov+EyLLb6gH+jSl1d
	dL0odzVA==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux))
	id 1sQovD-0000000409W-21Co;
	Mon, 08 Jul 2024 14:00:51 +0000
Received: from foss.arm.com ([217.140.110.172])
	by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux))
	id 1sQouy-0000000405M-0bfL
	for linux-arm-kernel@lists.infradead.org;
	Mon, 08 Jul 2024 14:00:37 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E437F1042;
	Mon,  8 Jul 2024 07:00:59 -0700 (PDT)
Received: from pluto (usa-sjc-mx-foss1.foss.arm.com [172.31.20.19])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E32313F766;
	Mon,  8 Jul 2024 07:00:33 -0700 (PDT)
Date: Mon, 8 Jul 2024 15:00:31 +0100
From: Cristian Marussi <cristian.marussi@arm.com>
To: Luke Parkin <luke.parkin@arm.com>
Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	arm-scmi@vger.kernel.org, sudeep.holla@arm.com,
	cristian.marussi@arm.com
Subject: Re: [PATCH v2 3/4] firmware: arm_scmi: Track basic SCMI statistics
Message-ID: <Zovw_4MgumQo8npi@pluto>
References: <20240703143738.2007457-1-luke.parkin@arm.com>
 <20240703143738.2007457-4-luke.parkin@arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20240703143738.2007457-4-luke.parkin@arm.com>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20240708_070036_298810_AA65A9D1 
X-CRM114-Status: GOOD (  33.34  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Wed, Jul 03, 2024 at 03:37:37PM +0100, Luke Parkin wrote:
> Add tracking of 5 initial statistics
> 
> Signed-off-by: Luke Parkin <luke.parkin@arm.com>

Missing

	---

> V1->V2
> Drop unneccesary atomic_set's
> Use new 'scmi_log_stats' to simplify incrementing of atomics
> Move scmi_log_stats to locations which mean no extra conditionals
> 	are needed
> ---
>  drivers/firmware/arm_scmi/driver.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
> index df3eb17cf439..937546397cf2 100644
> --- a/drivers/firmware/arm_scmi/driver.c
> +++ b/drivers/firmware/arm_scmi/driver.c
> @@ -1146,8 +1146,10 @@ static void scmi_handle_response(struct scmi_chan_info *cinfo,
>  	if (xfer->hdr.type == MSG_TYPE_DELAYED_RESP) {
>  		scmi_clear_channel(info, cinfo);
>  		complete(xfer->async_done);
> +		scmi_log_stats(&info->stats.dlyd_response_ok);
>  	} else {
>  		complete(&xfer->done);
> +		scmi_log_stats(&info->stats.response_ok);
>  	}
>  
>  	if (IS_ENABLED(CONFIG_ARM_SCMI_RAW_MODE_SUPPORT)) {

Up above here in scmi_handle_respnonse() I would take care to track a
few interesting stats about when reponses that went bad...if you look at
scmi_xfer_command_acquire() you will notice that a whole class of errors
there, and  spread all around the core, are handled by scmi_bad_message_trace():
please handle all of these errors based on the type of the enum scmi_bad_msg...

....which means, IOW, add a counter for each the enum scmi_bad_msg and handle
them in scmi_bad_message_trace(), so that they will get counted everywhere....
and... :P ... while doing this please consider as usual the fact that we will
like the log_stats call to vanish completely at compile time if STATS=n

...then please look also into handle_notification() and try to add a couple
of counters there too...

... we can discuss all of these new counters in the next review...

> @@ -1231,6 +1233,7 @@ static int scmi_wait_for_reply(struct device *dev, const struct scmi_desc *desc,
>  			       struct scmi_xfer *xfer, unsigned int timeout_ms)
>  {
>  	int ret = 0;
> +	struct scmi_info *info = handle_to_scmi_info(cinfo->handle);
>  
>  	if (xfer->hdr.poll_completion) {
>  		/*
> @@ -1251,13 +1254,12 @@ static int scmi_wait_for_reply(struct device *dev, const struct scmi_desc *desc,
>  					"timed out in resp(caller: %pS) - polling\n",
>  					(void *)_RET_IP_);
>  				ret = -ETIMEDOUT;
> +				scmi_log_stats(&info->stats.xfers_response_timeout);

I would rename this as 'xfer_response_polled_timeout' (so adding one more
stats to the structs and debugfs) in order to discern this timeout case
for the one below.

>  			}
>  		}
>  
>  		if (!ret) {
>  			unsigned long flags;
> -			struct scmi_info *info =
> -				handle_to_scmi_info(cinfo->handle);
>  

..in this same code-block, down below right before the trace_scmi_msg, I
would add a:

		scmi_log_stats(&info->stats.response_polled_ok);

...if not, you will end-up counting only the successfull reply on the IRQ
path, i.e. only the one that are served by handle_response.

>  			/*
>  			 * Do not fetch_response if an out-of-order delayed
> @@ -1291,6 +1293,7 @@ static int scmi_wait_for_reply(struct device *dev, const struct scmi_desc *desc,
>  			dev_err(dev, "timed out in resp(caller: %pS)\n",
>  				(void *)_RET_IP_);
>  			ret = -ETIMEDOUT;
> +			scmi_log_stats(&info->stats.xfers_response_timeout);

This is a different kind of timeout...on non-polled request, lets keep
it named as it is now...

>  		}
>  	}
>  
> @@ -1374,13 +1377,15 @@ static int do_xfer(const struct scmi_protocol_handle *ph,
>  	    !is_transport_polling_capable(info->desc)) {
>  		dev_warn_once(dev,
>  			      "Polling mode is not supported by transport.\n");
> +		scmi_log_stats(&info->stats.sent_fail);

Same here ... you have different reason for failures in the following,
so you could try to count them separately...

	sent_fail_polling_unsupported

>  		return -EINVAL;
>  	}
>  
>  	cinfo = idr_find(&info->tx_idr, pi->proto->id);
> -	if (unlikely(!cinfo))
> +	if (unlikely(!cinfo)) {
> +		scmi_log_stats(&info->stats.sent_fail);

	sent_fail_channel_not_found

>  		return -EINVAL;
> -
> +	}
>  	/* True ONLY if also supported by transport. */
>  	if (is_polling_enabled(cinfo, info->desc))
>  		xfer->hdr.poll_completion = true;
> @@ -1412,6 +1417,7 @@ static int do_xfer(const struct scmi_protocol_handle *ph,
>  	ret = info->desc->ops->send_message(cinfo, xfer);
>  	if (ret < 0) {
>  		dev_dbg(dev, "Failed to send message %d\n", ret);
> +		scmi_log_stats(&info->stats.sent_fail);

this is the basic, original send failure...it is a generic failure at the transport layer,
so lets keep it named like it is now...

>  		return ret;
>  	}
>  

...here you can just place your (instead of the one down below)

	scmi_log_stats(&info->stats.sent_ok);

...becasue now, right here, you are already sure that the message has been sent ...

> @@ -1420,8 +1426,12 @@ static int do_xfer(const struct scmi_protocol_handle *ph,
>  			    xfer->hdr.status, xfer->tx.buf, xfer->tx.len);
>  
>  	ret = scmi_wait_for_message_response(cinfo, xfer);
> -	if (!ret && xfer->hdr.status)
> +	if (!ret && xfer->hdr.status) {
>  		ret = scmi_to_linux_errno(xfer->hdr.status);
> +		scmi_log_stats(&info->stats.sent_fail);

...this is NOT a sent_fail...you DID send successfully the message AND received
successfully a response (ret==0)  BUT there was an error of some kind at the protocol
layer...i.e. the server sent you an error code in the reply...so I would count here
instead something named like

	stats.protocol_errs

> +	} else {
> +		scmi_log_stats(&info->stats.sent_ok);

and this is really late to be counted as sent, so it would be clearer to
count it above (where I mentiones), so please drop the else.

Thanks,
Cristian