From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EDD90C3DA64 for ; Thu, 1 Aug 2024 23:08:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=cTzwkxXMjCnXxGPm8SS63wasgARHwPjwYcFLo4i11LY=; b=PLxfG5VrCld7few1BWTl41LalH i+Mep/m/pulXjDJaGiLzoyEp0XSSBKuJkCefJptsOUFVmN2fsK0oCDWfUjdIfaA6/Q0hvKBXb0IYy 1AVvVJING7Y3UttZDpkfggakI8+ZkSOzD/1Hu+j2PJnBzoNIBpWAE0HPZk07zIiHO04quGDG6JpUy 3eIuN0FLO5VnqN47ZmFg7ERzEKi3iQl446+XEcdkinK51eRGGHtCyVYeClF+mUNz23d50f6gnBLQ/ wFjpnFoMxKL6cn5hnO6i6OUC0p1y0LocQall8zPBxYHmhGXWGuCpqi1Vpdp+Amz5cF/r/cAz7LLlk VDm4iU4w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sZetu-00000006z1A-1UOr; Thu, 01 Aug 2024 23:08:02 +0000 Received: from mail-ed1-x533.google.com ([2a00:1450:4864:20::533]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sZetO-00000006ytu-3Kch for linux-arm-kernel@lists.infradead.org; Thu, 01 Aug 2024 23:07:32 +0000 Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-5af51684d52so8581863a12.1 for ; Thu, 01 Aug 2024 16:07:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722553649; x=1723158449; darn=lists.infradead.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=cTzwkxXMjCnXxGPm8SS63wasgARHwPjwYcFLo4i11LY=; b=RhOG/FMEMXebmbJzlkfARcTLV7WBNeMB9DjOr5qhiTo4uoCEACn54sVlm8uG86nONF QpDstDhRiHAQQEynrWsfl0Rc9LpmVkJ1GlKBv0NEpabTWGhyIiNGRuXTk5l4qlSCUbFk BX/0Eolupuoy8UFmln3ALxSZzZkVbJgfcNoQG8Lu5Anz9x0l3LixnO8eKD5lJWBaaDYf 78GG6s/VigZuX6flBS8wJXK3abQ6dHVtJohafPFxI4L04VK8MsmsMiEXAyyD9Na8Xm3f Kc4pgr0czbtvMxhjaU+vkH6cYw+TA1+Oh+Y33ImW/aomsi5ZKHbnOyHCDFO0hruBRNy8 cLsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722553649; x=1723158449; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=cTzwkxXMjCnXxGPm8SS63wasgARHwPjwYcFLo4i11LY=; b=qUBBv03PXugYnFegOHnOXLA8rPOmyvLqUlABnmmFSWXMfuqAGHjdwJCZypJXfhUFAJ innHDYyqzYRdbzWgnr0otPezIdh1Ty6tpz/pJ72YsGcm+u6Gow85ysStYDovhHmmjGo8 B8H1+YdSm8mbOLbcuXbO6sMgJqksbBBYpIkgR2rwFWbBKaCG5fESawswSw45VbOZn59q LJIWEj1bIDIWdAw/SNjDnwjuPDUlK2n+MiBsSiX8Ek+A7XFwUH99hG50/Sl8VKBb0Ep8 p24H9laKFEEpYD8ijfjOVF4CNqgAEEXi9kMZ+XBub3sh8EM8Kk+hF2GTIvCUO0Sfy+ox PRdA== X-Forwarded-Encrypted: i=1; AJvYcCXgHsUCf5IlIOY1x5Gs5SIDQAOlEjca0KPA/skWKFsCdp9ToAMK3Im5lcipew8rw+YlMYeV9PZ+UIe9J3LPw6BtpDKRq/WxLw8w0mYKrqGyXSKqHoE= X-Gm-Message-State: AOJu0YwkfTnQwFtAvj8OlC1/Bhc4yWNMnCNsKEevVcHSJeNMByQTIGYY x46kqXdibX8/sRlaN0hDNSoF+KeL8t+oenpcFGqlz5xQNulQYeAn X-Google-Smtp-Source: AGHT+IFv2aAGzMN11Iigqz0ZSdZXSz3c8So7Yv+NTGX6LRfrYtmGi031rLkNH9XJUmYnnVE03dMx+g== X-Received: by 2002:a17:906:7956:b0:a7a:b8f1:fd69 with SMTP id a640c23a62f3a-a7dc4e4ae17mr140198966b.18.1722553648600; Thu, 01 Aug 2024 16:07:28 -0700 (PDT) Received: from skbuf ([188.25.135.70]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a7dc9c0b6besm30094266b.52.2024.08.01.16.07.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Aug 2024 16:07:27 -0700 (PDT) Date: Fri, 2 Aug 2024 02:07:25 +0300 From: Vladimir Oltean To: Furong Xu <0x1207@gmail.com> Cc: Andrew Lunn , "David S. Miller" , Alexandre Torgue , Jose Abreu , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Maxime Coquelin , Joao Pinto , netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, xfr@outlook.com, rock.xu@nio.com Subject: Re: [PATCH net-next v1 2/5] net: stmmac: support fp parameter of tc-mqprio Message-ID: <20240801230725.nllk7n3veqwplfpo@skbuf> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240801_160730_866923_58B40EAE X-CRM114-Status: GOOD ( 32.83 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Jul 31, 2024 at 06:43:13PM +0800, Furong Xu wrote: > tc-mqprio can select whether traffic classes are express or preemptible. > > Tested on DWMAC CORE 5.10a > > Signed-off-by: Furong Xu <0x1207@gmail.com> > --- > .../net/ethernet/stmicro/stmmac/dwmac4_core.c | 2 + > drivers/net/ethernet/stmicro/stmmac/dwmac5.c | 12 ++++ > drivers/net/ethernet/stmicro/stmmac/dwmac5.h | 2 + > drivers/net/ethernet/stmicro/stmmac/hwif.h | 8 +++ > .../net/ethernet/stmicro/stmmac/stmmac_main.c | 2 + > .../net/ethernet/stmicro/stmmac/stmmac_tc.c | 61 +++++++++++++++++++ > 6 files changed, 87 insertions(+) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c > index 5d132bada3fe..068859284691 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c > @@ -655,3 +655,15 @@ void dwmac5_fpe_set_add_frag_size(void __iomem *ioaddr, u32 add_frag_size) > > writel(value, ioaddr + MTL_FPE_CTRL_STS); > } > + > +void dwmac5_fpe_set_preemptible_tcs(void __iomem *ioaddr, unsigned long tcs) > +{ > + u32 value; > + > + value = readl(ioaddr + MTL_FPE_CTRL_STS); > + > + value &= ~PEC; > + value |= FIELD_PREP(PEC, tcs); > + > + writel(value, ioaddr + MTL_FPE_CTRL_STS); > +} Watch out here. I think the MTL_FPE_CTRL_STS[PEC] field is per TXQ, but input from user space is per TC. There's a difference between the 2, that I'll try to clarify below. But even ignoring the case of multiple TXQs per TC, there's also the case of reverse TC:TXQ mappings: "num_tc 4 map 0 1 2 3 queues 1@3 1@2 1@1 1@0". When the user then says he wants TC 0 to be preemptible, he really means TXQ 3. That's how this should be treated. > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index 9b1cf81c50ea..a5e3316bc410 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -6256,6 +6256,8 @@ static int stmmac_setup_tc(struct net_device *ndev, enum tc_setup_type type, > switch (type) { > case TC_QUERY_CAPS: > return stmmac_tc_query_caps(priv, priv, type_data); > + case TC_SETUP_QDISC_MQPRIO: > + return stmmac_tc_setup_mqprio(priv, priv, type_data); > case TC_SETUP_BLOCK: > return flow_block_cb_setup_simple(type_data, > &stmmac_block_cb_list, > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c > index 996f2bcd07a2..494fe2f68300 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c > @@ -1198,6 +1198,13 @@ static int tc_query_caps(struct stmmac_priv *priv, > struct tc_query_caps_base *base) > { > switch (base->type) { > + case TC_SETUP_QDISC_MQPRIO: { > + struct tc_mqprio_caps *caps = base->caps; > + > + caps->validate_queue_counts = true; > + > + return 0; > + } > case TC_SETUP_QDISC_TAPRIO: { > struct tc_taprio_caps *caps = base->caps; > > @@ -1214,6 +1221,59 @@ static int tc_query_caps(struct stmmac_priv *priv, > } > } > > +static void stmmac_reset_tc_mqprio(struct net_device *ndev) > +{ > + struct stmmac_priv *priv = netdev_priv(ndev); > + > + netdev_reset_tc(ndev); > + netif_set_real_num_tx_queues(ndev, priv->plat->tx_queues_to_use); > + > + stmmac_fpe_set_preemptible_tcs(priv, priv->ioaddr, 0); > +} > + > +static int tc_setup_mqprio(struct stmmac_priv *priv, > + struct tc_mqprio_qopt_offload *mqprio) > +{ > + struct tc_mqprio_qopt *qopt = &mqprio->qopt; > + struct net_device *ndev = priv->dev; > + int num_stack_tx_queues = 0; > + int num_tc = qopt->num_tc; > + int offset, count; > + int tc, err; > + > + if (!num_tc) { > + stmmac_reset_tc_mqprio(ndev); > + return 0; > + } > + > + err = netdev_set_num_tc(ndev, num_tc); > + if (err) > + return err; > + > + for (tc = 0; tc < num_tc; tc++) { > + offset = qopt->offset[tc]; > + count = qopt->count[tc]; > + num_stack_tx_queues += count; > + > + err = netdev_set_tc_queue(ndev, tc, count, offset); > + if (err) > + goto err_reset_tc; > + } We might have a problem here, and I don't know if I'm well enough equipped with DWMAC knowledge to help you solve it. The way I understand mqprio is that it groups TX queues into traffic classes. All traffic classes are in strict priority relative to each other (TC 0 having the smallest priority). If multiple TX queues go to the same traffic class, it is expected that the NIC performs round robin dequeuing out of them. Then there's a prio_tc_map[], which maps skb->priority values to traffic classes. On xmit, netdev_pick_tx() chooses a random TX queue out of those assigned to the computed traffic class for the packet. This skb_tx_hash() is the software enqueue counterpart to what the NIC is expected to do in terms of scheduling. Much of this is said in newer versions of "man tc-mqprio". https://man7.org/linux/man-pages/man8/tc-mqprio.8.html Where I was trying to get is that you aren't programming the TC to TXQ mapping to hardware in any way, and you are accepting any mapping that the user requests. This isn't okay. I believe that the DWMAC TX scheduling algorithm is strict priority by default, with plat->tx_sched_algorithm being configurable through device tree properties to other values. Then, individual TX queues have the "snps,priority" device tree property for configuring their scheduling priority. All of that can go out of sync with what the user thinks he configures through tc-mqprio, and badly. Consider "num_tc 2 queues 3@0 2@3". The stack will think that TXQs 0, 1, 2 have one priority, and TXQs 3 and 4 another. But in reality, each TXQ (say by default) has a priority equal to its index. netdev_pick_tx() will think it's okay, for a packet belonging to TC0, to select a TXQ based on hashing between indices 0, 1, 2. But in hardware, the packets will get sent through TXQs with different priorities, based on nothing more than pure chance. That is a disaster, and especially noticeable when the mqprio mapping is applied through taprio, and there is a Qbv schedule on the port. Some packets will be scheduled on the right time slots, and some won't. So the idea here is that you'll either have to experiment with reprogramming the scheduling algorithm and TXQ priorities on mqprio offload, or refuse offloading anything that doesn't match what the hardware was pre-programmed with (through device tree, likely). > + > + err = netif_set_real_num_tx_queues(ndev, num_stack_tx_queues); > + if (err) > + goto err_reset_tc; > + > + stmmac_fpe_set_preemptible_tcs(priv, priv->ioaddr, mqprio->preemptible_tcs); > + > + return 0; > + > +err_reset_tc: > + stmmac_reset_tc_mqprio(ndev); > + > + return err; > +}