From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="a/b5xmrK" Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C5E8D44 for ; Wed, 6 Dec 2023 11:06:08 -0800 (PST) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-5bd0c909c50so49790a12.3 for ; Wed, 06 Dec 2023 11:06:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701889567; x=1702494367; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=I409PBeV0vI0BSi/sNEko+BMZt4s00+VHGZ9Cxi/uGs=; b=a/b5xmrKgk/7uyzJAxjbIr2Xb7wyTKzTfLsjFb6oWW05EP6YlIjaNmUcBT0I1x8yq5 TEXhXYDpVmPJELmA6eoz0fCoelVWgMttaxPwpp2tsf+r+jVlJvsoeSEE54lur5ShfBs9 vWWXsgclEYLdYu/0MGT2L1AoCRq8ijqhLD8eQUf69H0NWlAR3WSH6EujcCYZmMaQNSsA RxiBPVWWa8zcJmmenYgCRNJWLyyIt8hm1H/4IR5q36cszbwoBHZzPaCwcWVv4MYTSRth RnwkyJJdnvKS5+Pj+G0Re10FUOBe8MuenPHZF2JltBxQGtxDUfQclF6ORuX7TPEyRPZK KxSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701889567; x=1702494367; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=I409PBeV0vI0BSi/sNEko+BMZt4s00+VHGZ9Cxi/uGs=; b=utn+Cb/LBWsgGVP9MDDZYrKfzhvRGTjGXk0WhuPESMl8m9w1Qc94nYdHlVzf7DQzrZ JTHGsBslzP//qG4wa+bnNpBac487q/bZz16/Pf/HuqvgKr+07hJ5Lt8GguEv7uZavw4D Yhk1MF0P0/CkS6zqgBFxomqCyhQUXD1NS5UrEARUP7T6uJoSg6TFmwfmOaTvCJ1uz+qZ q+Sl09JHzFc46rXYLQainlSHmqYIBdXvg5P04PndPGnM313AfiQ+Y/9ADODIaPnUso+0 aAhlnliKjy+hDjhrp65oUfBcwiDD7nxhKpa3GmbesU1og52W/7QRjMh7/yMZpwaDuRyZ rEEw== X-Gm-Message-State: AOJu0YyeoaECLwA7a4YsQRQ3feOB5cx/id9xd5OkySs2vJtXFY2Txanl KFCLgb6SCLvBvrHciGUYuk9EFhI= X-Google-Smtp-Source: AGHT+IEqS0IGgMlSKmdLV1hbJusVKwZ8hKL8byL5eF/gGsSa+b+3KizaW4xrrV9KIs8i0M+jEvdFFF0= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a65:67d2:0:b0:5c6:a4e5:2d6a with SMTP id b18-20020a6567d2000000b005c6a4e52d6amr14804pgs.7.1701889567399; Wed, 06 Dec 2023 11:06:07 -0800 (PST) Date: Wed, 6 Dec 2023 11:06:05 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231203165129.1740512-1-yoong.siang.song@intel.com> <20231203165129.1740512-3-yoong.siang.song@intel.com> <43b01013-e78b-417e-b169-91909c7309b1@kernel.org> <656de830e8d70_2e983e294ca@willemb.c.googlers.com.notmuch> <5a0faf8cc9ec3ab0d5082c66b909c582c8f1eae6.camel@siemens.com> <656f66023f7bd_3dd6422942a@willemb.c.googlers.com.notmuch> Message-ID: Subject: Re: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC From: Stanislav Fomichev To: Magnus Karlsson Cc: Willem de Bruijn , Florian Bezdeka , yoong.siang.song@intel.com, Jesper Dangaard Brouer , davem@davemloft.net, Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Bjorn Topel , magnus.karlsson@intel.com, maciej.fijalkowski@intel.com, Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Lorenzo Bianconi , Tariq Toukan , Willem de Bruijn , Maxime Coquelin , Andrii Nakryiko , Mykola Lysenko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Hao Luo , Jiri Olsa , Shuah Khan , Alexandre Torgue , Jose Abreu , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "bpf@vger.kernel.org" , "xdp-hints@xdp-project.net" , "linux-stm32@st-md-mailman.stormreply.com" , "linux-arm-kernel@lists.infradead.org" , "linux-kselftest@vger.kernel.org" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On 12/06, Magnus Karlsson wrote: > On Tue, 5 Dec 2023 at 20:39, Stanislav Fomichev wrote: > > > > On 12/05, Willem de Bruijn wrote: > > > Stanislav Fomichev wrote: > > > > On Tue, Dec 5, 2023 at 7:34=E2=80=AFAM Florian Bezdeka > > > > wrote: > > > > > > > > > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote: > > > > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote: > > > > > > > Jesper Dangaard Brouer wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote: > > > > > > > > > This patch enables Launch Time (Time-Based Scheduling) su= pport to XDP zero > > > > > > > > > copy via XDP Tx metadata framework. > > > > > > > > > > > > > > > > > > Signed-off-by: Song Yoong Siang > > > > > > > > > --- > > > > > > > > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 = ++ > > > > > > > > > > > > > > > > As requested before, I think we need to see another driver = implementing > > > > > > > > this. > > > > > > > > > > > > > > > > I propose driver igc and chip i225. > > > > > > > > > > > > Sure. I will include igc patches in next version. > > > > > > > > > > > > > > > > > > > > > > The interesting thing for me is to see how the LaunchTime m= ax 1 second > > > > > > > > into the future[1] is handled code wise. One suggestion is = to add a > > > > > > > > section to Documentation/networking/xsk-tx-metadata.rst per= driver that > > > > > > > > mentions/documents these different hardware limitations. I= t is natural > > > > > > > > that different types of hardware have limitations. This is= a close-to > > > > > > > > hardware-level abstraction/API, and IMHO as long as we docu= ment the > > > > > > > > limitations we can expose this API without too many limitat= ions for more > > > > > > > > capable hardware. > > > > > > > > > > > > Sure. I will try to add hardware limitations in documentation. > > > > > > > > > > > > > > > > > > > > I would assume that the kfunc will fail when a value is passe= d that > > > > > > > cannot be programmed. > > > > > > > > > > > > > > > > > > > In current design, the xsk_tx_metadata_request() dint got retur= n value. > > > > > > So user won't know if their request is fail. > > > > > > It is complex to inform user which request is failing. > > > > > > Therefore, IMHO, it is good that we let driver handle the error= silently. > > > > > > > > > > > > > > > > If the programmed value is invalid, the packet will be "dropped" = / will > > > > > never make it to the wire, right? > > > > > > Programmable behavior is to either drop or cap to some boundary > > > value, such as the farthest programmable time in the future: the > > > horizon. In fq: > > > > > > /* Check if packet timestamp is too far in the future= . */ > > > if (fq_packet_beyond_horizon(skb, q, now)) { > > > if (q->horizon_drop) { > > > q->stat_horizon_drops++; > > > return qdisc_drop(skb, sch, t= o_free); > > > } > > > q->stat_horizon_caps++; > > > skb->tstamp =3D now + q->horizon; > > > } > > > fq_skb_cb(skb)->time_to_send =3D skb->tstamp; > > > > > > Drop is the more obviously correct mode. > > > > > > Programming with a clock source that the driver does not support will > > > then be a persistent failure. > > > > > > Preferably, this driver capability can be queried beforehand (rather > > > than only through reading error counters afterwards). > > > > > > Perhaps it should not be a driver task to convert from possibly > > > multiple clock sources to the device native clock. Right now, we do > > > use per-device timecounters for this, implemented in the driver. > > > > > > As for which clocks are relevant. For PTP, I suppose the device PHC, > > > converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC. > > > > Do we need to expose some generic netdev netlink apis to query/adjust > > nic clock sources (or maybe there is something existing already)? > > Then the userspace can be responsible for syncing/converting the > > timestamps to the internal nic clocks. +1 to trying to avoid doing > > this in the drivers. > > > > > > > That is clearly a situation that the user should be informed abou= t. For > > > > > RT systems this normally means that something is really wrong reg= arding > > > > > timing / cycle overflow. Such systems have to react on that situa= tion. > > > > > > > > In general, af_xdp is a bit lacking in this 'notify the user that t= hey > > > > somehow messed up' area :-( > > > > For example, pushing a tx descriptor with a wrong addr/len in zc mo= de > > > > will not give any visible signal back (besides driver potentially > > > > spilling something into dmesg as it was in the mlx case). > > > > We can probably start with having some counters for these events? > > > > > > This is because the AF_XDP completion queue descriptor format is only > > > a u64 address? > > > > Yeah. XDP_COPY mode has the descriptor validation which is exported via > > recvmsg errno, but zerocopy path seems to be too deep in the stack > > to report something back. And there is no place, as you mention, > > in the completion ring to report the status. > > > > > Could error conditions be reported on tx completion in the metadata, > > > using xsk_tx_metadata_complete? > > > > That would be one way to do it, yes. But then the error reporting depen= ds > > on the metadata opt-in. Having a separate ring to export the errors, > > or having a v2 tx-completions layout with extra 'status' field would al= so > > work. >=20 > There are error counters for the non-metadata and offloading cases > above that can be retrieved with the XDP_STATISTICS getsockopt(). From > if_xdp.h: >=20 > struct xdp_statistics { > __u64 rx_dropped; /* Dropped for other reasons */ > __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */ > __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */ > __u64 rx_ring_full; /* Dropped due to rx ring being full */ > __u64 rx_fill_ring_empty_descs; /* Failed to retrieve item > from fill ring */ > __u64 tx_ring_empty_descs; /* Failed to retrieve item from tx rin= g */ > }; >=20 > Albeit, these are aggregate statistics and do not say anything about > which packet that caused it. Works well for things that are > programming bugs that should not occur (such as rx_invalid_descs and > tx_invalid_descs) and requires the programmer to debug and fix his or > her program, but it does not work for requests that might fail even > though the program is correct and need to be handled on a packet by > packet basis. So something needs to be added for that as you both say. >=20 > Would prefer if we could avoid a v2 completion descriptor format or > another ring that needs to be checked all the time, so if we could > live with providing the error status in the metadata field of the > packet at completion time, that would be good. Though having the error > status in the completion ring would be faster as that cache line is > hot, while the metadata section of the packet is likely not at > completion time. So that speaks for a v2 completion ring format. Just > thinking out loud here. In this case, maybe adding tx_over_horizon_dropped to XDP_STATISTICS is all we need here? We can have some new api to query this horizon per netdev.