From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Re: RFC: i40e xmit path HW limitation Date: Thu, 30 Jul 2015 09:17:53 -0700 Message-ID: <20150730091753.1af6cc67@urahara> References: <55BA3B5D.4020402@cloudius-systems.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" To: Vlad Zolotarov Return-path: Received: from mail-pd0-f180.google.com (mail-pd0-f180.google.com [209.85.192.180]) by dpdk.org (Postfix) with ESMTP id 2240C569D for ; Thu, 30 Jul 2015 18:17:46 +0200 (CEST) Received: by pdrg1 with SMTP id g1so26897383pdr.2 for ; Thu, 30 Jul 2015 09:17:45 -0700 (PDT) In-Reply-To: <55BA3B5D.4020402@cloudius-systems.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Thu, 30 Jul 2015 17:57:33 +0300 Vlad Zolotarov wrote: > Hi, Konstantin, Helin, > there is a documented limitation of xl710 controllers (i40e driver)=20 > which is not handled in any way by a DPDK driver. > From the datasheet chapter 8.4.1: >=20 > "=E2=80=A2 A single transmit packet may span up to 8 buffers (up to 8 dat= a descriptors per packet including > both the header and payload buffers). > =E2=80=A2 The total number of data descriptors for the whole TSO (explain= ed later on in this chapter) is > unlimited as long as each segment within the TSO obeys the previous rule = (up to 8 data descriptors > per segment for both the TSO header and the segment payload buffers)." >=20 > This means that, for instance, long cluster with small fragments has to=20 > be linearized before it may be placed on the HW ring. > In more standard environments like Linux or FreeBSD drivers the solution= =20 > is straight forward - call skb_linearize()/m_collapse() corresponding. > In the non-conformist environment like DPDK life is not that easy -=20 > there is no easy way to collapse the cluster into a linear buffer from=20 > inside the device driver > since device driver doesn't allocate memory in a fast path and utilizes=20 > the user allocated pools only. >=20 > Here are two proposals for a solution: >=20 > 1. We may provide a callback that would return a user TRUE if a give > cluster has to be linearized and it should always be called before > rte_eth_tx_burst(). Alternatively it may be called from inside the > rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some > error code for a case when one of the clusters it's given has to be > linearized. > 2. Another option is to allocate a mempool in the driver with the > elements consuming a single page each (standard 2KB buffers would > do). Number of elements in the pool should be as Tx ring length > multiplied by "64KB/(linear data length of the buffer in the pool > above)". Here I use 64KB as a maximum packet length and not taking > into an account esoteric things like "Giant" TSO mentioned in the > spec above. Then we may actually go and linearize the cluster if > needed on top of the buffers from the pool above, post the buffer > from the mempool above on the HW ring, link the original cluster to > that new cluster (using the private data) and release it when the > send is done. Or just silently drop heavily scattered packets (and increment oerrors) with a PMD_TX_LOG debug message. I think a DPDK driver doesn't have to accept all possible mbufs and do extra work. It seems reasonable to expect caller to be well behaved in this restricted ecosystem.