From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: RFC: i40e xmit path HW limitation Date: Thu, 30 Jul 2015 19:20:22 +0300 Message-ID: <55BA4EC6.3030301@cloudius-systems.com> References: <55BA3B5D.4020402@cloudius-systems.com> <20150730091753.1af6cc67@urahara> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" To: Stephen Hemminger , Vlad Zolotarov Return-path: Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by dpdk.org (Postfix) with ESMTP id F2F125A38 for ; Thu, 30 Jul 2015 18:20:23 +0200 (CEST) Received: by wibud3 with SMTP id ud3so27572900wib.1 for ; Thu, 30 Jul 2015 09:20:23 -0700 (PDT) In-Reply-To: <20150730091753.1af6cc67@urahara> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 07/30/2015 07:17 PM, Stephen Hemminger wrote: > On Thu, 30 Jul 2015 17:57:33 +0300 > Vlad Zolotarov wrote: > >> Hi, Konstantin, Helin, >> there is a documented limitation of xl710 controllers (i40e driver) >> which is not handled in any way by a DPDK driver. >> From the datasheet chapter 8.4.1: >> >> "=E2=80=A2 A single transmit packet may span up to 8 buffers (up to 8 = data descriptors per packet including >> both the header and payload buffers). >> =E2=80=A2 The total number of data descriptors for the whole TSO (expl= ained later on in this chapter) is >> unlimited as long as each segment within the TSO obeys the previous ru= le (up to 8 data descriptors >> per segment for both the TSO header and the segment payload buffers)." >> >> This means that, for instance, long cluster with small fragments has t= o >> be linearized before it may be placed on the HW ring. >> In more standard environments like Linux or FreeBSD drivers the soluti= on >> is straight forward - call skb_linearize()/m_collapse() corresponding. >> In the non-conformist environment like DPDK life is not that easy - >> there is no easy way to collapse the cluster into a linear buffer from >> inside the device driver >> since device driver doesn't allocate memory in a fast path and utilize= s >> the user allocated pools only. >> >> Here are two proposals for a solution: >> >> 1. We may provide a callback that would return a user TRUE if a give >> cluster has to be linearized and it should always be called befor= e >> rte_eth_tx_burst(). Alternatively it may be called from inside th= e >> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return so= me >> error code for a case when one of the clusters it's given has to = be >> linearized. >> 2. Another option is to allocate a mempool in the driver with the >> elements consuming a single page each (standard 2KB buffers would >> do). Number of elements in the pool should be as Tx ring length >> multiplied by "64KB/(linear data length of the buffer in the pool >> above)". Here I use 64KB as a maximum packet length and not takin= g >> into an account esoteric things like "Giant" TSO mentioned in the >> spec above. Then we may actually go and linearize the cluster if >> needed on top of the buffers from the pool above, post the buffer >> from the mempool above on the HW ring, link the original cluster = to >> that new cluster (using the private data) and release it when the >> send is done. > Or just silently drop heavily scattered packets (and increment oerrors) > with a PMD_TX_LOG debug message. > > I think a DPDK driver doesn't have to accept all possible mbufs and do > extra work. It seems reasonable to expect caller to be well behaved > in this restricted ecosystem. > How can the caller know what's well behaved? It's device dependent.