From mboxrd@z Thu Jan  1 00:00:00 1970
From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Subject: Re: [RFC] lib/librte_ether: consistent PMD batching
 behavior
Date: Mon, 23 Jan 2017 17:36:08 +0100
Message-ID: <20170123163607.GB3779@6wind.com>
References: <1484905876-60165-1-git-send-email-zhiyong.yang@intel.com>
 <d1ebbed6-61ea-da60-4e52-cc0b59b98163@solarflare.com>
 <2601191342CEEE43887BDE71AB9772583F108924@irsmsx105.ger.corp.intel.com>
 <2601191342CEEE43887BDE71AB9772583F108959@irsmsx105.ger.corp.intel.com>
 <20170120114822.GA106360@bricha3-MOBL3.ger.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
 Andrew Rybchenko <arybchenko@solarflare.com>,
 "Yang, Zhiyong" <zhiyong.yang@intel.com>, "dev@dpdk.org" <dev@dpdk.org>,
 "thomas.monjalon@6wind.com" <thomas.monjalon@6wind.com>
To: Bruce Richardson <bruce.richardson@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50])
 by dpdk.org (Postfix) with ESMTP id 61E9B2A5B
 for <dev@dpdk.org>; Mon, 23 Jan 2017 17:36:16 +0100 (CET)
Received: by mail-wm0-f50.google.com with SMTP id c206so164555576wme.0
 for <dev@dpdk.org>; Mon, 23 Jan 2017 08:36:16 -0800 (PST)
Content-Disposition: inline
In-Reply-To: <20170120114822.GA106360@bricha3-MOBL3.ger.corp.intel.com>
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Fri, Jan 20, 2017 at 11:48:22AM +0000, Bruce Richardson wrote:
> On Fri, Jan 20, 2017 at 11:24:40AM +0000, Ananyev, Konstantin wrote:
> > > 
> > > From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]
> > > Sent: Friday, January 20, 2017 10:26 AM
> > > To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org
> > > Cc: thomas.monjalon@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> > > <konstantin.ananyev@intel.com>
> > > Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
> > > 
> > > On 01/20/2017 12:51 PM, Zhiyong Yang wrote:
> > > The rte_eth_tx_burst() function in the file Rte_ethdev.h is invoked to
> > > transmit output packets on the output queue for DPDK applications as
> > > follows.
> > > 
> > > static inline uint16_t
> > > rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
> > >                  struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
> > > 
> > > Note: The fourth parameter nb_pkts: The number of packets to transmit.
> > > The rte_eth_tx_burst() function returns the number of packets it actually
> > > sent. The return value equal to *nb_pkts* means that all packets have been
> > > sent, and this is likely to signify that other output packets could be
> > > immediately transmitted again. Applications that implement a "send as many
> > > packets to transmit as possible" policy can check this specific case and
> > > keep invoking the rte_eth_tx_burst() function until a value less than
> > > *nb_pkts* is returned.
> > > 
> > > When you call TX only once in rte_eth_tx_burst, you may get different
> > > behaviors from different PMDs. One problem that every DPDK user has to
> > > face is that they need to take the policy into consideration at the app-
> > > lication level when using any specific PMD to send the packets whether or
> > > not it is necessary, which brings usage complexities and makes DPDK users
> > > easily confused since they have to learn the details on TX function limit
> > > of specific PMDs and have to handle the different return value: the number
> > > of packets transmitted successfully for various PMDs. Some PMDs Tx func-
> > > tions have a limit of sending at most 32 packets for every invoking, some
> > > PMDs have another limit of at most 64 packets once, another ones have imp-
> > > lemented to send as many packets to transmit as possible, etc. This will
> > > easily cause wrong usage for DPDK users.
> > > 
> > > This patch proposes to implement the above policy in DPDK lib in order to
> > > simplify the application implementation and avoid the incorrect invoking
> > > as well. So, DPDK Users don't need to consider the implementation policy
> > > and to write duplicated code at the application level again when sending
> > > packets. In addition to it, the users don't need to know the difference of
> > > specific PMD TX and can transmit the arbitrary number of packets as they
> > > expect when invoking TX API rte_eth_tx_burst, then check the return value
> > > to get the number of packets actually sent.
> > > 
> > > How to implement the policy in DPDK lib? Two solutions are proposed below.
> > > 
> > > Solution 1:
> > > Implement the wrapper functions to remove some limits for each specific
> > > PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple do like that.
> > > 
> > > > IMHO, the solution is a bit better since it:
> > > > 1. Does not affect other PMDs at all
> > > > 2. Could be a bit faster for the PMDs which require it since has no indirect
> > > >    function call on each iteration
> > > > 3. No ABI change
> > 
> > I also would prefer solution number 1 for the reasons outlined by Andrew above.
> > Also, IMO current limitation for number of packets to TX in some Intel PMD TX routines
> > are sort of artificial:
> > - they are not caused by any real HW limitations
> > - avoiding them at PMD level shouldn't cause any performance or functional degradation.
> > So I don't see any good reason why instead of fixing these limitations in
> > our own PMDs we are trying to push them to the upper (rte_ethdev) layer.

For what it's worth, I agree with Konstantin. Wrappers should be as thin as
possible on top of PMD functions, they are not helpers. We could define a
set of higher level functions for this purpose though.

In the meantime, exposing and documenting PMD limitations seems safe enough.

We could assert that RX/TX burst requests larger than the size of the target
queue are unlikely to be fully met (i.e. PMDs usually do not check for
completions in the middle of a TX burst).

> > Konstantin
> > 
> The main advantage I see is that it should make it a bit easier for
> driver writers, since they have a tighter set of constraints to work
> with for packet RX and Tx. The routines only have to handle requests for
> packets in the range 0-N, rather than not having an upper bound on the
> request. It also then saves code duplicating with having multiple
> drivers having the same outer-loop code for handling arbitrarily large
> requests.
> 
> No big deal to me either way though.
> 
> /Bruce

Right but there is a cost in doing so, as unlikely() as the additional code
is. We should leave that choice to applications.

-- 
Adrien Mazarguil
6WIND