From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
Andrew Rybchenko <arybchenko@solarflare.com>,
"Yang, Zhiyong" <zhiyong.yang@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"thomas.monjalon@6wind.com" <thomas.monjalon@6wind.com>
Subject: Re: [RFC] lib/librte_ether: consistent PMD batching behavior
Date: Mon, 23 Jan 2017 17:36:08 +0100 [thread overview]
Message-ID: <20170123163607.GB3779@6wind.com> (raw)
In-Reply-To: <20170120114822.GA106360@bricha3-MOBL3.ger.corp.intel.com>
On Fri, Jan 20, 2017 at 11:48:22AM +0000, Bruce Richardson wrote:
> On Fri, Jan 20, 2017 at 11:24:40AM +0000, Ananyev, Konstantin wrote:
> > >
> > > From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]
> > > Sent: Friday, January 20, 2017 10:26 AM
> > > To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org
> > > Cc: thomas.monjalon@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> > > <konstantin.ananyev@intel.com>
> > > Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
> > >
> > > On 01/20/2017 12:51 PM, Zhiyong Yang wrote:
> > > The rte_eth_tx_burst() function in the file Rte_ethdev.h is invoked to
> > > transmit output packets on the output queue for DPDK applications as
> > > follows.
> > >
> > > static inline uint16_t
> > > rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
> > > struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
> > >
> > > Note: The fourth parameter nb_pkts: The number of packets to transmit.
> > > The rte_eth_tx_burst() function returns the number of packets it actually
> > > sent. The return value equal to *nb_pkts* means that all packets have been
> > > sent, and this is likely to signify that other output packets could be
> > > immediately transmitted again. Applications that implement a "send as many
> > > packets to transmit as possible" policy can check this specific case and
> > > keep invoking the rte_eth_tx_burst() function until a value less than
> > > *nb_pkts* is returned.
> > >
> > > When you call TX only once in rte_eth_tx_burst, you may get different
> > > behaviors from different PMDs. One problem that every DPDK user has to
> > > face is that they need to take the policy into consideration at the app-
> > > lication level when using any specific PMD to send the packets whether or
> > > not it is necessary, which brings usage complexities and makes DPDK users
> > > easily confused since they have to learn the details on TX function limit
> > > of specific PMDs and have to handle the different return value: the number
> > > of packets transmitted successfully for various PMDs. Some PMDs Tx func-
> > > tions have a limit of sending at most 32 packets for every invoking, some
> > > PMDs have another limit of at most 64 packets once, another ones have imp-
> > > lemented to send as many packets to transmit as possible, etc. This will
> > > easily cause wrong usage for DPDK users.
> > >
> > > This patch proposes to implement the above policy in DPDK lib in order to
> > > simplify the application implementation and avoid the incorrect invoking
> > > as well. So, DPDK Users don't need to consider the implementation policy
> > > and to write duplicated code at the application level again when sending
> > > packets. In addition to it, the users don't need to know the difference of
> > > specific PMD TX and can transmit the arbitrary number of packets as they
> > > expect when invoking TX API rte_eth_tx_burst, then check the return value
> > > to get the number of packets actually sent.
> > >
> > > How to implement the policy in DPDK lib? Two solutions are proposed below.
> > >
> > > Solution 1:
> > > Implement the wrapper functions to remove some limits for each specific
> > > PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple do like that.
> > >
> > > > IMHO, the solution is a bit better since it:
> > > > 1. Does not affect other PMDs at all
> > > > 2. Could be a bit faster for the PMDs which require it since has no indirect
> > > > function call on each iteration
> > > > 3. No ABI change
> >
> > I also would prefer solution number 1 for the reasons outlined by Andrew above.
> > Also, IMO current limitation for number of packets to TX in some Intel PMD TX routines
> > are sort of artificial:
> > - they are not caused by any real HW limitations
> > - avoiding them at PMD level shouldn't cause any performance or functional degradation.
> > So I don't see any good reason why instead of fixing these limitations in
> > our own PMDs we are trying to push them to the upper (rte_ethdev) layer.
For what it's worth, I agree with Konstantin. Wrappers should be as thin as
possible on top of PMD functions, they are not helpers. We could define a
set of higher level functions for this purpose though.
In the meantime, exposing and documenting PMD limitations seems safe enough.
We could assert that RX/TX burst requests larger than the size of the target
queue are unlikely to be fully met (i.e. PMDs usually do not check for
completions in the middle of a TX burst).
> > Konstantin
> >
> The main advantage I see is that it should make it a bit easier for
> driver writers, since they have a tighter set of constraints to work
> with for packet RX and Tx. The routines only have to handle requests for
> packets in the range 0-N, rather than not having an upper bound on the
> request. It also then saves code duplicating with having multiple
> drivers having the same outer-loop code for handling arbitrarily large
> requests.
>
> No big deal to me either way though.
>
> /Bruce
Right but there is a cost in doing so, as unlikely() as the additional code
is. We should leave that choice to applications.
--
Adrien Mazarguil
6WIND
next prev parent reply other threads:[~2017-01-23 16:36 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-20 9:51 [RFC] lib/librte_ether: consistent PMD batching behavior Zhiyong Yang
2017-01-20 10:26 ` Andrew Rybchenko
[not found] ` <2601191342CEEE43887BDE71AB9772583F108924@irsmsx105.ger.corp.intel.com>
2017-01-20 11:24 ` Ananyev, Konstantin
2017-01-20 11:48 ` Bruce Richardson
2017-01-23 16:36 ` Adrien Mazarguil [this message]
2017-02-07 7:50 ` Yang, Zhiyong
2017-01-21 4:07 ` Yang, Zhiyong
2017-01-21 4:13 ` Yang, Zhiyong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170123163607.GB3779@6wind.com \
--to=adrien.mazarguil@6wind.com \
--cc=arybchenko@solarflare.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=konstantin.ananyev@intel.com \
--cc=thomas.monjalon@6wind.com \
--cc=zhiyong.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.