From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marc Sune Subject: Re: [RFC PATCH 0/4] pktdev Date: Mon, 20 Apr 2015 08:51:26 +0200 Message-ID: <5534A1EE.4060905@bisdn.de> References: <1428954274-26944-1-git-send-email-keith.wiles@intel.com> <1429283804-28087-1-git-send-email-bruce.richardson@intel.com> <553155C7.3000106@bisdn.de> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable To: "Wiles, Keith" , "dev-VfR2kkLFssw@public.gmane.org" Return-path: In-Reply-To: List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org Sender: "dev" On 17/04/15 21:50, Wiles, Keith wrote: > Hi Marc and Bruce, Hi Keith, Bruce, > > On 4/17/15, 1:49 PM, "Marc Sune" wrote: > >> >> On 17/04/15 17:16, Bruce Richardson wrote: >>> Hi all, >>> >>> to continue this discussion a bit more, here is my, slightly differen= t, >>> slant >>> on what a pktdev abstraction may look like. >>> >>> The primary objective I had in mind when drafting this is to provide = the >>> minimal abstraction that can be *easily* used as a common device >>> abstraction for >>> existing (and future) device types to be passed to dataplane code. Th= e >>> patchset >>> demonstrates this by defining a minimal interface for pktdev - since = I >>> firmly >>> believe the interface should be as small as possible - and then showi= ng >>> how that >>> common interface can be used to unify rings and ethdevs under a commo= n >>> API for the >>> datapath. I believe any attempt to unify things much beyond this to t= he >>> control >>> plane or setup phase is not worth doing - at least not initially - as= at >>> init time the code always needs to be aware of the underlying resourc= e >>> type in >>> order to configure it properly for dataplane use. >>> >>> The overall objective I look to achieve is illustrated by the final >>> patch in >>> the series, which is a sample app where the same code is used for all >>> cores, >>> irrespective of the underlying device type. >>> >>> To get to that point, patch 1 defines the minimal API - just RX and T= X. >>> The .c >>> file in the library is empty for simplicity, though I would see some >>> functionality moving there when/if it makes sense e.g. the callback >>> support >>> from ethdev, as is done in Keith's patchset. >>> Patch 2 then makes very minimal changes to ethdev to allow ethdevs to >>> be used >>> as pktdevs, and to make use of the pktdev functions when appropriate >>> Patch 3 was, for me, the key test for this implementation - how hard >>> was it to >>> make an rte_ring usable as a pktdev too. Two single-line functions fo= r >>> RX/TX >>> and a separate "converter" function proved to be all that was necessa= ry >>> here - >>> and I believe simpler solutions may be possible too, as the extra >>> structures >>> allocated on conversion could be merged into the rte_ring structure >>> itself and >>> initialized on ring creation if we prefer that option. It is >>> hoped/presumed that >>> wrapping other structures, such as KNI, may prove to be just as easil= y >>> done. >>> [Not attempted yet - left as an exercise for the reader :-)]. >>> >>> Now, in terms of pktdev vs ethdev, there is nothing in this proposal >>> that >>> cannot also be done using ethdev AFAIK. However, pktdev as outlined h= ere >>> should make the process far easier than trying to create a full PMD f= or >>> something. >>> All NIC specific functions, including things like stop/start, are >>> stripped out, >>> as they don't make sense for an rte_ring or other software objects. >>> Also, the other thing this provides is that we can move away from jus= t >>> using >>> port ids. Instead in the same way as we now reference >>> rings/mempools/KNIs etc >>> via pointer, we can do the same with ethernet ports as pktdevs on the >>> data path. >>> There was discussion previously on moving beyond 8-bit port ids. If w= e >>> look to >>> use ethdev as a common abstraction, I feel that change will soon have >>> to be made >>> causing a large amount of code churn. >> Hi Richard, >> >> First thank you both for taking the time to look at this. I did not no= t >> reply to Keith because you Richard summarized most of my concerns. >> >> I had a brief look to this second proposal. It is more aligned to what= I >> had in mind. But still I feel it is slightly too complicated. I don't >> like much the necessary (in your approach) MACRO-like pkt_dev_data >> struct. It is also slightly inconvenient that the user has to do: >> >> + struct rte_pkt_dev *in =3D rte_eth_get_dev(0); >> >> + struct rte_pkt_dev *out =3D rte_ring_get_dev( >> + rte_ring_create(name, 4096, rte_socket_id(), 0)); >> >> >> >> What about something like (~pseudo-code): >> >> rte_pkt_dev_data.h: >> >> enum rte_pkt_dev_type{ >> RTE_PKT_DEV_ETH, >> RTE_PKT_DEV_RING, >> RTE_PKT_DEV_KNI, >> //Keep adding as more PMDs are supported >> }; >> >> >> //This struct may be redundant if there is nothing more >> struct rte_pkt_dev_data{ >> enum rte_pkt_dev_type; >> //Placeholder, maybe we need more... >> }; >> >> //Make RX/TX pktdev APIs more readable, but not really needed >> typedef void pkt_dev_t; >> >> (In all PMDs and e.g. KNI and RINGs): >> >> struct rte_eth_dev { >> struct rte_pkt_dev_data pkt_dev;// >> <++++++++++++++++++++++++++++++++++ >> eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function= . */ >> eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit functio= n. >> */ >> struct rte_eth_dev_data *data; /**< Pointer to device data */ >> /... >> >> rte_pkt_dev.h: >> >> #include >> //Include PMD (and non-PMD) TX/RX headers... >> >> static inline uint16_t >> rte_pkt_tx_burst(pkt_dev_t* dev, uint16_t queue_id, >> struct rte_mbuf **tx_pkts, uint16_t nb_pkts) >> { >> switch (((struct rte_pkt_dev_data*)dev)->type){ >> case RTE_PKT_DEV_ETH: >> struct rte_eth_dev* eth_dev =3D (struct >> rte_eth_dev*)pkt_dev; >> rte_pkt_tx_burst(eth_dev, queue_id, tx_pkts, nb_pkts)= ; >> break; >> case RTE_PKT_DEV_RING: >> //... >> } >> } >> >> //... > I do not see the functional different from my proposal other then a bit > more search/replace, which does effect some of the PMD=B9s. The changes= to > the PMD is just to use the macros as we now have a nested structure. I think what I am proposing is different. See below. > > The nesting and movement some structures and code to a common location = is > to provide a better way to add other devices. Moving code into a common > set of structures and APIs should only improve the code as we get more > devices added to DPDK. The basic flow and design is the same. In your proposal you are movig TX/RX queues as well as some shared stuff=20 and, most importantly, you are using function pointers to execute the=20 RX/TX routines. What I was proposing is to try to add the minimum common shared state in=20 order to properly demultiplex the RX/TX call and have a common set of=20 abstract calls (the pkt_dev type). In a way, I was proposing to=20 deliberately not have a shared struct rte_dev_data because I think the=20 internals of the "pkt_dev" can be very different across devices (e.g.=20 queues in kni vs eth port vs. crypto?). I treat the pkt_dev as a "black=20 box" that conforms to TX/RX API, leaving the developer of that device to=20 define its internal structures as it better suites the needs. I only use=20 each of the specific device type TX/RX APIs (external to us, pkt_dev=20 library) in rte_pkt_dev.h. This also simplifies the refactor required to=20 eventually integrate the rte_pkt_dev library and builds it "on top" of=20 the existing APIs. The other important difference with both, Bruce and your approach, and=20 mine is the use of function pointers for RX/TX. I don't use them, which=20 makes the entire abstracted TX/RX (including the final RX/TX routines=20 itself) functions be "inlinable". Btw, I forgot to add something basic in the previous pseudo-code. The=20 different types have to be conditionally compiled according to=20 compiled-in DPDK libs: rte_pkt_dev.h: #include //Eth devices #ifdef RTE_LIBRTE_ETHER #include #endif //KNI #ifdef RTE_LIBRTE_KNI #include #endif //... //Include PMD (and non-PMD) TX/RX headers... static inline uint16_t rte_pkt_tx_burst(pkt_dev_t* dev, uint16_t queue_id, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { switch (((struct rte_pkt_dev_data*)dev)->type){ #ifdef RTE_LIBRTE_ETHER case RTE_PKT_DEV_ETH: struct rte_eth_dev* eth_dev =3D (struct rte_eth_dev*)pkt= _dev; rte_pkt_tx_burst(eth_dev, queue_id, tx_pkts, nb_pkts); break; #endif #ifdef RTE_LIBRTE_KNI case RTE_PKT_DEV_KNI: //... break; #endif default: //Corrupted type or unsupported (without compiled=20 support) //Ignore or fail(fatal error)? break; } } //... > > From rte_common_device.h (format screwed with cut/paste and not all of= the > code) > > > struct rte_dev_data { > char name[RTE_DEV_NAME_MAX_LEN]; /**< Unique identifier name */ > > void **rx_queues; /**< Array of pointers to RX queues. */ > void **tx_queues; /**< Array of pointers to TX queues. */ > > uint16_t nb_rx_queues; /**< Number of RX queues. */ > uint16_t nb_tx_queues; /**< Number of TX queues. */ > > uint16_t mtu; /**< Maximum Transmission Unit. */ > uint8_t unit_id; /**< Unit/Port ID for this instance */ > uint8_t _pad0; /* alignment filler */ > > void *dev_private; /**< PMD-specific private data */ > uint64_t rx_mbuf_alloc_failed; /**< RX ring mbuf allocation failures.= */ > uint32_t min_rx_buf_size; /**< Common rx buffer size handled by = all > queues */ > }; > > struct rte_dev_info { > struct rte_pci_device *pci_dev; /**< Device PCI information. */ > const char *driver_name; /**< Device Driver name. */ > unsigned int if_index; /**< Index to bound host interface, o= r 0 > if none. */ > /* Use if_indextoname() to translate into an interface name. */ > unsigned int _pad0; > }; > > > > struct rte_dev { > dev_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive > function. */ > dev_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit > function. */ > void *data; /**< Pointer to device data */ > struct rte_dev_global *gbl_data; /**< Pointer to device global da= ta > */ > const struct rte_dev_drv *driver; /**< Driver for this device */ > void *dev_ops; /**< Functions exported by PMD */ > struct rte_pci_device *pci_dev; /**< PCI info. supplied by > probing */ > struct rte_dev_info *dev_info; /**< Pointer to the device info > structure */ > > /** User application callback for interrupts if present */ > struct rte_dev_cb_list link_intr_cbs; > /** > * User-supplied functions called from rx_burst to post-process > * received packets before passing them to the user > */ > struct rte_dev_rxtx_callback **post_rx_burst_cbs; > /** > * User-supplied functions called from tx_burst to pre-process > * received packets before passing them to the driver for transmission. > */ > struct rte_dev_rxtx_callback **pre_tx_burst_cbs; What is the indended use of these callbacks and what is the benefit? If=20 the user is supplying the callback, why not putting it directly after=20 the RX and before the TX call in user's code (anyway)? > enum rte_dev_type dev_type; /**< Flag indicating the device typ= e */ > uint8_t attached; /**< Flag indicating the port= is > attached */ > }; > > From rte_ethdev.h > > struct rte_eth_dev_info { > struct rte_dev_info di; /**< Common device information */ > > /* Rest of structure is for private device data */ > uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */ > uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */ > uint16_t max_rx_queues; /**< Maximum number of RX queues. */ > uint16_t max_tx_queues; /**< Maximum number of TX queues. */ > uint32_t max_mac_addrs; /**< Maximum number of MAC addresses. */ > uint32_t max_hash_mac_addrs; > /** Maximum number of hash MAC addresses for MTA and UTA. */ > uint16_t max_vfs; /**< Maximum number of VFs. */ > uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */ > uint32_t rx_offload_capa; /**< Device RX offload capabilities. */ > uint32_t tx_offload_capa; /**< Device TX offload capabilities. */ > uint16_t reta_size; > /**< Device redirection table size, the total number of entries. */ > /** Bit mask of RSS offloads, the bit offset also means flow type */ > uint64_t flow_type_rss_offloads; > struct rte_eth_rxconf default_rxconf; /**< Default RX configuration */ > struct rte_eth_txconf default_txconf; /**< Default TX configuration */ > uint16_t vmdq_queue_base; /**< First queue ID for VMDQ pools. */ > uint16_t vmdq_queue_num; /**< Queue number for VMDQ pools. */ > uint16_t vmdq_pool_base; /**< First ID of VMDQ pools. */ > }; > > struct rte_eth_dev_data { > struct rte_dev_data dd; /**< Common device data */ > > /* Rest of the structure is private data */ > struct rte_eth_dev_sriov sriov; /**< SRIOV data */ > > struct rte_eth_link dev_link; > /**< Link-level information & status */ > > struct rte_eth_conf dev_conf; /**< Configuration applied to device. = */ > struct ether_addr* mac_addrs; /**< Device Ethernet Link address. */ > uint64_t mac_pool_sel[ETH_NUM_RECEIVE_MAC_ADDR]; > /** bitmap array of associating Ethernet MAC addresses to pools */ > struct ether_addr* hash_mac_addrs; > > /** Device Ethernet MAC addresses of hash filtering. */ > uint8_t promiscuous : 1, /**< RX promiscuous mode ON(1) / OFF(0). *= / > scattered_rx : 1, /**< RX of scattered packets is ON(1) / OFF(0) */ > all_multicast: 1, /**< RX all multicast mode ON(1) / OFF(0). */ > dev_started : 1; /**< Device state: STARTED(1) / STOPPED(0). */ > }; > > In the case of rte_ethdev.h the rte_eth_dev turns into a macro '#define > rte_eth_dev rte_dev', in other devices they may have a rte_xyz_dev > structure rte_dev as the first member and then private members. Also la= ter > if we find something private to ethdev then a structure can be created. > > > Most of the code changes outside of these changes are because using a > macro to fix the code was easier. Rewriting the functions to use real > variables instead of casting void pointer could yield without the macro= s. > Not sure as the performance impact in rewriting some of these functions= to > eliminate the macros. > > This is the heart of what I am proposing and it just so happens to allo= w > Bruce=B9 ideas as well. I have another patch I have not sent that inclu= des > Bruce=B9s ideas using my suggestions. > > Being able to use a single API for all devices does have its benefits I= MO, > but I want to cleanup ethdev to allow for other devices to be added to = the > system. > > At some point (when the cryptodev is done) I want to add it to this > framework, plus add more APIs for crypto similar to Linux Crypto API or > something we can agree is a reasonable set of APIs for DPDK. I currently don't see the benefit of abstracting beyond the RX/TX=20 functions, and I see more cons than pros. Doing so we will also make=20 librte_pkt_dev sort of mandatory, even if you are not using it our=20 abstracted RX/TX functions at all. regards marc > > > Regards, > > ++Keith >> >> Maybe it is oversimplified? With this approach tough, we would barely >> touch the PMDs and the functionality can be built on top of the existi= ng >> PMDs. >> >> Thoughts? >> Marc >> >> >>> Bruce Richardson (4): >>> Add example pktdev implementation >>> Make ethdev explicitly a subclass of pktdev >>> add support for a ring to be a pktdev >>> example app showing pktdevs used in a chain >>> >>> config/common_bsdapp | 5 + >>> config/common_linuxapp | 5 + >>> examples/pktdev/Makefile | 57 +++++++++++ >>> examples/pktdev/basicfwd.c | 222 >>> +++++++++++++++++++++++++++++++++++++++++ >>> lib/Makefile | 1 + >>> lib/librte_ether/rte_ethdev.h | 26 ++--- >>> lib/librte_pktdev/Makefile | 56 +++++++++++ >>> lib/librte_pktdev/rte_pktdev.c | 35 +++++++ >>> lib/librte_pktdev/rte_pktdev.h | 144 ++++++++++++++++++++++++++ >>> lib/librte_ring/rte_ring.c | 41 ++++++++ >>> lib/librte_ring/rte_ring.h | 3 + >>> 11 files changed, 582 insertions(+), 13 deletions(-) >>> create mode 100644 examples/pktdev/Makefile >>> create mode 100644 examples/pktdev/basicfwd.c >>> create mode 100644 lib/librte_pktdev/Makefile >>> create mode 100644 lib/librte_pktdev/rte_pktdev.c >>> create mode 100644 lib/librte_pktdev/rte_pktdev.h >>>