DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] kni: use bulk functions to allocate and free mbufs
From: Ananyev, Konstantin @ 2017-01-11 18:25 UTC (permalink / raw)
  To: Yigit, Ferruh, Stephen Hemminger
  Cc: Sergey Vyazmitinov, olivier.matz@6wind.com, dev@dpdk.org
In-Reply-To: <63819ae1-f056-0ad7-b7dd-041fe1fe08fa@intel.com>



> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Wednesday, January 11, 2017 5:48 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Stephen Hemminger <stephen@networkplumber.org>
> Cc: Sergey Vyazmitinov <s.vyazmitinov@brain4net.com>; olivier.matz@6wind.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] kni: use bulk functions to allocate and free mbufs
> 
> On 1/11/2017 5:43 PM, Ananyev, Konstantin wrote:
> >
> >
> >> -----Original Message-----
> >> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> >> Sent: Wednesday, January 11, 2017 5:36 PM
> >> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> >> Cc: Sergey Vyazmitinov <s.vyazmitinov@brain4net.com>; olivier.matz@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>;
> dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH] kni: use bulk functions to allocate and free mbufs
> >>
> >> On Wed, 11 Jan 2017 17:28:21 +0000
> >> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> >>
> >>>> -----Original Message-----
> >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> >>>> Sent: Wednesday, January 11, 2017 4:18 PM
> >>>> To: Sergey Vyazmitinov <s.vyazmitinov@brain4net.com>
> >>>> Cc: olivier.matz@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org
> >>>> Subject: Re: [dpdk-dev] [PATCH] kni: use bulk functions to allocate and free mbufs
> >>>>
> >>>> On Fri, 30 Dec 2016 04:50:16 +0700
> >>>> Sergey Vyazmitinov <s.vyazmitinov@brain4net.com> wrote:
> >>>>
> >>>>>  /**
> >>>>> + * Free n packets mbuf back into its original mempool.
> >>>>> + *
> >>>>> + * Free each mbuf, and all its segments in case of chained buffers. Each
> >>>>> + * segment is added back into its original mempool.
> >>>>> + *
> >>>>> + * @param mp
> >>>>> + *   The packets mempool.
> >>>>> + * @param mbufs
> >>>>> + *   The packets mbufs array to be freed.
> >>>>> + * @param n
> >>>>> + *   Number of packets.
> >>>>> + */
> >>>>> +static inline void rte_pktmbuf_free_bulk(struct rte_mempool *mp,
> >>>>> +		struct rte_mbuf **mbufs, unsigned n)
> >>>>> +{
> >>>>> +	struct rte_mbuf *mbuf, *m_next;
> >>>>> +	unsigned i;
> >>>>> +	for (i = 0; i < n; ++i) {
> >>>>> +		mbuf = mbufs[i];
> >>>>> +		__rte_mbuf_sanity_check(mbuf, 1);
> >>>>> +
> >>>>> +		mbuf = mbuf->next;
> >>>>> +		while (mbuf != NULL) {
> >>>>> +			m_next = mbuf->next;
> >>>>> +			rte_pktmbuf_free_seg(mbuf);
> >>>>> +			mbuf = m_next;
> >>>>> +		}
> >>>>> +	}
> >>>>> +	rte_mempool_put_bulk(mp, (void * const *)mbufs, n);
> >>>>> +}
> >>>>
> >>>> The mbufs may come from different pools. You need to handle that.
> >>>
> >>> I suppose both stituations are possible:
> >>> 1) user knows off-hand that all mbufs in the group are from the same mempool
> >>> 2) user can't guarantee that all mbufs in the group are from same mempool.
> >>>
> >>> As I understand that patch is for case 1) only.
> >>> For 2) it could be a separate function and separate patch.
> >>>
> >>> Konstantin
> >>>
> >>>
> >>
> >> Please don't make unnecessary assumptions in pursuit of minor optimizations.
> >
> > I don't suggest to make *any* assumptions.
> > What I am saying we  can have 2 functions for two different cases.
> 
> kni_free_mbufs() is static function. Even user knows if all mbufs are
> some same pool or not, can't pass this information to the free function.
> 
> Of course this information can be passed via new API, or as an update to
> exiting API, but I think it is better to update free function to cover
> both cases instead of getting this information from user.

I suppose misunderstanding came from the fact that kni_free_mbufs()
is modified to use rte_pktmbuf_free_bulk(mp, mbufs, n).
I am not talking about kni part of the patch
(to be honest I didn't pay much attention to it).
What I am saying there are many situations when user knows off-hand
that all  mbufs in that group are from the same mempool and such
function will be useful too.
BTW, for my own curiosity, how it could happen with KNI that 
kni_fifo_get() would return mbufs not from kni->pktmbuf_pool
(I am not really familiar with KNI and its use-cases)?
Konstantin

> 
> > Obviously we'll have to document it properly.
> > Konstantin
> >
> >> It is trivial to write a correct free bulk that handles pool changing.
> >> Also the free_seg could be bulked as well.

^ permalink raw reply

* Re: [PATCH] kni: use bulk functions to allocate and free mbufs
From: Ferruh Yigit @ 2017-01-11 18:41 UTC (permalink / raw)
  To: Ananyev, Konstantin, Stephen Hemminger
  Cc: Sergey Vyazmitinov, olivier.matz@6wind.com, dev@dpdk.org
In-Reply-To: <2601191342CEEE43887BDE71AB9772583F104048@irsmsx105.ger.corp.intel.com>

On 1/11/2017 6:25 PM, Ananyev, Konstantin wrote:
> 
> 
>> -----Original Message-----
>> From: Yigit, Ferruh
>> Sent: Wednesday, January 11, 2017 5:48 PM
>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Stephen Hemminger <stephen@networkplumber.org>
>> Cc: Sergey Vyazmitinov <s.vyazmitinov@brain4net.com>; olivier.matz@6wind.com; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] kni: use bulk functions to allocate and free mbufs
>>
>> On 1/11/2017 5:43 PM, Ananyev, Konstantin wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>>>> Sent: Wednesday, January 11, 2017 5:36 PM
>>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
>>>> Cc: Sergey Vyazmitinov <s.vyazmitinov@brain4net.com>; olivier.matz@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>;
>> dev@dpdk.org
>>>> Subject: Re: [dpdk-dev] [PATCH] kni: use bulk functions to allocate and free mbufs
>>>>
>>>> On Wed, 11 Jan 2017 17:28:21 +0000
>>>> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
>>>>
>>>>>> -----Original Message-----
>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
>>>>>> Sent: Wednesday, January 11, 2017 4:18 PM
>>>>>> To: Sergey Vyazmitinov <s.vyazmitinov@brain4net.com>
>>>>>> Cc: olivier.matz@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org
>>>>>> Subject: Re: [dpdk-dev] [PATCH] kni: use bulk functions to allocate and free mbufs
>>>>>>
>>>>>> On Fri, 30 Dec 2016 04:50:16 +0700
>>>>>> Sergey Vyazmitinov <s.vyazmitinov@brain4net.com> wrote:
>>>>>>
>>>>>>>  /**
>>>>>>> + * Free n packets mbuf back into its original mempool.
>>>>>>> + *
>>>>>>> + * Free each mbuf, and all its segments in case of chained buffers. Each
>>>>>>> + * segment is added back into its original mempool.
>>>>>>> + *
>>>>>>> + * @param mp
>>>>>>> + *   The packets mempool.
>>>>>>> + * @param mbufs
>>>>>>> + *   The packets mbufs array to be freed.
>>>>>>> + * @param n
>>>>>>> + *   Number of packets.
>>>>>>> + */
>>>>>>> +static inline void rte_pktmbuf_free_bulk(struct rte_mempool *mp,
>>>>>>> +		struct rte_mbuf **mbufs, unsigned n)
>>>>>>> +{
>>>>>>> +	struct rte_mbuf *mbuf, *m_next;
>>>>>>> +	unsigned i;
>>>>>>> +	for (i = 0; i < n; ++i) {
>>>>>>> +		mbuf = mbufs[i];
>>>>>>> +		__rte_mbuf_sanity_check(mbuf, 1);
>>>>>>> +
>>>>>>> +		mbuf = mbuf->next;
>>>>>>> +		while (mbuf != NULL) {
>>>>>>> +			m_next = mbuf->next;
>>>>>>> +			rte_pktmbuf_free_seg(mbuf);
>>>>>>> +			mbuf = m_next;
>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +	rte_mempool_put_bulk(mp, (void * const *)mbufs, n);
>>>>>>> +}
>>>>>>
>>>>>> The mbufs may come from different pools. You need to handle that.
>>>>>
>>>>> I suppose both stituations are possible:
>>>>> 1) user knows off-hand that all mbufs in the group are from the same mempool
>>>>> 2) user can't guarantee that all mbufs in the group are from same mempool.
>>>>>
>>>>> As I understand that patch is for case 1) only.
>>>>> For 2) it could be a separate function and separate patch.
>>>>>
>>>>> Konstantin
>>>>>
>>>>>
>>>>
>>>> Please don't make unnecessary assumptions in pursuit of minor optimizations.
>>>
>>> I don't suggest to make *any* assumptions.
>>> What I am saying we  can have 2 functions for two different cases.
>>
>> kni_free_mbufs() is static function. Even user knows if all mbufs are
>> some same pool or not, can't pass this information to the free function.
>>
>> Of course this information can be passed via new API, or as an update to
>> exiting API, but I think it is better to update free function to cover
>> both cases instead of getting this information from user.
> 
> I suppose misunderstanding came from the fact that kni_free_mbufs()
> is modified to use rte_pktmbuf_free_bulk(mp, mbufs, n).
> I am not talking about kni part of the patch
> (to be honest I didn't pay much attention to it).
> What I am saying there are many situations when user knows off-hand
> that all  mbufs in that group are from the same mempool and such
> function will be useful too.

> BTW, for my own curiosity, how it could happen with KNI that 
> kni_fifo_get() would return mbufs not from kni->pktmbuf_pool
> (I am not really familiar with KNI and its use-cases)?

It gets packets from free queue:
kni_fifo_get(kni->free_q, ...)

DPDK app may send a mbuf (from any pool, like another port's mempool) to
kernel, kernel puts buf back to kni->free_q when done with it.

> Konstantin
> 
>>
>>> Obviously we'll have to document it properly.
>>> Konstantin
>>>
>>>> It is trivial to write a correct free bulk that handles pool changing.
>>>> Also the free_seg could be bulked as well.
> 

^ permalink raw reply

* Re: [dpdk-users] Pktgen DPDK net bonding mode 4 802.3AD (LACP) not working with BIGIP trunk with LACP enabled
From: Wiles, Keith @ 2017-01-11 18:51 UTC (permalink / raw)
  To: Vincent Li; +Cc: users@dpdk.org, dev@dpdk.org
In-Reply-To: <CAK3+h2xzFvAaWFUrh_LyTgk7Sju_k2oxT_5Yg4ELo81NE3sUEA@mail.gmail.com>


> On Jan 11, 2017, at 12:08 PM, Vincent Li <vincent.mc.li@gmail.com> wrote:
> 
> Hi,
> 
> I have direct two cable connections between Pktgen DPDK box and BIGIP
> 
> this is a problem I am not sure if it Pktgen, or the DPDK net bonding mode
> 4 LACP, or the BIGIP trunk with LACP enabled issue, I am hoping someone
> here have more clue and give me some direction on how to trouble shoot
> this. I am suspecting there might some issue between DPDK net bonding
> 8023ad mode 4 and BIGIP trunk LACP mode. I turned
> on CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and
> defined RTE_LIBRTE_BOND_DEBUG_8023AD
> in drivers/net/bonding/rte_eth_bond_8023ad.c hoping getting more debug
> output in Pktgen, I could not see any debug output for 8023ad protocol.

I am not going to be much help as I do not and have not used BIGIP.

I assume you have the Pktgen that supports bonding mode 4. The previous Pktgen version would not call tx_burst() unless it had something to send, which the bonding driver requires tx_burst() with zero packets at least at a given interval. The latest versions do call tx_burst() with zero pkts at least within the needed timing.

> 
> 
> packet tx/rx works :
> 
> 1 Pktgen with DPDK net bonding mode 1, 2, 3, 5
> 2 BIGIP trunk with LACP disabled (trunk port status UP)
> 
> packet tx/rx failed:
> 
> 1, Pktgen with DPDK net bonding mode 4 802.3AD
> 2, BIGIP trunk with LACP enabled (trunk status is DOWN)
> 
> or
> 1, Pktgen with DPDK net bonding mode 4 802.3AD
> 2, BIGIP trunk with LACP disabled (trunk status is UP)
> 
> my end goal is to achieve 10G < througput < 20G with Pktgen DPDK net
> bonding with Intel 82599 to load test BIGIP, so far with bonding mode 1,
> 2,3, 5, I am unable to achieve throughput > 10G.

Regards,
Keith

^ permalink raw reply

* Re: [PATCH] kni: use bulk functions to allocate and free mbufs
From: Stephen Hemminger @ 2017-01-11 18:56 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Ananyev, Konstantin, Sergey Vyazmitinov, olivier.matz@6wind.com,
	dev@dpdk.org
In-Reply-To: <bf84e7aa-2cf0-9037-b27d-97f6e3d6fc5b@intel.com>

On Wed, 11 Jan 2017 18:41:28 +0000
Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 1/11/2017 6:25 PM, Ananyev, Konstantin wrote:
> > 
> >   
> >> -----Original Message-----
> >> From: Yigit, Ferruh
> >> Sent: Wednesday, January 11, 2017 5:48 PM
> >> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Stephen Hemminger <stephen@networkplumber.org>
> >> Cc: Sergey Vyazmitinov <s.vyazmitinov@brain4net.com>; olivier.matz@6wind.com; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH] kni: use bulk functions to allocate and free mbufs
> >>
> >> On 1/11/2017 5:43 PM, Ananyev, Konstantin wrote:  
> >>>
> >>>  
> >>>> -----Original Message-----
> >>>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> >>>> Sent: Wednesday, January 11, 2017 5:36 PM
> >>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> >>>> Cc: Sergey Vyazmitinov <s.vyazmitinov@brain4net.com>; olivier.matz@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>;  
> >> dev@dpdk.org  
> >>>> Subject: Re: [dpdk-dev] [PATCH] kni: use bulk functions to allocate and free mbufs
> >>>>
> >>>> On Wed, 11 Jan 2017 17:28:21 +0000
> >>>> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> >>>>  
> >>>>>> -----Original Message-----
> >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> >>>>>> Sent: Wednesday, January 11, 2017 4:18 PM
> >>>>>> To: Sergey Vyazmitinov <s.vyazmitinov@brain4net.com>
> >>>>>> Cc: olivier.matz@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org
> >>>>>> Subject: Re: [dpdk-dev] [PATCH] kni: use bulk functions to allocate and free mbufs
> >>>>>>
> >>>>>> On Fri, 30 Dec 2016 04:50:16 +0700
> >>>>>> Sergey Vyazmitinov <s.vyazmitinov@brain4net.com> wrote:
> >>>>>>  
> >>>>>>>  /**
> >>>>>>> + * Free n packets mbuf back into its original mempool.
> >>>>>>> + *
> >>>>>>> + * Free each mbuf, and all its segments in case of chained buffers. Each
> >>>>>>> + * segment is added back into its original mempool.
> >>>>>>> + *
> >>>>>>> + * @param mp
> >>>>>>> + *   The packets mempool.
> >>>>>>> + * @param mbufs
> >>>>>>> + *   The packets mbufs array to be freed.
> >>>>>>> + * @param n
> >>>>>>> + *   Number of packets.
> >>>>>>> + */
> >>>>>>> +static inline void rte_pktmbuf_free_bulk(struct rte_mempool *mp,
> >>>>>>> +		struct rte_mbuf **mbufs, unsigned n)
> >>>>>>> +{
> >>>>>>> +	struct rte_mbuf *mbuf, *m_next;
> >>>>>>> +	unsigned i;
> >>>>>>> +	for (i = 0; i < n; ++i) {
> >>>>>>> +		mbuf = mbufs[i];
> >>>>>>> +		__rte_mbuf_sanity_check(mbuf, 1);
> >>>>>>> +
> >>>>>>> +		mbuf = mbuf->next;
> >>>>>>> +		while (mbuf != NULL) {
> >>>>>>> +			m_next = mbuf->next;
> >>>>>>> +			rte_pktmbuf_free_seg(mbuf);
> >>>>>>> +			mbuf = m_next;
> >>>>>>> +		}
> >>>>>>> +	}
> >>>>>>> +	rte_mempool_put_bulk(mp, (void * const *)mbufs, n);
> >>>>>>> +}  
> >>>>>>
> >>>>>> The mbufs may come from different pools. You need to handle that.  
> >>>>>
> >>>>> I suppose both stituations are possible:
> >>>>> 1) user knows off-hand that all mbufs in the group are from the same mempool
> >>>>> 2) user can't guarantee that all mbufs in the group are from same mempool.
> >>>>>
> >>>>> As I understand that patch is for case 1) only.
> >>>>> For 2) it could be a separate function and separate patch.
> >>>>>
> >>>>> Konstantin
> >>>>>
> >>>>>  
> >>>>
> >>>> Please don't make unnecessary assumptions in pursuit of minor optimizations.  
> >>>
> >>> I don't suggest to make *any* assumptions.
> >>> What I am saying we  can have 2 functions for two different cases.  
> >>
> >> kni_free_mbufs() is static function. Even user knows if all mbufs are
> >> some same pool or not, can't pass this information to the free function.
> >>
> >> Of course this information can be passed via new API, or as an update to
> >> exiting API, but I think it is better to update free function to cover
> >> both cases instead of getting this information from user.  
> > 
> > I suppose misunderstanding came from the fact that kni_free_mbufs()
> > is modified to use rte_pktmbuf_free_bulk(mp, mbufs, n).
> > I am not talking about kni part of the patch
> > (to be honest I didn't pay much attention to it).
> > What I am saying there are many situations when user knows off-hand
> > that all  mbufs in that group are from the same mempool and such
> > function will be useful too.  
> 
> > BTW, for my own curiosity, how it could happen with KNI that 
> > kni_fifo_get() would return mbufs not from kni->pktmbuf_pool
> > (I am not really familiar with KNI and its use-cases)?  
> 
> It gets packets from free queue:
> kni_fifo_get(kni->free_q, ...)
> 
> DPDK app may send a mbuf (from any pool, like another port's mempool) to
> kernel, kernel puts buf back to kni->free_q when done with it.
> 
> > Konstantin
> >   
> >>  
> >>> Obviously we'll have to document it properly.
> >>> Konstantin
> >>>  
> >>>> It is trivial to write a correct free bulk that handles pool changing.
> >>>> Also the free_seg could be bulked as well.  
> >   
> 

Please write generic code. Something like the following (untested).

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 4476d75379fd..b7a743ec5c87 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -306,6 +306,9 @@ extern "C" {
 /** Alignment constraint of mbuf private area. */
 #define RTE_MBUF_PRIV_ALIGN 8
 
+/** Maximum number of mbufs freed in bulk. */
+#define RTE_MBUF_BULK_FREE 64
+
 /**
  * Get the name of a RX offload flag
  *
@@ -1261,6 +1264,50 @@ static inline void rte_pktmbuf_free(struct rte_mbuf *m)
 }
 
 /**
+ * Free n packets mbuf back into its original mempool.
+ *
+ * Free each mbuf, and all its segments in case of chained buffers. Each
+ * segment is added back into its original mempool.
+ *
+ * @param mbufs
+ *   The packets mbufs array to be freed.
+ * @param n
+ *   Number of packets.
+ */
+static inline void rte_pktmbuf_free_bulk(struct rte_mbuf **mbufs,
+					 unsigned n)
+{
+	struct rte_mbuf *tofree[RTE_MBUF_BULK_FREE];
+	struct rte_mempool *mp;
+	unsigned i, count = 0;
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m, *m_next;
+
+		for (m = mbufs[i]; m; m = m_next) {
+			m_next = m->next;
+			
+			if (count > 0 &&
+			    (unlikely(m->pool != mp || count == RTE_MBUF_BULK_FREE))) {
+				rte_mempool_put_bulk(mp, tofree, count);
+				count = 0;
+			}
+
+			mp = m->pool;
+
+			if (likely(__rte_pktmbuf_prefree_seg(m))) {
+				m->next = NULL;
+				tofree[count++] = m;
+			}
+		}
+	}
+
+	if (likely(count > 0))
+		rte_mempool_put_bulk(mp, tofree, count);
+}
+
+	
+/**
  * Creates a "clone" of the given packet mbuf.
  *
  * Walks through all segments of the given packet mbuf, and for each of them:


This handles multiple pools and multi-segment and indirect mbufs.

^ permalink raw reply related

* Re: Coverity project created for dpdk-next-net tree
From: Ferruh Yigit @ 2017-01-11 18:57 UTC (permalink / raw)
  To: DPDK
In-Reply-To: <be82169c-6af3-4dff-1729-9e07c06d4c19@intel.com>

On 11/21/2016 11:29 PM, Ferruh Yigit wrote:
> Coverity project created for dpdk-next-net tree:
> https://scan.coverity.com/projects/dpdk-next-net
> 
> This can be useful to fix coverity issues before next-net merged into
> master branch.
> 
> Project is open for everyone to register and to get scan reports, there
> will be regular scans for next-net tree.
> 
> Specially driver maintainers, please consider registering to the project
> and checking your driver's defect status.
> I believe there are some false positives, it is appreciated if you can
> identify and mark them in coverity tool.
> 
> Thanks,
> ferruh
> 
> PS: As a reminder, a coverity project already exists for main dpdk repo:
> https://scan.coverity.com/projects/dpdk-data-plane-development-kit
> 

Reminder of the next-net coverity project.
Project kept up to date with new driver patches merged in.

Driver maintainers can check and fix issues introduced in next-net
before sub-tree merged into main tree.

Thanks,
ferruh

^ permalink raw reply

* Re: [dpdk-users] Pktgen DPDK net bonding mode 4 802.3AD (LACP) not working with BIGIP trunk with LACP enabled
From: Vincent Li @ 2017-01-11 19:08 UTC (permalink / raw)
  To: Wiles, Keith; +Cc: users@dpdk.org, dev@dpdk.org
In-Reply-To: <57835675-A430-40D9-83EC-6B5ADE66D247@intel.com>

On Wed, Jan 11, 2017 at 10:51 AM, Wiles, Keith <keith.wiles@intel.com>
wrote:

>
> > On Jan 11, 2017, at 12:08 PM, Vincent Li <vincent.mc.li@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I have direct two cable connections between Pktgen DPDK box and BIGIP
> >
> > this is a problem I am not sure if it Pktgen, or the DPDK net bonding
> mode
> > 4 LACP, or the BIGIP trunk with LACP enabled issue, I am hoping someone
> > here have more clue and give me some direction on how to trouble shoot
> > this. I am suspecting there might some issue between DPDK net bonding
> > 8023ad mode 4 and BIGIP trunk LACP mode. I turned
> > on CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and
> > defined RTE_LIBRTE_BOND_DEBUG_8023AD
> > in drivers/net/bonding/rte_eth_bond_8023ad.c hoping getting more debug
> > output in Pktgen, I could not see any debug output for 8023ad protocol.
>
> I am not going to be much help as I do not and have not used BIGIP.
>
> I assume you have the Pktgen that supports bonding mode 4. The previous
> Pktgen version would not call tx_burst() unless it had something to send,
> which the bonding driver requires tx_burst() with zero packets at least at
> a given interval. The latest versions do call tx_burst() with zero pkts at
> least within the needed timing.
>
> Yes, I am using latest Pktgen with latest DPDK and run Pktgen like:

# ./app/app/x86_64-native-linuxapp-gcc/pktgen -c 0xff
 --vdev=net_bonding0,mode=4,xmit_policy=l34,slave=0000:04:00.1,slave=0000:04:00.0
-- -P -m [0:1-7].2

also I am not sure where net bonding  debug log goes as I added

#define RTE_LIBRTE_BOND_DEBUG_8023AD 1

in drivers/net/bonding/rte_eth_bond_8023ad.c and the code has following

#ifdef RTE_LIBRTE_BOND_DEBUG_8023AD
#define MODE4_DEBUG(fmt, ...) RTE_LOG(DEBUG, PMD, "%6u [Port %u: %s] " fmt,
\
                        bond_dbg_get_time_diff_ms(), slave_id, \
                        __func__, ##__VA_ARGS__)

^ permalink raw reply

* Re: [PATCH 3/3] driver: vHost support to free consumed buffers
From: Billy McFall @ 2017-01-11 19:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: thomas.monjalon, wenzhuo.lu, dev
In-Reply-To: <20161216082404.3377bfc6@xeon-e3>

This new API is attempting to address two scenarios:
1) Application wants to reuse existing mbuf to avoid a packet copy
(example: Flooding a packet to multiple ports). The application increments
the reference count of the packet and then polls new API until the reference
count for the given mbuf is decremented.
2) Application runs out of mbufs, or some application like pktgen finishes
a run and is preparing for an additional run, calls API to free consumed
packets so processing can continue.

With the current design, the application calls the new API, if rval >= 0,
assume mubfs are being freed and can call multiple times if need be (to
either get enough mbufs to continue or to get the specific one freed). If
rval < 0, take some other action, like make a copy of packet in the
flooding case or whatever is currently done in the application today.

If the default behavior is to return 0, the application can't take any
additional actions.

Submitting V2 of the patch with the rte_eth_tx_buffer_flush() call and
associated parameters removed and to continue the discussion on new API or
not.

Thanks,
Billy McFall


On Fri, Dec 16, 2016 at 11:24 AM, Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Fri, 16 Dec 2016 07:48:51 -0500
> Billy McFall <bmcfall@redhat.com> wrote:
>
> > Add support to the vHostdriver for the new API to force free consumed
> > buffers on TX ring. vHost does not cache the mbufs so there is no work
> > to do.
> >
> > Signed-off-by: Billy McFall <bmcfall@redhat.com>
> > ---
> >  drivers/net/vhost/rte_eth_vhost.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> > index 766d4ef..6493d56 100644
> > --- a/drivers/net/vhost/rte_eth_vhost.c
> > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > @@ -939,6 +939,16 @@ eth_queue_release(void *q)
> >  }
> >
> >  static int
> > +eth_tx_done_cleanup(void *txq __rte_unused, uint32_t free_cnt
> __rte_unused)
> > +{
> > +     /*
> > +      * vHost does not hang onto mbuf. eth_vhost_tx() copies packet data
> > +      * and releases mbuf, so nothing to cleanup.
> > +      */
> > +     return 0;
> > +}
> > +
> > +static int
> >  eth_link_update(struct rte_eth_dev *dev __rte_unused,
> >               int wait_to_complete __rte_unused)
> >  {
> > @@ -979,6 +989,7 @@ static const struct eth_dev_ops ops = {
> >       .tx_queue_setup = eth_tx_queue_setup,
> >       .rx_queue_release = eth_queue_release,
> >       .tx_queue_release = eth_queue_release,
> > +     .tx_done_cleanup = eth_tx_done_cleanup,
> >       .link_update = eth_link_update,
> >       .stats_get = eth_stats_get,
> >       .stats_reset = eth_stats_reset,
>
> Rather than having to change every drive, why not make this the default
> behavior?
>

^ permalink raw reply

* [PATCH v2 0/3] new API to free consumed buffers in Tx ring
From: Billy McFall @ 2017-01-11 20:03 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall

Based on a request from Damjan Marion and seconded by Keith Wiles, see
dpdk-dev mailing list from 11/21/2016, add a new API to free consumed
buffers on TX ring. This addresses two scenarios:
1) Flooding a packet and want to reuse existing mbuf to avoid a packet
copy. Increment the reference count of the packet and poll new API until
reference count is decremented.
2) Application runs out of mbufs, or resets and is preparing for
additional run, call API to free consumed packets so processing can
continue.

API will return the number of packets freed (0-n) or error code if
feature not supported (-ENOTSUP) or input invalid (-ENODEV).

API for e1000 igb driver and vHost driver have been implemented. Other
drivers can be implemented over time. Some drivers implement a Tx done
flush routine that should be reused where possible. e1000 igb driver
and vHost driver do not have such functions.

---
v2:
* Removed rte_eth_tx_buffer_flush() call and associated parameters
  from new rte_eth_tx_done_cleanup() API.

* Remaining open issue is whether this should be a new API or overload 
  existing rte_eth_tx_buffer() API with input parameter nb_pkts set to
  0. My concern is that overloading existing API will not provided
  enough feedback to application. Application needs to know if feature
  is supported and driver is attempting to free mbufs or not.

* If new API is supported, second open issue is if parameter free_cnt
  should be included. It was in the original feature request.


Billy McFall (3):
  ethdev: new API to free consumed buffers in Tx ring
  net/e1000: e1000 igb support to free consumed buffers
  net/vhost: vHost support to free consumed buffers

 drivers/net/e1000/e1000_ethdev.h  |   2 +
 drivers/net/e1000/igb_ethdev.c    |   1 +
 drivers/net/e1000/igb_rxtx.c      | 126 ++++++++++++++++++++++++++++++++++++++
 drivers/net/vhost/rte_eth_vhost.c |  11 ++++
 lib/librte_ether/rte_ethdev.h     |  43 +++++++++++++
 5 files changed, 183 insertions(+)

-- 
2.9.3

^ permalink raw reply

* [PATCH v2 1/3] ethdev: new API to free consumed buffers in Tx ring
From: Billy McFall @ 2017-01-11 20:03 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall
In-Reply-To: <20170111200323.12938-1-bmcfall@redhat.com>

Add a new API to force free consumed buffers on Tx ring. API will return
the number of packets freed (0-n) or error code if feature not supported
(-ENOTSUP) or input invalid (-ENODEV).

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 lib/librte_ether/rte_ethdev.h | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 9678179..ce662b6 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1150,6 +1150,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct rte_eth_dev *dev,
 typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
 /**< @internal Check DD bit of specific RX descriptor */
 
+typedef int (*eth_tx_done_cleanup_t)(void *txq, uint32_t free_cnt);
+/**< @internal Force mbufs to be from TX ring. */
+
 typedef void (*eth_rxq_info_get_t)(struct rte_eth_dev *dev,
 	uint16_t rx_queue_id, struct rte_eth_rxq_info *qinfo);
 
@@ -1467,6 +1470,7 @@ struct eth_dev_ops {
 	eth_rx_disable_intr_t      rx_queue_intr_disable;
 	eth_tx_queue_setup_t       tx_queue_setup;/**< Set up device TX queue.*/
 	eth_queue_release_t        tx_queue_release;/**< Release TX queue.*/
+	eth_tx_done_cleanup_t      tx_done_cleanup;/**< Free tx ring mbufs */
 	eth_dev_led_on_t           dev_led_on;    /**< Turn on LED. */
 	eth_dev_led_off_t          dev_led_off;   /**< Turn off LED. */
 	flow_ctrl_get_t            flow_ctrl_get; /**< Get flow control. */
@@ -2943,6 +2947,45 @@ rte_eth_tx_buffer(uint8_t port_id, uint16_t queue_id,
 }
 
 /**
+ * Request the driver to free mbufs currently cached by the driver. The
+ * driver will only free the mbuf if it is no longer in use. It is the
+ * application's responsibity to ensure rte_eth_tx_buffer_flush(..) is
+ * called if needed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which output packets must be
+ *   sent.
+ *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
+ *   to rte_eth_dev_configure().
+ * @param free_cnt
+ *   Maximum number of packets to free. Use 0 to indicate all possible packets
+ *   should be freed. Note that a packet may be using multiple mbufs.
+ * @return
+ *   Failure: < 0
+ *     -ENODEV: Invalid interface
+ *     -ENOTSUP: Driver does not support function
+ *   Success: >= 0
+ *     0-n: Number of packets freed. More packets may still remain in ring that
+ *     are in use.
+ */
+
+static inline int
+rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	/* Validate Input Data. Bail if not valid or not supported. */
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
+
+	/* Call driver to free pending mbufs. */
+	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
+			free_cnt);
+}
+
+/**
  * Configure a callback for buffered packets which cannot be sent
  *
  * Register a specific callback to be called when an attempt is made to send
-- 
2.9.3

^ permalink raw reply related

* [PATCH v2 2/3] net/e1000: e1000 igb support to free consumed buffers
From: Billy McFall @ 2017-01-11 20:03 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall
In-Reply-To: <20170111200323.12938-1-bmcfall@redhat.com>

Add support to the e1000 igb driver for the new API to force free
consumed buffers on Tx ring. e1000 igb driver does not implement a
tx_rs_thresh to free mbufs, it frees a slot in the ring as needed, so
a new function needed to be written.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 drivers/net/e1000/e1000_ethdev.h |   2 +
 drivers/net/e1000/igb_ethdev.c   |   1 +
 drivers/net/e1000/igb_rxtx.c     | 126 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 129 insertions(+)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6c25c8d..fce0278 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -308,6 +308,8 @@ int eth_igb_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
 		uint16_t nb_tx_desc, unsigned int socket_id,
 		const struct rte_eth_txconf *tx_conf);
 
+int eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt);
+
 int eth_igb_rx_init(struct rte_eth_dev *dev);
 
 void eth_igb_tx_init(struct rte_eth_dev *dev);
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 2fddf0c..4010dc4 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -402,6 +402,7 @@ static const struct eth_dev_ops eth_igb_ops = {
 	.rx_descriptor_done   = eth_igb_rx_descriptor_done,
 	.tx_queue_setup       = eth_igb_tx_queue_setup,
 	.tx_queue_release     = eth_igb_tx_queue_release,
+	.tx_done_cleanup      = eth_igb_tx_done_cleanup,
 	.dev_led_on           = eth_igb_led_on,
 	.dev_led_off          = eth_igb_led_off,
 	.flow_ctrl_get        = eth_igb_flow_ctrl_get,
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index dbd37ac..1d4fbcb 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1227,6 +1227,132 @@ eth_igb_tx_queue_release(void *txq)
 	igb_tx_queue_release(txq);
 }
 
+static int
+igb_tx_done_cleanup(struct igb_tx_queue *txq, uint32_t free_cnt)
+{
+	struct igb_tx_entry *sw_ring;
+	volatile union e1000_adv_tx_desc *txr;
+	uint16_t tx_first; /* First segment analyzed. */
+	uint16_t tx_id;    /* Current segment being processed. */
+	uint16_t tx_last;  /* Last segment in the current packet. */
+	uint16_t tx_next;  /* First segment of the next packet. */
+	int count;
+
+	if (txq != NULL) {
+		count = 0;
+		sw_ring = txq->sw_ring;
+		txr = txq->tx_ring;
+
+		/*
+		 * tx_tail is the last sent packet on the sw_ring. Goto the end
+		 * of that packet (the last segment in the packet chain) and
+		 * then the next segment will be the start of the oldest segment
+		 * in the sw_ring. This is the first packet that will be
+		 * attempted to be freed.
+		 */
+
+		/* Get last segment in most recently added packet. */
+		tx_first = sw_ring[txq->tx_tail].last_id;
+
+		/* Get the next segment, which is the oldest segment in ring. */
+		tx_first = sw_ring[tx_first].next_id;
+
+		/* Set the current index to the first. */
+		tx_id = tx_first;
+
+		/*
+		 * Loop through each packet. For each packet, verify that an
+		 * mbuf exists and that the last segment is free. If so, free
+		 * it and move on.
+		 */
+		while (1) {
+			tx_last = sw_ring[tx_id].last_id;
+
+			if (sw_ring[tx_last].mbuf) {
+				if (txr[tx_last].wb.status &
+						E1000_TXD_STAT_DD) {
+					/*
+					 * Increment the number of packets
+					 * freed.
+					 */
+					count++;
+
+					/* Get the start of the next packet. */
+					tx_next = sw_ring[tx_last].next_id;
+
+					/*
+					 * Loop through all segments in a
+					 * packet.
+					 */
+					do {
+						rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf);
+						sw_ring[tx_id].mbuf = NULL;
+						sw_ring[tx_id].last_id = tx_id;
+
+						/* Move to next segemnt. */
+						tx_id = sw_ring[tx_id].next_id;
+
+					} while (tx_id != tx_next);
+
+					if (unlikely(count == (int)free_cnt))
+						break;
+				} else
+					/*
+					 * mbuf still in use, nothing left to
+					 * free.
+					 */
+					break;
+			} else {
+				/*
+				 * There are multiple reasons to be here:
+				 * 1) All the packets on the ring have been
+				 *    freed - tx_id is equal to tx_first
+				 *    and some packets have been freed.
+				 *    - Done, exit
+				 * 2) Interfaces has not sent a rings worth of
+				 *    packets yet, so the segment after tail is
+				 *    still empty. Or a previous call to this
+				 *    function freed some of the segments but
+				 *    not all so there is a hole in the list.
+				 *    Hopefully this is a rare case.
+				 *    - Walk the list and find the next mbuf. If
+				 *      there isn't one, then done.
+				 */
+				if (likely((tx_id == tx_first) && (count != 0)))
+					break;
+
+				/*
+				 * Walk the list and find the next mbuf, if any.
+				 */
+				do {
+					/* Move to next segemnt. */
+					tx_id = sw_ring[tx_id].next_id;
+
+					if (sw_ring[tx_id].mbuf)
+						break;
+
+				} while (tx_id != tx_first);
+
+				/*
+				 * Determine why previous loop bailed. If there
+				 * is not an mbuf, done.
+				 */
+				if (sw_ring[tx_id].mbuf == NULL)
+					break;
+			}
+		}
+	} else
+		count = -ENODEV;
+
+	return count;
+}
+
+int
+eth_igb_tx_done_cleanup(void *txq, uint32_t free_cnt)
+{
+	return igb_tx_done_cleanup(txq, free_cnt);
+}
+
 static void
 igb_reset_tx_queue_stat(struct igb_tx_queue *txq)
 {
-- 
2.9.3

^ permalink raw reply related

* [PATCH v2 3/3] net/vhost: vHost support to free consumed buffers
From: Billy McFall @ 2017-01-11 20:03 UTC (permalink / raw)
  To: thomas.monjalon, wenzhuo.lu; +Cc: dev, Billy McFall
In-Reply-To: <20170111200323.12938-1-bmcfall@redhat.com>

Add support to the vHostdriver for the new API to force free consumed
buffers on Tx ring. vHost does not cache the mbufs so there is no work
to do.

Signed-off-by: Billy McFall <bmcfall@redhat.com>
---
 drivers/net/vhost/rte_eth_vhost.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 766d4ef..6493d56 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -939,6 +939,16 @@ eth_queue_release(void *q)
 }
 
 static int
+eth_tx_done_cleanup(void *txq __rte_unused, uint32_t free_cnt __rte_unused)
+{
+	/*
+	 * vHost does not hang onto mbuf. eth_vhost_tx() copies packet data
+	 * and releases mbuf, so nothing to cleanup.
+	 */
+	return 0;
+}
+
+static int
 eth_link_update(struct rte_eth_dev *dev __rte_unused,
 		int wait_to_complete __rte_unused)
 {
@@ -979,6 +989,7 @@ static const struct eth_dev_ops ops = {
 	.tx_queue_setup = eth_tx_queue_setup,
 	.rx_queue_release = eth_queue_release,
 	.tx_queue_release = eth_queue_release,
+	.tx_done_cleanup = eth_tx_done_cleanup,
 	.link_update = eth_link_update,
 	.stats_get = eth_stats_get,
 	.stats_reset = eth_stats_reset,
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH 8/8] eal: VMBUS infrastructure
From: Jan Blunck @ 2017-01-11 21:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger
In-Reply-To: <CALe+Z02gqJAz3WVv4izqTX7SGvVFGt5aGT08+Hw-viXhTO+Yiw@mail.gmail.com>

On Wed, Jan 11, 2017 at 3:49 PM, Jan Blunck <jblunck@infradead.org> wrote:
> On Sat, Jan 7, 2017 at 7:17 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> Add support for VMBUS on Hyper-V/Azure. VMBUS is similar to PCI
>> but has different addressing and internal API's.
>>
>> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
>> ---
>>  lib/librte_eal/common/Makefile              |   2 +-
>>  lib/librte_eal/common/eal_common_devargs.c  |   7 +
>>  lib/librte_eal/common/eal_common_options.c  |  38 ++
>>  lib/librte_eal/common/eal_internal_cfg.h    |   1 +
>>  lib/librte_eal/common/eal_options.h         |   6 +
>>  lib/librte_eal/common/eal_private.h         |   5 +
>>  lib/librte_eal/common/include/rte_devargs.h |   8 +
>>  lib/librte_eal/common/include/rte_vmbus.h   | 249 ++++++++
>>  lib/librte_eal/linuxapp/eal/Makefile        |   6 +
>>  lib/librte_eal/linuxapp/eal/eal.c           |  13 +
>>  lib/librte_eal/linuxapp/eal/eal_vmbus.c     | 911 ++++++++++++++++++++++++++++
>>  lib/librte_ether/rte_ethdev.c               |  90 +++
>>  lib/librte_ether/rte_ethdev.h               |  31 +
>>  mk/rte.app.mk                               |   1 +
>>  14 files changed, 1367 insertions(+), 1 deletion(-)
>>  create mode 100644 lib/librte_eal/common/include/rte_vmbus.h
>>  create mode 100644 lib/librte_eal/linuxapp/eal/eal_vmbus.c
>>
>> diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
>> index 09a3d3af..ceb77bed 100644
>> --- a/lib/librte_eal/common/Makefile
>> +++ b/lib/librte_eal/common/Makefile
>> @@ -33,7 +33,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
>>
>>  INC := rte_branch_prediction.h rte_common.h
>>  INC += rte_debug.h rte_eal.h rte_errno.h rte_launch.h rte_lcore.h
>> -INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h
>> +INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h rte_vmbus.h
>>  INC += rte_per_lcore.h rte_random.h
>>  INC += rte_tailq.h rte_interrupts.h rte_alarm.h
>>  INC += rte_string_fns.h rte_version.h
>> diff --git a/lib/librte_eal/common/eal_common_devargs.c b/lib/librte_eal/common/eal_common_devargs.c
>> index e403717b..934ca840 100644
>> --- a/lib/librte_eal/common/eal_common_devargs.c
>> +++ b/lib/librte_eal/common/eal_common_devargs.c
>> @@ -113,6 +113,13 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str)
>>                         goto fail;
>>
>>                 break;
>> +       case RTE_DEVTYPE_WHITELISTED_VMBUS:
>> +       case RTE_DEVTYPE_BLACKLISTED_VMBUS:
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +               if (uuid_parse(buf, devargs->uuid) == 0)
>> +                       break;
>> +#endif
>> +               goto fail;
>>         }
>>
>>         free(buf);
>> diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
>> index f36bc556..1a2b418c 100644
>> --- a/lib/librte_eal/common/eal_common_options.c
>> +++ b/lib/librte_eal/common/eal_common_options.c
>> @@ -95,6 +95,11 @@ eal_long_options[] = {
>>         {OPT_VFIO_INTR,         1, NULL, OPT_VFIO_INTR_NUM        },
>>         {OPT_VMWARE_TSC_MAP,    0, NULL, OPT_VMWARE_TSC_MAP_NUM   },
>>         {OPT_XEN_DOM0,          0, NULL, OPT_XEN_DOM0_NUM         },
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +       {OPT_NO_VMBUS,          0, NULL, OPT_NO_VMBUS_NUM         },
>> +       {OPT_VMBUS_BLACKLIST,   1, NULL, OPT_VMBUS_BLACKLIST_NUM  },
>> +       {OPT_VMBUS_WHITELIST,   1, NULL, OPT_VMBUS_WHITELIST_NUM  },
>> +#endif
>>         {0,                     0, NULL, 0                        }
>>  };
>>
>> @@ -858,6 +863,21 @@ eal_parse_common_option(int opt, const char *optarg,
>>                 conf->no_pci = 1;
>>                 break;
>>
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +       case OPT_NO_VMBUS_NUM:
>> +               conf->no_vmbus = 1;
>> +               break;
>> +       case OPT_VMBUS_BLACKLIST_NUM:
>> +               if (rte_eal_devargs_add(RTE_DEVTYPE_BLACKLISTED_VMBUS,
>> +                                       optarg) < 0)
>> +                       return -1;
>> +               break;
>> +       case OPT_VMBUS_WHITELIST_NUM:
>> +               if (rte_eal_devargs_add(RTE_DEVTYPE_WHITELISTED_VMBUS,
>> +                               optarg) < 0)
>> +                       return -1;
>> +               break;
>> +#endif
>>         case OPT_NO_HPET_NUM:
>>                 conf->no_hpet = 1;
>>                 break;
>> @@ -1017,6 +1037,14 @@ eal_check_common_options(struct internal_config *internal_cfg)
>>                 return -1;
>>         }
>>
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +       if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_VMBUS) != 0 &&
>> +               rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_VMBUS) != 0) {
>> +               RTE_LOG(ERR, EAL, "Options vmbus blacklist and whitelist "
>> +                       "cannot be used at the same time\n");
>> +               return -1;
>> +       }
>> +#endif
>>         return 0;
>>  }
>>
>> @@ -1066,5 +1094,15 @@ eal_common_usage(void)
>>                "  --"OPT_NO_PCI"            Disable PCI\n"
>>                "  --"OPT_NO_HPET"           Disable HPET\n"
>>                "  --"OPT_NO_SHCONF"         No shared config (mmap'd files)\n"
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +              "  --"OPT_NO_VMBUS"          Disable VMBUS\n"
>> +              "  --"OPT_VMBUS_BLACKLIST" Add a VMBUS device to black list.\n"
>> +              "                      Prevent EAL from using this PCI device. The argument\n"
>> +              "                      format is device UUID.\n"
>> +              "  --"OPT_VMBUS_WHITELIST" Add a VMBUS device to white list.\n"
>> +              "                      Only use the specified VMBUS devices. The argument format\n"
>> +              "                      is device UUID This option can be present\n"
>> +              "                      several times (once per device).\n"
>> +#endif
>>                "\n", RTE_MAX_LCORE);
>>  }
>> diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
>> index 5f1367eb..4b6af937 100644
>> --- a/lib/librte_eal/common/eal_internal_cfg.h
>> +++ b/lib/librte_eal/common/eal_internal_cfg.h
>> @@ -67,6 +67,7 @@ struct internal_config {
>>         unsigned hugepage_unlink;         /**< true to unlink backing files */
>>         volatile unsigned xen_dom0_support; /**< support app running on Xen Dom0*/
>>         volatile unsigned no_pci;         /**< true to disable PCI */
>> +       volatile unsigned no_vmbus;       /**< true to disable VMBUS */
>>         volatile unsigned no_hpet;        /**< true to disable HPET */
>>         volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
>>                                                                                 * instead of native TSC */
>> diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
>> index a881c62e..156727e7 100644
>> --- a/lib/librte_eal/common/eal_options.h
>> +++ b/lib/librte_eal/common/eal_options.h
>> @@ -83,6 +83,12 @@ enum {
>>         OPT_VMWARE_TSC_MAP_NUM,
>>  #define OPT_XEN_DOM0          "xen-dom0"
>>         OPT_XEN_DOM0_NUM,
>> +#define OPT_NO_VMBUS          "no-vmbus"
>> +       OPT_NO_VMBUS_NUM,
>> +#define OPT_VMBUS_BLACKLIST   "vmbus-blacklist"
>> +       OPT_VMBUS_BLACKLIST_NUM,
>> +#define OPT_VMBUS_WHITELIST   "vmbus-whitelist"
>> +       OPT_VMBUS_WHITELIST_NUM,
>>         OPT_LONG_MAX_NUM
>>  };
>>
>> diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
>> index 9e7d8f6b..c856c63e 100644
>> --- a/lib/librte_eal/common/eal_private.h
>> +++ b/lib/librte_eal/common/eal_private.h
>> @@ -210,6 +210,11 @@ int pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
>>                 struct mapped_pci_resource *uio_res, int map_idx);
>>
>>  /**
>> + * VMBUS related functions and structures
>> + */
>> +int rte_eal_vmbus_init(void);
>> +
>> +/**
>>   * Init tail queues for non-EAL library structures. This is to allow
>>   * the rings, mempools, etc. lists to be shared among multiple processes
>>   *
>> diff --git a/lib/librte_eal/common/include/rte_devargs.h b/lib/librte_eal/common/include/rte_devargs.h
>> index 88120a1c..c079d289 100644
>> --- a/lib/librte_eal/common/include/rte_devargs.h
>> +++ b/lib/librte_eal/common/include/rte_devargs.h
>> @@ -51,6 +51,9 @@ extern "C" {
>>  #include <stdio.h>
>>  #include <sys/queue.h>
>>  #include <rte_pci.h>
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +#include <uuid/uuid.h>
>> +#endif
>>
>>  /**
>>   * Type of generic device
>> @@ -59,6 +62,8 @@ enum rte_devtype {
>>         RTE_DEVTYPE_WHITELISTED_PCI,
>>         RTE_DEVTYPE_BLACKLISTED_PCI,
>>         RTE_DEVTYPE_VIRTUAL,
>> +       RTE_DEVTYPE_WHITELISTED_VMBUS,
>> +       RTE_DEVTYPE_BLACKLISTED_VMBUS,
>>  };
>>
>>  /**
>> @@ -88,6 +93,9 @@ struct rte_devargs {
>>                         /** Driver name. */
>>                         char drv_name[32];
>>                 } virt;
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +               uuid_t uuid;
>> +#endif
>>         };
>>         /** Arguments string as given by user or "" for no argument. */
>>         char *args;
>> diff --git a/lib/librte_eal/common/include/rte_vmbus.h b/lib/librte_eal/common/include/rte_vmbus.h
>> new file mode 100644
>> index 00000000..f96d753e
>> --- /dev/null
>> +++ b/lib/librte_eal/common/include/rte_vmbus.h
>> @@ -0,0 +1,249 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright(c) 2013-2016 Brocade Communications Systems, Inc.
>> + *   Copyright(c) 2016 Microsoft Corporation
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + *     * Redistributions of source code must retain the above copyright
>> + *       notice, this list of conditions and the following disclaimer.
>> + *     * Redistributions in binary form must reproduce the above copyright
>> + *       notice, this list of conditions and the following disclaimer in
>> + *       the documentation and/or other materials provided with the
>> + *       distribution.
>> + *     * Neither the name of Intel Corporation nor the names of its
>> + *       contributors may be used to endorse or promote products derived
>> + *       from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + *
>> + */
>> +
>> +#ifndef _RTE_VMBUS_H_
>> +#define _RTE_VMBUS_H_
>> +
>> +/**
>> + * @file
>> + *
>> + * RTE VMBUS Interface
>> + */
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +#include <limits.h>
>> +#include <errno.h>
>> +#include <uuid/uuid.h>
>> +#include <sys/queue.h>
>> +#include <stdint.h>
>> +#include <inttypes.h>
>> +
>> +#include <rte_debug.h>
>> +#include <rte_interrupts.h>
>> +#include <rte_dev.h>
>> +
>> +TAILQ_HEAD(vmbus_device_list, rte_vmbus_device);
>> +TAILQ_HEAD(vmbus_driver_list, rte_vmbus_driver);
>> +
>> +extern struct vmbus_driver_list vmbus_driver_list;
>> +extern struct vmbus_device_list vmbus_device_list;
>> +
>> +/** Pathname of VMBUS devices directory. */
>> +#define SYSFS_VMBUS_DEVICES "/sys/bus/vmbus/devices"
>> +
>> +#define UUID_BUF_SZ    (36 + 1)
>> +
>> +
>> +/** Maximum number of VMBUS resources. */
>> +#define VMBUS_MAX_RESOURCE 7
>> +
>> +/**
>> + * A structure describing a VMBUS device.
>> + */
>> +struct rte_vmbus_device {
>> +       TAILQ_ENTRY(rte_vmbus_device) next;     /**< Next probed VMBUS device. */
>> +       struct rte_device device;               /**< Inherit core device */
>> +       uuid_t device_id;                       /**< VMBUS device id */
>> +       uuid_t class_id;                        /**< VMBUS device type */
>> +       uint32_t relid;                         /**< VMBUS id for notification */
>> +       uint8_t monitor_id;
>> +       struct rte_intr_handle intr_handle;     /**< Interrupt handle */
>> +       const struct rte_vmbus_driver *driver;  /**< Associated driver */
>> +
>> +       struct rte_mem_resource mem_resource[VMBUS_MAX_RESOURCE];
>> +                                               /**< VMBUS Memory Resource */
>> +       char sysfs_name[];                      /**< Name in sysfs bus directory */
>> +};
>> +
>> +struct rte_vmbus_driver;
>> +
>> +/**
>> + * Initialisation function for the driver called during VMBUS probing.
>> + */
>> +typedef int (vmbus_probe_t)(struct rte_vmbus_driver *,
>> +                           struct rte_vmbus_device *);
>> +
>> +/**
>> + * Uninitialisation function for the driver called during hotplugging.
>> + */
>> +typedef int (vmbus_remove_t)(struct rte_vmbus_device *);
>> +
>> +/**
>> + * A structure describing a VMBUS driver.
>> + */
>> +struct rte_vmbus_driver {
>> +       TAILQ_ENTRY(rte_vmbus_driver) next;     /**< Next in list. */
>> +       struct rte_driver driver;
>> +       vmbus_probe_t *probe;                   /**< Device Probe function. */
>> +       vmbus_remove_t *remove;                 /**< Device Remove function. */
>> +
>> +       const uuid_t *id_table;                 /**< ID table. */
>> +};
>> +
>> +struct vmbus_map {
>> +       void *addr;
>> +       char *path;
>> +       uint64_t offset;
>> +       uint64_t size;
>> +       uint64_t phaddr;
>> +};
>> +
>> +/*
>> + * For multi-process we need to reproduce all vmbus mappings in secondary
>> + * processes, so save them in a tailq.
>> + */
>> +struct mapped_vmbus_resource {
>> +       TAILQ_ENTRY(mapped_vmbus_resource) next;
>> +
>> +       uuid_t uuid;
>> +       char path[PATH_MAX];
>> +       int nb_maps;
>> +       struct vmbus_map maps[VMBUS_MAX_RESOURCE];
>> +};
>> +
>> +TAILQ_HEAD(mapped_vmbus_res_list, mapped_vmbus_resource);
>> +
>> +/**
>> + * Scan the content of the VMBUS bus, and the devices in the devices list
>> + *
>> + * @return
>> + *  0 on success, negative on error
>> + */
>> +int rte_eal_vmbus_scan(void);
>> +
>> +/**
>> + * Probe the VMBUS bus for registered drivers.
>> + *
>> + * Scan the content of the VMBUS bus, and call the probe() function for
>> + * all registered drivers that have a matching entry in its id_table
>> + * for discovered devices.
>> + *
>> + * @return
>> + *   - 0 on success.
>> + *   - Negative on error.
>> + */
>> +int rte_eal_vmbus_probe(void);
>> +
>> +/**
>> + * Map the VMBUS device resources in user space virtual memory address
>> + *
>> + * @param dev
>> + *   A pointer to a rte_vmbus_device structure describing the device
>> + *   to use
>> + *
>> + * @return
>> + *   0 on success, negative on error and positive if no driver
>> + *   is found for the device.
>> + */
>> +int rte_eal_vmbus_map_device(struct rte_vmbus_device *dev);
>> +
>> +/**
>> + * Unmap this device
>> + *
>> + * @param dev
>> + *   A pointer to a rte_vmbus_device structure describing the device
>> + *   to use
>> + */
>> +void rte_eal_vmbus_unmap_device(struct rte_vmbus_device *dev);
>> +
>> +/**
>> + * Probe the single VMBUS device.
>> + *
>> + * Scan the content of the VMBUS bus, and find the vmbus device
>> + * specified by device uuid, then call the probe() function for
>> + * registered driver that has a matching entry in its id_table for
>> + * discovered device.
>> + *
>> + * @param id
>> + *   The VMBUS device uuid.
>> + * @return
>> + *   - 0 on success.
>> + *   - Negative on error.
>> + */
>> +int rte_eal_vmbus_probe_one(uuid_t id);
>> +
>> +/**
>> + * Close the single VMBUS device.
>> + *
>> + * Scan the content of the VMBUS bus, and find the vmbus device id,
>> + * then call the remove() function for registered driver that has a
>> + * matching entry in its id_table for discovered device.
>> + *
>> + * @param id
>> + *   The VMBUS device uuid.
>> + * @return
>> + *   - 0 on success.
>> + *   - Negative on error.
>> + */
>> +int rte_eal_vmbus_detach(uuid_t id);
>> +
>> +/**
>> + * Register a VMBUS driver.
>> + *
>> + * @param driver
>> + *   A pointer to a rte_vmbus_driver structure describing the driver
>> + *   to be registered.
>> + */
>> +void rte_eal_vmbus_register(struct rte_vmbus_driver *driver);
>> +
>> +/** Helper for VMBUS device registration from driver nstance */
>> +#define RTE_PMD_REGISTER_VMBUS(nm, vmbus_drv) \
>> +RTE_INIT(vmbusinitfn_ ##nm); \
>> +static void vmbusinitfn_ ##nm(void) \
>> +{\
>> +       (vmbus_drv).driver.name = RTE_STR(nm);\
>> +       (vmbus_drv).driver.type = PMD_VMBUS; \
>> +       rte_eal_vmbus_register(&vmbus_drv); \
>> +} \
>> +RTE_PMD_EXPORT_NAME(nm, __COUNTER__)
>> +
>> +/**
>> + * Unregister a VMBUS driver.
>> + *
>> + * @param driver
>> + *   A pointer to a rte_vmbus_driver structure describing the driver
>> + *   to be unregistered.
>> + */
>> +void rte_eal_vmbus_unregister(struct rte_vmbus_driver *driver);
>
> The register/unregister need to get exported via the map file too.
>
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif /* _RTE_VMBUS_H_ */
>> diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
>> index 4e206f09..f6ca3848 100644
>> --- a/lib/librte_eal/linuxapp/eal/Makefile
>> +++ b/lib/librte_eal/linuxapp/eal/Makefile
>> @@ -71,6 +71,11 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
>>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
>>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
>>
>> +ifeq ($(CONFIG_RTE_LIBRTE_HV_PMD),y)
>> +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_vmbus.c
>> +LDLIBS += -luuid
>> +endif
>> +
>>  # from common dir
>>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
>>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_timer.c
>> @@ -114,6 +119,7 @@ CFLAGS_eal_hugepage_info.o := -D_GNU_SOURCE
>>  CFLAGS_eal_pci.o := -D_GNU_SOURCE
>>  CFLAGS_eal_pci_uio.o := -D_GNU_SOURCE
>>  CFLAGS_eal_pci_vfio.o := -D_GNU_SOURCE
>> +CFLAGS_eal_vmbux.o := -D_GNU_SOURCE
>>  CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
>>  CFLAGS_eal_common_options.o := -D_GNU_SOURCE
>>  CFLAGS_eal_common_thread.o := -D_GNU_SOURCE
>> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
>> index 16dd5b9c..1bc0814a 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal.c
>> @@ -70,6 +70,9 @@
>>  #include <rte_cpuflags.h>
>>  #include <rte_interrupts.h>
>>  #include <rte_pci.h>
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +#include <rte_vmbus.h>
>> +#endif
>>  #include <rte_dev.h>
>>  #include <rte_devargs.h>
>>  #include <rte_common.h>
>> @@ -830,6 +833,11 @@ rte_eal_init(int argc, char **argv)
>>
>>         eal_check_mem_on_local_socket();
>>
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +       if (rte_eal_vmbus_init() < 0)
>> +               RTE_LOG(ERR, EAL, "Cannot init VMBUS\n");
>> +#endif
>> +
>>         if (eal_plugins_init() < 0)
>>                 rte_panic("Cannot init plugins\n");
>>
>> @@ -884,6 +892,11 @@ rte_eal_init(int argc, char **argv)
>>         if (rte_eal_pci_probe())
>>                 rte_panic("Cannot probe PCI\n");
>>
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +       if (rte_eal_vmbus_probe() < 0)
>> +               rte_panic("Cannot probe VMBUS\n");
>> +#endif
>> +
>>         if (rte_eal_dev_init() < 0)
>>                 rte_panic("Cannot init pmd devices\n");
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vmbus.c b/lib/librte_eal/linuxapp/eal/eal_vmbus.c
>> new file mode 100644
>> index 00000000..729f93a9
>> --- /dev/null
>> +++ b/lib/librte_eal/linuxapp/eal/eal_vmbus.c
>> @@ -0,0 +1,911 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright(c) 2013-2016 Brocade Communications Systems, Inc.
>> + *   Copyright(c) 2016 Microsoft Corporation
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + *     * Redistributions of source code must retain the above copyright
>> + *      notice, this list of conditions and the following disclaimer.
>> + *     * Redistributions in binary form must reproduce the above copyright
>> + *      notice, this list of conditions and the following disclaimer in
>> + *      the documentation and/or other materials provided with the
>> + *      distribution.
>> + *     * Neither the name of Intel Corporation nor the names of its
>> + *      contributors may be used to endorse or promote products derived
>> + *      from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + *
>> + */
>> +
>> +#include <string.h>
>> +#include <unistd.h>
>> +#include <dirent.h>
>> +#include <fcntl.h>
>> +#include <sys/mman.h>
>> +
>> +#include <rte_eal.h>
>> +#include <rte_tailq.h>
>> +#include <rte_log.h>
>> +#include <rte_devargs.h>
>> +#include <rte_vmbus.h>
>> +#include <rte_malloc.h>
>> +
>> +#include "eal_private.h"
>> +#include "eal_pci_init.h"
>> +#include "eal_filesystem.h"
>> +
>> +struct vmbus_driver_list vmbus_driver_list =
>> +       TAILQ_HEAD_INITIALIZER(vmbus_driver_list);
>> +struct vmbus_device_list vmbus_device_list =
>> +       TAILQ_HEAD_INITIALIZER(vmbus_device_list);
>> +
>> +static void *vmbus_map_addr;
>> +
>> +static struct rte_tailq_elem rte_vmbus_uio_tailq = {
>> +       .name = "UIO_RESOURCE_LIST",

This should be VMBUS_UIO_RESOURCE_LIST to not collide with rte_uio_tailq.

>> +};
>> +EAL_REGISTER_TAILQ(rte_vmbus_uio_tailq);
>> +
>> +/*
>> + * parse a sysfs file containing one integer value
>> + * different to the eal version, as it needs to work with 64-bit values
>> + */
>> +static int
>> +vmbus_get_sysfs_uuid(const char *filename, uuid_t uu)
>> +{
>> +       char buf[BUFSIZ];
>> +       char *cp, *in = buf;
>> +       FILE *f;
>> +
>> +       f = fopen(filename, "r");
>> +       if (f == NULL) {
>> +               RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
>> +                               __func__, filename);
>> +               return -1;
>> +       }
>> +
>> +       if (fgets(buf, sizeof(buf), f) == NULL) {
>> +               RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
>> +                               __func__, filename);
>> +               fclose(f);
>> +               return -1;
>> +       }
>> +       fclose(f);
>> +
>> +       cp = strchr(buf, '\n');
>> +       if (cp)
>> +               *cp = '\0';
>> +
>> +       /* strip { } notation */
>> +       if (buf[0] == '{') {
>> +               in = buf + 1;
>> +               cp = strchr(in, '}');
>> +               if (cp)
>> +                       *cp = '\0';
>> +       }
>> +
>> +       if (uuid_parse(in, uu) < 0) {
>> +               RTE_LOG(ERR, EAL, "%s %s not a valid UUID\n",
>> +                       filename, buf);
>> +               return -1;
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +/* map a particular resource from a file */
>> +static void *
>> +vmbus_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
>> +                  int flags)
>> +{
>> +       void *mapaddr;
>> +
>> +       /* Map the memory resource of device */
>> +       mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
>> +                      MAP_SHARED | flags, fd, offset);
>> +       if (mapaddr == MAP_FAILED ||
>> +           (requested_addr != NULL && mapaddr != requested_addr)) {
>> +               RTE_LOG(ERR, EAL,
>> +                       "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s)\n",
>> +                       __func__, fd, requested_addr,
>> +                       (unsigned long)size, (unsigned long)offset,
>> +                       strerror(errno));
>> +       } else
>> +               RTE_LOG(DEBUG, EAL, "  VMBUS memory mapped at %p\n", mapaddr);
>> +
>> +       return mapaddr;
>> +}
>> +
>> +/* unmap a particular resource */
>> +static void
>> +vmbus_unmap_resource(void *requested_addr, size_t size)
>> +{
>> +       if (requested_addr == NULL)
>> +               return;
>> +
>> +       /* Unmap the VMBUS memory resource of device */
>> +       if (munmap(requested_addr, size)) {
>> +               RTE_LOG(ERR, EAL, "%s(): cannot munmap(%p, 0x%lx): %s\n",
>> +                       __func__, requested_addr, (unsigned long)size,
>> +                       strerror(errno));
>> +       } else
>> +               RTE_LOG(DEBUG, EAL, "  VMBUS memory unmapped at %p\n",
>> +                               requested_addr);
>> +}
>> +
>> +/* Only supports current kernel version
>> + * Unlike PCI there is no option (or need) to create UIO device.
>> + */
>> +static int vmbus_get_uio_dev(const char *name,
>> +                            char *dstbuf, size_t buflen)
>> +{
>> +       char dirname[PATH_MAX];
>> +       unsigned int uio_num;
>> +       struct dirent *e;
>> +       DIR *dir;
>> +
>> +       snprintf(dirname, sizeof(dirname),
>> +                "/sys/bus/vmbus/devices/%s/uio", name);
>> +
>> +       dir = opendir(dirname);
>> +       if (dir == NULL) {
>> +               RTE_LOG(ERR, EAL, "Cannot map uio resources for %s: %s\n",
>> +                       name, strerror(errno));
>> +               return -1;
>> +       }
>> +
>> +       /* take the first file starting with "uio" */
>> +       while ((e = readdir(dir)) != NULL) {
>> +               if (sscanf(e->d_name, "uio%u", &uio_num) != 1)
>> +                       continue;
>> +
>> +               snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
>> +               break;
>> +       }
>> +       closedir(dir);
>> +
>> +       return e ? (int) uio_num : -1;
>> +}
>> +
>> +/*
>> + * parse a sysfs file containing one integer value
>> + * different to the eal version, as it needs to work with 64-bit values
>> + */
>> +static int
>> +vmbus_parse_sysfs_value(const char *dir, const char *name,
>> +                       uint64_t *val)
>> +{
>> +       char filename[PATH_MAX];
>> +       FILE *f;
>> +       char buf[BUFSIZ];
>> +       char *end = NULL;
>> +
>> +       snprintf(filename, sizeof(filename), "%s/%s", dir, name);
>> +       f = fopen(filename, "r");
>> +       if (f == NULL) {
>> +               RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
>> +                               __func__, filename);
>> +               return -1;
>> +       }
>> +
>> +       if (fgets(buf, sizeof(buf), f) == NULL) {
>> +               RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
>> +                               __func__, filename);
>> +               fclose(f);
>> +               return -1;
>> +       }
>> +       fclose(f);
>> +
>> +       *val = strtoull(buf, &end, 0);
>> +       if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
>> +               RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
>> +                               __func__, filename);
>> +               return -1;
>> +       }
>> +       return 0;
>> +}
>> +
>> +/* Get mappings out of values provided by uio */
>> +static int
>> +vmbus_uio_get_mappings(const char *uioname,
>> +                      struct vmbus_map maps[])
>> +{
>> +       int i;
>> +
>> +       for (i = 0; i != VMBUS_MAX_RESOURCE; i++) {
>> +               struct vmbus_map *map = &maps[i];
>> +               char dirname[PATH_MAX];
>> +
>> +               /* check if map directory exists */
>> +               snprintf(dirname, sizeof(dirname),
>> +                        "%s/maps/map%d", uioname, i);
>> +
>> +               if (access(dirname, F_OK) != 0)
>> +                       break;
>> +
>> +               /* get mapping offset */
>> +               if (vmbus_parse_sysfs_value(dirname, "offset",
>> +                                           &map->offset) < 0)
>> +                       return -1;
>> +
>> +               /* get mapping size */
>> +               if (vmbus_parse_sysfs_value(dirname, "size",
>> +                                           &map->size) < 0)
>> +                       return -1;
>> +
>> +               /* get mapping physical address */
>> +               if (vmbus_parse_sysfs_value(dirname, "addr",
>> +                                           &maps->phaddr) < 0)
>> +                       return -1;
>> +       }
>> +
>> +       return i;
>> +}
>> +
>> +static void
>> +vmbus_uio_free_resource(struct rte_vmbus_device *dev,
>> +               struct mapped_vmbus_resource *uio_res)
>> +{
>> +       rte_free(uio_res);
>> +
>> +       if (dev->intr_handle.fd) {
>> +               close(dev->intr_handle.fd);
>> +               dev->intr_handle.fd = -1;
>> +               dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
>> +       }
>> +}
>> +
>> +static struct mapped_vmbus_resource *
>> +vmbus_uio_alloc_resource(struct rte_vmbus_device *dev)
>> +{
>> +       struct mapped_vmbus_resource *uio_res;
>> +       char dirname[PATH_MAX], devname[PATH_MAX];
>> +       int uio_num, nb_maps;
>> +
>> +       uio_num = vmbus_get_uio_dev(dev->sysfs_name, dirname, sizeof(dirname));
>> +       if (uio_num < 0) {
>> +               RTE_LOG(WARNING, EAL,
>> +                       "  %s not managed by UIO driver, skipping\n",
>> +                       dev->sysfs_name);
>> +               return NULL;
>> +       }
>> +
>> +       /* allocate the mapping details for secondary processes*/
>> +       uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0);
>> +       if (uio_res == NULL) {
>> +               RTE_LOG(ERR, EAL,
>> +                       "%s(): cannot store uio mmap details\n", __func__);
>> +               goto error;
>> +       }
>> +
>> +       snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
>> +       dev->intr_handle.fd = open(devname, O_RDWR);
>> +       if (dev->intr_handle.fd < 0) {
>> +               RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
>> +                       devname, strerror(errno));
>> +               goto error;
>> +       }
>> +
>> +       dev->intr_handle.type = RTE_INTR_HANDLE_UIO_INTX;
>> +
>> +       snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
>> +       uuid_copy(uio_res->uuid, dev->device_id);
>> +
>> +       nb_maps = vmbus_uio_get_mappings(dirname, uio_res->maps);
>> +       if (nb_maps < 0)
>> +               goto error;
>> +
>> +       RTE_LOG(DEBUG, EAL, "Found %d memory maps for device %s\n",
>> +               nb_maps, dev->sysfs_name);
>> +
>> +       return uio_res;
>> +
>> + error:
>> +       vmbus_uio_free_resource(dev, uio_res);
>> +       return NULL;
>> +}
>> +
>> +static int
>> +vmbus_uio_map_resource_by_index(struct rte_vmbus_device *dev,
>> +                               unsigned int res_idx,
>> +                               struct mapped_vmbus_resource *uio_res,
>> +                               unsigned int map_idx)
>> +{
>> +       struct vmbus_map *maps = uio_res->maps;
>> +       char devname[PATH_MAX];
>> +       void *mapaddr;
>> +       int fd;
>> +
>> +       snprintf(devname, sizeof(devname),
>> +                "/sys/bus/vmbus/%s/resource%u", dev->sysfs_name, res_idx);
>> +
>> +       fd = open(devname, O_RDWR);
>> +       if (fd < 0) {
>> +               RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
>> +                               devname, strerror(errno));
>> +               return -1;
>> +       }
>> +
>> +       /* allocate memory to keep path */
>> +       maps[map_idx].path = rte_malloc(NULL, strlen(devname) + 1, 0);
>> +       if (maps[map_idx].path == NULL) {
>> +               RTE_LOG(ERR, EAL, "Cannot allocate memory for path: %s\n",
>> +                               strerror(errno));
>> +               return -1;
>> +       }
>> +
>> +       /* try mapping somewhere close to the end of hugepages */
>> +       if (vmbus_map_addr == NULL)
>> +               vmbus_map_addr = pci_find_max_end_va();
>> +
>> +       mapaddr = vmbus_map_resource(vmbus_map_addr, fd, 0,
>> +                                    dev->mem_resource[res_idx].len, 0);
>> +       close(fd);
>> +       if (mapaddr == MAP_FAILED) {
>> +               rte_free(maps[map_idx].path);
>> +               return -1;
>> +       }
>> +
>> +       vmbus_map_addr = RTE_PTR_ADD(mapaddr,
>> +                                    dev->mem_resource[res_idx].len);
>> +
>> +       maps[map_idx].phaddr = dev->mem_resource[res_idx].phys_addr;
>> +       maps[map_idx].size = dev->mem_resource[res_idx].len;
>> +       maps[map_idx].addr = mapaddr;
>> +       maps[map_idx].offset = 0;
>> +       strcpy(maps[map_idx].path, devname);
>> +       dev->mem_resource[res_idx].addr = mapaddr;
>> +
>> +       return 0;
>> +}
>> +
>> +static void
>> +vmbus_uio_unmap(struct mapped_vmbus_resource *uio_res)
>> +{
>> +       int i;
>> +
>> +       if (uio_res == NULL)
>> +               return;
>> +
>> +       for (i = 0; i != uio_res->nb_maps; i++) {
>> +               vmbus_unmap_resource(uio_res->maps[i].addr,
>> +                                    uio_res->maps[i].size);
>> +
>> +               if (rte_eal_process_type() == RTE_PROC_PRIMARY)
>> +                       rte_free(uio_res->maps[i].path);
>> +       }
>> +}
>> +
>> +static struct mapped_vmbus_resource *
>> +vmbus_uio_find_resource(struct rte_vmbus_device *dev)
>> +{
>> +       struct mapped_vmbus_resource *uio_res;
>> +       struct mapped_vmbus_res_list *uio_res_list =
>> +                       RTE_TAILQ_CAST(rte_vmbus_uio_tailq.head,
>> +                                      mapped_vmbus_res_list);
>> +
>> +       if (dev == NULL)
>> +               return NULL;
>> +
>> +       TAILQ_FOREACH(uio_res, uio_res_list, next) {
>> +               if (uuid_compare(uio_res->uuid, dev->device_id) == 0)
>> +                       return uio_res;
>> +       }
>> +       return NULL;
>> +}
>> +
>> +/* unmap the VMBUS resource of a VMBUS device in virtual memory */
>> +static void
>> +vmbus_uio_unmap_resource(struct rte_vmbus_device *dev)
>> +{
>> +       struct mapped_vmbus_resource *uio_res;
>> +       struct mapped_vmbus_res_list *uio_res_list =
>> +                       RTE_TAILQ_CAST(rte_vmbus_uio_tailq.head,
>> +                                      mapped_vmbus_res_list);
>> +
>> +       if (dev == NULL)
>> +               return;
>> +
>> +       /* find an entry for the device */
>> +       uio_res = vmbus_uio_find_resource(dev);
>> +       if (uio_res == NULL)
>> +               return;
>> +
>> +       /* secondary processes - just free maps */
>> +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
>> +               return vmbus_uio_unmap(uio_res);
>> +
>> +       TAILQ_REMOVE(uio_res_list, uio_res, next);
>> +
>> +       /* unmap all resources */
>> +       vmbus_uio_unmap(uio_res);
>> +
>> +       /* free uio resource */
>> +       rte_free(uio_res);
>> +
>> +       /* close fd if in primary process */
>> +       close(dev->intr_handle.fd);
>> +       if (dev->intr_handle.uio_cfg_fd >= 0) {
>> +               close(dev->intr_handle.uio_cfg_fd);
>> +               dev->intr_handle.uio_cfg_fd = -1;
>> +       }
>> +
>> +       dev->intr_handle.fd = -1;
>> +       dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
>> +}
>> +
>> +static int
>> +vmbus_uio_map_secondary(struct rte_vmbus_device *dev)
>> +{
>> +       struct mapped_vmbus_resource *uio_res;
>> +       struct mapped_vmbus_res_list *uio_res_list =
>> +                       RTE_TAILQ_CAST(rte_vmbus_uio_tailq.head,
>> +                                      mapped_vmbus_res_list);
>> +
>> +       TAILQ_FOREACH(uio_res, uio_res_list, next) {
>> +               int i;
>> +
>> +               /* skip this element if it doesn't match our id */
>> +               if (uuid_compare(uio_res->uuid, dev->device_id))
>> +                       continue;
>> +
>> +               for (i = 0; i != uio_res->nb_maps; i++) {
>> +                       void *mapaddr;
>> +                       int fd;
>> +
>> +                       fd = open(uio_res->maps[i].path, O_RDWR);
>> +                       if (fd < 0) {
>> +                               RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
>> +                                       uio_res->maps[i].path, strerror(errno));
>> +                               return -1;
>> +                       }
>> +
>> +                       mapaddr = vmbus_map_resource(uio_res->maps[i].addr, fd,
>> +                                                    uio_res->maps[i].offset,
>> +                                                    uio_res->maps[i].size, 0);
>> +                       /* fd is not needed in slave process, close it */
>> +                       close(fd);
>> +
>> +                       if (mapaddr == uio_res->maps[i].addr)
>> +                               continue;
>> +
>> +                       RTE_LOG(ERR, EAL,
>> +                               "Cannot mmap device resource file %s to address: %p\n",
>> +                               uio_res->maps[i].path,
>> +                               uio_res->maps[i].addr);
>> +
>> +                       /* unmap addrs correctly mapped */
>> +                       while (i != 0) {
>> +                               --i;
>> +                               vmbus_unmap_resource(uio_res->maps[i].addr,
>> +                                                    uio_res->maps[i].size);
>> +                       }
>> +                       return -1;
>> +
>> +               }
>> +               return 0;
>> +       }
>> +
>> +       RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
>> +       return 1;
>> +}
>> +
>> +/* map the resources of a vmbus device in virtual memory */
>> +int
>> +rte_eal_vmbus_map_device(struct rte_vmbus_device *dev)
>> +{
>> +       struct mapped_vmbus_resource *uio_res;
>> +       struct mapped_vmbus_res_list *uio_res_list =
>> +               RTE_TAILQ_CAST(rte_vmbus_uio_tailq.head, mapped_vmbus_res_list);
>> +       int i, ret, map_idx = 0;
>> +
>> +       dev->intr_handle.fd = -1;
>> +       dev->intr_handle.uio_cfg_fd = -1;
>> +       dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
>> +
>> +       /* secondary processes - use already recorded details */
>> +       if (rte_eal_process_type() != RTE_PROC_PRIMARY)
>> +               return vmbus_uio_map_secondary(dev);
>> +
>> +       /* allocate uio resource */
>> +       uio_res = vmbus_uio_alloc_resource(dev);
>> +       if (uio_res == NULL)
>> +               return -1;
>> +
>> +       /* Map all BARs */
>> +       for (i = 0; i != VMBUS_MAX_RESOURCE; i++) {
>> +               uint64_t phaddr;
>> +
>> +               /* skip empty BAR */
>> +               phaddr = dev->mem_resource[i].phys_addr;
>> +               if (phaddr == 0)
>> +                       continue;
>> +
>> +               ret = vmbus_uio_map_resource_by_index(dev, i,
>> +                                                     uio_res, map_idx);
>> +               if (ret)
>> +                       goto error;
>> +
>> +               map_idx++;
>> +       }
>> +
>> +       uio_res->nb_maps = map_idx;
>> +
>> +       TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);
>> +
>> +       return 0;
>> +error:
>> +       for (i = 0; i < map_idx; i++) {
>> +               vmbus_unmap_resource(uio_res->maps[i].addr,
>> +                                    uio_res->maps[i].size);
>> +               rte_free(uio_res->maps[i].path);
>> +       }
>> +       vmbus_uio_free_resource(dev, uio_res);
>> +       return -1;
>> +}
>> +
>> +/* Scan one vmbus sysfs entry, and fill the devices list from it. */
>> +static int
>> +vmbus_scan_one(const char *name)
>> +{
>> +       struct rte_vmbus_device *dev, *dev2;
>> +       char filename[PATH_MAX];
>> +       char dirname[PATH_MAX];
>> +       unsigned long tmp;
>> +
>> +       dev = malloc(sizeof(*dev) + strlen(name) + 1);
>> +       if (dev == NULL)
>> +               return -1;
>> +
>> +       memset(dev, 0, sizeof(*dev));
>> +       strcpy(dev->sysfs_name, name);
>> +       if (dev->sysfs_name == NULL)
>> +               goto error;
>> +
>> +       /* sysfs base directory
>> +        *   /sys/bus/vmbus/devices/7a08391f-f5a0-4ac0-9802-d13fd964f8df
>> +        * or on older kernel
>> +        *   /sys/bus/vmbus/devices/vmbus_1
>> +        */
>> +       snprintf(dirname, sizeof(dirname), "%s/%s",
>> +                SYSFS_VMBUS_DEVICES, name);
>> +
>> +       /* get device id */
>> +       snprintf(filename, sizeof(filename), "%s/device_id", dirname);
>> +       if (vmbus_get_sysfs_uuid(filename, dev->device_id) < 0)
>> +               goto error;
>> +
>> +       /* get device class  */
>> +       snprintf(filename, sizeof(filename), "%s/class_id", dirname);
>> +       if (vmbus_get_sysfs_uuid(filename, dev->class_id) < 0)
>> +               goto error;
>> +
>> +       /* get relid */
>> +       snprintf(filename, sizeof(filename), "%s/id", dirname);
>> +       if (eal_parse_sysfs_value(filename, &tmp) < 0)
>> +               goto error;
>> +       dev->relid = tmp;
>> +
>> +       /* get monitor id */
>> +       snprintf(filename, sizeof(filename), "%s/monitor_id", dirname);
>> +       if (eal_parse_sysfs_value(filename, &tmp) < 0)
>> +               goto error;
>> +       dev->monitor_id = tmp;
>> +
>> +       /* get numa node */
>> +       snprintf(filename, sizeof(filename), "%s/numa_node",
>> +                dirname);
>> +       if (eal_parse_sysfs_value(filename, &tmp) < 0)
>> +               /* if no NUMA support, set default to 0 */
>> +               dev->device.numa_node = 0;
>> +       else
>> +               dev->device.numa_node = tmp;
>> +
>> +       /* device is valid, add in list (sorted) */
>> +       RTE_LOG(DEBUG, EAL, "Adding vmbus device %s\n", name);
>> +
>> +       TAILQ_FOREACH(dev2, &vmbus_device_list, next) {
>> +               int ret;
>> +
>> +               ret = uuid_compare(dev->device_id, dev->device_id);
>> +               if (ret > 0)
>> +                       continue;
>> +
>> +               if (ret < 0) {
>> +                       TAILQ_INSERT_BEFORE(dev2, dev, next);
>> +                       rte_eal_device_insert(&dev->device);
>> +               } else { /* already registered */
>> +                       memmove(dev2->mem_resource, dev->mem_resource,
>> +                               sizeof(dev->mem_resource));
>> +                       free(dev);
>> +               }
>> +               return 0;
>> +       }
>> +
>> +       rte_eal_device_insert(&dev->device);
>> +       TAILQ_INSERT_TAIL(&vmbus_device_list, dev, next);
>> +
>> +       return 0;
>> +error:
>> +       free(dev);
>> +       return -1;
>> +}
>> +
>> +/*
>> + * Scan the content of the vmbus, and the devices in the devices list
>> + */
>> +static int
>> +vmbus_scan(void)
>> +{
>> +       struct dirent *e;
>> +       DIR *dir;
>> +
>> +       dir = opendir(SYSFS_VMBUS_DEVICES);
>> +       if (dir == NULL) {
>> +               if (errno == ENOENT)
>> +                       return 0;
>> +
>> +               RTE_LOG(ERR, EAL, "%s(): opendir failed: %s\n",
>> +                       __func__, strerror(errno));
>> +               return -1;
>> +       }
>> +
>> +       while ((e = readdir(dir)) != NULL) {
>> +               if (e->d_name[0] == '.')
>> +                       continue;
>> +
>> +               if (vmbus_scan_one(e->d_name) < 0)
>> +                       goto error;
>> +       }
>> +       closedir(dir);
>> +       return 0;
>> +
>> +error:
>> +       closedir(dir);
>> +       return -1;
>> +}
>> +
>> +/* Init the VMBUS EAL subsystem */
>> +int rte_eal_vmbus_init(void)
>> +{
>> +       /* VMBUS can be disabled */
>> +       if (internal_config.no_vmbus)
>> +               return 0;
>> +
>> +       if (vmbus_scan() < 0) {
>> +               RTE_LOG(ERR, EAL, "%s(): Cannot scan vmbus\n", __func__);
>> +               return -1;
>> +       }
>> +       return 0;
>> +}
>> +
>> +/* Below is PROBE part of eal_vmbus library */
>> +
>> +/*
>> + * If device ID match, call the devinit() function of the driver.
>> + */
>> +static int
>> +rte_eal_vmbus_probe_one_driver(struct rte_vmbus_driver *dr,
>> +                              struct rte_vmbus_device *dev)
>> +{
>> +       const uuid_t *id_table;
>> +
>> +       RTE_LOG(DEBUG, EAL, "  probe driver: %s\n", dr->driver.name);
>> +
>> +       for (id_table = dr->id_table; !uuid_is_null(*id_table); ++id_table) {
>> +               struct rte_devargs *args;
>> +               char guid[UUID_BUF_SZ];
>> +               int ret;
>> +
>> +               /* skip devices not assocaited with this device class */
>> +               if (uuid_compare(*id_table, dev->class_id) != 0)
>> +                       continue;
>> +
>> +               uuid_unparse(dev->device_id, guid);
>> +               RTE_LOG(INFO, EAL, "VMBUS device %s on NUMA socket %i\n",
>> +                       guid, dev->device.numa_node);
>> +
>> +               /* no initialization when blacklisted, return without error */
>> +               args = dev->device.devargs;
>> +               if (args && args->type == RTE_DEVTYPE_BLACKLISTED_VMBUS) {
>> +                       RTE_LOG(INFO, EAL, "  Device is blacklisted, not initializing\n");
>> +                       return 1;
>> +               }
>> +
>> +               RTE_LOG(INFO, EAL, "  probe driver: %s\n", dr->driver.name);
>> +
>> +               /* map resources for device */
>> +               ret = rte_eal_vmbus_map_device(dev);
>> +               if (ret != 0)
>> +                       return ret;
>> +
>> +               /* reference driver structure */
>> +               dev->driver = dr;
>> +
>> +               /* call the driver probe() function */
>> +               ret = dr->probe(dr, dev);
>> +               if (ret)
>> +                       dev->driver = NULL;
>> +
>> +               return ret;
>> +       }
>> +
>> +       /* return positive value if driver doesn't support this device */
>> +       return 1;
>> +}
>> +
>> +
>> +/*
>> + * If vendor/device ID match, call the remove() function of the
>> + * driver.
>> + */
>> +static int
>> +vmbus_detach_dev(struct rte_vmbus_driver *dr,
>> +                struct rte_vmbus_device *dev)
>> +{
>> +       const uuid_t *id_table;
>> +
>> +       for (id_table = dr->id_table; !uuid_is_null(*id_table); ++id_table) {
>> +               char guid[UUID_BUF_SZ];
>> +
>> +               /* skip devices not assocaited with this device class */
>> +               if (uuid_compare(*id_table, dev->class_id) != 0)
>> +                       continue;
>> +
>> +               uuid_unparse(dev->device_id, guid);
>> +               RTE_LOG(INFO, EAL, "VMBUS device %s on NUMA socket %i\n",
>> +                       guid, dev->device.numa_node);
>> +
>> +               RTE_LOG(DEBUG, EAL, "  remove driver: %s\n", dr->driver.name);
>> +
>> +               if (dr->remove && (dr->remove(dev) < 0))
>> +                       return -1;      /* negative value is an error */
>> +
>> +               /* clear driver structure */
>> +               dev->driver = NULL;
>> +
>> +               vmbus_uio_unmap_resource(dev);
>> +               return 0;
>> +       }
>> +
>> +       /* return positive value if driver doesn't support this device */
>> +       return 1;
>> +}
>> +
>> +/*
>> + * call the devinit() function of all
>> + * registered drivers for the vmbus device. Return -1 if no driver is
>> + * found for this class of vmbus device.
>> + * The present assumption is that we have drivers only for vmbus network
>> + * devices. That's why we don't check driver's id_table now.
>> + */
>> +static int
>> +vmbus_probe_all_drivers(struct rte_vmbus_device *dev)
>> +{
>> +       struct rte_vmbus_driver *dr = NULL;
>> +       int ret;
>> +
>> +       TAILQ_FOREACH(dr, &vmbus_driver_list, next) {
>> +               ret = rte_eal_vmbus_probe_one_driver(dr, dev);
>> +               if (ret < 0) {
>> +                       /* negative value is an error */
>> +                       RTE_LOG(ERR, EAL, "Failed to probe driver %s\n",
>> +                               dr->driver.name);
>> +                       return -1;
>> +               }
>> +               /* positive value means driver doesn't support it */
>> +               if (ret > 0)
>> +                       continue;
>> +
>> +               return 0;
>> +       }
>> +
>> +       return 1;
>> +}
>> +
>> +
>> +/*
>> + * If device ID matches, call the remove() function of all
>> + * registered driver for the given device. Return -1 if initialization
>> + * failed, return 1 if no driver is found for this device.
>> + */
>> +static int
>> +vmbus_detach_all_drivers(struct rte_vmbus_device *dev)
>> +{
>> +       struct rte_vmbus_driver *dr;
>> +       int rc = 0;
>> +
>> +       if (dev == NULL)
>> +               return -1;
>> +
>> +       TAILQ_FOREACH(dr, &vmbus_driver_list, next) {
>> +               rc = vmbus_detach_dev(dr, dev);
>> +               if (rc < 0)
>> +                       /* negative value is an error */
>> +                       return -1;
>> +               if (rc > 0)
>> +                       /* positive value means driver doesn't support it */
>> +                       continue;
>> +               return 0;
>> +       }
>> +       return 1;
>> +}
>> +
>> +/* Detach device specified by its VMBUS id */
>> +int
>> +rte_eal_vmbus_detach(uuid_t device_id)
>> +{
>> +       struct rte_vmbus_device *dev;
>> +       char ubuf[UUID_BUF_SZ];
>> +
>> +       TAILQ_FOREACH(dev, &vmbus_device_list, next) {
>> +               if (uuid_compare(dev->device_id, device_id) != 0)
>> +                       continue;
>> +
>> +               if (vmbus_detach_all_drivers(dev) < 0)
>> +                       goto err_return;
>> +
>> +               TAILQ_REMOVE(&vmbus_device_list, dev, next);
>> +               free(dev);
>> +               return 0;
>> +       }
>> +       return -1;
>> +
>> +err_return:
>> +       uuid_unparse(device_id, ubuf);
>> +       RTE_LOG(WARNING, EAL, "Requested device %s cannot be used\n",
>> +               ubuf);
>> +       return -1;
>> +}
>> +
>> +/*
>> + * Scan the vmbus, and call the devinit() function for
>> + * all registered drivers that have a matching entry in its id_table
>> + * for discovered devices.
>> + */
>> +int
>> +rte_eal_vmbus_probe(void)
>> +{
>> +       struct rte_vmbus_device *dev = NULL;
>> +
>> +       TAILQ_FOREACH(dev, &vmbus_device_list, next) {
>> +               char ubuf[UUID_BUF_SZ];
>> +
>> +               uuid_unparse(dev->device_id, ubuf);
>> +
>> +               RTE_LOG(DEBUG, EAL, "Probing driver for device %s ...\n",
>> +                       ubuf);
>> +               vmbus_probe_all_drivers(dev);
>> +       }
>> +       return 0;
>> +}
>> +
>> +/* register vmbus driver */
>> +void
>> +rte_eal_vmbus_register(struct rte_vmbus_driver *driver)
>> +{
>> +       TAILQ_INSERT_TAIL(&vmbus_driver_list, driver, next);
>> +}
>> +
>> +/* unregister vmbus driver */
>> +void
>> +rte_eal_vmbus_unregister(struct rte_vmbus_driver *driver)
>> +{
>> +       TAILQ_REMOVE(&vmbus_driver_list, driver, next);
>> +}
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>> index 7c212096..b69af0f0 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -3334,3 +3334,93 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
>>                                 -ENOTSUP);
>>         return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask, en);
>>  }
>> +
>> +
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +int
>> +rte_eth_dev_vmbus_probe(struct rte_vmbus_driver *vmbus_drv,
>> +                       struct rte_vmbus_device *vmbus_dev)
>> +{
>> +       struct eth_driver  *eth_drv = (struct eth_driver *)vmbus_drv;
>> +       struct rte_eth_dev *eth_dev;
>> +       char ustr[UUID_BUF_SZ];
>> +       int diag;
>> +
>> +       uuid_unparse(vmbus_dev->device_id, ustr);
>> +
>> +       eth_dev = rte_eth_dev_allocate(ustr);
>> +       if (eth_dev == NULL)
>> +               return -ENOMEM;
>> +
>> +       if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
>> +               eth_dev->data->dev_private = rte_zmalloc("ethdev private structure",
>> +                                 eth_drv->dev_private_size,
>> +                                 RTE_CACHE_LINE_SIZE);
>> +               if (eth_dev->data->dev_private == NULL)
>> +                       rte_panic("Cannot allocate memzone for private port data\n");
>> +       }
>> +
>> +       eth_dev->device = &vmbus_dev->device;
>> +       eth_dev->driver = eth_drv;
>> +       eth_dev->data->rx_mbuf_alloc_failed = 0;
>> +
>> +       /* init user callbacks */
>> +       TAILQ_INIT(&(eth_dev->link_intr_cbs));
>> +
>> +       /*
>> +        * Set the default maximum frame size.
>> +        */
>> +       eth_dev->data->mtu = ETHER_MTU;
>> +
>> +       /* Invoke PMD device initialization function */
>> +       diag = (*eth_drv->eth_dev_init)(eth_dev);
>> +       if (diag == 0)
>> +               return 0;
>> +
>> +       RTE_PMD_DEBUG_TRACE("driver %s: eth_dev_init(%s) failed\n",
>> +                           vmbus_drv->driver.name, ustr);
>> +
>> +       if (rte_eal_process_type() == RTE_PROC_PRIMARY)
>> +               rte_free(eth_dev->data->dev_private);
>> +
>> +       return diag;
>> +}
>> +
>> +int
>> +rte_eth_dev_vmbus_remove(struct rte_vmbus_device *vmbus_dev)
>> +{
>> +       const struct eth_driver *eth_drv;
>> +       struct rte_eth_dev *eth_dev;
>> +       char ustr[UUID_BUF_SZ];
>> +       int ret;
>> +
>> +       if (vmbus_dev == NULL)
>> +               return -EINVAL;
>> +
>> +       uuid_unparse(vmbus_dev->device_id, ustr);
>> +       eth_dev = rte_eth_dev_allocated(ustr);
>> +       if (eth_dev == NULL)
>> +               return -ENODEV;
>> +
>> +       eth_drv = (const struct eth_driver *)vmbus_dev->driver;
>> +
>> +       /* Invoke PMD device uninit function */
>> +       if (*eth_drv->eth_dev_uninit) {
>> +               ret = (*eth_drv->eth_dev_uninit)(eth_dev);
>> +               if (ret)
>> +                       return ret;
>> +       }
>> +
>> +       /* free ether device */
>> +       rte_eth_dev_release_port(eth_dev);
>> +
>> +       if (rte_eal_process_type() == RTE_PROC_PRIMARY)
>> +               rte_free(eth_dev->data->dev_private);
>> +
>> +       eth_dev->device = NULL;
>> +       eth_dev->driver = NULL;
>> +       eth_dev->data = NULL;
>> +
>> +       return 0;
>> +}
>> +#endif
>> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>> index 1a62a322..2a8c1eed 100644
>> --- a/lib/librte_ether/rte_ethdev.h
>> +++ b/lib/librte_ether/rte_ethdev.h
>> @@ -180,6 +180,9 @@ extern "C" {
>>  #include <rte_log.h>
>>  #include <rte_interrupts.h>
>>  #include <rte_pci.h>
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +#include <rte_vmbus.h>
>> +#endif
>>  #include <rte_dev.h>
>>  #include <rte_devargs.h>
>>  #include <rte_errno.h>
>> @@ -1908,6 +1911,17 @@ struct rte_pci_eth_driver {
>>         struct eth_driver       eth_drv;        /**< Ethernet driver. */
>>  };
>>
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +/**
>> + * @internal
>> + * The structure associated with a PMD VMBUS Ethernet driver.
>> + */
>> +struct rte_vmbus_eth_driver {
>> +       struct rte_vmbus_driver vmbus_drv;      /**< Underlying VMBUS driver. */
>> +       struct eth_driver       eth_drv;        /**< Ethernet driver. */
>> +};
>> +#endif
>> +
>>  /**
>>   * Convert a numerical speed in Mbps to a bitmap flag that can be used in
>>   * the bitmap link_speeds of the struct rte_eth_conf
>> @@ -4543,6 +4557,23 @@ int rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,
>>   */
>>  int rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev);
>>
>> +#ifdef RTE_LIBRTE_HV_PMD
>> +/**
>> + * @internal
>> + * Wrapper for use by vmbus drivers as a .probe function to attach to a ethdev
>> + * interface.
>> + */
>> +int rte_eth_dev_vmbus_probe(struct rte_vmbus_driver *vmbus_drv,
>> +                         struct rte_vmbus_device *vmbus_dev);
>> +
>> +/**
>> + * @internal
>> + * Wrapper for use by vmbus drivers as a .remove function to detach a ethdev
>> + * interface.
>> + */
>> +int rte_eth_dev_vmbus_remove(struct rte_vmbus_device *vmbus_dev);
>> +#endif
>> +
>>  #ifdef __cplusplus
>>  }
>>  #endif
>> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
>> index f75f0e24..6b304084 100644
>> --- a/mk/rte.app.mk
>> +++ b/mk/rte.app.mk
>> @@ -130,6 +130,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
>>  endif # $(CONFIG_RTE_LIBRTE_VHOST)
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD)    += -lrte_pmd_vmxnet3_uio
>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_HV_PMD)        += -luuid
>>
>>  ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y)
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)    += -lrte_pmd_aesni_mb
>> --
>> 2.11.0
>>

^ permalink raw reply

* Re: [PATCH] app/testpmd: fix ixgbe private API calling
From: Lu, Wenzhuo @ 2017-01-12  1:08 UTC (permalink / raw)
  To: Iremonger, Bernard, dev@dpdk.org; +Cc: Wu, Jingjing, stable@dpdk.org
In-Reply-To: <8CEF83825BEC744B83065625E567D7C224D1CB14@IRSMSX108.ger.corp.intel.com>

Hi Bernard,

> -----Original Message-----
> From: Iremonger, Bernard
> Sent: Wednesday, January 11, 2017 6:27 PM
> To: Lu, Wenzhuo; dev@dpdk.org
> Cc: Wu, Jingjing; stable@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] app/testpmd: fix ixgbe private API calling
> 
> Hi Wenzhuo,
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wenzhuo Lu
> > Sent: Wednesday, January 11, 2017 2:48 AM
> > To: dev@dpdk.org
> > Cc: Wu, Jingjing <jingjing.wu@intel.com>; Lu, Wenzhuo
> > <wenzhuo.lu@intel.com>; stable@dpdk.org
> > Subject: [dpdk-dev] [PATCH] app/testpmd: fix ixgbe private API calling
> >
> > Some ixgbe private APIs are added to expose ixgbe specific functions.
> > When they're used by testpmd, there's no check for if the NICs are ixgbe.
> > Other NICs also have chance to  call these APIs.
> > This patch add the check and the feedback print.
> 
> I am not sure that testpmd is the right place to do this.
> The rte_pmd_ixgbe_* functions are public API's which can be called by other
> applications.
> The checks should be in the rte_pmd_ixgbe_* API's
To be safer, it's better to add a check in the APIs. 
But the APIs is so called private API, not really public.  Considering if we have the same function on different NICs, for example we have rte_pmd_ixgbe_a and rte_pmd_i40e_a.
APP still need to call them one by one, like
ret = rte_pmd_ixgbe_a;
ret = rte_pmd_i40e_a;

then, why not add the check, like
If (NIC is ixgbe)
	ret = rte_pmd_ixgbe_a;
if (NIC is i40e)
	ret = rte_pmd_i40e_a;

testpmd is an example to let the users to know how to use the APIs. They should follow the example.
How about your opinion?

^ permalink raw reply

* Re: [PATCH 8/8] eal: VMBUS infrastructure
From: Stephen Hemminger @ 2017-01-12  1:20 UTC (permalink / raw)
  To: Jan Blunck; +Cc: dev, Stephen Hemminger
In-Reply-To: <CALe+Z00rdh-cfRZuMB+b-8OwjR1wi+UNntboYizjLsMwLr=BJg@mail.gmail.com>

On Wed, 11 Jan 2017 22:13:32 +0100
Jan Blunck <jblunck@infradead.org> wrote:

> >> +static void *vmbus_map_addr;
> >> +
> >> +static struct rte_tailq_elem rte_vmbus_uio_tailq = {
> >> +       .name = "UIO_RESOURCE_LIST",  
> 
> This should be VMBUS_UIO_RESOURCE_LIST to not collide with rte_uio_tailq.

Ok, please trim review comments. Trying to find comment in middle of
patch is a nuisance.

^ permalink raw reply

* Re: [PATCH v7 2/5] net/e1000: add firmware version get
From: Yang, Qiming @ 2017-01-12  1:25 UTC (permalink / raw)
  To: Horton, Remy, dev@dpdk.org; +Cc: Yigit, Ferruh
In-Reply-To: <2ecd0288-6f58-9f55-48fc-2dc3e4e97983@intel.com>

-----Original Message-----
From: Horton, Remy 
Sent: Wednesday, January 11, 2017 11:45 PM
To: Yang, Qiming <qiming.yang@intel.com>; dev@dpdk.org
Cc: Yigit, Ferruh <ferruh.yigit@intel.com>
Subject: Re: [PATCH v7 2/5] net/e1000: add firmware version get


On 11/01/2017 06:41, Qiming Yang wrote:
> This patch adds a new function eth_igb_fw_version_get.
>
> Signed-off-by: Qiming Yang <qiming.yang@intel.com>
[..]
> +static int
> +eth_igb_fw_version_get(struct rte_eth_dev *dev, char *fw_version,
> +		       size_t fw_size)
> +{
> +	struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +	struct e1000_fw_version fw;
> +
> +	if (fw_size < 16) {

Magic number. Ought to be #define value (or at least a comment explaining the choice of size)..

Otherwise looking good.. :)

..Remy

Ok, I'll use a macro to define this number. Thanks~
Qiming 

^ permalink raw reply

* Re: [PATCH v2 3/3] net/vhost: vHost support to free consumed buffers
From: Stephen Hemminger @ 2017-01-12  1:39 UTC (permalink / raw)
  To: Billy McFall; +Cc: thomas.monjalon, wenzhuo.lu, dev
In-Reply-To: <20170111200323.12938-4-bmcfall@redhat.com>

On Wed, 11 Jan 2017 15:03:23 -0500
Billy McFall <bmcfall@redhat.com> wrote:

> Add support to the vHostdriver for the new API to force free consumed
> buffers on Tx ring. vHost does not cache the mbufs so there is no work
> to do.
> 
> Signed-off-by: Billy McFall <bmcfall@redhat.com>
> ---
>  drivers/net/vhost/rte_eth_vhost.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
> index 766d4ef..6493d56 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -939,6 +939,16 @@ eth_queue_release(void *q)
>  }
>  
>  static int
> +eth_tx_done_cleanup(void *txq __rte_unused, uint32_t free_cnt __rte_unused)
> +{
> +	/*
> +	 * vHost does not hang onto mbuf. eth_vhost_tx() copies packet data
> +	 * and releases mbuf, so nothing to cleanup.
> +	 */
> +	return 0;
> +}
> +
> +static int
>  eth_link_update(struct rte_eth_dev *dev __rte_unused,
>  		int wait_to_complete __rte_unused)
>  {
> @@ -979,6 +989,7 @@ static const struct eth_dev_ops ops = {
>  	.tx_queue_setup = eth_tx_queue_setup,
>  	.rx_queue_release = eth_queue_release,
>  	.tx_queue_release = eth_queue_release,
> +	.tx_done_cleanup = eth_tx_done_cleanup,
>  	.link_update = eth_link_update,
>  	.stats_get = eth_stats_get,
>  	.stats_reset = eth_stats_reset,

Rather than change drivers, since this is not critical path, make
it optional to have tx_done_cleanup.

^ permalink raw reply

* Re: [PATCH v2 1/3] ethdev: new API to free consumed buffers in Tx ring
From: Stephen Hemminger @ 2017-01-12  1:41 UTC (permalink / raw)
  To: Billy McFall; +Cc: thomas.monjalon, wenzhuo.lu, dev
In-Reply-To: <20170111200323.12938-2-bmcfall@redhat.com>

On Wed, 11 Jan 2017 15:03:21 -0500
Billy McFall <bmcfall@redhat.com> wrote:

>  /**
> + * Request the driver to free mbufs currently cached by the driver. The
> + * driver will only free the mbuf if it is no longer in use. It is the
> + * application's responsibity to ensure rte_eth_tx_buffer_flush(..) is
> + * called if needed.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The value must be in the range [0, nb_tx_queue - 1] previously supplied
> + *   to rte_eth_dev_configure().
> + * @param free_cnt
> + *   Maximum number of packets to free. Use 0 to indicate all possible packets
> + *   should be freed. Note that a packet may be using multiple mbufs.
> + * @return
> + *   Failure: < 0
> + *     -ENODEV: Invalid interface
> + *     -ENOTSUP: Driver does not support function
> + *   Success: >= 0
> + *     0-n: Number of packets freed. More packets may still remain in ring that
> + *     are in use.
> + */
> +
> +static inline int
> +rte_eth_tx_done_cleanup(uint8_t port_id, uint16_t queue_id,  uint32_t free_cnt)

Extra white space.

> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +
> +	/* Validate Input Data. Bail if not valid or not supported. */
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
> +
> +	/* Call driver to free pending mbufs. */
> +	return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_queues[queue_id],
> +			free_cnt);
> +}
> +

It doesn't look like this is something that needs to be in critical
path. Therefore it makes no sense to inline it.

^ permalink raw reply

* Re: [PATCH v5 3/8] ethdev: reserve capability flags for PMD-specific API
From: Tiwei Bie @ 2017-01-12  2:08 UTC (permalink / raw)
  To: Olivier MATZ
  Cc: Thomas Monjalon, Ananyev, Konstantin, Adrien Mazarguil, dev,
	Lu, Wenzhuo, Mcnamara, John, Zhang, Helin, Dai, Wei, Wang, Xiao W
In-Reply-To: <20170111183248.38a27193@glumotte.dev.6wind.com>

On Wed, Jan 11, 2017 at 06:32:48PM +0100, Olivier MATZ wrote:
> Hi Tiwei, Hi Thomas,
> 
> On Mon, 09 Jan 2017 12:26:53 +0100, Thomas Monjalon
> <thomas.monjalon@6wind.com> wrote:
> > 2017-01-09 11:57, Tiwei Bie:
> > > On Sun, Jan 08, 2017 at 08:39:55PM +0800, Ananyev, Konstantin
> > > wrote:  
> > > > > Well my first reply to this thread was asking why isn't the
> > > > > whole API global from the start then?  
> > > > 
> > > > That's good question, and my preference would always be to have
> > > > the API to configure this feature as generic one.
> > > > I guess the main reason why it is not right now we don't reach an
> > > > agreement how this API should look like: 
> > > > http://dpdk.org/ml/archives/dev/2016-September/047810.html
> > > > But I'll leave it to the author to provide the real reason
> > > > here.   
> > > 
> > > Yes, currently this work just provided a thin layer over 82599's
> > > hardware MACsec offload support to allow users configure 82599's
> > > MACsec offload engine. The current API may be too specific and may
> > > need a rework to be used with other NICs.  
> > 
> > I think it is a really good approach to start such API privately in a
> > driver. It will give us more time and experience to design a proper
> > generic API.
> > 
> > Regarding the mbuf flag, it looks straight-forward, and as it is IEEE
> > standardized, I do not see any objection to add it now.
> > However, I will wait for the approval of Olivier - as maintainer of
> > mbuf.
> > 
> 
> Generally speaking, we have to be careful when introducing new mbuf
> flags, since we don't have so much of them (~25 remaining out of 64,
> which mean we may run out of them in 3-4 years).
> 
> In this particular case, despite the flag is added for an ixgbe-specific
> API, when MACsec will be implemented on another PMD, the exact same
> flag would also be needed. That's the same for the ethdev capability
> flag. Moreover, as Thomas stated, it's a standardized protocol so it's
> legitimate to have it included in rte_mbuf.h.
> 
> For me, having PMD-specific APIs for a new feature is not a problem,
> but I think we should try to have a generic API as soon as the feature
> is supported by several PMDs.
> 

Sure! Thank you very much!

Best regards,
Tiwei Bie

^ permalink raw reply

* Re: [dpdk-stable] [PATCH 1/2] net/virtio: fix performance regression due to TSO enabling
From: Yuanhan Liu @ 2017-01-12  2:30 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Jianbo Liu, Jerin Jacob, Jan Viktorin, Chao Zhu, dev,
	Tan Jianfeng, Wang Zhihong, Olivier Matz, Maxime Coquelin,
	Michael S. Tsirkin
In-Reply-To: <1610499.AMUobBPor6@xps13>

On Wed, Jan 11, 2017 at 03:51:22PM +0100, Thomas Monjalon wrote:
> 2017-01-11 12:27, Yuanhan Liu:
> > The fact that virtio net header is initiated to zero in PMD driver
> > init stage means that these costly writes are unnecessary and could
> > be avoided:
> > 
> >     if (hdr->csum_start != 0)
> >         hdr->csum_start = 0;
> > 
> > And that's what the macro ASSIGN_UNLESS_EQUAL does. With this, the
> > performance drop introduced by TSO enabling is recovered: it could
> > be up to 20% in micro benchmarking.
> 
> This patch is adding a condition to assignments.
> We need a benchmark on other architectures like ARM. Please anyone?

I think the cost of condition should be way lower than the cost from the
penalty introduced by the cache issue, that I don't see it would perform
bad on other platforms.

But, of course, testing is always welcome!

	--yliu
> 
> 
> [...]
> > +/* avoid write operation when necessary, to lessen cache issues */
> > +#define ASSIGN_UNLESS_EQUAL(var, val) do {	\
> > +	if ((var) != (val))			\
> > +		(var) = (val);			\
> > +} while (0)

^ permalink raw reply

* Re: [PATCH v5 3/8] ethdev: reserve capability flags for PMD-specific API
From: Yuanhan Liu @ 2017-01-12  2:42 UTC (permalink / raw)
  To: Olivier MATZ
  Cc: Thomas Monjalon, Tiwei Bie, Ananyev, Konstantin, Adrien Mazarguil,
	dev, Lu, Wenzhuo, Mcnamara, John, Zhang, Helin, Dai, Wei,
	Wang, Xiao W
In-Reply-To: <20170111183248.38a27193@glumotte.dev.6wind.com>

On Wed, Jan 11, 2017 at 06:32:48PM +0100, Olivier MATZ wrote:
> Generally speaking, we have to be careful when introducing new mbuf
> flags, since we don't have so much of them (~25 remaining out of 64,
> which mean we may run out of them in 3-4 years).
> 
> In this particular case, despite the flag is added for an ixgbe-specific
> API, when MACsec will be implemented on another PMD, the exact same
> flag would also be needed. That's the same for the ethdev capability
> flag. Moreover, as Thomas stated, it's a standardized protocol so it's
> legitimate to have it included in rte_mbuf.h.
> 
> For me, having PMD-specific APIs for a new feature is not a problem,
> but I think we should try to have a generic API as soon as the feature
> is supported by several PMDs.

JFYI, here is how the virtio handle the feature bits. It basically
just divides it to two parts.

>From virtio 1.0 spec (2.2 Feature Bits):

    Feature bits are allocated as follows:

    - 0 to 23 Feature bits for the specific device type

    - 24 to 32 Feature bits reserved for extensions to the queue and feature
      negotiation mechanisms

    - 33 and above Feature bits reserved for future extensions.

    Note: For example, feature bit 0 for a network device (i.e. Device ID 1)
    indicates that the device supports checksumming of packets.


	--yliu

^ permalink raw reply

* [PATCH] examples/vhost: fix calc err of nr_mbufs_per_core
From: Yong Wang @ 2017-01-12  3:52 UTC (permalink / raw)
  To: yuanhan.liu; +Cc: dev, Yong Wang

When calculating 'nr_mbufs_per_core', 'MAX_PKT_BURST' was mutiplied
twice. Fix it by removing one of them.

Fixes: bdb19b771e6f ("examples/vhost: fix mbuf allocation failure")

Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
---
 examples/vhost/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index ac1f6e2..81a6a8c 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1393,7 +1393,7 @@ static inline void __attribute__((always_inline))
 		mtu = 64 * 1024;
 
 	nr_mbufs_per_core  = (mtu + mbuf_size) * MAX_PKT_BURST /
-			(mbuf_size - RTE_PKTMBUF_HEADROOM) * MAX_PKT_BURST;
+			(mbuf_size - RTE_PKTMBUF_HEADROOM);
 	nr_mbufs_per_core += nr_rx_desc;
 	nr_mbufs_per_core  = RTE_MAX(nr_mbufs_per_core, nr_mbuf_cache);
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] examples/vhost: fix calc err of nr_mbufs_per_core
From: Yuanhan Liu @ 2017-01-12  2:58 UTC (permalink / raw)
  To: Yong Wang; +Cc: dev
In-Reply-To: <1484193137-4326-1-git-send-email-wang.yong19@zte.com.cn>

On Wed, Jan 11, 2017 at 10:52:17PM -0500, Yong Wang wrote:
> When calculating 'nr_mbufs_per_core', 'MAX_PKT_BURST' was mutiplied
> twice. Fix it by removing one of them.
> 
> Fixes: bdb19b771e6f ("examples/vhost: fix mbuf allocation failure")
> 
> Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>

Applied to dpdk-next-virtio.

Thanks.

	--yliu

^ permalink raw reply

* Re: [PATCH v2] examples/vhost: fix the wrong initialization of lcore_ids
From: Yuanhan Liu @ 2017-01-12  3:06 UTC (permalink / raw)
  To: Yong Wang; +Cc: huawei.xie, dev
In-Reply-To: <1484125186-31667-1-git-send-email-wang.yong19@zte.com.cn>

On Wed, Jan 11, 2017 at 03:59:46AM -0500, Yong Wang wrote:
> when "TAILQ_INIT()" was added to the loop of "for (lcore_id = 0; ...)"
> statement, the assignment to "lcore_ids" was removed out of the loop.
> It changed the original initialization of "lcore_ids".

opps.... and nice catch! Thanks.
> 
> Fix it by introducing two braces.
> 
> Fixes: 45657a5c6861 ("examples/vhost: use tailq to link vhost devices")
> 
> Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>

Applied to dpdk-next-virtio.

	--yliu

^ permalink raw reply

* Re: [PATCH v4] ethdev: fix port data mismatched in multiple process model
From: Yuanhan Liu @ 2017-01-12  3:10 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, stable, Bruce Richardson, Ferruh Yigit
In-Reply-To: <6191673.SI6i20B7c3@xps13>

On Wed, Jan 11, 2017 at 02:32:03PM +0100, Thomas Monjalon wrote:
> 2017-01-10 22:33, Yuanhan Liu:
> > On Mon, Jan 09, 2017 at 06:08:04PM +0100, Thomas Monjalon wrote:
> > > Hi Yuanhan,
> > > 
> > > Nit: the title should be "v4 1/6"
> > 
> > Oops, my bad.
> > 
> > > Except that, good patch :)
> > > 
> > > > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > > 
> > > Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> > 
> > Thanks for review! Mind if I apply it to the next-virtio tree?
> 
> Yes, please do.

Thanks!

Series applied to dpdk-next-virtio.

	--yliu

^ permalink raw reply

* Re: [PATCH v8 00/25] Support VFD on i40e
From: Lu, Wenzhuo @ 2017-01-12  3:21 UTC (permalink / raw)
  To: Vincent JARDIN, Yigit, Ferruh, Zhang, Helin, dev@dpdk.org
  Cc: 'JOSHI, KAUSTUBH', 'DANIELS, EDWARD',
	'ZELEZNIAK, ALEX', Chen, Jing D
In-Reply-To: <ea5d8fdb-2de3-3dbf-bbb5-5c2099c8c73a@6wind.com>

Hi Vincent,

> -----Original Message-----
> From: Vincent JARDIN [mailto:vincent.jardin@6wind.com]
> Sent: Wednesday, January 11, 2017 9:14 PM
> To: Yigit, Ferruh; Zhang, Helin; Lu, Wenzhuo; dev@dpdk.org
> Cc: 'JOSHI, KAUSTUBH'; 'DANIELS, EDWARD'; 'ZELEZNIAK, ALEX'
> Subject: Re: [dpdk-dev] [PATCH v8 00/25] Support VFD on i40e
> 
> Le 10/01/2017 à 22:32, Ferruh Yigit a écrit :
> > What do you think to continue high level DPDK PF discussion in mail
> > thread for other pathset? So that we can continue to work on this one.
> 
> First, we need to assess or not if it makes sense to go toward Linux kernel or
> DPDK based PF. If Linux kernel is used, then DPDK does not need VFD related
> modifications.
> 
> VFD demonstrates that there are some needs of features, but it pushes the
> new path of a fork of PF drivers.
We're not sure if we want to change and maintain a totally new DPDK PF either. 
So, we only change PMD code and not expose it to abstraction layer. It can only be used by some users who have the needs. 
It's experimental and comes from the users' requirement. If it's good enough and accepted by other NICs, we can expose it. 
If it's not good, it still can be a choice before Linux kernel provides.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox