DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/32] Support more features in Solarflare PMD
From: Andrew Rybchenko @ 2016-12-15 12:50 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit
In-Reply-To: <1480664691-26561-1-git-send-email-arybchenko@solarflare.com>

The patch series adds a number of features to Solarflare libefx-based
PMD. Basically one patch per feature.

The patches are grouped into one series since they touch nearby lines
in either PMD feature list, or dev_ops structure, or documentation.
So, patches cannot be applied in arbitrary order.

---

v2:
* Fix ICC and clang warnings
* Slightly change sfc_tso_{alloc,free}_tsoh_objs() prototypes


Andrew Rybchenko (17):
  net/sfc: implement MCDI logging callback
  net/sfc: support parameter to choose performance profile
  net/sfc: implement ethdev hook to get basic statistics
  net/sfc: support extended statistics
  net/sfc: support flow control settings get/set
  net/sfc: support link status change interrupt
  net/sfc: implement device operation to change MTU
  net/sfc: support link speed and duplex settings
  net/sfc: support checksum offloads on receive
  net/sfc: handle received packet type info provided by HW
  net/sfc: support callback to get receive queue information
  net/sfc: support Rx free threshold
  net/sfc: add callback to get RxQ pending descriptors count
  net/sfc: add RxQ descriptor done callback
  net/sfc: support scattered Rx DMA
  net/sfc: support deferred start of receive queues
  net/sfc/base: do not use enum type when values are bitmask

Artem Andreev (1):
  net/sfc: support link up/down

Ivan Malov (14):
  net/sfc: support promiscuous and all-multicast control
  net/sfc: support main (the first) MAC address change
  net/sfc: support multicast addresses list controls
  net/sfc: add callback to get transmit queue information
  net/sfc: support Tx free threshold
  net/sfc: support deferred start of transmit queues
  net/sfc: support VLAN offload on transmit path
  net/sfc: add basic stubs for RSS support on driver attach
  net/sfc: support RSS hash offload
  net/sfc: add callback to query RSS key and hash types config
  net/sfc: add callback to set RSS key and hash types config
  net/sfc: add callback to query RSS redirection table
  net/sfc: add callback to update RSS redirection table
  net/sfc: support firmware-assisted TSOv2

 config/common_base                   |   1 +
 doc/guides/nics/features/sfc_efx.ini |  22 +-
 doc/guides/nics/sfc_efx.rst          |  58 ++-
 drivers/net/sfc/Makefile             |   4 +
 drivers/net/sfc/base/ef10_rx.c       |   8 +-
 drivers/net/sfc/base/efx.h           |  12 +-
 drivers/net/sfc/base/efx_rx.c        |   8 +-
 drivers/net/sfc/efsys.h              |   8 +-
 drivers/net/sfc/sfc.c                | 126 ++++-
 drivers/net/sfc/sfc.h                |  46 ++
 drivers/net/sfc/sfc_ethdev.c         | 893 ++++++++++++++++++++++++++++++++++-
 drivers/net/sfc/sfc_ev.c             |  64 ++-
 drivers/net/sfc/sfc_ev.h             |   2 +
 drivers/net/sfc/sfc_intr.c           | 204 ++++++++
 drivers/net/sfc/sfc_kvargs.c         |   2 +
 drivers/net/sfc/sfc_kvargs.h         |  12 +
 drivers/net/sfc/sfc_mcdi.c           |  69 +++
 drivers/net/sfc/sfc_port.c           | 107 ++++-
 drivers/net/sfc/sfc_rx.c             | 288 ++++++++++-
 drivers/net/sfc/sfc_rx.h             |  16 +
 drivers/net/sfc/sfc_tso.c            | 200 ++++++++
 drivers/net/sfc/sfc_tweak.h          |   3 +
 drivers/net/sfc/sfc_tx.c             | 166 ++++++-
 drivers/net/sfc/sfc_tx.h             |  41 +-
 24 files changed, 2274 insertions(+), 86 deletions(-)
 create mode 100644 drivers/net/sfc/sfc_tso.c

-- 
2.5.5

^ permalink raw reply

* Re: [PATCH 00/31] Support more features in Solarflare PMD
From: Andrew Rybchenko @ 2016-12-15 12:50 UTC (permalink / raw)
  To: Ferruh Yigit, dev
In-Reply-To: <b29c4041-8558-ac66-b690-8d1a8e0a2397@intel.com>

On 12/09/2016 08:34 PM, Ferruh Yigit wrote:
> On 12/2/2016 7:44 AM, Andrew Rybchenko wrote:
>> The patch series adds a number of features to Solarflare libefx-based
>> PMD. Basically one patch per feature.
>>
>> The patches are grouped into one series since they touch nearby lines
>> in either PMD feature list, or dev_ops structure, or documentation.
>> So, patches cannot be applied in arbitrary order.
>>
>> The patch series should be applied after
>> [PATCH v2 00/55] Solarflare libefx-based PMD
>> (Message-ID: 1480436367-20749-1-git-send-email-arybchenko@solarflare.com)
>>
>>
>> Andrew Rybchenko (16):
>>    net/sfc: implement MCDI logging callback
>>    net/sfc: support parameter to choose performance profile
>>    net/sfc: implement ethdev hook to get basic statistics
>>    net/sfc: support extended statistics
>>    net/sfc: support flow control settings get/set
>>    net/sfc: support link status change interrupt
>>    net/sfc: implement device operation to change MTU
>>    net/sfc: support link speed and duplex settings
>>    net/sfc: support checksum offloads on receive
>>    net/sfc: handle received packet type info provided by HW
>>    net/sfc: support callback to get receive queue information
>>    net/sfc: support Rx free threshold
>>    net/sfc: add callback to get RxQ pending descriptors count
>>    net/sfc: add RxQ descriptor done callback
>>    net/sfc: support scattered Rx DMA
>>    net/sfc: support deferred start of receive queues
>>
>> Artem Andreev (1):
>>    net/sfc: support link up/down
>>
>> Ivan Malov (14):
>>    net/sfc: support promiscuous and all-multicast control
>>    net/sfc: support main (the first) MAC address change
>>    net/sfc: support multicast addresses list controls
>>    net/sfc: add callback to get transmit queue information
>>    net/sfc: support Tx free threshold
>>    net/sfc: support deferred start of transmit queues
>>    net/sfc: support VLAN offload on transmit path
>>    net/sfc: add basic stubs for RSS support on driver attach
>>    net/sfc: support RSS hash offload
>>    net/sfc: add callback to query RSS key and hash types config
>>    net/sfc: add callback to set RSS key and hash types config
>>    net/sfc: add callback to query RSS redirection table
>>    net/sfc: add callback to update RSS redirection table
>>    net/sfc: support firmware-assisted TSOv2
> Hi Andrew,
>
> I am getting following build errors for clang [1] and ICC [2]. I have
> not investigated the root cause, just copy-pasting here.
>
> For ICC, since you explicitly noted it is not supported, and reported
> warning is known, I believe it is safe the ignore this warning via
> "-wd188" CFLAGS option in the Makefile.

Hi Ferruh,

I think I prefer to fix these warnings in v2. May be other compilers 
will become more pedantic in the future.

Thanks,
Andrew.

>
> Thanks,
> ferruh
>
>
>
> [1] clang
> .../drivers/net/sfc/sfc_ethdev.c:1143:4: error: format specifies type
> 'unsigned short' but the argument has type 'int' [-Werror,-Wformat]
>                          EFX_RSS_TBL_SIZE);
>                          ^~~~~~~~~~~~~~~~
> .../drivers/net/sfc/base/efx.h:1866:26: note: expanded from macro
> 'EFX_RSS_TBL_SIZE'
> #define EFX_RSS_TBL_SIZE        128     /* Rows in RX indirection table */
>                                  ^~~
> .../drivers/net/sfc/sfc_log.h:51:19: note: expanded from macro 'sfc_err'
>          SFC_LOG(sa, ERR, __VA_ARGS__)
>                           ^~~~~~~~~~~
> .../drivers/net/sfc/sfc_log.h:47:18: note: expanded from macro 'SFC_LOG'
>                                  RTE_FMT_TAIL(__VA_ARGS__,)));           \
>                                               ^~~~~~~~~~~
> .../x86_64-native-linuxapp-clang/include/rte_common.h:345:32: note:
> expanded from macro 'RTE_FMT_TAIL'
> #define RTE_FMT_TAIL(fmt, ...) __VA_ARGS__
>                                 ^~~~~~~~~~~
> .../x86_64-native-linuxapp-clang/include/rte_common.h:343:39: note:
> expanded from macro 'RTE_FMT'
> #define RTE_FMT(fmt, ...) fmt "%.0s", __VA_ARGS__ ""
>                                        ^~~~~~~~~~~
> .../x86_64-native-linuxapp-clang/include/rte_log.h:258:32: note:
> expanded from macro 'RTE_LOG'
>                   RTE_LOGTYPE_ ## t, # t ": " __VA_ARGS__)
>                                               ^~~~~~~~~~~
>
>
>
> [2] ICC
> .../drivers/net/sfc/sfc_ethdev.c(1063): error #188: enumerated type
> mixed with another type
>                                     efx_hash_types, B_TRUE);
>                                     ^
>
> .../drivers/net/sfc/sfc_ethdev.c(1086): error #188: enumerated type
> mixed with another type
>                                    sa->rss_hash_types, B_TRUE) != 0)
>                                    ^
>
> compilation aborted for .../drivers/net/sfc/sfc_ethdev.c (code 2)
> make[7]: *** [sfc_ethdev.o] Error 2
> make[7]: *** Waiting for unfinished jobs....
> .../drivers/net/sfc/sfc_rx.c(820): error #188: enumerated type mixed
> with another type
>                                             sa->rss_hash_types, B_TRUE);
>                                             ^
>
> compilation aborted for .../drivers/net/sfc/sfc_rx.c (code 2)
>

^ permalink raw reply

* Re: [PATCH 00/22] Generic flow API (rte_flow)
From: Ferruh Yigit @ 2016-12-15 12:20 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: dev, Thomas Monjalon, Pablo de Lara, Olivier Matz
In-Reply-To: <20161208151935.GK10340@6wind.com>

On 12/8/2016 3:19 PM, Adrien Mazarguil wrote:
> Hi Ferruh,
> 
> On Fri, Dec 02, 2016 at 04:58:53PM +0000, Ferruh Yigit wrote:
>> Hi Adrien,
>>
>> On 11/16/2016 4:23 PM, Adrien Mazarguil wrote:
>>> As previously discussed in RFC v1 [1], RFC v2 [2], with changes
>>> described in [3] (also pasted below), here is the first non-draft series
>>> for this new API.
>>>
>>> Its capabilities are so generic that its name had to be vague, it may be
>>> called "Generic flow API", "Generic flow interface" (possibly shortened
>>> as "GFI") to refer to the name of the new filter type, or "rte_flow" from
>>> the prefix used for its public symbols. I personally favor the latter.
>>>
>>> While it is currently meant to supersede existing filter types in order for
>>> all PMDs to expose a common filtering/classification interface, it may
>>> eventually evolve to cover the following ideas as well:
>>>
>>> - Rx/Tx offloads configuration through automatic offloads for specific
>>>   packets, e.g. performing checksum on TCP packets could be expressed with
>>>   an egress rule with a TCP pattern and a kind of checksum action.
>>>
>>> - RSS configuration (already defined actually). Could be global or per rule
>>>   depending on hardware capabilities.
>>>
>>> - Switching configuration for devices with many physical ports; rules doing
>>>   both ingress and egress could even be used to completely bypass software
>>>   if supported by hardware.
>>>
>>>  [1] http://dpdk.org/ml/archives/dev/2016-July/043365.html
>>>  [2] http://dpdk.org/ml/archives/dev/2016-August/045383.html
>>>  [3] http://dpdk.org/ml/archives/dev/2016-November/050044.html
>>>
>>> Changes since RFC v2:
>>>
>>> - New separate VLAN pattern item (previously part of the ETH definition),
>>>   found to be much more convenient.
>>>
>>> - Removed useless "any" field from VF pattern item, the same effect can be
>>>   achieved by not providing a specification structure.
>>>
>>> - Replaced bit-fields from the VXLAN pattern item to avoid endianness
>>>   conversion issues on 24-bit fields.
>>>
>>> - Updated struct rte_flow_item with a new "last" field to create inclusive
>>>   ranges. They are defined as the interval between (spec & mask) and
>>>   (last & mask). All three parameters are optional.
>>>
>>> - Renamed ID action MARK.
>>>
>>> - Renamed "queue" fields in actions QUEUE and DUP to "index".
>>>
>>> - "rss_conf" field in RSS action is now const.
>>>
>>> - VF action now uses a 32 bit ID like its pattern item counterpart.
>>>
>>> - Removed redundant struct rte_flow_pattern, API functions now expect
>>>   struct
>>>   rte_flow_item lists terminated by END items.
>>>
>>> - Replaced struct rte_flow_actions for the same reason, with struct
>>>   rte_flow_action lists terminated by END actions.
>>>
>>> - Error types (enum rte_flow_error_type) have been updated and the cause
>>>   pointer in struct rte_flow_error is now const.
>>>
>>> - Function prototypes (rte_flow_create, rte_flow_validate) have also been
>>>   updated for clarity.
>>>
>>> Additions:
>>>
>>> - Public wrapper functions rte_flow_{validate|create|destroy|flush|query}
>>>   are now implemented in rte_flow.c, with their symbols exported and
>>>   versioned. Related filter type RTE_ETH_FILTER_GENERIC has been added.
>>>
>>> - A separate header (rte_flow_driver.h) has been added for driver-side
>>>   functionality, in particular struct rte_flow_ops which contains PMD
>>>   callbacks returned by RTE_ETH_FILTER_GENERIC query.
>>>
>>> - testpmd now exposes most of this API through the new "flow" command.
>>>
>>> What remains to be done:
>>>
>>> - Using endian-aware integer types (rte_beX_t) where necessary for clarity.
>>>
>>> - API documentation (based on RFC).
>>>
>>> - testpmd flow command documentation (although context-aware command
>>>   completion should already help quite a bit in this regard).
>>>
>>> - A few pattern item / action properties cannot be configured yet
>>>   (e.g. rss_conf parameter for RSS action) and a few completions
>>>   (e.g. possible queue IDs) should be added.
>>>
>>
>> <...>
>>
>> I was trying to check driver filter API patches, but hit a few compiler
>> errors with this patchset.
>>
>> [1] clang complains about variable bitfield value changed from -1 to 1.
>> Which is correct, but I guess that is intentional, but I don't know how
>> to tell this to clang?
>>
>> [2] shred library compilation error, because of missing rte_flow_flush
>> in rte_ether_version.map file
>>
>> [3] bunch of icc compilation errors, almost all are same type:
>> error #188: enumerated type mixed with another type
> 
> Thanks for the report, I'll attempt to address them all in v2. 

Hi Adrien,

I would like to remind that there are driver patch sets depends to this
patch.

New version of this patch should give some time to drivers to re-do (if
required) the patchsets before integration deadline.


Thanks,
ferruh



> However icc
> error #188 looks like a pain, I think I can work around it but do we really
> not tolerate the use of normal integers inside enum fields in DPDK?
> 

<...>

^ permalink raw reply

* Re: KNI broken again with 4.9 kernel
From: Mcnamara, John @ 2016-12-15 12:01 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org; +Cc: Yigit, Ferruh
In-Reply-To: <20161214154049.698de2e8@xeon-e3>



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Wednesday, December 14, 2016 11:41 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] KNI broken again with 4.9 kernel
> 
> /build/lib/librte_eal/linuxapp/kni/igb_main.c:2317:21: error:
> initialization from incompatible pointer type [-Werror=incompatible-
> pointer-types]
>   .ndo_set_vf_vlan = igb_ndo_set_vf_vlan,
>                      ^~~~~~~~~~~~~~~~~~~
> 
> I am sure Ferruh Yigit will fix it.
> 
> Which raises a couple of questions:
>  1. Why is DPDK still keeping KNI support for Intel specific ethtool
> functionality.
>     This always breaks, is code bloat, and means a 3rd copy of base code
> (Linux, DPDK PMD, + KNI)
> 
>  2. Why is KNI not upstream?
>     If not acceptable due to security or supportablity then why does it
> still exist?
> 
>  3. If not upstream, then maintainer should track upstream kernel changes
> and fix DPDK before
>     kernel is released.  The ABI is normally set early in the rc cycle
> weeks before release.


Hi Stephen,

On point 2: The feedback we have always received is that the KNI code isn't upstreamable. Do you think there is an upstream path? 

> If not acceptable due to security or supportablity then why does it
> still exist?

The most commonly expressed reason when we have asked this question in the past (and we did again at Userspace a few months ago) is that the people who use it want the performance.

On point 3: We do have an internal continuous integration system that runs nightly compiles of DPDK against the latest kernel and flags any issues.

John

^ permalink raw reply

* Re: KNI Questions
From: Ferruh Yigit @ 2016-12-15 11:53 UTC (permalink / raw)
  To: Stephen Hemminger, dev
In-Reply-To: <20161214154049.698de2e8@xeon-e3>

Hi Stephen,

<...>

> 
> Which raises a couple of questions:
>  1. Why is DPDK still keeping KNI support for Intel specific ethtool functionality.
>     This always breaks, is code bloat, and means a 3rd copy of base code (Linux, DPDK PMD, + KNI)

I agree on you comments related to the ethtool functionality,
but right now that is a functionality that people may be using, I think
we should not remove it without providing an alternative to it.

> 
>  2. Why is KNI not upstream?
>     If not acceptable due to security or supportablity then why does it still exist?

I believe you are one of the most knowledgeable person in the mail list
on upstreaming, any support is welcome.

> 
>  3. If not upstream, then maintainer should track upstream kernel changes and fix DPDK before
>     kernel is released.  The ABI is normally set early in the rc cycle weeks before release.

I am trying to track as much as possible, any help appreciated.

> 

^ permalink raw reply

* Re: [PATCH 13/28] eal/arm64: override I/O device read/write access for arm64
From: Jerin Jacob @ 2016-12-15 11:08 UTC (permalink / raw)
  To: Jianbo Liu
  Cc: dev, Ananyev, Konstantin, Thomas Monjalon, Bruce Richardson,
	Jan Viktorin
In-Reply-To: <CAP4Qi39A+sQ73Gt3E-mmfZTcEm4g7maB2WTRd=nEHytb5T7Kxg@mail.gmail.com>

On Thu, Dec 15, 2016 at 06:17:32PM +0800, Jianbo Liu wrote:
> On 15 December 2016 at 18:04, Jerin Jacob
> <jerin.jacob@caviumnetworks.com> wrote:
> > On Thu, Dec 15, 2016 at 05:53:05PM +0800, Jianbo Liu wrote:
> >> On 14 December 2016 at 09:55, Jerin Jacob
> >> <jerin.jacob@caviumnetworks.com> wrote:
> >> > Override the generic I/O device memory read/write access and implement it
> >> > using armv8 instructions for arm64.
> >> >
> >> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >> > ---
> >> >  lib/librte_eal/common/include/arch/arm/rte_io.h    |   4 +
> >> >  lib/librte_eal/common/include/arch/arm/rte_io_64.h | 183 +++++++++++++++++++++
> >> >  2 files changed, 187 insertions(+)
> >> >  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_io_64.h
> >> >
> >> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_io.h b/lib/librte_eal/common/include/arch/arm/rte_io.h
> >> > index 74c1f2c..9593b42 100644
> >> > --- a/lib/librte_eal/common/include/arch/arm/rte_io.h
> >> > +++ b/lib/librte_eal/common/include/arch/arm/rte_io.h
> >> > @@ -38,7 +38,11 @@
> >> >  extern "C" {
> >> >  #endif
> >> >
> >> > +#ifdef RTE_ARCH_64
> >> > +#include "rte_io_64.h"
> >> > +#else
> >> >  #include "generic/rte_io.h"
> >> > +#endif
> >> >
> >> >  #ifdef __cplusplus
> >> >  }
> >> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_io_64.h b/lib/librte_eal/common/include/arch/arm/rte_io_64.h
> >> > new file mode 100644
> >> > index 0000000..09e7a89
> >> > --- /dev/null
> >> > +++ b/lib/librte_eal/common/include/arch/arm/rte_io_64.h
> >> > @@ -0,0 +1,183 @@
> >> > +/*
> >> > + *   BSD LICENSE
> >> > + *
> >> > + *   Copyright (C) Cavium networks Ltd. 2016.
> >> > + *
> >> > + *   Redistribution and use in source and binary forms, with or without
> >> > + *   modification, are permitted provided that the following conditions
> >> > + *   are met:
> >> > + *
> >> > + *     * Redistributions of source code must retain the above copyright
> >> > + *       notice, this list of conditions and the following disclaimer.
> >> > + *     * Redistributions in binary form must reproduce the above copyright
> >> > + *       notice, this list of conditions and the following disclaimer in
> >> > + *       the documentation and/or other materials provided with the
> >> > + *       distribution.
> >> > + *     * Neither the name of Cavium networks nor the names of its
> >> > + *       contributors may be used to endorse or promote products derived
> >> > + *       from this software without specific prior written permission.
> >> > + *
> >> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> >> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> >> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> >> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> >> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> >> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> >> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> >> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> >> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> >> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> >> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> >> > + */
> >> > +
> >> > +#ifndef _RTE_IO_ARM64_H_
> >> > +#define _RTE_IO_ARM64_H_
> >> > +
> >> > +#ifdef __cplusplus
> >> > +extern "C" {
> >> > +#endif
> >> > +
> >> > +#include <stdint.h>
> >> > +
> >> > +#define RTE_OVERRIDE_IO_H
> >> > +
> >> > +#include "generic/rte_io.h"
> >> > +#include "rte_atomic_64.h"
> >> > +
> >> > +static inline __attribute__((always_inline)) uint8_t
> >> > +__rte_arm64_readb(const volatile void *addr)
> >> > +{
> >> > +       uint8_t val;
> >> > +
> >> > +       asm volatile(
> >> > +                   "ldrb %w[val], [%x[addr]]"
> >> > +                   : [val] "=r" (val)
> >> > +                   : [addr] "r" (addr));
> >> > +       return val;
> >> > +}
> >> > +
> >> > +static inline __attribute__((always_inline)) uint16_t
> >> > +__rte_arm64_readw(const volatile void *addr)
> >> > +{
> >> > +       uint16_t val;
> >> > +
> >> > +       asm volatile(
> >> > +                   "ldrh %w[val], [%x[addr]]"
> >> > +                   : [val] "=r" (val)
> >> > +                   : [addr] "r" (addr));
> >> > +       return val;
> >> > +}
> >> > +
> >> > +static inline __attribute__((always_inline)) uint32_t
> >> > +__rte_arm64_readl(const volatile void *addr)
> >> > +{
> >> > +       uint32_t val;
> >> > +
> >> > +       asm volatile(
> >> > +                   "ldr %w[val], [%x[addr]]"
> >> > +                   : [val] "=r" (val)
> >> > +                   : [addr] "r" (addr));
> >> > +       return val;
> >> > +}
> >> > +
> >> > +static inline __attribute__((always_inline)) uint64_t
> >> > +__rte_arm64_readq(const volatile void *addr)
> >> > +{
> >> > +       uint64_t val;
> >> > +
> >> > +       asm volatile(
> >> > +                   "ldr %x[val], [%x[addr]]"
> >> > +                   : [val] "=r" (val)
> >> > +                   : [addr] "r" (addr));
> >> > +       return val;
> >> > +}
> >> > +
> >> > +static inline __attribute__((always_inline)) void
> >> > +__rte_arm64_writeb(uint8_t val, volatile void *addr)
> >> > +{
> >> > +       asm volatile(
> >> > +                   "strb %w[val], [%x[addr]]"
> >> > +                   :
> >> > +                   : [val] "r" (val), [addr] "r" (addr));
> >> > +}
> >> > +
> >> > +static inline __attribute__((always_inline)) void
> >> > +__rte_arm64_writew(uint16_t val, volatile void *addr)
> >> > +{
> >> > +       asm volatile(
> >> > +                   "strh %w[val], [%x[addr]]"
> >> > +                   :
> >> > +                   : [val] "r" (val), [addr] "r" (addr));
> >> > +}
> >> > +
> >> > +static inline __attribute__((always_inline)) void
> >> > +__rte_arm64_writel(uint32_t val, volatile void *addr)
> >> > +{
> >> > +       asm volatile(
> >> > +                   "str %w[val], [%x[addr]]"
> >> > +                   :
> >> > +                   : [val] "r" (val), [addr] "r" (addr));
> >> > +}
> >> > +
> >> > +static inline __attribute__((always_inline)) void
> >> > +__rte_arm64_writeq(uint64_t val, volatile void *addr)
> >> > +{
> >> > +       asm volatile(
> >> > +                   "str %x[val], [%x[addr]]"
> >> > +                   :
> >> > +                   : [val] "r" (val), [addr] "r" (addr));
> >> > +}
> >>
> >> I'm not quite sure about these overridings. Can you explain the
> >> benefit to do so?
> >
> > Better to be native if there is option. That all. Do you see any issue?
> > or what is the real concern?
> >
> 
> I think it's the same as the generic c version after compiling. Am I right?

I really don't that is the case for all the scenarios like compiler may
combine two 16bit reads one 32bit read etc and which will impact on IO
register access.

But, I am sure the proposed scheme generates correct instruction in all the cases.

^ permalink raw reply

* Re: [PATCH 1/4] eal/common: introduce rte_memset on IA platform
From: Ananyev, Konstantin @ 2016-12-15 10:53 UTC (permalink / raw)
  To: Yang, Zhiyong, Thomas Monjalon
  Cc: dev@dpdk.org, yuanhan.liu@linux.intel.com, Richardson, Bruce,
	De Lara Guarch, Pablo
In-Reply-To: <E182254E98A5DA4EB1E657AC7CB9BD2A3EB599D4@BGSMSX101.gar.corp.intel.com>

Hi Zhiyong,

> -----Original Message-----
> From: Yang, Zhiyong
> Sent: Thursday, December 15, 2016 6:51 AM
> To: Yang, Zhiyong <zhiyong.yang@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas Monjalon
> <thomas.monjalon@6wind.com>
> Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform
> 
> Hi, Thomas, Konstantin:
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yang, Zhiyong
> > Sent: Sunday, December 11, 2016 8:33 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> > Monjalon <thomas.monjalon@6wind.com>
> > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on
> > IA platform
> >
> > Hi, Konstantin, Bruce:
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Thursday, December 8, 2016 6:31 PM
> > > To: Yang, Zhiyong <zhiyong.yang@intel.com>; Thomas Monjalon
> > > <thomas.monjalon@6wind.com>
> > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> > > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > > <pablo.de.lara.guarch@intel.com>
> > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset
> > > on IA platform
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Yang, Zhiyong
> > > > Sent: Thursday, December 8, 2016 9:53 AM
> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> > > > Monjalon <thomas.monjalon@6wind.com>
> > > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> > > > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > > > <pablo.de.lara.guarch@intel.com>
> > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset
> > > > on IA platform
> > > >
> > > extern void *(*__rte_memset_vector)( (void *s, int c, size_t n);
> > >
> > > static inline void*
> > > rte_memset_huge(void *s, int c, size_t n) {
> > >    return __rte_memset_vector(s, c, n); }
> > >
> > > static inline void *
> > > rte_memset(void *s, int c, size_t n)
> > > {
> > > 	If (n < XXX)
> > > 		return rte_memset_scalar(s, c, n);
> > > 	else
> > > 		return rte_memset_huge(s, c, n);
> > > }
> > >
> > > XXX could be either a define, or could also be a variable, so it can
> > > be setuped at startup, depending on the architecture.
> > >
> > > Would that work?
> > > Konstantin
> > >
> I have implemented the code for  choosing the functions at run time.
> rte_memcpy is used more frequently, So I test it at run time.
> 
> typedef void *(*rte_memcpy_vector_t)(void *dst, const void *src, size_t n);
> extern rte_memcpy_vector_t rte_memcpy_vector;
> static inline void *
> rte_memcpy(void *dst, const void *src, size_t n)
> {
>         return rte_memcpy_vector(dst, src, n);
> }
> In order to reduce the overhead at run time,
> I assign the function address to var rte_memcpy_vector before main() starts to init the var.
> 
> static void __attribute__((constructor))
> rte_memcpy_init(void)
> {
> 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
> 	{
> 		rte_memcpy_vector = rte_memcpy_avx2;
> 	}
> 	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
> 	{
> 		rte_memcpy_vector = rte_memcpy_sse;
> 	}
> 	else
> 	{
> 		rte_memcpy_vector = memcpy;
> 	}
> 
> }

I thought we discussed a bit different approach.
In which rte_memcpy_vector() (rte_memeset_vector) would be called  only after some cutoff point, i.e:

void
rte_memcpy(void *dst, const void *src, size_t len)
{
	if (len < N) memcpy(dst, src, len);
	else rte_memcpy_vector(dst, src, len);
}

If you just always call rte_memcpy_vector() for every len, 
then it means that compiler most likely has always to generate a proper call
(not inlining happening).
For small length(s) price of extra function would probably overweight any
potential gain with SSE/AVX2 implementation.  

Konstantin 

> I run the same virtio/vhost loopback tests without NIC.
> I can see the  throughput drop  when running choosing functions at run time
> compared to original code as following on the same platform(my machine is haswell)
> 	Packet size	perf drop
> 	64 		-4%
> 	256 		-5.4%
> 	1024		-5%
> 	1500		-2.5%
> Another thing, I run the memcpy_perf_autotest,  when N= <128,
> the rte_memcpy perf gains almost disappears
> When choosing functions at run time.  For N=other numbers, the perf gains will become narrow.
> 
> Thanks
> Zhiyong

^ permalink raw reply

* Re: KNI broken again with 4.9 kernel
From: Ferruh Yigit @ 2016-12-15 10:25 UTC (permalink / raw)
  To: Stephen Hemminger, dev
In-Reply-To: <20161214154049.698de2e8@xeon-e3>

Hi Stephen,

I can't reproduce the error, I think it is not broken, that issue should
be already fixed with commit [1].



On 12/14/2016 11:40 PM, Stephen Hemminger wrote:
> /build/lib/librte_eal/linuxapp/kni/igb_main.c:2317:21: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
>   .ndo_set_vf_vlan = igb_ndo_set_vf_vlan,
>                      ^~~~~~~~~~~~~~~~~~~

Thanks for the report, not everyone is working with latest kernel, so
these reports are useful.

> 
> I am sure Ferruh Yigit will fix it.

I think I already did J, please check the commit [1], it is merged in
v16.11, can you please double check the DPDK version.


[1]
 commit 6445198f802d993c73f4b246353b2ceb2dfafc32
 Refs: v16.11-rc1-2-g6445198
 Author:     Ferruh Yigit <ferruh.yigit@intel.com>
 AuthorDate: Mon Oct 17 11:23:14 2016 +0100
 Commit:     Thomas Monjalon <thomas.monjalon@6wind.com>
 CommitDate: Tue Oct 25 16:20:43 2016 +0200

     kni: fix build with kernel 4.9

     compile error:
       CC [M]  .../lib/librte_eal/linuxapp/kni/igb_main.o
     .../lib/librte_eal/linuxapp/kni/igb_main.c:2317:21:
     error: initialization from incompatible pointer type
             [-Werror=incompatible-pointer-types]
       .ndo_set_vf_vlan = igb_ndo_set_vf_vlan,
                          ^~~~~~~~~~~~~~~~~~~

     Linux kernel 4.9 updates API for ndo_set_vf_vlan:
     Linux: 79aab093a0b5 ("net: Update API for VF vlan protocol 802.1ad
support")

     Use new API for Linux kernels >= 4.9

     Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
     Tested-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>


<...>
I will reply questions with a different subject...

^ permalink raw reply

* Re: [PATCH 13/28] eal/arm64: override I/O device read/write access for arm64
From: Jianbo Liu @ 2016-12-15 10:17 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Ananyev, Konstantin, Thomas Monjalon, Bruce Richardson,
	Jan Viktorin
In-Reply-To: <20161215100423.GA6712@localhost.localdomain>

On 15 December 2016 at 18:04, Jerin Jacob
<jerin.jacob@caviumnetworks.com> wrote:
> On Thu, Dec 15, 2016 at 05:53:05PM +0800, Jianbo Liu wrote:
>> On 14 December 2016 at 09:55, Jerin Jacob
>> <jerin.jacob@caviumnetworks.com> wrote:
>> > Override the generic I/O device memory read/write access and implement it
>> > using armv8 instructions for arm64.
>> >
>> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> > ---
>> >  lib/librte_eal/common/include/arch/arm/rte_io.h    |   4 +
>> >  lib/librte_eal/common/include/arch/arm/rte_io_64.h | 183 +++++++++++++++++++++
>> >  2 files changed, 187 insertions(+)
>> >  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_io_64.h
>> >
>> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_io.h b/lib/librte_eal/common/include/arch/arm/rte_io.h
>> > index 74c1f2c..9593b42 100644
>> > --- a/lib/librte_eal/common/include/arch/arm/rte_io.h
>> > +++ b/lib/librte_eal/common/include/arch/arm/rte_io.h
>> > @@ -38,7 +38,11 @@
>> >  extern "C" {
>> >  #endif
>> >
>> > +#ifdef RTE_ARCH_64
>> > +#include "rte_io_64.h"
>> > +#else
>> >  #include "generic/rte_io.h"
>> > +#endif
>> >
>> >  #ifdef __cplusplus
>> >  }
>> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_io_64.h b/lib/librte_eal/common/include/arch/arm/rte_io_64.h
>> > new file mode 100644
>> > index 0000000..09e7a89
>> > --- /dev/null
>> > +++ b/lib/librte_eal/common/include/arch/arm/rte_io_64.h
>> > @@ -0,0 +1,183 @@
>> > +/*
>> > + *   BSD LICENSE
>> > + *
>> > + *   Copyright (C) Cavium networks Ltd. 2016.
>> > + *
>> > + *   Redistribution and use in source and binary forms, with or without
>> > + *   modification, are permitted provided that the following conditions
>> > + *   are met:
>> > + *
>> > + *     * Redistributions of source code must retain the above copyright
>> > + *       notice, this list of conditions and the following disclaimer.
>> > + *     * Redistributions in binary form must reproduce the above copyright
>> > + *       notice, this list of conditions and the following disclaimer in
>> > + *       the documentation and/or other materials provided with the
>> > + *       distribution.
>> > + *     * Neither the name of Cavium networks nor the names of its
>> > + *       contributors may be used to endorse or promote products derived
>> > + *       from this software without specific prior written permission.
>> > + *
>> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> > + */
>> > +
>> > +#ifndef _RTE_IO_ARM64_H_
>> > +#define _RTE_IO_ARM64_H_
>> > +
>> > +#ifdef __cplusplus
>> > +extern "C" {
>> > +#endif
>> > +
>> > +#include <stdint.h>
>> > +
>> > +#define RTE_OVERRIDE_IO_H
>> > +
>> > +#include "generic/rte_io.h"
>> > +#include "rte_atomic_64.h"
>> > +
>> > +static inline __attribute__((always_inline)) uint8_t
>> > +__rte_arm64_readb(const volatile void *addr)
>> > +{
>> > +       uint8_t val;
>> > +
>> > +       asm volatile(
>> > +                   "ldrb %w[val], [%x[addr]]"
>> > +                   : [val] "=r" (val)
>> > +                   : [addr] "r" (addr));
>> > +       return val;
>> > +}
>> > +
>> > +static inline __attribute__((always_inline)) uint16_t
>> > +__rte_arm64_readw(const volatile void *addr)
>> > +{
>> > +       uint16_t val;
>> > +
>> > +       asm volatile(
>> > +                   "ldrh %w[val], [%x[addr]]"
>> > +                   : [val] "=r" (val)
>> > +                   : [addr] "r" (addr));
>> > +       return val;
>> > +}
>> > +
>> > +static inline __attribute__((always_inline)) uint32_t
>> > +__rte_arm64_readl(const volatile void *addr)
>> > +{
>> > +       uint32_t val;
>> > +
>> > +       asm volatile(
>> > +                   "ldr %w[val], [%x[addr]]"
>> > +                   : [val] "=r" (val)
>> > +                   : [addr] "r" (addr));
>> > +       return val;
>> > +}
>> > +
>> > +static inline __attribute__((always_inline)) uint64_t
>> > +__rte_arm64_readq(const volatile void *addr)
>> > +{
>> > +       uint64_t val;
>> > +
>> > +       asm volatile(
>> > +                   "ldr %x[val], [%x[addr]]"
>> > +                   : [val] "=r" (val)
>> > +                   : [addr] "r" (addr));
>> > +       return val;
>> > +}
>> > +
>> > +static inline __attribute__((always_inline)) void
>> > +__rte_arm64_writeb(uint8_t val, volatile void *addr)
>> > +{
>> > +       asm volatile(
>> > +                   "strb %w[val], [%x[addr]]"
>> > +                   :
>> > +                   : [val] "r" (val), [addr] "r" (addr));
>> > +}
>> > +
>> > +static inline __attribute__((always_inline)) void
>> > +__rte_arm64_writew(uint16_t val, volatile void *addr)
>> > +{
>> > +       asm volatile(
>> > +                   "strh %w[val], [%x[addr]]"
>> > +                   :
>> > +                   : [val] "r" (val), [addr] "r" (addr));
>> > +}
>> > +
>> > +static inline __attribute__((always_inline)) void
>> > +__rte_arm64_writel(uint32_t val, volatile void *addr)
>> > +{
>> > +       asm volatile(
>> > +                   "str %w[val], [%x[addr]]"
>> > +                   :
>> > +                   : [val] "r" (val), [addr] "r" (addr));
>> > +}
>> > +
>> > +static inline __attribute__((always_inline)) void
>> > +__rte_arm64_writeq(uint64_t val, volatile void *addr)
>> > +{
>> > +       asm volatile(
>> > +                   "str %x[val], [%x[addr]]"
>> > +                   :
>> > +                   : [val] "r" (val), [addr] "r" (addr));
>> > +}
>>
>> I'm not quite sure about these overridings. Can you explain the
>> benefit to do so?
>
> Better to be native if there is option. That all. Do you see any issue?
> or what is the real concern?
>

I think it's the same as the generic c version after compiling. Am I right?
If there is no apparent benefit, I don't think we need the overriding.

^ permalink raw reply

* Re: [PATCH 1/4] eal/common: introduce rte_memset on IA platform
From: Bruce Richardson @ 2016-12-15 10:12 UTC (permalink / raw)
  To: Yang, Zhiyong
  Cc: Ananyev, Konstantin, Thomas Monjalon, dev@dpdk.org,
	yuanhan.liu@linux.intel.com, De Lara Guarch, Pablo
In-Reply-To: <E182254E98A5DA4EB1E657AC7CB9BD2A3EB599D4@BGSMSX101.gar.corp.intel.com>

On Thu, Dec 15, 2016 at 06:51:08AM +0000, Yang, Zhiyong wrote:
> Hi, Thomas, Konstantin:
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yang, Zhiyong
> > Sent: Sunday, December 11, 2016 8:33 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> > Monjalon <thomas.monjalon@6wind.com>
> > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on
> > IA platform
> > 
> > Hi, Konstantin, Bruce:
> > 
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Thursday, December 8, 2016 6:31 PM
> > > To: Yang, Zhiyong <zhiyong.yang@intel.com>; Thomas Monjalon
> > > <thomas.monjalon@6wind.com>
> > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> > > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > > <pablo.de.lara.guarch@intel.com>
> > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset
> > > on IA platform
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Yang, Zhiyong
> > > > Sent: Thursday, December 8, 2016 9:53 AM
> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> > > > Monjalon <thomas.monjalon@6wind.com>
> > > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> > > > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > > > <pablo.de.lara.guarch@intel.com>
> > > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset
> > > > on IA platform
> > > >
> > > extern void *(*__rte_memset_vector)( (void *s, int c, size_t n);
> > >
> > > static inline void*
> > > rte_memset_huge(void *s, int c, size_t n) {
> > >    return __rte_memset_vector(s, c, n); }
> > >
> > > static inline void *
> > > rte_memset(void *s, int c, size_t n)
> > > {
> > > 	If (n < XXX)
> > > 		return rte_memset_scalar(s, c, n);
> > > 	else
> > > 		return rte_memset_huge(s, c, n);
> > > }
> > >
> > > XXX could be either a define, or could also be a variable, so it can
> > > be setuped at startup, depending on the architecture.
> > >
> > > Would that work?
> > > Konstantin
> > >
> I have implemented the code for  choosing the functions at run time.
> rte_memcpy is used more frequently, So I test it at run time. 
> 
> typedef void *(*rte_memcpy_vector_t)(void *dst, const void *src, size_t n);
> extern rte_memcpy_vector_t rte_memcpy_vector;
> static inline void *
> rte_memcpy(void *dst, const void *src, size_t n)
> {
>         return rte_memcpy_vector(dst, src, n);
> }
> In order to reduce the overhead at run time, 
> I assign the function address to var rte_memcpy_vector before main() starts to init the var.
> 
> static void __attribute__((constructor))
> rte_memcpy_init(void)
> {
> 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
> 	{
> 		rte_memcpy_vector = rte_memcpy_avx2;
> 	}
> 	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
> 	{
> 		rte_memcpy_vector = rte_memcpy_sse;
> 	}
> 	else
> 	{
> 		rte_memcpy_vector = memcpy;
> 	}
> 
> }
> I run the same virtio/vhost loopback tests without NIC.
> I can see the  throughput drop  when running choosing functions at run time
> compared to original code as following on the same platform(my machine is haswell) 
> 	Packet size	perf drop
> 	64 		-4%
> 	256 		-5.4%
> 	1024		-5%
> 	1500		-2.5%
> Another thing, I run the memcpy_perf_autotest,  when N= <128, 
> the rte_memcpy perf gains almost disappears
> When choosing functions at run time.  For N=other numbers, the perf gains will become narrow.
> 
How narrow. How significant is the improvement that we gain from having
to maintain our own copy of memcpy. If the libc version is nearly as
good we should just use that.

/Bruce

^ permalink raw reply

* Re: [PATCH 13/28] eal/arm64: override I/O device read/write access for arm64
From: Jerin Jacob @ 2016-12-15 10:04 UTC (permalink / raw)
  To: Jianbo Liu
  Cc: dev, Ananyev, Konstantin, Thomas Monjalon, Bruce Richardson,
	Jan Viktorin
In-Reply-To: <CAP4Qi38jm3R8Xf1=18x-mR=E=Z2Hmi9JR9fGxFdn6hoRrwnsWw@mail.gmail.com>

On Thu, Dec 15, 2016 at 05:53:05PM +0800, Jianbo Liu wrote:
> On 14 December 2016 at 09:55, Jerin Jacob
> <jerin.jacob@caviumnetworks.com> wrote:
> > Override the generic I/O device memory read/write access and implement it
> > using armv8 instructions for arm64.
> >
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> >  lib/librte_eal/common/include/arch/arm/rte_io.h    |   4 +
> >  lib/librte_eal/common/include/arch/arm/rte_io_64.h | 183 +++++++++++++++++++++
> >  2 files changed, 187 insertions(+)
> >  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_io_64.h
> >
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_io.h b/lib/librte_eal/common/include/arch/arm/rte_io.h
> > index 74c1f2c..9593b42 100644
> > --- a/lib/librte_eal/common/include/arch/arm/rte_io.h
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_io.h
> > @@ -38,7 +38,11 @@
> >  extern "C" {
> >  #endif
> >
> > +#ifdef RTE_ARCH_64
> > +#include "rte_io_64.h"
> > +#else
> >  #include "generic/rte_io.h"
> > +#endif
> >
> >  #ifdef __cplusplus
> >  }
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_io_64.h b/lib/librte_eal/common/include/arch/arm/rte_io_64.h
> > new file mode 100644
> > index 0000000..09e7a89
> > --- /dev/null
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_io_64.h
> > @@ -0,0 +1,183 @@
> > +/*
> > + *   BSD LICENSE
> > + *
> > + *   Copyright (C) Cavium networks Ltd. 2016.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Cavium networks nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_IO_ARM64_H_
> > +#define _RTE_IO_ARM64_H_
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <stdint.h>
> > +
> > +#define RTE_OVERRIDE_IO_H
> > +
> > +#include "generic/rte_io.h"
> > +#include "rte_atomic_64.h"
> > +
> > +static inline __attribute__((always_inline)) uint8_t
> > +__rte_arm64_readb(const volatile void *addr)
> > +{
> > +       uint8_t val;
> > +
> > +       asm volatile(
> > +                   "ldrb %w[val], [%x[addr]]"
> > +                   : [val] "=r" (val)
> > +                   : [addr] "r" (addr));
> > +       return val;
> > +}
> > +
> > +static inline __attribute__((always_inline)) uint16_t
> > +__rte_arm64_readw(const volatile void *addr)
> > +{
> > +       uint16_t val;
> > +
> > +       asm volatile(
> > +                   "ldrh %w[val], [%x[addr]]"
> > +                   : [val] "=r" (val)
> > +                   : [addr] "r" (addr));
> > +       return val;
> > +}
> > +
> > +static inline __attribute__((always_inline)) uint32_t
> > +__rte_arm64_readl(const volatile void *addr)
> > +{
> > +       uint32_t val;
> > +
> > +       asm volatile(
> > +                   "ldr %w[val], [%x[addr]]"
> > +                   : [val] "=r" (val)
> > +                   : [addr] "r" (addr));
> > +       return val;
> > +}
> > +
> > +static inline __attribute__((always_inline)) uint64_t
> > +__rte_arm64_readq(const volatile void *addr)
> > +{
> > +       uint64_t val;
> > +
> > +       asm volatile(
> > +                   "ldr %x[val], [%x[addr]]"
> > +                   : [val] "=r" (val)
> > +                   : [addr] "r" (addr));
> > +       return val;
> > +}
> > +
> > +static inline __attribute__((always_inline)) void
> > +__rte_arm64_writeb(uint8_t val, volatile void *addr)
> > +{
> > +       asm volatile(
> > +                   "strb %w[val], [%x[addr]]"
> > +                   :
> > +                   : [val] "r" (val), [addr] "r" (addr));
> > +}
> > +
> > +static inline __attribute__((always_inline)) void
> > +__rte_arm64_writew(uint16_t val, volatile void *addr)
> > +{
> > +       asm volatile(
> > +                   "strh %w[val], [%x[addr]]"
> > +                   :
> > +                   : [val] "r" (val), [addr] "r" (addr));
> > +}
> > +
> > +static inline __attribute__((always_inline)) void
> > +__rte_arm64_writel(uint32_t val, volatile void *addr)
> > +{
> > +       asm volatile(
> > +                   "str %w[val], [%x[addr]]"
> > +                   :
> > +                   : [val] "r" (val), [addr] "r" (addr));
> > +}
> > +
> > +static inline __attribute__((always_inline)) void
> > +__rte_arm64_writeq(uint64_t val, volatile void *addr)
> > +{
> > +       asm volatile(
> > +                   "str %x[val], [%x[addr]]"
> > +                   :
> > +                   : [val] "r" (val), [addr] "r" (addr));
> > +}
> 
> I'm not quite sure about these overridings. Can you explain the
> benefit to do so?

Better to be native if there is option. That all. Do you see any issue?
or what is the real concern?

^ permalink raw reply

* Re: [PATCH 13/28] eal/arm64: override I/O device read/write access for arm64
From: Jianbo Liu @ 2016-12-15  9:53 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Ananyev, Konstantin, Thomas Monjalon, Bruce Richardson,
	Jan Viktorin
In-Reply-To: <1481680558-4003-14-git-send-email-jerin.jacob@caviumnetworks.com>

On 14 December 2016 at 09:55, Jerin Jacob
<jerin.jacob@caviumnetworks.com> wrote:
> Override the generic I/O device memory read/write access and implement it
> using armv8 instructions for arm64.
>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  lib/librte_eal/common/include/arch/arm/rte_io.h    |   4 +
>  lib/librte_eal/common/include/arch/arm/rte_io_64.h | 183 +++++++++++++++++++++
>  2 files changed, 187 insertions(+)
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_io_64.h
>
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_io.h b/lib/librte_eal/common/include/arch/arm/rte_io.h
> index 74c1f2c..9593b42 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_io.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_io.h
> @@ -38,7 +38,11 @@
>  extern "C" {
>  #endif
>
> +#ifdef RTE_ARCH_64
> +#include "rte_io_64.h"
> +#else
>  #include "generic/rte_io.h"
> +#endif
>
>  #ifdef __cplusplus
>  }
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_io_64.h b/lib/librte_eal/common/include/arch/arm/rte_io_64.h
> new file mode 100644
> index 0000000..09e7a89
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/arm/rte_io_64.h
> @@ -0,0 +1,183 @@
> +/*
> + *   BSD LICENSE
> + *
> + *   Copyright (C) Cavium networks Ltd. 2016.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Cavium networks nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_IO_ARM64_H_
> +#define _RTE_IO_ARM64_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdint.h>
> +
> +#define RTE_OVERRIDE_IO_H
> +
> +#include "generic/rte_io.h"
> +#include "rte_atomic_64.h"
> +
> +static inline __attribute__((always_inline)) uint8_t
> +__rte_arm64_readb(const volatile void *addr)
> +{
> +       uint8_t val;
> +
> +       asm volatile(
> +                   "ldrb %w[val], [%x[addr]]"
> +                   : [val] "=r" (val)
> +                   : [addr] "r" (addr));
> +       return val;
> +}
> +
> +static inline __attribute__((always_inline)) uint16_t
> +__rte_arm64_readw(const volatile void *addr)
> +{
> +       uint16_t val;
> +
> +       asm volatile(
> +                   "ldrh %w[val], [%x[addr]]"
> +                   : [val] "=r" (val)
> +                   : [addr] "r" (addr));
> +       return val;
> +}
> +
> +static inline __attribute__((always_inline)) uint32_t
> +__rte_arm64_readl(const volatile void *addr)
> +{
> +       uint32_t val;
> +
> +       asm volatile(
> +                   "ldr %w[val], [%x[addr]]"
> +                   : [val] "=r" (val)
> +                   : [addr] "r" (addr));
> +       return val;
> +}
> +
> +static inline __attribute__((always_inline)) uint64_t
> +__rte_arm64_readq(const volatile void *addr)
> +{
> +       uint64_t val;
> +
> +       asm volatile(
> +                   "ldr %x[val], [%x[addr]]"
> +                   : [val] "=r" (val)
> +                   : [addr] "r" (addr));
> +       return val;
> +}
> +
> +static inline __attribute__((always_inline)) void
> +__rte_arm64_writeb(uint8_t val, volatile void *addr)
> +{
> +       asm volatile(
> +                   "strb %w[val], [%x[addr]]"
> +                   :
> +                   : [val] "r" (val), [addr] "r" (addr));
> +}
> +
> +static inline __attribute__((always_inline)) void
> +__rte_arm64_writew(uint16_t val, volatile void *addr)
> +{
> +       asm volatile(
> +                   "strh %w[val], [%x[addr]]"
> +                   :
> +                   : [val] "r" (val), [addr] "r" (addr));
> +}
> +
> +static inline __attribute__((always_inline)) void
> +__rte_arm64_writel(uint32_t val, volatile void *addr)
> +{
> +       asm volatile(
> +                   "str %w[val], [%x[addr]]"
> +                   :
> +                   : [val] "r" (val), [addr] "r" (addr));
> +}
> +
> +static inline __attribute__((always_inline)) void
> +__rte_arm64_writeq(uint64_t val, volatile void *addr)
> +{
> +       asm volatile(
> +                   "str %x[val], [%x[addr]]"
> +                   :
> +                   : [val] "r" (val), [addr] "r" (addr));
> +}

I'm not quite sure about these overridings. Can you explain the
benefit to do so?

> +
> +#define rte_readb_relaxed(addr) \
> +       ({ uint8_t __v = __rte_arm64_readb(addr); __v; })
> +
> +#define rte_readw_relaxed(addr) \
> +       ({ uint16_t __v = __rte_arm64_readw(addr); __v; })
> +
> +#define rte_readl_relaxed(addr) \
> +       ({ uint32_t __v = __rte_arm64_readl(addr); __v; })
> +
> +#define rte_readq_relaxed(addr) \
> +       ({ uint64_t __v = __rte_arm64_readq(addr); __v; })
> +
> +#define rte_writeb_relaxed(value, addr) \
> +       ({ __rte_arm64_writeb(value, addr); })
> +
> +#define rte_writew_relaxed(value, addr) \
> +       ({ __rte_arm64_writew(value, addr); })
> +
> +#define rte_writel_relaxed(value, addr) \
> +       ({ __rte_arm64_writel(value, addr); })
> +
> +#define rte_writeq_relaxed(value, addr) \
> +       ({ __rte_arm64_writeq(value, addr); })
> +
> +#define rte_readb(addr) \
> +       ({ uint8_t __v = __rte_arm64_readb(addr); rte_io_rmb(); __v; })
> +
> +#define rte_readw(addr) \
> +       ({ uint16_t __v = __rte_arm64_readw(addr); rte_io_rmb(); __v; })
> +
> +#define rte_readl(addr) \
> +       ({ uint32_t __v = __rte_arm64_readl(addr); rte_io_rmb(); __v; })
> +
> +#define rte_readq(addr) \
> +       ({ uint64_t __v = __rte_arm64_readq(addr); rte_io_rmb(); __v; })
> +
> +#define rte_writeb(value, addr) \
> +       ({ rte_io_wmb(); rte_writeb_relaxed(value, addr); })
> +
> +#define rte_writew(value, addr) \
> +       ({ rte_io_wmb(); rte_writew_relaxed(value, addr); })
> +
> +#define rte_writel(value, addr) \
> +       ({ rte_io_wmb(); rte_writel_relaxed(value, addr); })
> +
> +#define rte_writeq(value, addr) \
> +       ({ rte_io_wmb(); rte_writeq_relaxed(value, addr); })
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_IO_ARM64_H_ */
> --
> 2.5.5
>

^ permalink raw reply

* Re: [PATCH 23/28] net/ixgbe: use eal I/O device memory read/write API
From: Jianbo Liu @ 2016-12-15  8:37 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Ananyev, Konstantin, Thomas Monjalon, Bruce Richardson,
	Jan Viktorin, Santosh Shukla, Helin Zhang
In-Reply-To: <1481680558-4003-24-git-send-email-jerin.jacob@caviumnetworks.com>

On 14 December 2016 at 09:55, Jerin Jacob
<jerin.jacob@caviumnetworks.com> wrote:
> From: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>
> Replace the raw I/O device memory read/write access with eal
> abstraction for I/O device memory read/write access to fix
> portability issues across different architectures.
>
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: Helin Zhang <helin.zhang@intel.com>
> CC: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  drivers/net/ixgbe/base/ixgbe_osdep.h | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ixgbe/base/ixgbe_osdep.h b/drivers/net/ixgbe/base/ixgbe_osdep.h
> index 77f0af5..9d16c21 100644
> --- a/drivers/net/ixgbe/base/ixgbe_osdep.h
> +++ b/drivers/net/ixgbe/base/ixgbe_osdep.h
> @@ -44,6 +44,7 @@
>  #include <rte_cycles.h>
>  #include <rte_log.h>
>  #include <rte_byteorder.h>
> +#include <rte_io.h>
>
>  #include "../ixgbe_logs.h"
>  #include "../ixgbe_bypass_defines.h"
> @@ -121,16 +122,20 @@ typedef int               bool;
>
>  #define prefetch(x) rte_prefetch0(x)
>
> -#define IXGBE_PCI_REG(reg) (*((volatile uint32_t *)(reg)))
> +#define IXGBE_PCI_REG(reg) ({                  \
> +       uint32_t __val;                         \
> +       __val = rte_readl(reg);                 \
> +       __val;                                  \
> +})
>
>  static inline uint32_t ixgbe_read_addr(volatile void* addr)
>  {
>         return rte_le_to_cpu_32(IXGBE_PCI_REG(addr));
>  }
>
> -#define IXGBE_PCI_REG_WRITE(reg, value) do { \
> -       IXGBE_PCI_REG((reg)) = (rte_cpu_to_le_32(value)); \
> -} while(0)
> +#define IXGBE_PCI_REG_WRITE(reg, value) ({             \
> +       rte_writel(rte_cpu_to_le_32(value), reg);       \
> +})
>

memory barrier operation is put inside IXGBE_PCI_REG_READ/WRITE in
your change, but I found rte_*mb is called before these macros in some
places.
Can you remove all these redundant calls? And please do the same
checking for other drivers.

>  #define IXGBE_PCI_REG_ADDR(hw, reg) \
>         ((volatile uint32_t *)((char *)(hw)->hw_addr + (reg)))
> --
> 2.5.5
>

^ permalink raw reply

* Re: [PATCH 08/28] eal/arm64: define smp barrier definition for arm64
From: Jerin Jacob @ 2016-12-15  8:20 UTC (permalink / raw)
  To: Jianbo Liu
  Cc: dev, Ananyev, Konstantin, Thomas Monjalon, Bruce Richardson,
	Jan Viktorin
In-Reply-To: <CAP4Qi38SCGZHAybjiR4=hhvZbEEnjmCVGWb23+EgqjzAowmAiQ@mail.gmail.com>

On Thu, Dec 15, 2016 at 04:13:33PM +0800, Jianbo Liu wrote:
> On 14 December 2016 at 09:55, Jerin Jacob
> <jerin.jacob@caviumnetworks.com> wrote:
> > dmb instruction based barrier is used for smp version of memory barrier.
> >
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> >  lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > index bc7de64..78ebea2 100644
> > --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > @@ -82,11 +82,11 @@ static inline void rte_rmb(void)
> >         dsb(ld);
> >  }
> >
> > -#define rte_smp_mb() rte_mb()
> > +#define rte_smp_mb() dmb(ish)
> >
> > -#define rte_smp_wmb() rte_wmb()
> > +#define rte_smp_wmb() dmb(ishst)
> >
> > -#define rte_smp_rmb() rte_rmb()
> > +#define rte_smp_rmb() dmb(ishld)
> >
> 
> rte_*mb are inline functions, while rte_smp_*mb are macro. As they are
> all derived from dsb/dmb, can you keep them consistent?

OK.I will add a separate patch in v2 series to change existing inline to
marco to keep consistent.


> 
> >  #ifdef __cplusplus
> >  }
> > --
> > 2.5.5
> >

^ permalink raw reply

* Re: [PATCH 08/28] eal/arm64: define smp barrier definition for arm64
From: Jianbo Liu @ 2016-12-15  8:13 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Ananyev, Konstantin, Thomas Monjalon, Bruce Richardson,
	Jan Viktorin
In-Reply-To: <1481680558-4003-9-git-send-email-jerin.jacob@caviumnetworks.com>

On 14 December 2016 at 09:55, Jerin Jacob
<jerin.jacob@caviumnetworks.com> wrote:
> dmb instruction based barrier is used for smp version of memory barrier.
>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> index bc7de64..78ebea2 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> @@ -82,11 +82,11 @@ static inline void rte_rmb(void)
>         dsb(ld);
>  }
>
> -#define rte_smp_mb() rte_mb()
> +#define rte_smp_mb() dmb(ish)
>
> -#define rte_smp_wmb() rte_wmb()
> +#define rte_smp_wmb() dmb(ishst)
>
> -#define rte_smp_rmb() rte_rmb()
> +#define rte_smp_rmb() dmb(ishld)
>

rte_*mb are inline functions, while rte_smp_*mb are macro. As they are
all derived from dsb/dmb, can you keep them consistent?

>  #ifdef __cplusplus
>  }
> --
> 2.5.5
>

^ permalink raw reply

* Re: [PATCH 16/32] net/dpaa2: dpio add support to check SOC type
From: Hemant Agrawal @ 2016-12-15  7:01 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, thomas.monjalon, bruce.richardson, shreyansh.jain
In-Reply-To: <20161215063409.GC19354@localhost.localdomain>

On 12/15/2016 12:04 PM, Jerin Jacob wrote:
> On Sun, Dec 04, 2016 at 11:47:11PM +0530, Hemant Agrawal wrote:
>> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>> ---
>>  drivers/net/dpaa2/base/dpaa2_hw_dpio.c | 74 ++++++++++++++++++++++++++++++++++
>>  1 file changed, 74 insertions(+)
>>
>> diff --git a/drivers/net/dpaa2/base/dpaa2_hw_dpio.c b/drivers/net/dpaa2/base/dpaa2_hw_dpio.c
>> index 9c6eb96..3b8f87d 100644
>> --- a/drivers/net/dpaa2/base/dpaa2_hw_dpio.c
>> +++ b/drivers/net/dpaa2/base/dpaa2_hw_dpio.c
>> @@ -70,6 +70,18 @@
>>  static struct dpio_device_list *dpio_dev_list; /*!< DPIO device list */
>>  static uint32_t io_space_count;
>>
>> +#define ARM_CORTEX_A53		0xD03
>> +#define ARM_CORTEX_A57		0xD07
>> +#define ARM_CORTEX_A72		0xD08
>
> May not be good idea to have generic ARM part number definition in driver
> file.
>
>> +
>> +static int dpaa2_soc_core = ARM_CORTEX_A72;
>> +
>> +#define NXP_LS2085	1
>> +#define NXP_LS2088	2
>> +#define NXP_LS1088	3
>> +
>> +static int dpaa2_soc_family  = NXP_LS2088;
>> +
>>  /*Stashing Macros default for LS208x*/
>>  static int dpaa2_core_cluster_base = 0x04;
>>  static int dpaa2_cluster_sz = 2;
>> @@ -101,6 +113,58 @@
>>  	return dpaa2_core_cluster_base + x;
>>  }
>>
>> +static int cpuinfo_arm(FILE *file)
>> +{
>> +	char str[128], *pos;
>> +	int part = -1;
>> +
>> +	#define ARM_CORTEX_A53_INFO	"Cortex-A53"
>> +	#define ARM_CORTEX_A57_INFO	"Cortex-A57"
>> +	#define ARM_CORTEX_A72_INFO	"Cortex-A72"
>> +
>> +	while (fgets(str, sizeof(str), file) != NULL) {
>> +		if (part >= 0)
>> +			break;
>> +		pos = strstr(str, "CPU part");
>> +		if (pos != NULL) {
>> +			pos = strchr(pos, ':');
>> +			if (pos != NULL)
>> +				sscanf(++pos, "%x", &part);
>> +		}
>> +	}
>> +
>> +	dpaa2_soc_core = part;
>> +	if (part == ARM_CORTEX_A53) {
>> +		dpaa2_soc_family = NXP_LS1088;
>> +		printf("\n########## Detected NXP LS108x with %s\n",
>> +		       ARM_CORTEX_A53_INFO);
>> +	} else if (part == ARM_CORTEX_A57) {
>> +		dpaa2_soc_family = NXP_LS2085;
>> +		printf("\n########## Detected NXP LS208x Rev1.0 with %s\n",
>> +		       ARM_CORTEX_A57_INFO);
>> +	} else if (part == ARM_CORTEX_A72) {
>> +		dpaa2_soc_family = NXP_LS2088;
>> +		printf("\n########## Detected NXP LS208x with %s\n",
>> +		       ARM_CORTEX_A72_INFO);
>> +	}
>> +	return 0;
>> +}
>> +
>> +static void
>> +check_cpu_part(void)
>> +{
>> +	FILE *stream;
>> +
>> +	stream = fopen("/proc/cpuinfo", "r");
>> +	if (!stream) {
>> +		PMD_INIT_LOG(WARNING, "Unable to open /proc/cpuinfo\n");
>> +		return;
>> +	}
>> +	cpuinfo_arm(stream);
>> +
>> +	fclose(stream);
>> +}
>> +
>>  static int
>>  configure_dpio_qbman_swp(struct dpaa2_dpio_dev *dpio_dev)
>>  {
>> @@ -326,6 +390,16 @@ static inline struct dpaa2_dpio_dev *dpaa2_get_qbman_swp(void)
>>  {
>>  	struct dpaa2_dpio_dev *dpio_dev;
>>  	struct vfio_region_info reg_info = { .argsz = sizeof(reg_info)};
>> +	static int first_time;
>> +
>> +	if (!first_time) {
>> +		check_cpu_part();
>> +		if (dpaa2_soc_family == NXP_LS1088) {
>> +			dpaa2_core_cluster_base = 0x02;
>> +			dpaa2_cluster_sz = 4;
> Can this device configuration information passed through dt/the means
> where you are populating the fsl bus for dpio ?
>
> if not arm64 cpu part identification code can go in arm64 common
> code. Even better if we have EAL API for same. Looks like x86 similar
> attribute called "model"
>

This is good idea to have something equivalent in EAL. let me try to 
make an attempt on it.

^ permalink raw reply

* Re: [PATCH 17/32] net/dpaa2: dpbp based mempool hw offload driver
From: Jerin Jacob @ 2016-12-15  6:54 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: Hemant Agrawal, dev, thomas.monjalon, bruce.richardson
In-Reply-To: <8a6fe787-f8e6-e326-c1b9-42b001644885@nxp.com>

On Thu, Dec 15, 2016 at 12:07:51PM +0530, Shreyansh Jain wrote:
> On Thursday 15 December 2016 11:39 AM, Jerin Jacob wrote:
> > On Sun, Dec 04, 2016 at 11:47:12PM +0530, Hemant Agrawal wrote:
> > > DPBP represent a buffer pool instance in DPAA2-QBMAN
> > > HW accelerator.
> > > 
> > > All buffers needs to be programmed in the HW accelerator.
> > > 
> > > Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> > > ---
> > >  config/defconfig_arm64-dpaa2-linuxapp-gcc |   5 +
> > >  drivers/net/dpaa2/Makefile                |   2 +
> > >  drivers/net/dpaa2/base/dpaa2_hw_dpbp.c    | 366 ++++++++++++++++++++++++++++++
> > >  drivers/net/dpaa2/base/dpaa2_hw_dpbp.h    | 101 +++++++++
> > >  drivers/net/dpaa2/base/dpaa2_hw_pvt.h     |   7 +
> > 
> > 
> > How about moving the external mempool driver to RTE_SDK/driver/pool.
> > We are planning to push our external mempool driver to driver/pool.
> 
> I really like the idea of this separation:
> 
> So,
> ..drivers/net/<all PMDs>
> ..drivers/crypto/<all crypto PMDs>
> ..drivers/bus/<all bus handlers/drivers>
> ..drivers/pool/<all Pool handlers/drivers>
> 
> only concern I see for now is resolving dependency of symbols across this
> structure. for example, DPAA2 Pool would be dependent on some DPAA2 specific
> objects - which then are again used in crypto/ and net/.
> 
> It is possible to have drivers/common (which DPAA2 PMD patchset is already
> doing). How are you doing that?

Same approach. driver/common/octeontx directory for common octeontx driver code

^ permalink raw reply

* Re: [PATCH 1/4] eal/common: introduce rte_memset on IA platform
From: Yang, Zhiyong @ 2016-12-15  6:51 UTC (permalink / raw)
  To: Yang, Zhiyong, Ananyev, Konstantin, Thomas Monjalon
  Cc: dev@dpdk.org, yuanhan.liu@linux.intel.com, Richardson, Bruce,
	De Lara Guarch, Pablo
In-Reply-To: <E182254E98A5DA4EB1E657AC7CB9BD2A3EB58E90@BGSMSX101.gar.corp.intel.com>

Hi, Thomas, Konstantin:

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yang, Zhiyong
> Sent: Sunday, December 11, 2016 8:33 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> Monjalon <thomas.monjalon@6wind.com>
> Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on
> IA platform
> 
> Hi, Konstantin, Bruce:
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Thursday, December 8, 2016 6:31 PM
> > To: Yang, Zhiyong <zhiyong.yang@intel.com>; Thomas Monjalon
> > <thomas.monjalon@6wind.com>
> > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset
> > on IA platform
> >
> >
> >
> > > -----Original Message-----
> > > From: Yang, Zhiyong
> > > Sent: Thursday, December 8, 2016 9:53 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> > > Monjalon <thomas.monjalon@6wind.com>
> > > Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Richardson, Bruce
> > > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > > <pablo.de.lara.guarch@intel.com>
> > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset
> > > on IA platform
> > >
> > extern void *(*__rte_memset_vector)( (void *s, int c, size_t n);
> >
> > static inline void*
> > rte_memset_huge(void *s, int c, size_t n) {
> >    return __rte_memset_vector(s, c, n); }
> >
> > static inline void *
> > rte_memset(void *s, int c, size_t n)
> > {
> > 	If (n < XXX)
> > 		return rte_memset_scalar(s, c, n);
> > 	else
> > 		return rte_memset_huge(s, c, n);
> > }
> >
> > XXX could be either a define, or could also be a variable, so it can
> > be setuped at startup, depending on the architecture.
> >
> > Would that work?
> > Konstantin
> >
I have implemented the code for  choosing the functions at run time.
rte_memcpy is used more frequently, So I test it at run time. 

typedef void *(*rte_memcpy_vector_t)(void *dst, const void *src, size_t n);
extern rte_memcpy_vector_t rte_memcpy_vector;
static inline void *
rte_memcpy(void *dst, const void *src, size_t n)
{
        return rte_memcpy_vector(dst, src, n);
}
In order to reduce the overhead at run time, 
I assign the function address to var rte_memcpy_vector before main() starts to init the var.

static void __attribute__((constructor))
rte_memcpy_init(void)
{
	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
	{
		rte_memcpy_vector = rte_memcpy_avx2;
	}
	else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
	{
		rte_memcpy_vector = rte_memcpy_sse;
	}
	else
	{
		rte_memcpy_vector = memcpy;
	}

}
I run the same virtio/vhost loopback tests without NIC.
I can see the  throughput drop  when running choosing functions at run time
compared to original code as following on the same platform(my machine is haswell) 
	Packet size	perf drop
	64 		-4%
	256 		-5.4%
	1024		-5%
	1500		-2.5%
Another thing, I run the memcpy_perf_autotest,  when N= <128, 
the rte_memcpy perf gains almost disappears
When choosing functions at run time.  For N=other numbers, the perf gains will become narrow.

Thanks
Zhiyong

^ permalink raw reply

* Re: [PATCH 2/2] hyperv: VMBUS support infrastucture
From: Shreyansh Jain @ 2016-12-15  6:49 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Stephen Hemminger
In-Reply-To: <20161214235920.12877-3-sthemmin@microsoft.com>

On Thursday 15 December 2016 05:29 AM, Stephen Hemminger wrote:
> Generalize existing bus support to handle VMBUS in Hyper-V.
> Most of the code is based of existing model for PCI, the difference
> is how bus is represented in sysfs and how addressing works.
>
> This is based on earlier code contributed by Brocade.
> It supports only 4.9 or later versions of the Linux kernel
> at this time (not older kernels or BSD).
>
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> ---
>  lib/librte_eal/common/Makefile              |   2 +-
>  lib/librte_eal/common/eal_common_devargs.c  |   7 +
>  lib/librte_eal/common/eal_common_options.c  |  38 ++
>  lib/librte_eal/common/eal_internal_cfg.h    |   3 +-
>  lib/librte_eal/common/eal_options.h         |   6 +
>  lib/librte_eal/common/eal_private.h         |   5 +
>  lib/librte_eal/common/include/rte_devargs.h |   8 +
>  lib/librte_eal/common/include/rte_vmbus.h   | 247 ++++++++
>  lib/librte_eal/linuxapp/eal/Makefile        |   6 +
>  lib/librte_eal/linuxapp/eal/eal.c           |  11 +
>  lib/librte_eal/linuxapp/eal/eal_vmbus.c     | 906 ++++++++++++++++++++++++++++
>  lib/librte_ether/rte_ethdev.c               |  90 +++
>  lib/librte_ether/rte_ethdev.h               |  28 +-
>  mk/rte.app.mk                               |   1 +
>  14 files changed, 1354 insertions(+), 4 deletions(-)
>  create mode 100644 lib/librte_eal/common/include/rte_vmbus.h
>  create mode 100644 lib/librte_eal/linuxapp/eal/eal_vmbus.c
>
> diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
> index a92c984..9254bae 100644
> --- a/lib/librte_eal/common/Makefile
> +++ b/lib/librte_eal/common/Makefile
> @@ -33,7 +33,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
>
>  INC := rte_branch_prediction.h rte_common.h
>  INC += rte_debug.h rte_eal.h rte_errno.h rte_launch.h rte_lcore.h
> -INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h
> +INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h rte_vmbus.h
>  INC += rte_per_lcore.h rte_random.h
>  INC += rte_tailq.h rte_interrupts.h rte_alarm.h
>  INC += rte_string_fns.h rte_version.h
> diff --git a/lib/librte_eal/common/eal_common_devargs.c b/lib/librte_eal/common/eal_common_devargs.c
> index e403717..934ca84 100644
> --- a/lib/librte_eal/common/eal_common_devargs.c
> +++ b/lib/librte_eal/common/eal_common_devargs.c
> @@ -113,6 +113,13 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str)
>  			goto fail;
>
>  		break;
> +	case RTE_DEVTYPE_WHITELISTED_VMBUS:
> +	case RTE_DEVTYPE_BLACKLISTED_VMBUS:
> +#ifdef RTE_LIBRTE_HV_PMD
> +		if (uuid_parse(buf, devargs->uuid) == 0)
> +			break;
> +#endif
> +		goto fail;
>  	}
>
>  	free(buf);
> diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
> index 6ca8af1..6aea87d 100644
> --- a/lib/librte_eal/common/eal_common_options.c
> +++ b/lib/librte_eal/common/eal_common_options.c
> @@ -95,6 +95,11 @@ eal_long_options[] = {
>  	{OPT_VFIO_INTR,         1, NULL, OPT_VFIO_INTR_NUM        },
>  	{OPT_VMWARE_TSC_MAP,    0, NULL, OPT_VMWARE_TSC_MAP_NUM   },
>  	{OPT_XEN_DOM0,          0, NULL, OPT_XEN_DOM0_NUM         },
> +#ifdef RTE_LIBRTE_HV_PMD
> +	{OPT_NO_VMBUS,          0, NULL, OPT_NO_VMBUS_NUM         },
> +	{OPT_VMBUS_BLACKLIST,   1, NULL, OPT_VMBUS_BLACKLIST_NUM  },
> +	{OPT_VMBUS_WHITELIST,   1, NULL, OPT_VMBUS_WHITELIST_NUM  },
> +#endif
>  	{0,                     0, NULL, 0                        }
>  };
>
> @@ -855,6 +860,21 @@ eal_parse_common_option(int opt, const char *optarg,
>  		conf->no_pci = 1;
>  		break;
>
> +#ifdef RTE_LIBRTE_HV_PMD
> +	case OPT_NO_VMBUS_NUM:
> +		conf->no_vmbus = 1;
> +		break;
> +	case OPT_VMBUS_BLACKLIST_NUM:
> +		if (rte_eal_devargs_add(RTE_DEVTYPE_BLACKLISTED_VMBUS,
> +					optarg) < 0)
> +			return -1;
> +		break;
> +	case OPT_VMBUS_WHITELIST_NUM:
> +		if (rte_eal_devargs_add(RTE_DEVTYPE_WHITELISTED_VMBUS,
> +				optarg) < 0)
> +			return -1;
> +		break;
> +#endif
>  	case OPT_NO_HPET_NUM:
>  		conf->no_hpet = 1;
>  		break;
> @@ -987,6 +1007,14 @@ eal_check_common_options(struct internal_config *internal_cfg)
>  		return -1;
>  	}
>
> +#ifdef RTE_LIBRTE_HV_PMD
> +	if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_VMBUS) != 0 &&
> +		rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_VMBUS) != 0) {
> +		RTE_LOG(ERR, EAL, "Options vmbus blacklist and whitelist "
> +			"cannot be used at the same time\n");
> +		return -1;
> +	}
> +#endif
>  	return 0;
>  }
>
> @@ -1036,5 +1064,15 @@ eal_common_usage(void)
>  	       "  --"OPT_NO_PCI"            Disable PCI\n"
>  	       "  --"OPT_NO_HPET"           Disable HPET\n"
>  	       "  --"OPT_NO_SHCONF"         No shared config (mmap'd files)\n"
> +#ifdef RTE_LIBRTE_HV_PMD
> +	       "  --"OPT_NO_VMBUS"          Disable VMBUS\n"
> +	       "  --"OPT_VMBUS_BLACKLIST" Add a VMBUS device to black list.\n"
> +	       "                      Prevent EAL from using this PCI device. The argument\n"
> +	       "                      format is device UUID.\n"
> +	       "  --"OPT_VMBUS_WHITELIST" Add a VMBUS device to white list.\n"
> +	       "                      Only use the specified VMBUS devices. The argument format\n"
> +	       "                      is device UUID This option can be present\n"
> +	       "                      several times (once per device).\n"
> +#endif
>  	       "\n", RTE_MAX_LCORE);
>  }
> diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
> index 5f1367e..1827194 100644
> --- a/lib/librte_eal/common/eal_internal_cfg.h
> +++ b/lib/librte_eal/common/eal_internal_cfg.h
> @@ -69,7 +69,8 @@ struct internal_config {
>  	volatile unsigned no_pci;         /**< true to disable PCI */
>  	volatile unsigned no_hpet;        /**< true to disable HPET */
>  	volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
> -										* instead of native TSC */
> +					   * instead of native TSC */
> +	volatile unsigned no_vmbus;       /**< true to disable VMBUS */
>  	volatile unsigned no_shconf;      /**< true if there is no shared config */
>  	volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */
>  	volatile enum rte_proc_type_t process_type; /**< multi-process proc type */
> diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
> index a881c62..156727e 100644
> --- a/lib/librte_eal/common/eal_options.h
> +++ b/lib/librte_eal/common/eal_options.h
> @@ -83,6 +83,12 @@ enum {
>  	OPT_VMWARE_TSC_MAP_NUM,
>  #define OPT_XEN_DOM0          "xen-dom0"
>  	OPT_XEN_DOM0_NUM,
> +#define OPT_NO_VMBUS          "no-vmbus"
> +	OPT_NO_VMBUS_NUM,
> +#define OPT_VMBUS_BLACKLIST   "vmbus-blacklist"
> +	OPT_VMBUS_BLACKLIST_NUM,
> +#define OPT_VMBUS_WHITELIST   "vmbus-whitelist"
> +	OPT_VMBUS_WHITELIST_NUM,
>  	OPT_LONG_MAX_NUM
>  };
>
> diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
> index 9e7d8f6..c856c63 100644
> --- a/lib/librte_eal/common/eal_private.h
> +++ b/lib/librte_eal/common/eal_private.h
> @@ -210,6 +210,11 @@ int pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
>  		struct mapped_pci_resource *uio_res, int map_idx);
>
>  /**
> + * VMBUS related functions and structures
> + */
> +int rte_eal_vmbus_init(void);
> +
> +/**
>   * Init tail queues for non-EAL library structures. This is to allow
>   * the rings, mempools, etc. lists to be shared among multiple processes
>   *
> diff --git a/lib/librte_eal/common/include/rte_devargs.h b/lib/librte_eal/common/include/rte_devargs.h
> index 88120a1..c079d28 100644
> --- a/lib/librte_eal/common/include/rte_devargs.h
> +++ b/lib/librte_eal/common/include/rte_devargs.h
> @@ -51,6 +51,9 @@ extern "C" {
>  #include <stdio.h>
>  #include <sys/queue.h>
>  #include <rte_pci.h>
> +#ifdef RTE_LIBRTE_HV_PMD
> +#include <uuid/uuid.h>
> +#endif
>
>  /**
>   * Type of generic device
> @@ -59,6 +62,8 @@ enum rte_devtype {
>  	RTE_DEVTYPE_WHITELISTED_PCI,
>  	RTE_DEVTYPE_BLACKLISTED_PCI,
>  	RTE_DEVTYPE_VIRTUAL,
> +	RTE_DEVTYPE_WHITELISTED_VMBUS,
> +	RTE_DEVTYPE_BLACKLISTED_VMBUS,
>  };
>
>  /**
> @@ -88,6 +93,9 @@ struct rte_devargs {
>  			/** Driver name. */
>  			char drv_name[32];
>  		} virt;
> +#ifdef RTE_LIBRTE_HV_PMD
> +		uuid_t uuid;
> +#endif
>  	};
>  	/** Arguments string as given by user or "" for no argument. */
>  	char *args;
> diff --git a/lib/librte_eal/common/include/rte_vmbus.h b/lib/librte_eal/common/include/rte_vmbus.h
> new file mode 100644
> index 0000000..8540539
> --- /dev/null
> +++ b/lib/librte_eal/common/include/rte_vmbus.h
> @@ -0,0 +1,247 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2013-2016 Brocade Communications Systems, Inc.
> + *   Copyright(c) 2016 Microsoft Corporation
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + */
> +
> +#ifndef _RTE_VMBUS_H_
> +#define _RTE_VMBUS_H_
> +
> +/**
> + * @file
> + *
> + * RTE VMBUS Interface
> + */
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <limits.h>
> +#include <errno.h>
> +#include <uuid/uuid.h>
> +#include <sys/queue.h>
> +#include <stdint.h>
> +#include <inttypes.h>
> +
> +#include <rte_debug.h>
> +#include <rte_interrupts.h>
> +#include <rte_dev.h>
> +
> +TAILQ_HEAD(vmbus_device_list, rte_vmbus_device);
> +TAILQ_HEAD(vmbus_driver_list, rte_vmbus_driver);
> +
> +extern struct vmbus_driver_list vmbus_driver_list;
> +extern struct vmbus_device_list vmbus_device_list;
> +
> +/** Pathname of VMBUS devices directory. */
> +#define SYSFS_VMBUS_DEVICES "/sys/bus/vmbus/devices"
> +
> +#define UUID_BUF_SZ	(36 + 1)
> +	
> +
> +/** Maximum number of VMBUS resources. */
> +#define VMBUS_MAX_RESOURCE 7
> +
> +/**
> + * A structure describing a VMBUS device.
> + */
> +struct rte_vmbus_device {
> +	TAILQ_ENTRY(rte_vmbus_device) next;     /**< Next probed VMBUS device. */
> +	struct rte_device device;               /**< Inherit core device */
> +	uuid_t device_id;			/**< VMBUS device id */
> +	uuid_t class_id;			/**< VMBUS device type */
> +	uint32_t relid;				/**< VMBUS id for notification */
> +	uint8_t	monitor_id;
> +	struct rte_intr_handle intr_handle;     /**< Interrupt handle */
> +	const struct rte_vmbus_driver *driver;  /**< Associated driver */
> +
> +	struct rte_mem_resource mem_resource[VMBUS_MAX_RESOURCE];
> +						/**< VMBUS Memory Resource */
> +	char sysfs_name[];			/**< Name in sysfs bus directory */
> +};
> +
> +struct rte_vmbus_driver;
> +
> +/**
> + * Initialisation function for the driver called during VMBUS probing.
> + */
> +typedef int (vmbus_probe_t)(struct rte_vmbus_driver *, struct rte_vmbus_device *);
> +
> +/**
> + * Uninitialisation function for the driver called during hotplugging.
> + */
> +typedef int (vmbus_remove_t)(struct rte_vmbus_device *);
> +
> +/**
> + * A structure describing a VMBUS driver.
> + */
> +struct rte_vmbus_driver {
> +	TAILQ_ENTRY(rte_vmbus_driver) next;     /**< Next in list. */
> +	struct rte_driver driver;
> +	vmbus_probe_t *probe;                   /**< Device Probe function. */
> +	vmbus_remove_t *remove;                 /**< Device Remove function. */
> +
> +	const uuid_t *id_table;			/**< ID table, NULL terminated. */
> +};
> +
> +struct vmbus_map {
> +	void *addr;
> +	char *path;
> +	uint64_t offset;
> +	uint64_t size;
> +	uint64_t phaddr;
> +};
> +
> +/*
> + * For multi-process we need to reproduce all vmbus mappings in secondary
> + * processes, so save them in a tailq.
> + */
> +struct mapped_vmbus_resource {
> +	TAILQ_ENTRY(mapped_vmbus_resource) next;
> +
> +	uuid_t uuid;
> +	char path[PATH_MAX];
> +	int nb_maps;
> +	struct vmbus_map maps[VMBUS_MAX_RESOURCE];
> +};
> +
> +TAILQ_HEAD(mapped_vmbus_res_list, mapped_vmbus_resource);
> +
> +/**
> + * Scan the content of the VMBUS bus, and the devices in the devices list
> + *
> + * @return
> + *  0 on success, negative on error
> + */
> +int rte_eal_vmbus_scan(void);
> +
> +/**
> + * Probe the VMBUS bus for registered drivers.
> + *
> + * Scan the content of the VMBUS bus, and call the probe() function for
> + * all registered drivers that have a matching entry in its id_table
> + * for discovered devices.
> + *
> + * @return
> + *   - 0 on success.
> + *   - Negative on error.
> + */
> +int rte_eal_vmbus_probe(void);
> +
> +/**
> + * Map the VMBUS device resources in user space virtual memory address
> + *
> + * @param dev
> + *   A pointer to a rte_vmbus_device structure describing the device
> + *   to use
> + *
> + * @return
> + *   0 on success, negative on error and positive if no driver
> + *   is found for the device.
> + */
> +int rte_eal_vmbus_map_device(struct rte_vmbus_device *dev);
> +
> +/**
> + * Unmap this device
> + *
> + * @param dev
> + *   A pointer to a rte_vmbus_device structure describing the device
> + *   to use
> + */
> +void rte_eal_vmbus_unmap_device(struct rte_vmbus_device *dev);
> +
> +/**
> + * Probe the single VMBUS device.
> + *
> + * Scan the content of the VMBUS bus, and find the vmbus device
> + * specified by device uuid, then call the probe() function for
> + * registered driver that has a matching entry in its id_table for
> + * discovered device.
> + *
> + * @param id
> + * 	The VMBUS device uuid.
> + * @return
> + *   - 0 on success.
> + *   - Negative on error.
> + */
> +int rte_eal_vmbus_probe_one(uuid_t id);
> +
> +/**
> + * Close the single VMBUS device.
> + *
> + * Scan the content of the VMBUS bus, and find the vmbus device id,
> + * then call the remove() function for registered driver that has a
> + * matching entry in its id_table for discovered device.
> + *
> + * @param id
> + * 	The VMBUS device uuid.
> + * @return
> + *   - 0 on success.
> + *   - Negative on error.
> + */
> +int rte_eal_vmbus_detach(uuid_t id);
> +
> +/**
> + * Register a VMBUS driver.
> + *
> + * @param driver
> + *   A pointer to a rte_vmbus_driver structure describing the driver
> + *   to be registered.
> + */
> +void rte_eal_vmbus_register(struct rte_vmbus_driver *driver);
> +
> +/** Helper for VMBUS device registration from driver nstance */
> +#define RTE_PMD_REGISTER_VMBUS(nm, vmbus_drv) \
> +RTE_INIT(vmbusinitfn_ ##nm); \
> +static void vmbusinitfn_ ##nm(void) \
> +{\
> +	(vmbus_drv).driver.name = RTE_STR(nm);\
> +	rte_eal_vmbus_register(&vmbus_drv); \
> +} \
> +RTE_PMD_EXPORT_NAME(nm, __COUNTER__)
> +
> +/**
> + * Unregister a VMBUS driver.
> + *
> + * @param driver
> + *   A pointer to a rte_vmbus_driver structure describing the driver
> + *   to be unregistered.
> + */
> +void rte_eal_vmbus_unregister(struct rte_vmbus_driver *driver);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_VMBUS_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
> index 4e206f0..f6ca384 100644
> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -71,6 +71,11 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
>
> +ifeq ($(CONFIG_RTE_LIBRTE_HV_PMD),y)
> +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_vmbus.c
> +LDLIBS += -luuid
> +endif
> +
>  # from common dir
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_timer.c
> @@ -114,6 +119,7 @@ CFLAGS_eal_hugepage_info.o := -D_GNU_SOURCE
>  CFLAGS_eal_pci.o := -D_GNU_SOURCE
>  CFLAGS_eal_pci_uio.o := -D_GNU_SOURCE
>  CFLAGS_eal_pci_vfio.o := -D_GNU_SOURCE
> +CFLAGS_eal_vmbux.o := -D_GNU_SOURCE
>  CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
>  CFLAGS_eal_common_options.o := -D_GNU_SOURCE
>  CFLAGS_eal_common_thread.o := -D_GNU_SOURCE
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
> index 2075282..71083ec 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -70,6 +70,7 @@
>  #include <rte_cpuflags.h>
>  #include <rte_interrupts.h>
>  #include <rte_pci.h>
> +#include <rte_vmbus.h>
>  #include <rte_dev.h>
>  #include <rte_devargs.h>
>  #include <rte_common.h>
> @@ -830,6 +831,11 @@ rte_eal_init(int argc, char **argv)
>
>  	eal_check_mem_on_local_socket();
>
> +#ifdef RTE_LIBRTE_HV_PMD
> +	if (rte_eal_vmbus_init() < 0)
> +		RTE_LOG(ERR, EAL, "Cannot init VMBUS\n");
> +#endif
> +
>  	if (eal_plugins_init() < 0)
>  		rte_panic("Cannot init plugins\n");
>
> @@ -887,6 +893,11 @@ rte_eal_init(int argc, char **argv)
>  	if (rte_eal_pci_probe())
>  		rte_panic("Cannot probe PCI\n");
>
> +#ifdef RTE_LIBRTE_HV_PMD
> +	if (rte_eal_vmbus_probe() < 0)
> +		rte_panic("Cannot probe VMBUS\n");
> +#endif
> +
>  	rte_eal_mcfg_complete();
>
>  	return fctret;
> diff --git a/lib/librte_eal/linuxapp/eal/eal_vmbus.c b/lib/librte_eal/linuxapp/eal/eal_vmbus.c
> new file mode 100644
> index 0000000..cbd8bd1
> --- /dev/null
> +++ b/lib/librte_eal/linuxapp/eal/eal_vmbus.c
> @@ -0,0 +1,906 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2013-2016 Brocade Communications Systems, Inc.
> + *   Copyright(c) 2016 Microsoft Corporation
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + */
> +
> +#include <string.h>
> +#include <unistd.h>
> +#include <dirent.h>
> +#include <fcntl.h>
> +#include <sys/mman.h>
> +
> +#include <rte_eal.h>
> +#include <rte_tailq.h>
> +#include <rte_log.h>
> +#include <rte_devargs.h>
> +#include <rte_vmbus.h>
> +#include <rte_malloc.h>
> +
> +#include "eal_private.h"
> +#include "eal_pci_init.h"
> +#include "eal_filesystem.h"
> +
> +struct vmbus_driver_list vmbus_driver_list =
> +	TAILQ_HEAD_INITIALIZER(vmbus_driver_list);
> +struct vmbus_device_list vmbus_device_list =
> +	TAILQ_HEAD_INITIALIZER(vmbus_device_list);
> +
> +static void *vmbus_map_addr;
> +
> +static struct rte_tailq_elem rte_vmbus_uio_tailq = {
> +	.name = "UIO_RESOURCE_LIST",
> +};
> +EAL_REGISTER_TAILQ(rte_vmbus_uio_tailq);
> +
> +/*
> + * parse a sysfs file containing one integer value
> + * different to the eal version, as it needs to work with 64-bit values
> + */
> +static int
> +vmbus_get_sysfs_uuid(const char *filename, uuid_t uu)
> +{
> +	char buf[BUFSIZ];
> +	char *cp = NULL;
> +	FILE *f;
> +
> +	f = fopen(filename, "r");
> +	if (f == NULL) {
> +		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
> +				__func__, filename);
> +		return -1;
> +	}
> +
> +	if (fgets(buf, sizeof(buf), f) == NULL) {
> +		RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
> +				__func__, filename);
> +		fclose(f);
> +		return -1;
> +	}
> +	fclose(f);
> +
> +	cp = strchr(cp, '\n');
> +	if (cp)
> +		*cp = '\0';
> +
> +	/* strip { } notation */
> +	if (buf[0] == '{' && (cp = strchr(buf, '}')))
> +		*cp = '\0';
> +
> +	if (uuid_parse(buf, uu) < 0) {
> +		RTE_LOG(ERR, EAL, "%s %s not a valid UUID\n",
> +			filename, buf);
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* map a particular resource from a file */
> +static void *
> +vmbus_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
> +		   int flags)
> +{
> +	void *mapaddr;
> +
> +	/* Map the memory resource of device */
> +	mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
> +		       MAP_SHARED | flags, fd, offset);
> +	if (mapaddr == MAP_FAILED ||
> +	    (requested_addr != NULL && mapaddr != requested_addr)) {
> +		RTE_LOG(ERR, EAL,
> +			"%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s)\n",
> +			__func__, fd, requested_addr,
> +			(unsigned long)size, (unsigned long)offset,
> +			strerror(errno));
> +	} else
> +		RTE_LOG(DEBUG, EAL, "  VMBUS memory mapped at %p\n", mapaddr);
> +
> +	return mapaddr;
> +}
> +
> +/* unmap a particular resource */
> +static void
> +vmbus_unmap_resource(void *requested_addr, size_t size)
> +{
> +	if (requested_addr == NULL)
> +		return;
> +
> +	/* Unmap the VMBUS memory resource of device */
> +	if (munmap(requested_addr, size)) {
> +		RTE_LOG(ERR, EAL, "%s(): cannot munmap(%p, 0x%lx): %s\n",
> +			__func__, requested_addr, (unsigned long)size,
> +			strerror(errno));
> +	} else
> +		RTE_LOG(DEBUG, EAL, "  VMBUS memory unmapped at %p\n",
> +				requested_addr);
> +}
> +
> +/* Only supports current kernel version
> + * Unlike PCI there is no option (or need) to create UIO device.
> + */
> +static int vmbus_get_uio_dev(const char *name,
> +			     char *dstbuf, size_t buflen)
> +{
> +	char dirname[PATH_MAX];
> +	unsigned int uio_num;
> +	struct dirent *e;
> +	DIR *dir;
> +
> +	snprintf(dirname, sizeof(dirname),
> +		 "/sys/bus/vmbus/devices/%s/uio", name);
> +
> +	dir = opendir(dirname);
> +	if (dir == NULL) {
> +		RTE_LOG(ERR, EAL, "Cannot map uio resources for %s: %s\n",
> +			name, strerror(errno));
> +		return -1;
> +	}
> +
> +	/* take the first file starting with "uio" */
> +	while ((e = readdir(dir)) != NULL) {
> +		if (sscanf(e->d_name, "uio%u", &uio_num) != 1)
> +			continue;
> +
> +		snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
> +		break;
> +	}
> +	closedir(dir);
> +
> +	return e ? (int) uio_num : -1;
> +}
> +
> +/*
> + * parse a sysfs file containing one integer value
> + * different to the eal version, as it needs to work with 64-bit values
> + */
> +static int
> +vmbus_parse_sysfs_value(const char *dir, const char *name,
> +			uint64_t *val)
> +{
> +	char filename[PATH_MAX];
> +	FILE *f;
> +	char buf[BUFSIZ];
> +	char *end = NULL;
> +
> +	snprintf(filename, sizeof(filename), "%s/%s", dir, name);
> +	f = fopen(filename, "r");
> +	if (f == NULL) {
> +		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
> +				__func__, filename);
> +		return -1;
> +	}
> +
> +	if (fgets(buf, sizeof(buf), f) == NULL) {
> +		RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
> +				__func__, filename);
> +		fclose(f);
> +		return -1;
> +	}
> +	fclose(f);
> +
> +	*val = strtoull(buf, &end, 0);
> +	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
> +		RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
> +				__func__, filename);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +/* Get mappings out of values provided by uio */
> +static int
> +vmbus_uio_get_mappings(const char *uioname,
> +		       struct vmbus_map maps[])
> +{
> +	int i;
> +
> +	for (i = 0; i != VMBUS_MAX_RESOURCE; i++) {
> +		struct vmbus_map *map = &maps[i];
> +		char dirname[PATH_MAX];
> +
> +		/* check if map directory exists */
> +		snprintf(dirname, sizeof(dirname),
> +			 "%s/maps/map%d", uioname, i);
> +
> +		if (access(dirname, F_OK) != 0)
> +			break;
> +
> +		/* get mapping offset */
> +		if (vmbus_parse_sysfs_value(dirname, "offset",
> +					    &map->offset) < 0)
> +			return -1;
> +
> +		/* get mapping size */
> +		if (vmbus_parse_sysfs_value(dirname, "size",
> +					    &map->size) < 0)
> +			return -1;
> +
> +		/* get mapping physical address */
> +		if (vmbus_parse_sysfs_value(dirname, "addr",
> +					    &maps->phaddr) < 0)
> +			return -1;
> +	}
> +
> +	return i;
> +}
> +
> +static void
> +vmbus_uio_free_resource(struct rte_vmbus_device *dev,
> +		struct mapped_vmbus_resource *uio_res)
> +{
> +	rte_free(uio_res);
> +
> +	if (dev->intr_handle.fd) {
> +		close(dev->intr_handle.fd);
> +		dev->intr_handle.fd = -1;
> +		dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
> +	}
> +}
> +
> +static struct mapped_vmbus_resource *
> +vmbus_uio_alloc_resource(struct rte_vmbus_device *dev)
> +{
> +	struct mapped_vmbus_resource *uio_res;
> +	char dirname[PATH_MAX], devname[PATH_MAX];
> +	int uio_num, nb_maps;
> +
> +	uio_num = vmbus_get_uio_dev(dev->sysfs_name, dirname, sizeof(dirname));
> +	if (uio_num < 0) {
> +		RTE_LOG(WARNING, EAL,
> +			"  %s not managed by UIO driver, skipping\n",
> +			dev->sysfs_name);
> +		return NULL;
> +	}
> +
> +	/* allocate the mapping details for secondary processes*/
> +	uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0);
> +	if (uio_res == NULL) {
> +		RTE_LOG(ERR, EAL,
> +			"%s(): cannot store uio mmap details\n", __func__);
> +		goto error;
> +	}
> +
> +	snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
> +	dev->intr_handle.fd = open(devname, O_RDWR);
> +	if (dev->intr_handle.fd < 0) {
> +		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> +			devname, strerror(errno));
> +		goto error;
> +	}
> +
> +	dev->intr_handle.type = RTE_INTR_HANDLE_UIO_INTX;
> +
> +	snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
> +	uuid_copy(uio_res->uuid, dev->device_id);
> +
> +	nb_maps = vmbus_uio_get_mappings(dirname, uio_res->maps);
> +	if (nb_maps < 0)
> +		goto error;
> +
> +	RTE_LOG(DEBUG, EAL, "Found %d memory maps for device %s\n",
> +		nb_maps, dev->sysfs_name);
> +
> +	return uio_res;
> +
> + error:
> +	vmbus_uio_free_resource(dev, uio_res);
> +	return NULL;
> +}
> +
> +static int
> +vmbus_uio_map_resource_by_index(struct rte_vmbus_device *dev,
> +				unsigned int res_idx,
> +				struct mapped_vmbus_resource *uio_res,
> +				unsigned int map_idx)
> +{
> +	struct vmbus_map *maps = uio_res->maps;
> +	char devname[PATH_MAX];
> +	void *mapaddr;
> +	int fd;
> +
> +	snprintf(devname, sizeof(devname),
> +		 "/sys/bus/vmbus/%s/resource%u", dev->sysfs_name, res_idx);
> +
> +	fd = open(devname, O_RDWR);
> +	if (fd < 0) {
> +		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> +				devname, strerror(errno));
> +		return -1;
> +	}
> +
> +	/* allocate memory to keep path */
> +	maps[map_idx].path = rte_malloc(NULL, strlen(devname) + 1, 0);
> +	if (maps[map_idx].path == NULL) {
> +		RTE_LOG(ERR, EAL, "Cannot allocate memory for path: %s\n",
> +				strerror(errno));
> +		return -1;
> +	}
> +
> +	/* try mapping somewhere close to the end of hugepages */
> +	if (vmbus_map_addr == NULL)
> +		vmbus_map_addr = pci_find_max_end_va();
> +
> +	mapaddr = vmbus_map_resource(vmbus_map_addr, fd, 0,
> +				     dev->mem_resource[res_idx].len, 0);
> +	close(fd);
> +	if (mapaddr == MAP_FAILED) {
> +		rte_free(maps[map_idx].path);
> +		return -1;
> +	}
> +
> +	vmbus_map_addr = RTE_PTR_ADD(mapaddr,
> +				     dev->mem_resource[res_idx].len);
> +
> +	maps[map_idx].phaddr = dev->mem_resource[res_idx].phys_addr;
> +	maps[map_idx].size = dev->mem_resource[res_idx].len;
> +	maps[map_idx].addr = mapaddr;
> +	maps[map_idx].offset = 0;
> +	strcpy(maps[map_idx].path, devname);
> +	dev->mem_resource[res_idx].addr = mapaddr;
> +
> +	return 0;
> +}
> +
> +static void
> +vmbus_uio_unmap(struct mapped_vmbus_resource *uio_res)
> +{
> +	int i;
> +
> +	if (uio_res == NULL)
> +		return;
> +
> +	for (i = 0; i != uio_res->nb_maps; i++) {
> +		vmbus_unmap_resource(uio_res->maps[i].addr,
> +				     uio_res->maps[i].size);
> +
> +		if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +			rte_free(uio_res->maps[i].path);
> +	}
> +}
> +
> +static struct mapped_vmbus_resource *
> +vmbus_uio_find_resource(struct rte_vmbus_device *dev)
> +{
> +	struct mapped_vmbus_resource *uio_res;
> +	struct mapped_vmbus_res_list *uio_res_list =
> +			RTE_TAILQ_CAST(rte_vmbus_uio_tailq.head, mapped_vmbus_res_list);
> +
> +	if (dev == NULL)
> +		return NULL;
> +
> +	TAILQ_FOREACH(uio_res, uio_res_list, next) {
> +		if (uuid_compare(uio_res->uuid, dev->device_id) == 0)
> +			return uio_res;
> +	}
> +	return NULL;
> +}
> +
> +/* unmap the VMBUS resource of a VMBUS device in virtual memory */
> +static void
> +vmbus_uio_unmap_resource(struct rte_vmbus_device *dev)
> +{
> +	struct mapped_vmbus_resource *uio_res;
> +	struct mapped_vmbus_res_list *uio_res_list =
> +			RTE_TAILQ_CAST(rte_vmbus_uio_tailq.head, mapped_vmbus_res_list);
> +
> +	if (dev == NULL)
> +		return;
> +
> +	/* find an entry for the device */
> +	uio_res = vmbus_uio_find_resource(dev);
> +	if (uio_res == NULL)
> +		return;
> +
> +	/* secondary processes - just free maps */
> +	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> +		return vmbus_uio_unmap(uio_res);
> +
> +	TAILQ_REMOVE(uio_res_list, uio_res, next);
> +
> +	/* unmap all resources */
> +	vmbus_uio_unmap(uio_res);
> +
> +	/* free uio resource */
> +	rte_free(uio_res);
> +
> +	/* close fd if in primary process */
> +	close(dev->intr_handle.fd);
> +	if (dev->intr_handle.uio_cfg_fd >= 0) {
> +		close(dev->intr_handle.uio_cfg_fd);
> +		dev->intr_handle.uio_cfg_fd = -1;
> +	}
> +
> +	dev->intr_handle.fd = -1;
> +	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
> +}
> +
> +static int
> +vmbus_uio_map_secondary(struct rte_vmbus_device *dev)
> +{
> +	struct mapped_vmbus_resource *uio_res;
> +	struct mapped_vmbus_res_list *uio_res_list =
> +			RTE_TAILQ_CAST(rte_vmbus_uio_tailq.head,
> +				       mapped_vmbus_res_list);
> +
> +	TAILQ_FOREACH(uio_res, uio_res_list, next) {
> +		int i;
> +
> +		/* skip this element if it doesn't match our id */
> +		if (uuid_compare(uio_res->uuid, dev->device_id))
> +			continue;
> +
> +		for (i = 0; i != uio_res->nb_maps; i++) {
> +			void *mapaddr;
> +			int fd;
> +
> +			fd = open(uio_res->maps[i].path, O_RDWR);
> +			if (fd < 0) {
> +				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> +					uio_res->maps[i].path, strerror(errno));
> +				return -1;
> +			}
> +
> +			mapaddr = vmbus_map_resource(uio_res->maps[i].addr, fd,
> +						     uio_res->maps[i].offset,
> +						     uio_res->maps[i].size, 0);
> +			/* fd is not needed in slave process, close it */
> +			close(fd);
> +
> +			if (mapaddr == uio_res->maps[i].addr)
> +				continue;
> +
> +			RTE_LOG(ERR, EAL,
> +				"Cannot mmap device resource file %s to address: %p\n",
> +				uio_res->maps[i].path,
> +				uio_res->maps[i].addr);
> +
> +			/* unmap addrs correctly mapped */
> +			while (i != 0) {
> +				--i;
> + 				vmbus_unmap_resource(uio_res->maps[i].addr,
> +						     uio_res->maps[i].size);
> +			}
> +			return -1;
> +
> +		}
> +		return 0;
> +	}
> +
> +	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
> +	return 1;
> +}
> +
> +/* map the resources of a vmbus device in virtual memory */
> +int
> +rte_eal_vmbus_map_device(struct rte_vmbus_device *dev)
> +{
> +	struct mapped_vmbus_resource *uio_res;
> +	struct mapped_vmbus_res_list *uio_res_list =
> +		RTE_TAILQ_CAST(rte_vmbus_uio_tailq.head, mapped_vmbus_res_list);
> +	int i, ret, map_idx = 0;
> +
> +	dev->intr_handle.fd = -1;
> +	dev->intr_handle.uio_cfg_fd = -1;
> +	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
> +
> +	/* secondary processes - use already recorded details */
> +	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> +		return vmbus_uio_map_secondary(dev);
> +
> +	/* allocate uio resource */
> +	uio_res = vmbus_uio_alloc_resource(dev);
> +	if (uio_res == NULL)
> +		return -1;
> +
> +	/* Map all BARs */
> +	for (i = 0; i != VMBUS_MAX_RESOURCE; i++) {
> +		uint64_t phaddr;
> +
> +		/* skip empty BAR */
> +		phaddr = dev->mem_resource[i].phys_addr;
> +		if (phaddr == 0)
> +			continue;
> +
> +		ret = vmbus_uio_map_resource_by_index(dev, i,
> +						      uio_res, map_idx);
> +		if (ret)
> +			goto error;
> +
> +		map_idx++;
> +	}
> +
> +	uio_res->nb_maps = map_idx;
> +
> +	TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);
> +
> +	return 0;
> +error:
> +	for (i = 0; i < map_idx; i++) {
> +		vmbus_unmap_resource(uio_res->maps[i].addr,
> +				     uio_res->maps[i].size);
> +		rte_free(uio_res->maps[i].path);
> +	}
> +	vmbus_uio_free_resource(dev, uio_res);
> +	return -1;
> +}
> +
> +/* Scan one vmbus sysfs entry, and fill the devices list from it. */
> +static int
> +vmbus_scan_one(const char *name)
> +{
> +	struct rte_vmbus_device *dev, *dev2;
> +	char filename[PATH_MAX];
> +	char dirname[PATH_MAX];
> +	unsigned long tmp;
> +
> +	dev = malloc(sizeof(*dev) + strlen(name) + 1);
> +	if (dev == NULL)
> +		return -1;
> +
> +	memset(dev, 0, sizeof(*dev));
> +	strcpy(dev->sysfs_name, name);
> +	if (dev->sysfs_name == NULL)
> +		goto error;
> +
> +	/* sysfs base directory
> +	 *   /sys/bus/vmbus/devices/7a08391f-f5a0-4ac0-9802-d13fd964f8df
> +	 * or on older kernel
> +	 *   /sys/bus/vmbus/devices/vmbus_1
> +	 */
> +	snprintf(dirname, sizeof(dirname), "%s/%s",
> +		 SYSFS_VMBUS_DEVICES, name);
> +
> +	/* get device id */
> +	snprintf(filename, sizeof(filename), "%s/device_id", dirname);
> +	if (vmbus_get_sysfs_uuid(filename, dev->device_id) < 0)
> +		goto error;
> +
> +	/* get device class  */
> +	snprintf(filename, sizeof(filename), "%s/class_id", dirname);
> +	if (vmbus_get_sysfs_uuid(filename, dev->class_id) < 0)
> +		goto error;
> +
> +	/* get relid */
> +	snprintf(filename, sizeof(filename), "%s/id", dirname);
> +	if (eal_parse_sysfs_value(filename, &tmp) < 0)
> +		goto error;
> +	dev->relid = tmp;
> +
> +	/* get monitor id */
> +	snprintf(filename, sizeof(filename), "%s/monitor_id", dirname);
> +	if (eal_parse_sysfs_value(filename, &tmp) < 0)
> +		goto error;
> +	dev->monitor_id = tmp;
> +
> +	/* get numa node */
> +	snprintf(filename, sizeof(filename), "%s/numa_node",
> +		 dirname);
> +	if (eal_parse_sysfs_value(filename, &tmp) < 0)
> +		/* if no NUMA support, set default to 0 */
> +		dev->device.numa_node = 0;
> +	else
> +		dev->device.numa_node = tmp;
> +
> +	/* device is valid, add in list (sorted) */
> +	RTE_LOG(DEBUG, EAL, "Adding vmbus device %s\n", name);
> +
> +	TAILQ_FOREACH(dev2, &vmbus_device_list, next) {
> +		int ret;
> +
> +		ret = uuid_compare(dev->device_id, dev->device_id);
> +		if (ret > 0)
> +			continue;
> +
> +		if (ret < 0) {
> +			TAILQ_INSERT_BEFORE(dev2, dev, next);
> +			rte_eal_device_insert(&dev->device);
> +		} else { /* already registered */
> +			memmove(dev2->mem_resource, dev->mem_resource,
> +				sizeof(dev->mem_resource));
> +			free(dev);
> +		}
> +		return 0;
> +	}
> +
> +	rte_eal_device_insert(&dev->device);
> +	TAILQ_INSERT_TAIL(&vmbus_device_list, dev, next);
> +
> +	return 0;
> +error:
> +	free(dev);
> +	return -1;
> +}
> +
> +/*
> + * Scan the content of the vmbus, and the devices in the devices list
> + */
> +static int
> +vmbus_scan(void)
> +{
> +	struct dirent *e;
> +	DIR *dir;
> +
> +	dir = opendir(SYSFS_VMBUS_DEVICES);
> +	if (dir == NULL) {
> +		if (errno == ENOENT)
> +			return 0;
> +		else {
> +			RTE_LOG(ERR, EAL, "%s(): opendir failed: %s\n",
> +					__func__, strerror(errno));
> +			return -1;
> +		}
> +	}
> +
> +	while ((e = readdir(dir)) != NULL) {
> +		if (e->d_name[0] == '.')
> +			continue;
> +
> +		if (vmbus_scan_one(e->d_name) < 0)
> +			goto error;
> +	}
> +	closedir(dir);
> +	return 0;
> +
> +error:
> +	closedir(dir);
> +	return -1;
> +}
> +
> +/* Init the VMBUS EAL subsystem */
> +int rte_eal_vmbus_init(void)
> +{
> +	/* VMBUS can be disabled */
> +	if (internal_config.no_vmbus)
> +		return 0;
> +
> +	if (vmbus_scan() < 0) {
> +		RTE_LOG(ERR, EAL, "%s(): Cannot scan vmbus\n", __func__);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +/* Below is PROBE part of eal_vmbus library */
> +
> +/*
> + * If device ID match, call the devinit() function of the driver.
> + */
> +static int
> +rte_eal_vmbus_probe_one_driver(struct rte_vmbus_driver *dr,
> +			       struct rte_vmbus_device *dev)
> +{
> +	const uuid_t *id_table;
> +
> +	RTE_LOG(DEBUG, EAL, "  probe driver: %s\n", dr->driver.name);
> +
> +	for (id_table = dr->id_table; !uuid_is_null(*id_table); ++id_table) {
> +		struct rte_devargs *args;
> +		char guid[UUID_BUF_SZ];
> +		int ret;
> +
> +		/* skip devices not assocaited with this device class */
> +		if (uuid_compare(*id_table, dev->class_id) != 0)
> +			continue;
> +
> +		uuid_unparse(dev->device_id, guid);
> +		RTE_LOG(INFO, EAL, "VMBUS device %s on NUMA socket %i\n",
> +			guid, dev->device.numa_node);
> +
> +		/* no initialization when blacklisted, return without error */
> +		args = dev->device.devargs;
> +		if (args && args->type == RTE_DEVTYPE_BLACKLISTED_VMBUS) {
> +			RTE_LOG(INFO, EAL, "  Device is blacklisted, not initializing\n");
> +			return 1;
> +		}
> +
> +		RTE_LOG(INFO, EAL, "  probe driver: %s\n", dr->driver.name);
> +
> +		/* map resources for device */
> +		ret = rte_eal_vmbus_map_device(dev);
> +		if (ret != 0)
> +			return ret;
> +
> +		/* reference driver structure */
> +		dev->driver = dr;
> +
> +		/* call the driver probe() function */
> +		ret = dr->probe(dr, dev);
> +		if (ret)
> +			dev->driver = NULL;
> +
> +		return ret;
> +	}
> +
> +	/* return positive value if driver doesn't support this device */
> +	return 1;
> +}
> +
> +
> +/*
> + * If vendor/device ID match, call the remove() function of the
> + * driver.
> + */
> +static int
> +vmbus_detach_dev(struct rte_vmbus_driver *dr,
> +		 struct rte_vmbus_device *dev)
> +{
> +	const uuid_t *id_table;
> +
> +	for (id_table = dr->id_table; !uuid_is_null(*id_table); ++id_table) {
> +		char guid[UUID_BUF_SZ];
> +
> +		/* skip devices not assocaited with this device class */
> +		if (uuid_compare(*id_table, dev->class_id) != 0)
> +			continue;
> +
> +		uuid_unparse(dev->device_id, guid);
> +		RTE_LOG(INFO, EAL, "VMBUS device %s on NUMA socket %i\n",
> +			guid, dev->device.numa_node);
> +
> +		RTE_LOG(DEBUG, EAL, "  remove driver: %s\n", dr->driver.name);
> +
> +		if (dr->remove && (dr->remove(dev) < 0))
> +			return -1;	/* negative value is an error */
> +
> +		/* clear driver structure */
> +		dev->driver = NULL;
> +
> +		vmbus_uio_unmap_resource(dev);
> +		return 0;
> +	}
> +
> +	/* return positive value if driver doesn't support this device */
> +	return 1;
> +}
> +
> +/*
> + * call the devinit() function of all
> + * registered drivers for the vmbus device. Return -1 if no driver is
> + * found for this class of vmbus device.
> + * The present assumption is that we have drivers only for vmbus network
> + * devices. That's why we don't check driver's id_table now.
> + */
> +static int
> +vmbus_probe_all_drivers(struct rte_vmbus_device *dev)
> +{
> +	struct rte_vmbus_driver *dr = NULL;
> +	int ret;
> +
> +	TAILQ_FOREACH(dr, &vmbus_driver_list, next) {
> +		ret = rte_eal_vmbus_probe_one_driver(dr, dev);
> +		if (ret < 0) {
> +			/* negative value is an error */
> +			RTE_LOG(ERR, EAL, "Failed to probe driver %s\n",
> +				dr->driver.name);
> +			return -1;
> +		}
> +		/* positive value means driver doesn't support it */
> +		if (ret > 0)
> +			continue;
> +
> +		return 0;
> +	}
> +
> +	return 1;
> +}
> +
> +
> +/*
> + * If device ID matches, call the remove() function of all
> + * registered driver for the given device. Return -1 if initialization
> + * failed, return 1 if no driver is found for this device.
> + */
> +static int
> +vmbus_detach_all_drivers(struct rte_vmbus_device *dev)
> +{
> +	struct rte_vmbus_driver *dr;
> +	int rc = 0;
> +
> +	if (dev == NULL)
> +		return -1;
> +
> +	TAILQ_FOREACH(dr, &vmbus_driver_list, next) {
> +		rc = vmbus_detach_dev(dr, dev);
> +		if (rc < 0)
> +			/* negative value is an error */
> +			return -1;
> +		if (rc > 0)
> +			/* positive value means driver doesn't support it */
> +			continue;
> +		return 0;
> +	}
> +	return 1;
> +}
> +
> +/* Detach device specified by its VMBUS id */
> +int
> +rte_eal_vmbus_detach(uuid_t device_id)
> +{
> +	struct rte_vmbus_device *dev;
> +	char ubuf[UUID_BUF_SZ];
> +
> +	TAILQ_FOREACH(dev, &vmbus_device_list, next) {
> +		if (uuid_compare(dev->device_id, device_id) != 0)
> +			continue;
> +
> +		if (vmbus_detach_all_drivers(dev) < 0)
> +			goto err_return;
> +
> +		TAILQ_REMOVE(&vmbus_device_list, dev, next);
> +		free(dev);
> +		return 0;
> +	}
> +	return -1;
> +
> +err_return:
> +	uuid_unparse(device_id, ubuf);
> +	RTE_LOG(WARNING, EAL, "Requested device %s cannot be used\n",
> +		ubuf);
> +	return -1;
> +}
> +
> +/*
> + * Scan the vmbus, and call the devinit() function for
> + * all registered drivers that have a matching entry in its id_table
> + * for discovered devices.
> + */
> +int
> +rte_eal_vmbus_probe(void)
> +{
> +	struct rte_vmbus_device *dev = NULL;
> +
> +	TAILQ_FOREACH(dev, &vmbus_device_list, next) {
> +		char ubuf[UUID_BUF_SZ];
> +
> +		uuid_unparse(dev->device_id, ubuf);
> +
> +		RTE_LOG(DEBUG, EAL, "Probing driver for device %s ...\n",
> +			ubuf);
> +		vmbus_probe_all_drivers(dev);
> +	}
> +	return 0;
> +}
> +
> +/* register vmbus driver */
> +void
> +rte_eal_vmbus_register(struct rte_vmbus_driver *driver)
> +{
> +	TAILQ_INSERT_TAIL(&vmbus_driver_list, driver, next);
> +}
> +
> +/* unregister vmbus driver */
> +void
> +rte_eal_vmbus_unregister(struct rte_vmbus_driver *driver)
> +{
> +	TAILQ_REMOVE(&vmbus_driver_list, driver, next);
> +}
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 1e0f206..6298a8d 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -3282,3 +3282,93 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
>  				-ENOTSUP);
>  	return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask, en);
>  }
> +
> +
> +#ifdef RTE_LIBRTE_HV_PMD
> +int
> +rte_eth_dev_vmbus_probe(struct rte_vmbus_driver *vmbus_drv,
> +			struct rte_vmbus_device *vmbus_dev)
> +{
> +	struct eth_driver  *eth_drv = (struct eth_driver *)vmbus_drv;
> +	struct rte_eth_dev *eth_dev;
> +	char ustr[UUID_BUF_SZ];
> +	int diag;
> +
> +	uuid_unparse(vmbus_dev->device_id, ustr);
> +
> +	eth_dev = rte_eth_dev_allocate(ustr);
> +	if (eth_dev == NULL)
> +		return -ENOMEM;
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +		eth_dev->data->dev_private = rte_zmalloc("ethdev private structure",
> +				  eth_drv->dev_private_size,
> +				  RTE_CACHE_LINE_SIZE);
> +		if (eth_dev->data->dev_private == NULL)
> +			rte_panic("Cannot allocate memzone for private port data\n");
> +	}
> +
> +	eth_dev->vmbus_dev = vmbus_dev;
> +	eth_dev->driver = eth_drv;
> +	eth_dev->data->rx_mbuf_alloc_failed = 0;
> +
> +	/* init user callbacks */
> +	TAILQ_INIT(&(eth_dev->link_intr_cbs));
> +
> +	/*
> +	 * Set the default maximum frame size.
> +	 */
> +	eth_dev->data->mtu = ETHER_MTU;
> +
> +	/* Invoke PMD device initialization function */
> +	diag = (*eth_drv->eth_dev_init)(eth_dev);
> +	if (diag == 0)
> +		return 0;
> +
> +	RTE_PMD_DEBUG_TRACE("driver %s: eth_dev_init(%s) failed\n",
> +			    vmbus_drv->driver.name, ustr);
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +		rte_free(eth_dev->data->dev_private);
> +
> +	return diag;
> +}
> +
> +int
> +rte_eth_dev_vmbus_remove(struct rte_vmbus_device *vmbus_dev)
> +{
> +	const struct eth_driver *eth_drv;
> +	struct rte_eth_dev *eth_dev;
> +	char ustr[UUID_BUF_SZ];
> +	int ret;
> +
> +	if (vmbus_dev == NULL)
> +		return -EINVAL;
> +
> +	uuid_unparse(vmbus_dev->device_id, ustr);
> +	eth_dev = rte_eth_dev_allocated(ustr);
> +	if (eth_dev == NULL)
> +		return -ENODEV;
> +
> +	eth_drv = (const struct eth_driver *)vmbus_dev->driver;
> +
> +	/* Invoke PMD device uninit function */
> +	if (*eth_drv->eth_dev_uninit) {
> +		ret = (*eth_drv->eth_dev_uninit)(eth_dev);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	/* free ether device */
> +	rte_eth_dev_release_port(eth_dev);
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +		rte_free(eth_dev->data->dev_private);
> +
> +	eth_dev->pci_dev = NULL;
> +	eth_dev->driver = NULL;
> +	eth_dev->data = NULL;
> +
> +	return 0;
> +}
> +#endif
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 3c85e33..5050087 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -180,6 +180,7 @@ extern "C" {
>  #include <rte_log.h>
>  #include <rte_interrupts.h>
>  #include <rte_pci.h>
> +#include <rte_vmbus.h>
>  #include <rte_dev.h>
>  #include <rte_devargs.h>
>  #include "rte_ether.h"
> @@ -1628,7 +1629,11 @@ struct rte_eth_dev {
>  	struct rte_eth_dev_data *data;  /**< Pointer to device data */
>  	const struct eth_driver *driver;/**< Driver for this device */
>  	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> -	struct rte_pci_device *pci_dev; /**< PCI info. supplied by probing */
> +	union {
> +		struct rte_pci_device *pci_dev; /**< PCI info. */
> +		struct rte_vmbus_device *vmbus_dev; /**< VMBUS info. */
> +	};
> +
>  	/** User application callbacks for NIC interrupts */
>  	struct rte_eth_dev_cb_list link_intr_cbs;
>  	/**
> @@ -1866,7 +1871,11 @@ typedef int (*eth_dev_uninit_t)(struct rte_eth_dev *eth_dev);
>   * - The size of the private data to allocate for each matching device.
>   */
>  struct eth_driver {
> -	struct rte_pci_driver pci_drv;    /**< The PMD is also a PCI driver. */
> +	union {
> +		struct rte_pci_driver pci_drv;    /**< The PMD PCI driver. */
> +		struct rte_vmbus_driver vmbus_drv;/**< The PMD VMBUS drv. */
> +	};
> +
>  	eth_dev_init_t eth_dev_init;      /**< Device init function. */
>  	eth_dev_uninit_t eth_dev_uninit;  /**< Device uninit function. */
>  	unsigned int dev_private_size;    /**< Size of device private data. */

It is not a scale-able model where we have to change eth_driver/eth_dev 
for every new device type, other than PCI. Maybe VMBus is _very_ close 
to PCI so no changes are required in PCI layer (common, linuxapp, 
bsdapp) - but, for others it won't stop there.

At the least, rte_pci_driver/rte_pci_device should be removed from 
eth_driver & rte_eth_dev, respectively - relying on rte_driver and 
rte_device.

This is the primary reason work on the SoC patchset and now the new Bus 
model is being done.

> @@ -4383,6 +4392,21 @@ int rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,
>   */
>  int rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev);
>
> +/**
> + * @internal
> + * Wrapper for use by vmbus drivers as a .probe function to attach to a ethdev
> + * interface.
> + */
> +int rte_eth_dev_vmbus_probe(struct rte_vmbus_driver *vmbus_drv,
> +			  struct rte_vmbus_device *vmbus_dev);
> +
> +/**
> + * @internal
> + * Wrapper for use by vmbus drivers as a .remove function to detach a ethdev
> + * interface.
> + */
> +int rte_eth_dev_vmbus_remove(struct rte_vmbus_device *vmbus_dev);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index f75f0e2..6b30408 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -130,6 +130,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
>  endif # $(CONFIG_RTE_LIBRTE_VHOST)
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD)    += -lrte_pmd_vmxnet3_uio
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_HV_PMD)	    += -luuid
>
>  ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y)
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)    += -lrte_pmd_aesni_mb
>

-
Shreyansh

^ permalink raw reply

* Re: [PATCH 08/32] mk/dpaa2: add the crc support to the machine type
From: Jerin Jacob @ 2016-12-15  6:35 UTC (permalink / raw)
  To: Hemant Agrawal; +Cc: dev, thomas.monjalon, bruce.richardson, shreyansh.jain
In-Reply-To: <1480875447-23680-9-git-send-email-hemant.agrawal@nxp.com>

On Sun, Dec 04, 2016 at 11:47:03PM +0530, Hemant Agrawal wrote:
> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

> ---
>  mk/machine/dpaa2/rte.vars.mk | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/mk/machine/dpaa2/rte.vars.mk b/mk/machine/dpaa2/rte.vars.mk
> index 8541633..e4735c2 100644
> --- a/mk/machine/dpaa2/rte.vars.mk
> +++ b/mk/machine/dpaa2/rte.vars.mk
> @@ -1,6 +1,7 @@
>  #   BSD LICENSE
>  #
> -#   Copyright(c) 2016 Freescale Semiconductor, Inc. All rights reserved.
> +#   Copyright (c) 2016 Freescale Semiconductor, Inc. All rights reserved.
> +#   Copyright (c) 2016 NXP. All rights reserved.
>  #
>  #   Redistribution and use in source and binary forms, with or without
>  #   modification, are permitted provided that the following conditions
> @@ -53,7 +54,7 @@
>  # CPU_CFLAGS =
>  # CPU_LDFLAGS =
>  # CPU_ASFLAGS =
> -MACHINE_CFLAGS += -march=armv8-a
> +MACHINE_CFLAGS += -march=armv8-a+crc
>  
>  ifdef CONFIG_RTE_ARCH_ARM_TUNE
>  MACHINE_CFLAGS += -mcpu=$(CONFIG_RTE_ARCH_ARM_TUNE)
> -- 
> 1.9.1
> 

^ permalink raw reply

* Re: [PATCH 17/32] net/dpaa2: dpbp based mempool hw offload driver
From: Shreyansh Jain @ 2016-12-15  6:37 UTC (permalink / raw)
  To: Jerin Jacob, Hemant Agrawal; +Cc: dev, thomas.monjalon, bruce.richardson
In-Reply-To: <20161215060927.GB19354@localhost.localdomain>

On Thursday 15 December 2016 11:39 AM, Jerin Jacob wrote:
> On Sun, Dec 04, 2016 at 11:47:12PM +0530, Hemant Agrawal wrote:
>> DPBP represent a buffer pool instance in DPAA2-QBMAN
>> HW accelerator.
>>
>> All buffers needs to be programmed in the HW accelerator.
>>
>> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>> ---
>>  config/defconfig_arm64-dpaa2-linuxapp-gcc |   5 +
>>  drivers/net/dpaa2/Makefile                |   2 +
>>  drivers/net/dpaa2/base/dpaa2_hw_dpbp.c    | 366 ++++++++++++++++++++++++++++++
>>  drivers/net/dpaa2/base/dpaa2_hw_dpbp.h    | 101 +++++++++
>>  drivers/net/dpaa2/base/dpaa2_hw_pvt.h     |   7 +
>
>
> How about moving the external mempool driver to RTE_SDK/driver/pool.
> We are planning to push our external mempool driver to driver/pool.

I really like the idea of this separation:

So,
..drivers/net/<all PMDs>
..drivers/crypto/<all crypto PMDs>
..drivers/bus/<all bus handlers/drivers>
..drivers/pool/<all Pool handlers/drivers>

only concern I see for now is resolving dependency of symbols across 
this structure. for example, DPAA2 Pool would be dependent on some DPAA2 
specific objects - which then are again used in crypto/ and net/.

It is possible to have drivers/common (which DPAA2 PMD patchset is 
already doing). How are you doing that?

>
>>  drivers/net/dpaa2/dpaa2_vfio.c            |  13 +-
>>  6 files changed, 493 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/net/dpaa2/base/dpaa2_hw_dpbp.c
>>  create mode 100644 drivers/net/dpaa2/base/dpaa2_hw_dpbp.h
>>
>

^ permalink raw reply

* Re: [PATCH 16/32] net/dpaa2: dpio add support to check SOC type
From: Jerin Jacob @ 2016-12-15  6:34 UTC (permalink / raw)
  To: Hemant Agrawal; +Cc: dev, thomas.monjalon, bruce.richardson, shreyansh.jain
In-Reply-To: <1480875447-23680-17-git-send-email-hemant.agrawal@nxp.com>

On Sun, Dec 04, 2016 at 11:47:11PM +0530, Hemant Agrawal wrote:
> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> ---
>  drivers/net/dpaa2/base/dpaa2_hw_dpio.c | 74 ++++++++++++++++++++++++++++++++++
>  1 file changed, 74 insertions(+)
> 
> diff --git a/drivers/net/dpaa2/base/dpaa2_hw_dpio.c b/drivers/net/dpaa2/base/dpaa2_hw_dpio.c
> index 9c6eb96..3b8f87d 100644
> --- a/drivers/net/dpaa2/base/dpaa2_hw_dpio.c
> +++ b/drivers/net/dpaa2/base/dpaa2_hw_dpio.c
> @@ -70,6 +70,18 @@
>  static struct dpio_device_list *dpio_dev_list; /*!< DPIO device list */
>  static uint32_t io_space_count;
>  
> +#define ARM_CORTEX_A53		0xD03
> +#define ARM_CORTEX_A57		0xD07
> +#define ARM_CORTEX_A72		0xD08

May not be good idea to have generic ARM part number definition in driver
file.

> +
> +static int dpaa2_soc_core = ARM_CORTEX_A72;
> +
> +#define NXP_LS2085	1
> +#define NXP_LS2088	2
> +#define NXP_LS1088	3
> +
> +static int dpaa2_soc_family  = NXP_LS2088;
> +
>  /*Stashing Macros default for LS208x*/
>  static int dpaa2_core_cluster_base = 0x04;
>  static int dpaa2_cluster_sz = 2;
> @@ -101,6 +113,58 @@
>  	return dpaa2_core_cluster_base + x;
>  }
>  
> +static int cpuinfo_arm(FILE *file)
> +{
> +	char str[128], *pos;
> +	int part = -1;
> +
> +	#define ARM_CORTEX_A53_INFO	"Cortex-A53"
> +	#define ARM_CORTEX_A57_INFO	"Cortex-A57"
> +	#define ARM_CORTEX_A72_INFO	"Cortex-A72"
> +
> +	while (fgets(str, sizeof(str), file) != NULL) {
> +		if (part >= 0)
> +			break;
> +		pos = strstr(str, "CPU part");
> +		if (pos != NULL) {
> +			pos = strchr(pos, ':');
> +			if (pos != NULL)
> +				sscanf(++pos, "%x", &part);
> +		}
> +	}
> +
> +	dpaa2_soc_core = part;
> +	if (part == ARM_CORTEX_A53) {
> +		dpaa2_soc_family = NXP_LS1088;
> +		printf("\n########## Detected NXP LS108x with %s\n",
> +		       ARM_CORTEX_A53_INFO);
> +	} else if (part == ARM_CORTEX_A57) {
> +		dpaa2_soc_family = NXP_LS2085;
> +		printf("\n########## Detected NXP LS208x Rev1.0 with %s\n",
> +		       ARM_CORTEX_A57_INFO);
> +	} else if (part == ARM_CORTEX_A72) {
> +		dpaa2_soc_family = NXP_LS2088;
> +		printf("\n########## Detected NXP LS208x with %s\n",
> +		       ARM_CORTEX_A72_INFO);
> +	}
> +	return 0;
> +}
> +
> +static void
> +check_cpu_part(void)
> +{
> +	FILE *stream;
> +
> +	stream = fopen("/proc/cpuinfo", "r");
> +	if (!stream) {
> +		PMD_INIT_LOG(WARNING, "Unable to open /proc/cpuinfo\n");
> +		return;
> +	}
> +	cpuinfo_arm(stream);
> +
> +	fclose(stream);
> +}
> +
>  static int
>  configure_dpio_qbman_swp(struct dpaa2_dpio_dev *dpio_dev)
>  {
> @@ -326,6 +390,16 @@ static inline struct dpaa2_dpio_dev *dpaa2_get_qbman_swp(void)
>  {
>  	struct dpaa2_dpio_dev *dpio_dev;
>  	struct vfio_region_info reg_info = { .argsz = sizeof(reg_info)};
> +	static int first_time;
> +
> +	if (!first_time) {
> +		check_cpu_part();
> +		if (dpaa2_soc_family == NXP_LS1088) {
> +			dpaa2_core_cluster_base = 0x02;
> +			dpaa2_cluster_sz = 4;
Can this device configuration information passed through dt/the means
where you are populating the fsl bus for dpio ?

if not arm64 cpu part identification code can go in arm64 common
code. Even better if we have EAL API for same. Looks like x86 similar
attribute called "model"

^ permalink raw reply

* Re: [PATCH 17/32] net/dpaa2: dpbp based mempool hw offload driver
From: Jerin Jacob @ 2016-12-15  6:09 UTC (permalink / raw)
  To: Hemant Agrawal; +Cc: dev, thomas.monjalon, bruce.richardson, shreyansh.jain
In-Reply-To: <1480875447-23680-18-git-send-email-hemant.agrawal@nxp.com>

On Sun, Dec 04, 2016 at 11:47:12PM +0530, Hemant Agrawal wrote:
> DPBP represent a buffer pool instance in DPAA2-QBMAN
> HW accelerator.
> 
> All buffers needs to be programmed in the HW accelerator.
> 
> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> ---
>  config/defconfig_arm64-dpaa2-linuxapp-gcc |   5 +
>  drivers/net/dpaa2/Makefile                |   2 +
>  drivers/net/dpaa2/base/dpaa2_hw_dpbp.c    | 366 ++++++++++++++++++++++++++++++
>  drivers/net/dpaa2/base/dpaa2_hw_dpbp.h    | 101 +++++++++
>  drivers/net/dpaa2/base/dpaa2_hw_pvt.h     |   7 +


How about moving the external mempool driver to RTE_SDK/driver/pool.
We are planning to push our external mempool driver to driver/pool.

>  drivers/net/dpaa2/dpaa2_vfio.c            |  13 +-
>  6 files changed, 493 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/dpaa2/base/dpaa2_hw_dpbp.c
>  create mode 100644 drivers/net/dpaa2/base/dpaa2_hw_dpbp.h
> 

^ permalink raw reply

* Re: [PATCH 02/32] drivers/common: introducing dpaa2 mc driver
From: Jerin Jacob @ 2016-12-15  6:04 UTC (permalink / raw)
  To: Hemant Agrawal
  Cc: dev, thomas.monjalon, bruce.richardson, shreyansh.jain,
	Cristian Sovaiala
In-Reply-To: <1480875447-23680-3-git-send-email-hemant.agrawal@nxp.com>

On Sun, Dec 04, 2016 at 11:46:57PM +0530, Hemant Agrawal wrote:
> This patch intoduces the DPAA2 MC(Management complex Driver)
> 
> This driver is common to be used by various DPAA2 net, crypto
> and other drivers
> 
> Signed-off-by: Cristian Sovaiala <cristian.sovaiala@nxp.com>
> [Hemant:rebase and conversion to library for DPDK]
> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> +#ifndef _FSL_MC_SYS_H
> +#define _FSL_MC_SYS_H
> +
> +#ifdef __linux_driver__
> +
> +#include <linux/errno.h>
> +#include <asm/io.h>
> +#include <linux/slab.h>
> +
> +struct fsl_mc_io {
> +	void *regs;
> +};
> +
> +#ifndef ENOTSUP
> +#define ENOTSUP		95
> +#endif
> +
> +#define ioread64(_p)	    readq(_p)
> +#define iowrite64(_v, _p)   writeq(_v, _p)
> +
> +#else /* __linux_driver__ */
> +
> +#include <stdio.h>
> +#include <libio.h>
> +#include <stdint.h>
> +#include <errno.h>
> +#include <sys/uio.h>
> +#include <linux/byteorder/little_endian.h>
> +
> +#define cpu_to_le64(x) __cpu_to_le64(x)
> +#ifndef dmb
> +#define dmb() {__asm__ __volatile__("" : : : "memory"); }
> +#endif

Better to use DPDK macros here.

> +#define __iormb()       dmb()
> +#define __iowmb()       dmb()
> +#define __arch_getq(a)                  (*(volatile unsigned long *)(a))
> +#define __arch_putq(v, a)                (*(volatile unsigned long *)(a) = (v))
> +#define __arch_putq32(v, a)                (*(volatile unsigned int *)(a) = (v))
> +#define readq(c)        \
> +	({ uint64_t __v = __arch_getq(c); __iormb(); __v; })
> +#define writeq(v, c)     \
> +	({ uint64_t __v = v; __iowmb(); __arch_putq(__v, c); __v; })
> +#define writeq32(v, c) \
> +	({ uint32_t __v = v; __iowmb(); __arch_putq32(__v, c); __v; })
> +#define ioread64(_p)	    readq(_p)
> +#define iowrite64(_v, _p)   writeq(_v, _p)
> +#define iowrite32(_v, _p)   writeq32(_v, _p)

Hopefully, we can clean all this once rte_read32 and rte_write32 becomes
mainline

http://dpdk.org/dev/patchwork/patch/17935/

> +#define __iomem
> +
> +struct fsl_mc_io {
> +	void *regs;
> +};
> +
> +#ifndef ENOTSUP
> +#define ENOTSUP		95
> +#endif
> +
> +/*GPP is supposed to use MC commands with low priority*/
> +#define CMD_PRI_LOW          0 /*!< Low Priority command indication */
> +
> +struct mc_command;
> +
> +int mc_send_command(struct fsl_mc_io *mc_io, struct mc_command *cmd);
> +
> +#endif /* __linux_driver__ */
> +
> +#endif /* _FSL_MC_SYS_H */
> +
> +/** User space framework uses MC Portal in shared mode. Following change
> +* introduces lock in MC FLIB
> +*/
> +
> +/**
> +* The mc_spinlock_t type.
> +*/
> +typedef struct {
> +	volatile int locked; /**< lock status 0 = unlocked, 1 = locked */
> +} mc_spinlock_t;
> +
> +/**
> +* A static spinlock initializer.
> +*/
> +static mc_spinlock_t mc_portal_lock = { 0 };
> +
> +static inline void mc_pause(void) {}
> +
> +static inline void mc_spinlock_lock(mc_spinlock_t *sl)
> +{
> +	while (__sync_lock_test_and_set(&sl->locked, 1))
> +		while (sl->locked)
> +			mc_pause();
> +}
> +
> +static inline void mc_spinlock_unlock(mc_spinlock_t *sl)
> +{
> +	__sync_lock_release(&sl->locked);
> +}
> +

DPDK spinlock can be used here.

^ permalink raw reply

* Re: [PATCH 27/28] net/vmxnet3: use eal I/O device memory read/write API
From: Santosh Shukla @ 2016-12-15  5:48 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Jerin Jacob, dev, konstantin.ananyev, thomas.monjalon,
	bruce.richardson, jianbo.liu, viktorin, Yong Wang
In-Reply-To: <20161214025534.GG18991@yliu-dev.sh.intel.com>

On Wed, Dec 14, 2016 at 10:55:34AM +0800, Yuanhan Liu wrote:
> On Wed, Dec 14, 2016 at 07:25:57AM +0530, Jerin Jacob wrote:
> > From: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> > 
> > Replace the raw I/O device memory read/write access with eal
> > abstraction for I/O device memory read/write access to fix
> > portability issues across different architectures.
> > 
> > Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > CC: Yong Wang <yongwang@vmware.com>
> > ---
> >  drivers/net/vmxnet3/vmxnet3_ethdev.h | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.h b/drivers/net/vmxnet3/vmxnet3_ethdev.h
> > index 7d3b11e..5b6501b 100644
> > --- a/drivers/net/vmxnet3/vmxnet3_ethdev.h
> > +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.h
> > @@ -34,6 +34,8 @@
> >  #ifndef _VMXNET3_ETHDEV_H_
> >  #define _VMXNET3_ETHDEV_H_
> >  
> > +#include <rte_io.h>
> > +
> >  #define VMXNET3_MAX_MAC_ADDRS 1
> >  
> >  /* UPT feature to negotiate */
> > @@ -120,7 +122,11 @@ struct vmxnet3_hw {
> >  
> >  /* Config space read/writes */
> >  
> > -#define VMXNET3_PCI_REG(reg) (*((volatile uint32_t *)(reg)))
> > +#define VMXNET3_PCI_REG(reg) ({		\
> > +	uint32_t __val;			\
> > +	__val = rte_readl(reg);		\
> > +	__val;				\
> > +})
> 
> Why not simply using rte_readl directly?
> 
> 	#define VMXNET3_PCI_REG(reg)	rte_readl(reg)
>

Ok.

> >  
> >  static inline uint32_t
> >  vmxnet3_read_addr(volatile void *addr)
> > @@ -128,9 +134,9 @@ vmxnet3_read_addr(volatile void *addr)
> >  	return VMXNET3_PCI_REG(addr);
> >  }
> >  
> > -#define VMXNET3_PCI_REG_WRITE(reg, value) do { \
> > -	VMXNET3_PCI_REG((reg)) = (value); \
> > -} while(0)
> > +#define VMXNET3_PCI_REG_WRITE(reg, value) ({	\
> > +	rte_writel(value, reg);			\
> > +})
> 
> I think this could be done in one line.
>

Ok.
will take care in V2.

> 	--yliu

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox