Netdev List
 help / color / mirror / Atom feed
* Re: Face some error after applying commit 7dfa4b414d4(net/mlx4_en: Code cleanups in tx path)
From: Eric Dumazet @ 2014-11-10  2:46 UTC (permalink / raw)
  To: Wei Yang; +Cc: Eric Dumazet, Amir Vadai, David Miller, netdev
In-Reply-To: <20141110015933.GB6294@richard>

On Mon, 2014-11-10 at 09:59 +0800, Wei Yang wrote:
> On Fri, Nov 07, 2014 at 07:38:15PM -0800, Eric Dumazet wrote:
> >On Fri, Nov 7, 2014 at 6:57 PM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
> >> Eric and Amir
> >>
> >> I am testing the VF on PowerNV platform with 3.18-rc2.
> >> After applying this patch I face some errors.
> >>
> >> First is the compiling error.
> >>
> >>     drivers/net/ethernet/mellanox/mlx4//en_tx.c: In function ‘mlx4_en_xmit’:
> >>     drivers/net/ethernet/mellanox/mlx4//en_tx.c:802:8: error: ‘shinfo’ undeclared (first use in this function)
> >>             shinfo->tx_flags & SKBTX_HW_TSTAMP)) {
> >>             ^
> >>     include/linux/compiler.h:160:42: note: in definition of macro ‘unlikely’
> >>      # define unlikely(x) __builtin_expect(!!(x), 0)
> >>                                               ^
> >>     drivers/net/ethernet/mellanox/mlx4//en_tx.c:802:8: note: each undeclared identifier is reported only once for each function it appears in
> >>             shinfo->tx_flags & SKBTX_HW_TSTAMP)) {
> >>             ^
> >>     include/linux/compiler.h:160:42: note: in definition of macro ‘unlikely’
> >>      # define unlikely(x) __builtin_expect(!!(x), 0)
> >>                                               ^
> >>     make[1]: *** [drivers/net/ethernet/mellanox/mlx4//en_tx.o] Error 1
> >>     make: *** [_module_drivers/net/ethernet/mellanox/mlx4/] Error 2
> >>
> >
> >
> >This compilation error seems strange.
> >
> >Are you sure your tree is pristine, not corrupted in any way ?
> 
> I believe I did the revert one by one with git revert.
> 
> >
> >
> >> I tried to fix this with following change:
> >>
> >>     [root@tian-lp1 3.18]# git diff
> >>     diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/m
> >>     index eaf23eb..d2f06a7 100644
> >>     --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> >>     +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> >>     @@ -799,8 +799,8 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_dev
> >>              * set flag for further reference
> >>              */
> >>             if (unlikely(ring->hwtstamp_tx_type == HWTSTAMP_TX_ON &&
> >>     -                    shinfo->tx_flags & SKBTX_HW_TSTAMP)) {
> >>     -               shinfo->tx_flags |= SKBTX_IN_PROGRESS;
> >>     +                    skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) {
> >>     +               skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
> >>                     tx_info->ts_requested = 1;
> >>             }
> >>
> >> But seems to face another error.
> >>
> >
> >I suspect your tree is not the official tree, I do not see how you got
> >this compilation error.
> 
> 
> I checked the upstream git tree again, and find this commit:
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7dfa4b414d4eec8da56e44fb2b4aea3e549b092f
> 
> 
> And I want to say the shinfo local variable is introduced in commit:
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b9d8839a44092cb4268ef2813c34d5dbf3363603
> 
> And in my log tree, also checked the upstream, this one is applyed after the
> first one. And the compiling error will disappear untill I apply this one.
> 
> So this compiling issue can't reproduced at your side? You have reset --hard
> to the "Code cleanup" one, and can't see the error? That is strange.
> 

Okay, your message was not clear : I thought you had a compilation error
on current tree.

The true story of these patches is that Mellanox split an initial big
chunk [1] I gave into multiple patches.

Maybe they missed that one patch did not actually compile.

[1] https://patchwork.ozlabs.org/patch/394256/

Now, it is done, there is nothing we can do.

I'll let Mellanox comment, but it looks like your hardware does not like
something.

Have you tried to disable Blue Frame ?

^ permalink raw reply

* linux-next: manual merge of the tiny tree with the net-next tree
From: Stephen Rothwell @ 2014-11-10  3:25 UTC (permalink / raw)
  To: Josh Triplett, David Miller, netdev
  Cc: linux-next, linux-kernel, Hannes Frederic Sowa, Catalina Mocanu

[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]

Hi Josh,

Today's linux-next merge of the tiny tree got a conflict in
lib/Makefile between commit e5a2c8999576 ("fast_hash: avoid indirect
function calls") from the net-next tree and commit 4ecea0db79ef ("lib:
rhashtable: Make rhashtable.c optional") from the tiny tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc lib/Makefile
index 04e53dd16070,47b8305288e2..000000000000
--- a/lib/Makefile
+++ b/lib/Makefile
@@@ -22,11 -22,14 +22,14 @@@ lib-$(CONFIG_SMP) += cpumask.
  lib-y	+= kobject.o klist.o
  obj-y	+= lockref.o
  
- obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
+ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
  	 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
- 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \
+ 	 gcd.o lcm.o list_sort.o uuid.o iovec.o clz_ctz.o \
  	 bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
- 	 percpu-refcount.o percpu_ida.o rhashtable.o
 -	 percpu-refcount.o percpu_ida.o hash.o
++	 percpu-refcount.o percpu_ida.o
+ obj-$(CONFIG_FLEX_ARRAY) += flex_array.o
+ obj-$(CONFIG_HALFMD4) += halfmd4.o
+ obj-$(CONFIG_RHASHTABLE) += rhashtable.o
  obj-y += string_helpers.o
  obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
  obj-y += kstrtox.o

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* RE: [PATCH net-next 2/2] r8152: adjust rtl_start_rx
From: Hayes Wang @ 2014-11-10  3:29 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, nic_swsd, linux-kernel@vger.kernel.org,
	linux-usb@vger.kernel.org
In-Reply-To: <20141107.113522.837502028522211960.davem@davemloft.net>

 David Miller [mailto:davem@davemloft.net] 
> Sent: Saturday, November 08, 2014 12:35 AM
[...]
> Does this even work?
> 
> If you leave a hole in the ring, the device is going to stop there
> anyways.

Excuse me. I don't sure I understand your meaning clearly.

The behavior is different for PCI(e) and USB ethernet device.
The PCI nic could know the ring buffer by certain way, so
the device could fill the data into the buffer one by one
automatically. However, for usb nic, the driver has to
indicate (i.e. submit) each buffer for each data. The device
doesn't know what is the next buffer by itself. That is,
the driver determines the order by which the data would be
filled.

Therefore, when I try to submit 10 rx buffers and some of
them fail, I could get the data depending on the order of
the successful ones. Besides, the driver has to submit the
buffer for next data continually, so the previous unsuccessful
ones could be tried again for the same time.
 
Best Regards,
Hayes

^ permalink raw reply

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
From: Jamal Hadi Salim @ 2014-11-10  3:31 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415530280-9190-1-git-send-email-jiri@resnulli.us>

Jiri,

I am hoping you have considered what Ben Lahaise's, John Fastabend's,
and Roopa's patches after all those discussions and
meetings we have had (in which you promised you will merge patches
in). I am not seeing much of that here or mention of anything of that
sort.
At least please get their sign on - this  is such an important piece of
new work that you should make sure you get consensus.
Otherwise we are back to square one and everyone is going their way with
their patches;

Ben/Roopa/John - please issue either a signed-off as well
if you agree with this approach otherwise i am hoping none of these
patches are merged in.

I will look at the patches and comment.

cheers,
jamal

On 11/09/14 05:51, Jiri Pirko wrote:
> Hi all.
>
> This patchset is just the first phase of switch and switch-ish device
> support api in kernel. Note that the api will extend (our complete work
> can be pulled from https://github.com/jpirko/net-next-rocker).
>
> So what this patchset includes:
> - introduce switchdev api for implementing switch drivers (so far
>    only linux bridge fdb offload is covered)
> - introduce rocker switch driver which implements switchdev api
>
> As to the discussion if there is need to have specific class of device
> representing the switch itself, so far we found no need to introduce that.
> But we are generally ok with the idea and when the time comes and it will
> be needed, it can be easily introduced without any disturbance.
>
> This patchset introduces switch id export through rtnetlink and sysfs,
> which is similar to what we have for port id in SR-IOV. I will send iproute2
> patchset for showing the switch id for port netdevs once this is applied.
>
> For detailed description, please see individual patches.
>
> v1->v2:
> - addressed all DaveM's comments
>
> Jiri Pirko (5):
>    net: rename netdev_phys_port_id to more generic name
>    net: introduce generic switch devices support
>    rtnl: expose physical switch id for particular device
>    net-sysfs: expose physical switch id for particular device
>    rocker: introduce rocker switch driver
>
> Scott Feldman (5):
>    bridge: introduce fdb offloading via switchdev
>    bridge: call netdev_sw_port_stp_update when bridge port STP status
>      changes
>    bridge: add API to notify bridge driver of learned FBD on offloaded
>      device
>    rocker: implement rocker ofdpa flow table manipulation
>    rocker: implement L2 bridge offloading
>
>   Documentation/networking/switchdev.txt           |   59 +
>   MAINTAINERS                                      |   14 +
>   drivers/net/ethernet/Kconfig                     |    1 +
>   drivers/net/ethernet/Makefile                    |    1 +
>   drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
>   drivers/net/ethernet/intel/i40e/i40e_main.c      |    2 +-
>   drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |    2 +-
>   drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |    2 +-
>   drivers/net/ethernet/rocker/Kconfig              |   27 +
>   drivers/net/ethernet/rocker/Makefile             |    5 +
>   drivers/net/ethernet/rocker/rocker.c             | 4182 ++++++++++++++++++++++
>   drivers/net/ethernet/rocker/rocker.h             |  427 +++
>   include/linux/if_bridge.h                        |   18 +
>   include/linux/netdevice.h                        |   48 +-
>   include/net/switchdev.h                          |   53 +
>   include/uapi/linux/if_link.h                     |    1 +
>   net/Kconfig                                      |    1 +
>   net/Makefile                                     |    3 +
>   net/bridge/br_fdb.c                              |   94 +-
>   net/bridge/br_netlink.c                          |    2 +
>   net/bridge/br_stp.c                              |    4 +
>   net/bridge/br_stp_if.c                           |    3 +
>   net/bridge/br_stp_timer.c                        |    2 +
>   net/core/dev.c                                   |    2 +-
>   net/core/net-sysfs.c                             |   26 +-
>   net/core/rtnetlink.c                             |   30 +-
>   net/switchdev/Kconfig                            |   13 +
>   net/switchdev/Makefile                           |    5 +
>   net/switchdev/switchdev.c                        |   93 +
>   29 files changed, 5104 insertions(+), 18 deletions(-)
>   create mode 100644 Documentation/networking/switchdev.txt
>   create mode 100644 drivers/net/ethernet/rocker/Kconfig
>   create mode 100644 drivers/net/ethernet/rocker/Makefile
>   create mode 100644 drivers/net/ethernet/rocker/rocker.c
>   create mode 100644 drivers/net/ethernet/rocker/rocker.h
>   create mode 100644 include/net/switchdev.h
>   create mode 100644 net/switchdev/Kconfig
>   create mode 100644 net/switchdev/Makefile
>   create mode 100644 net/switchdev/switchdev.c
>

^ permalink raw reply

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
From: Jamal Hadi Salim @ 2014-11-10  3:35 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415530280-9190-2-git-send-email-jiri@resnulli.us>


On 11/09/14 05:51, Jiri Pirko wrote:
> So this can be reused for identification of other "items" as well.
>




>
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 9dd0669..55dc4da 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -387,7 +387,7 @@ static ssize_t phys_port_id_show(struct device *dev,
>   		return restart_syscall();
>
>   	if (dev_isalive(netdev)) {
> -		struct netdev_phys_port_id ppid;
> +		struct netdev_phys_item_id ppid;
>
>   		ret = dev_get_phys_port_id(netdev, &ppid);
>   		if (!ret)
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index a688268..1087c6d 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -868,7 +868,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
>   	       + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
>   	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
>   	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
> -	       + nla_total_size(MAX_PHYS_PORT_ID_LEN); /* IFLA_PHYS_PORT_ID */
> +	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
>   }
>

[...]
>   static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
> @@ -952,7 +952,7 @@ static int rtnl_port_fill(struct sk_buff *skb, struct net_device *dev,
>   static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
>   {
>   	int err;
> -	struct netdev_phys_port_id ppid;
> +	struct netdev_phys_item_id ppid;
>
>   	err = dev_get_phys_port_id(dev, &ppid);
>   	if (err) {
> @@ -1196,7 +1196,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>   	[IFLA_PROMISCUITY]	= { .type = NLA_U32 },
>   	[IFLA_NUM_TX_QUEUES]	= { .type = NLA_U32 },
>   	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
> -	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },
> +	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>   	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
>   };


wouldnt this just break an existing ABI? You may need to introduce a new 
attribute.

cheers,
jamal

^ permalink raw reply

* Re: [patch net-next v2 03/10] rtnl: expose physical switch id for particular device
From: Jamal Hadi Salim @ 2014-11-10  3:43 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415530280-9190-4-git-send-email-jiri@resnulli.us>

On 11/09/14 05:51, Jiri Pirko wrote:
> The netdevice represents a port in a switch, it will expose
> IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value
> belong to one physical switch.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   include/uapi/linux/if_link.h |  1 +
>   net/core/rtnetlink.c         | 26 +++++++++++++++++++++++++-
>   2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 7072d83..4163753 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -145,6 +145,7 @@ enum {
>   	IFLA_CARRIER,
>   	IFLA_PHYS_PORT_ID,
>   	IFLA_CARRIER_CHANGES,
> +	IFLA_PHYS_SWITCH_ID,
>   	__IFLA_MAX
>   };
>


> @@ -1198,6 +1221,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>   	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
>   	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>   	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
> +	[IFLA_PHYS_SWITCH_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
>   };
>

Ok, looking at this compared to #1 i can see you are introducing 
IFLA_PHYS_SWITCH_ID but then why did you need to change 
IFLA_PHYS_PORT_ID earlier?

cheers,
jamal

>   static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
>

^ permalink raw reply

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
From: Simon Horman @ 2014-11-10  3:46 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl
In-Reply-To: <5460319B.2010605@mojatatu.com>

Hi Jamal, Hi Jiri,

On a somewhat related note I am also wondering what if any progress has
been made regarding discussions of (and code for) the following:

1. Exposing flow tables to user-space
   - I realise that this is Open vSwitch specific to some extent
     but I am in no way implying that it should be done instead of
     non-Open vSwitch specific work.
   - Jiri, IIRC this was part ~v2 of your earlier offload patchset

2. Describing Switch Hardware
   - I see John Fastabend moving forwards on this in his git repository
     https://github.com/jrfastab/flow-net-next

The way that I see things is that both of the above could be exposed via
netlink. And that the first at least could be backed by NDOs.  As such I
see this work as complementary and perhaps applying on top of this
patchset. If I am mistaken in this regards it would be good to know :)

I am of course also interested to know if the above are moving forwards.
To be clear I am very interested in being able to use these APIs to
perform Open vSwitch offloads and I am very happy to help.
(Jamal: I'm also interested in non-Open vSwitch offloads :)

On Sun, Nov 09, 2014 at 10:31:39PM -0500, Jamal Hadi Salim wrote:
> Jiri,
> 
> I am hoping you have considered what Ben Lahaise's, John Fastabend's,
> and Roopa's patches after all those discussions and
> meetings we have had (in which you promised you will merge patches
> in). I am not seeing much of that here or mention of anything of that
> sort.
> At least please get their sign on - this  is such an important piece of
> new work that you should make sure you get consensus.
> Otherwise we are back to square one and everyone is going their way with
> their patches;
> 
> Ben/Roopa/John - please issue either a signed-off as well
> if you agree with this approach otherwise i am hoping none of these
> patches are merged in.
> 
> I will look at the patches and comment.
> 
> cheers,
> jamal
> 
> On 11/09/14 05:51, Jiri Pirko wrote:
> >Hi all.
> >
> >This patchset is just the first phase of switch and switch-ish device
> >support api in kernel. Note that the api will extend (our complete work
> >can be pulled from https://github.com/jpirko/net-next-rocker).
> >
> >So what this patchset includes:
> >- introduce switchdev api for implementing switch drivers (so far
> >   only linux bridge fdb offload is covered)
> >- introduce rocker switch driver which implements switchdev api
> >
> >As to the discussion if there is need to have specific class of device
> >representing the switch itself, so far we found no need to introduce that.
> >But we are generally ok with the idea and when the time comes and it will
> >be needed, it can be easily introduced without any disturbance.
> >
> >This patchset introduces switch id export through rtnetlink and sysfs,
> >which is similar to what we have for port id in SR-IOV. I will send iproute2
> >patchset for showing the switch id for port netdevs once this is applied.
> >
> >For detailed description, please see individual patches.
> >
> >v1->v2:
> >- addressed all DaveM's comments
> >
> >Jiri Pirko (5):
> >   net: rename netdev_phys_port_id to more generic name
> >   net: introduce generic switch devices support
> >   rtnl: expose physical switch id for particular device
> >   net-sysfs: expose physical switch id for particular device
> >   rocker: introduce rocker switch driver
> >
> >Scott Feldman (5):
> >   bridge: introduce fdb offloading via switchdev
> >   bridge: call netdev_sw_port_stp_update when bridge port STP status
> >     changes
> >   bridge: add API to notify bridge driver of learned FBD on offloaded
> >     device
> >   rocker: implement rocker ofdpa flow table manipulation
> >   rocker: implement L2 bridge offloading
> >
> >  Documentation/networking/switchdev.txt           |   59 +
> >  MAINTAINERS                                      |   14 +
> >  drivers/net/ethernet/Kconfig                     |    1 +
> >  drivers/net/ethernet/Makefile                    |    1 +
> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
> >  drivers/net/ethernet/intel/i40e/i40e_main.c      |    2 +-
> >  drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |    2 +-
> >  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |    2 +-
> >  drivers/net/ethernet/rocker/Kconfig              |   27 +
> >  drivers/net/ethernet/rocker/Makefile             |    5 +
> >  drivers/net/ethernet/rocker/rocker.c             | 4182 ++++++++++++++++++++++
> >  drivers/net/ethernet/rocker/rocker.h             |  427 +++
> >  include/linux/if_bridge.h                        |   18 +
> >  include/linux/netdevice.h                        |   48 +-
> >  include/net/switchdev.h                          |   53 +
> >  include/uapi/linux/if_link.h                     |    1 +
> >  net/Kconfig                                      |    1 +
> >  net/Makefile                                     |    3 +
> >  net/bridge/br_fdb.c                              |   94 +-
> >  net/bridge/br_netlink.c                          |    2 +
> >  net/bridge/br_stp.c                              |    4 +
> >  net/bridge/br_stp_if.c                           |    3 +
> >  net/bridge/br_stp_timer.c                        |    2 +
> >  net/core/dev.c                                   |    2 +-
> >  net/core/net-sysfs.c                             |   26 +-
> >  net/core/rtnetlink.c                             |   30 +-
> >  net/switchdev/Kconfig                            |   13 +
> >  net/switchdev/Makefile                           |    5 +
> >  net/switchdev/switchdev.c                        |   93 +
> >  29 files changed, 5104 insertions(+), 18 deletions(-)
> >  create mode 100644 Documentation/networking/switchdev.txt
> >  create mode 100644 drivers/net/ethernet/rocker/Kconfig
> >  create mode 100644 drivers/net/ethernet/rocker/Makefile
> >  create mode 100644 drivers/net/ethernet/rocker/rocker.c
> >  create mode 100644 drivers/net/ethernet/rocker/rocker.h
> >  create mode 100644 include/net/switchdev.h
> >  create mode 100644 net/switchdev/Kconfig
> >  create mode 100644 net/switchdev/Makefile
> >  create mode 100644 net/switchdev/switchdev.c
> >
> 

^ permalink raw reply

* Re: [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev
From: Jamal Hadi Salim @ 2014-11-10  3:47 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415530280-9190-7-git-send-email-jiri@resnulli.us>

On 11/09/14 05:51, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> Add two new ndos: ndo_sw_port_fdb_add/del to offload static bridge
> fdb entries.  Static bridge FDB entries are installed, for example,
> using iproute2 bridge cmd:
>
>         bridge fdb add ADDR dev DEV master vlan VID
>
> This would install ADDR into the bridge's FDB for port DEV on vlan VID.  The
> switch driver implements two ndo_swdev ops to add/delete FDB entries in the
> switch device:
>
>         int ndo_sw_port_fdb_add(struct net_device *dev,
>                                 const unsigned char *addr,
>                                 u16 vid);
>
>         int ndo_sw_port_fdb_del(struct net_device *dev,
>                                 const unsigned char *addr,
>                                 u16 vid);
>
> The driver returns 0 on success, negative error code on failure.
>
> Note: the switch driver would not implement ndo_fdb_add/del/dump on a port
> netdev as these are intended for devices maintaining their own FDB.  In our
> case, we want the Linux bridge to own the FBD.
>
> Note: by default, the bridge does not filter on VLAN and only bridges untagged
> traffic.  To enable VLAN support, turn on VLAN filtering:
>
>        echo 1 >/sys/class/net/<bridge>/bridge/vlan_filtering
>

Sorry - why is the current fdb_add/del insufficient? It needs a vlanid
and the master/self flags should indicate intent to add to h/w vs s/w.

cheers,
jamal

^ permalink raw reply

* Re: [patch net-next v2 10/10] rocker: implement L2 bridge offloading
From: Jamal Hadi Salim @ 2014-11-10  3:53 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415530280-9190-11-git-send-email-jiri@resnulli.us>

On 11/09/14 05:51, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> Add L2 bridge offloading support to rocker driver.  Here, the Linux bridge
> driver is used to collect swdev ports into a tagged (or untagged) VLAN
> bridge.  The swdev will offload from the bridge driver the following L2
> bridging functions:
>
>   - Learning of neighbor MAC addresses on VLAN X  Learned mac/vlan is
> installed in bridge FDB.  (And removed when device unlearns mac/vlan).
> Learning must be turned off on each bridge port to disable the feature in
> the bridge driver.
>

I have quiet a few use cases where the above is a no-no. I dont want
learning of any sort (we have a knob for that in the bridge).
And i dont want learning in hardware to be reflected in software.
Basically this is a policy decision. Introduce a knob to choose whether
hardware learnt addresses should be reflected in software.


Have to run, but will comment when i get the time.

cheers,
jamal

^ permalink raw reply

* [GIT net-next] Open vSwitch
From: Pravin B Shelar @ 2014-11-10  3:58 UTC (permalink / raw)
  To: davem; +Cc: netdev

Following batch of patches brings feature parity between upstream
ovs and out of tree ovs module.

Two features are added, first adds support to export egress
tunnel information for a packet. This is used to improve
visibility in network traffic. Second feature allows userspace
vswitchd process to probe ovs module features. Other patches
are optimization and code cleanup.

----------------------------------------------------------------
The following changes since commit c0560b9c523341516eabf0f3b51832256caa7bbb:

  dccp: Convert DCCP_WARN to net_warn_ratelimited (2014-11-08 21:22:54 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch.git net_next_ovs

for you to fetch changes up to 05da5898a96c05e32aa9850c9cd89eef29471b13:

  openvswitch: Add support for OVS_FLOW_ATTR_PROBE. (2014-11-09 18:58:44 -0800)

----------------------------------------------------------------
Jarno Rajahalme (1):
      openvswitch: Add support for OVS_FLOW_ATTR_PROBE.

Pravin B Shelar (3):
      openvswitch: Export symbols as GPL symbols.
      openvswitch: Optimize recirc action.
      openvswitch: Remove redundant key ref from upcall_info.

Thomas Graf (1):
      openvswitch: Constify various function arguments

Wenyu Zhang (1):
      openvswitch: Extend packet attribute for egress tunnel info

 include/uapi/linux/openvswitch.h |  15 ++
 net/openvswitch/actions.c        | 180 ++++++++++++++------
 net/openvswitch/datapath.c       | 129 ++++++++------
 net/openvswitch/datapath.h       |  22 +--
 net/openvswitch/flow.c           |   8 +-
 net/openvswitch/flow.h           |  71 ++++++--
 net/openvswitch/flow_netlink.c   | 357 +++++++++++++++++++++++----------------
 net/openvswitch/flow_netlink.h   |  13 +-
 net/openvswitch/flow_table.c     |  12 +-
 net/openvswitch/flow_table.h     |   8 +-
 net/openvswitch/vport-geneve.c   |  23 ++-
 net/openvswitch/vport-gre.c      |  12 +-
 net/openvswitch/vport-netdev.c   |   2 +-
 net/openvswitch/vport-vxlan.c    |  24 ++-
 net/openvswitch/vport.c          |  81 +++++++--
 net/openvswitch/vport.h          |  20 ++-
 16 files changed, 664 insertions(+), 313 deletions(-)

^ permalink raw reply

* [PATCH net-next 1/6] openvswitch: Export symbols as GPL symbols.
From: Pravin B Shelar @ 2014-11-10  3:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar, Thomas Graf

vport can be compiled as modules, therefore openvswitch needs
to export few symbols. Export them as GPL symbols.

CC: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/openvswitch/datapath.c |  4 ++--
 net/openvswitch/vport.c    | 12 ++++++------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 014485e..6cfb44f 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -59,7 +59,7 @@
 #include "vport-netdev.h"
 
 int ovs_net_id __read_mostly;
-EXPORT_SYMBOL(ovs_net_id);
+EXPORT_SYMBOL_GPL(ovs_net_id);
 
 static struct genl_family dp_packet_genl_family;
 static struct genl_family dp_flow_genl_family;
@@ -131,7 +131,7 @@ int lockdep_ovsl_is_held(void)
 	else
 		return 1;
 }
-EXPORT_SYMBOL(lockdep_ovsl_is_held);
+EXPORT_SYMBOL_GPL(lockdep_ovsl_is_held);
 #endif
 
 static struct vport *new_vport(const struct vport_parms *);
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 8168ef0..4b5dd18 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -90,7 +90,7 @@ errout:
 	ovs_unlock();
 	return err;
 }
-EXPORT_SYMBOL(ovs_vport_ops_register);
+EXPORT_SYMBOL_GPL(ovs_vport_ops_register);
 
 void ovs_vport_ops_unregister(struct vport_ops *ops)
 {
@@ -98,7 +98,7 @@ void ovs_vport_ops_unregister(struct vport_ops *ops)
 	list_del(&ops->list);
 	ovs_unlock();
 }
-EXPORT_SYMBOL(ovs_vport_ops_unregister);
+EXPORT_SYMBOL_GPL(ovs_vport_ops_unregister);
 
 /**
  *	ovs_vport_locate - find a port that has already been created
@@ -165,7 +165,7 @@ struct vport *ovs_vport_alloc(int priv_size, const struct vport_ops *ops,
 
 	return vport;
 }
-EXPORT_SYMBOL(ovs_vport_alloc);
+EXPORT_SYMBOL_GPL(ovs_vport_alloc);
 
 /**
  *	ovs_vport_free - uninitialize and free vport
@@ -186,7 +186,7 @@ void ovs_vport_free(struct vport *vport)
 	free_percpu(vport->percpu_stats);
 	kfree(vport);
 }
-EXPORT_SYMBOL(ovs_vport_free);
+EXPORT_SYMBOL_GPL(ovs_vport_free);
 
 static struct vport_ops *ovs_vport_lookup(const struct vport_parms *parms)
 {
@@ -493,7 +493,7 @@ void ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
 	}
 	ovs_dp_process_packet(skb, &key);
 }
-EXPORT_SYMBOL(ovs_vport_receive);
+EXPORT_SYMBOL_GPL(ovs_vport_receive);
 
 /**
  *	ovs_vport_send - send a packet on a device
@@ -572,4 +572,4 @@ void ovs_vport_deferred_free(struct vport *vport)
 
 	call_rcu(&vport->rcu, free_vport_rcu);
 }
-EXPORT_SYMBOL(ovs_vport_deferred_free);
+EXPORT_SYMBOL_GPL(ovs_vport_deferred_free);
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 2/6] openvswitch: Extend packet attribute for egress tunnel info
From: Pravin B Shelar @ 2014-11-10  3:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Wenyu Zhang, Pravin B Shelar

From: Wenyu Zhang <wenyuz@vmware.com>

OVS vswitch has extended IPFIX exporter to export tunnel headers
to improve network visibility.
To export this information userspace needs to know egress tunnel
for given packet. By extending packet attributes datapath can
export egress tunnel info for given packet. So that userspace
can ask for egress tunnel info in userspace action. This
information is used to build IPFIX data for given flow.

Signed-off-by: Wenyu Zhang <wenyuz@vmware.com>
Acked-by: Romain Lenglet <rlenglet@vmware.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 include/uapi/linux/openvswitch.h | 13 +++++++++
 net/openvswitch/actions.c        | 19 ++++++++++++
 net/openvswitch/datapath.c       | 21 +++++++++++---
 net/openvswitch/datapath.h       |  2 ++
 net/openvswitch/flow.h           | 62 +++++++++++++++++++++++++++++++---------
 net/openvswitch/flow_netlink.c   | 54 +++++++++++++++++++++++++++-------
 net/openvswitch/flow_netlink.h   |  3 ++
 net/openvswitch/vport-geneve.c   | 21 +++++++++++++-
 net/openvswitch/vport-gre.c      | 12 +++++++-
 net/openvswitch/vport-vxlan.c    | 24 +++++++++++++++-
 net/openvswitch/vport.c          | 61 +++++++++++++++++++++++++++++++++++++++
 net/openvswitch/vport.h          | 14 +++++++++
 12 files changed, 275 insertions(+), 31 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 26c36c4..cf81856 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -157,6 +157,11 @@ enum ovs_packet_cmd {
  * notification if the %OVS_ACTION_ATTR_USERSPACE action specified an
  * %OVS_USERSPACE_ATTR_USERDATA attribute, with the same length and content
  * specified there.
+ * @OVS_PACKET_ATTR_EGRESS_TUN_KEY: Present for an %OVS_PACKET_CMD_ACTION
+ * notification if the %OVS_ACTION_ATTR_USERSPACE action specified an
+ * %OVS_USERSPACE_ATTR_EGRESS_TUN_PORT attribute, which is sent only if the
+ * output port is actually a tunnel port. Contains the output tunnel key
+ * extracted from the packet as nested %OVS_TUNNEL_KEY_ATTR_* attributes.
  *
  * These attributes follow the &struct ovs_header within the Generic Netlink
  * payload for %OVS_PACKET_* commands.
@@ -167,6 +172,8 @@ enum ovs_packet_attr {
 	OVS_PACKET_ATTR_KEY,         /* Nested OVS_KEY_ATTR_* attributes. */
 	OVS_PACKET_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* attributes. */
 	OVS_PACKET_ATTR_USERDATA,    /* OVS_ACTION_ATTR_USERSPACE arg. */
+	OVS_PACKET_ATTR_EGRESS_TUN_KEY,  /* Nested OVS_TUNNEL_KEY_ATTR_*
+					    attributes. */
 	__OVS_PACKET_ATTR_MAX
 };
 
@@ -315,6 +322,8 @@ enum ovs_tunnel_key_attr {
 	OVS_TUNNEL_KEY_ATTR_CSUM,               /* No argument. CSUM packet. */
 	OVS_TUNNEL_KEY_ATTR_OAM,                /* No argument. OAM frame.  */
 	OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS,        /* Array of Geneve options. */
+	OVS_TUNNEL_KEY_ATTR_TP_SRC,		/* be16 src Transport Port. */
+	OVS_TUNNEL_KEY_ATTR_TP_DST,		/* be16 dst Transport Port. */
 	__OVS_TUNNEL_KEY_ATTR_MAX
 };
 
@@ -480,11 +489,15 @@ enum ovs_sample_attr {
  * message should be sent.  Required.
  * @OVS_USERSPACE_ATTR_USERDATA: If present, its variable-length argument is
  * copied to the %OVS_PACKET_CMD_ACTION message as %OVS_PACKET_ATTR_USERDATA.
+ * @OVS_USERSPACE_ATTR_EGRESS_TUN_PORT: If present, u32 output port to get
+ * tunnel info.
  */
 enum ovs_userspace_attr {
 	OVS_USERSPACE_ATTR_UNSPEC,
 	OVS_USERSPACE_ATTR_PID,	      /* u32 Netlink PID to receive upcalls. */
 	OVS_USERSPACE_ATTR_USERDATA,  /* Optional user-specified cookie. */
+	OVS_USERSPACE_ATTR_EGRESS_TUN_PORT,  /* Optional, u32 output port
+					      * to get tunnel info. */
 	__OVS_USERSPACE_ATTR_MAX
 };
 
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index f7e5891..ceb618c 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -564,6 +564,7 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port)
 static int output_userspace(struct datapath *dp, struct sk_buff *skb,
 			    struct sw_flow_key *key, const struct nlattr *attr)
 {
+	struct ovs_tunnel_info info;
 	struct dp_upcall_info upcall;
 	const struct nlattr *a;
 	int rem;
@@ -572,6 +573,7 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
 	upcall.key = key;
 	upcall.userdata = NULL;
 	upcall.portid = 0;
+	upcall.egress_tun_info = NULL;
 
 	for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
 		 a = nla_next(a, &rem)) {
@@ -583,7 +585,24 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
 		case OVS_USERSPACE_ATTR_PID:
 			upcall.portid = nla_get_u32(a);
 			break;
+
+		case OVS_USERSPACE_ATTR_EGRESS_TUN_PORT: {
+			/* Get out tunnel info. */
+			struct vport *vport;
+
+			vport = ovs_vport_rcu(dp, nla_get_u32(a));
+			if (vport) {
+				int err;
+
+				err = ovs_vport_get_egress_tun_info(vport, skb,
+								    &info);
+				if (!err)
+					upcall.egress_tun_info = &info;
+			}
+			break;
 		}
+
+		} /* End of switch. */
 	}
 
 	return ovs_dp_upcall(dp, skb, &upcall);
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 6cfb44f..c2ac340 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -274,6 +274,7 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
 		upcall.key = key;
 		upcall.userdata = NULL;
 		upcall.portid = ovs_vport_find_upcall_portid(p, skb);
+		upcall.egress_tun_info = NULL;
 		error = ovs_dp_upcall(dp, skb, &upcall);
 		if (unlikely(error))
 			kfree_skb(skb);
@@ -375,7 +376,7 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
 	return err;
 }
 
-static size_t upcall_msg_size(const struct nlattr *userdata,
+static size_t upcall_msg_size(const struct dp_upcall_info *upcall_info,
 			      unsigned int hdrlen)
 {
 	size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
@@ -383,8 +384,12 @@ static size_t upcall_msg_size(const struct nlattr *userdata,
 		+ nla_total_size(ovs_key_attr_size()); /* OVS_PACKET_ATTR_KEY */
 
 	/* OVS_PACKET_ATTR_USERDATA */
-	if (userdata)
-		size += NLA_ALIGN(userdata->nla_len);
+	if (upcall_info->userdata)
+		size += NLA_ALIGN(upcall_info->userdata->nla_len);
+
+	/* OVS_PACKET_ATTR_EGRESS_TUN_KEY */
+	if (upcall_info->egress_tun_info)
+		size += nla_total_size(ovs_tun_key_attr_size());
 
 	return size;
 }
@@ -440,7 +445,7 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 	else
 		hlen = skb->len;
 
-	len = upcall_msg_size(upcall_info->userdata, hlen);
+	len = upcall_msg_size(upcall_info, hlen);
 	user_skb = genlmsg_new_unicast(len, &info, GFP_ATOMIC);
 	if (!user_skb) {
 		err = -ENOMEM;
@@ -461,6 +466,14 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 			  nla_len(upcall_info->userdata),
 			  nla_data(upcall_info->userdata));
 
+	if (upcall_info->egress_tun_info) {
+		nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_EGRESS_TUN_KEY);
+		err = ovs_nla_put_egress_tunnel_key(user_skb,
+						    upcall_info->egress_tun_info);
+		BUG_ON(err);
+		nla_nest_end(user_skb, nla);
+	}
+
 	/* Only reserve room for attribute header, packet data is added
 	 * in skb_zerocopy() */
 	if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0))) {
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 1c56a80..2bc577b 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -114,12 +114,14 @@ struct ovs_skb_cb {
  * @pid: Netlink PID to which packet should be sent.  If @pid is 0 then no
  * packet is sent and the packet is accounted in the datapath's @n_lost
  * counter.
+ * @egress_tun_info: If nonnull, becomes %OVS_PACKET_ATTR_EGRESS_TUN_KEY.
  */
 struct dp_upcall_info {
 	u8 cmd;
 	const struct sw_flow_key *key;
 	const struct nlattr *userdata;
 	u32 portid;
+	const struct ovs_tunnel_info *egress_tun_info;
 };
 
 /**
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 4962bee..543b358 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -37,8 +37,8 @@ struct sk_buff;
 
 /* Used to memset ovs_key_ipv4_tunnel padding. */
 #define OVS_TUNNEL_KEY_SIZE					\
-	(offsetof(struct ovs_key_ipv4_tunnel, ipv4_ttl) +	\
-	FIELD_SIZEOF(struct ovs_key_ipv4_tunnel, ipv4_ttl))
+	(offsetof(struct ovs_key_ipv4_tunnel, tp_dst) +		\
+	 FIELD_SIZEOF(struct ovs_key_ipv4_tunnel, tp_dst))
 
 struct ovs_key_ipv4_tunnel {
 	__be64 tun_id;
@@ -47,6 +47,8 @@ struct ovs_key_ipv4_tunnel {
 	__be16 tun_flags;
 	u8   ipv4_tos;
 	u8   ipv4_ttl;
+	__be16 tp_src;
+	__be16 tp_dst;
 } __packed __aligned(4); /* Minimize padding. */
 
 struct ovs_tunnel_info {
@@ -64,27 +66,59 @@ struct ovs_tunnel_info {
 			       FIELD_SIZEOF(struct sw_flow_key, tun_opts) - \
 			       opt_len))
 
-static inline void ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
-					  const struct iphdr *iph,
-					  __be64 tun_id, __be16 tun_flags,
-					  struct geneve_opt *opts,
-					  u8 opts_len)
+static inline void __ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
+					    __be32 saddr, __be32 daddr,
+					    u8 tos, u8 ttl,
+					    __be16 tp_src,
+					    __be16 tp_dst,
+					    __be64 tun_id,
+					    __be16 tun_flags,
+					    struct geneve_opt *opts,
+					    u8 opts_len)
 {
 	tun_info->tunnel.tun_id = tun_id;
-	tun_info->tunnel.ipv4_src = iph->saddr;
-	tun_info->tunnel.ipv4_dst = iph->daddr;
-	tun_info->tunnel.ipv4_tos = iph->tos;
-	tun_info->tunnel.ipv4_ttl = iph->ttl;
+	tun_info->tunnel.ipv4_src = saddr;
+	tun_info->tunnel.ipv4_dst = daddr;
+	tun_info->tunnel.ipv4_tos = tos;
+	tun_info->tunnel.ipv4_ttl = ttl;
 	tun_info->tunnel.tun_flags = tun_flags;
 
-	/* clear struct padding. */
-	memset((unsigned char *)&tun_info->tunnel + OVS_TUNNEL_KEY_SIZE, 0,
-	       sizeof(tun_info->tunnel) - OVS_TUNNEL_KEY_SIZE);
+	/* For the tunnel types on the top of IPsec, the tp_src and tp_dst of
+	 * the upper tunnel are used.
+	 * E.g: GRE over IPSEC, the tp_src and tp_port are zero.
+	 */
+	tun_info->tunnel.tp_src = tp_src;
+	tun_info->tunnel.tp_dst = tp_dst;
+
+	/* Clear struct padding. */
+	if (sizeof(tun_info->tunnel) != OVS_TUNNEL_KEY_SIZE)
+		memset((unsigned char *)&tun_info->tunnel + OVS_TUNNEL_KEY_SIZE,
+		       0, sizeof(tun_info->tunnel) - OVS_TUNNEL_KEY_SIZE);
 
 	tun_info->options = opts;
 	tun_info->options_len = opts_len;
 }
 
+static inline void ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
+					  const struct iphdr *iph,
+					  __be16 tp_src,
+					  __be16 tp_dst,
+					  __be64 tun_id,
+					  __be16 tun_flags,
+					  struct geneve_opt *opts,
+					  u8 opts_len)
+{
+	__ovs_flow_tun_info_init(tun_info, iph->saddr, iph->daddr,
+				 iph->tos, iph->ttl,
+				 tp_src, tp_dst,
+				 tun_id, tun_flags,
+				 opts, opts_len);
+}
+
+#define OVS_SW_FLOW_KEY_METADATA_SIZE			\
+	(offsetof(struct sw_flow_key, recirc_id) +	\
+	FIELD_SIZEOF(struct sw_flow_key, recirc_id))
+
 struct sw_flow_key {
 	u8 tun_opts[255];
 	u8 tun_opts_len;
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index ed31097..98a3e96 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -245,6 +245,24 @@ static bool match_validate(const struct sw_flow_match *match,
 	return true;
 }
 
+size_t ovs_tun_key_attr_size(void)
+{
+	/* Whenever adding new OVS_TUNNEL_KEY_ FIELDS, we should consider
+	 * updating this function.
+	 */
+	return    nla_total_size(8)    /* OVS_TUNNEL_KEY_ATTR_ID */
+		+ nla_total_size(4)    /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC */
+		+ nla_total_size(4)    /* OVS_TUNNEL_KEY_ATTR_IPV4_DST */
+		+ nla_total_size(1)    /* OVS_TUNNEL_KEY_ATTR_TOS */
+		+ nla_total_size(1)    /* OVS_TUNNEL_KEY_ATTR_TTL */
+		+ nla_total_size(0)    /* OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */
+		+ nla_total_size(0)    /* OVS_TUNNEL_KEY_ATTR_CSUM */
+		+ nla_total_size(0)    /* OVS_TUNNEL_KEY_ATTR_OAM */
+		+ nla_total_size(256)  /* OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS */
+		+ nla_total_size(2)    /* OVS_TUNNEL_KEY_ATTR_TP_SRC */
+		+ nla_total_size(2);   /* OVS_TUNNEL_KEY_ATTR_TP_DST */
+}
+
 size_t ovs_key_attr_size(void)
 {
 	/* Whenever adding new OVS_KEY_ FIELDS, we should consider
@@ -254,15 +272,7 @@ size_t ovs_key_attr_size(void)
 
 	return    nla_total_size(4)   /* OVS_KEY_ATTR_PRIORITY */
 		+ nla_total_size(0)   /* OVS_KEY_ATTR_TUNNEL */
-		  + nla_total_size(8)   /* OVS_TUNNEL_KEY_ATTR_ID */
-		  + nla_total_size(4)   /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC */
-		  + nla_total_size(4)   /* OVS_TUNNEL_KEY_ATTR_IPV4_DST */
-		  + nla_total_size(1)   /* OVS_TUNNEL_KEY_ATTR_TOS */
-		  + nla_total_size(1)   /* OVS_TUNNEL_KEY_ATTR_TTL */
-		  + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */
-		  + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_CSUM */
-		  + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_OAM */
-		  + nla_total_size(256) /* OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS */
+		  + ovs_tun_key_attr_size()
 		+ nla_total_size(4)   /* OVS_KEY_ATTR_IN_PORT */
 		+ nla_total_size(4)   /* OVS_KEY_ATTR_SKB_MARK */
 		+ nla_total_size(4)   /* OVS_KEY_ATTR_DP_HASH */
@@ -393,6 +403,8 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 			[OVS_TUNNEL_KEY_ATTR_TTL] = 1,
 			[OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT] = 0,
 			[OVS_TUNNEL_KEY_ATTR_CSUM] = 0,
+			[OVS_TUNNEL_KEY_ATTR_TP_SRC] = sizeof(u16),
+			[OVS_TUNNEL_KEY_ATTR_TP_DST] = sizeof(u16),
 			[OVS_TUNNEL_KEY_ATTR_OAM] = 0,
 			[OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS] = -1,
 		};
@@ -440,6 +452,14 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 		case OVS_TUNNEL_KEY_ATTR_CSUM:
 			tun_flags |= TUNNEL_CSUM;
 			break;
+		case OVS_TUNNEL_KEY_ATTR_TP_SRC:
+			SW_FLOW_KEY_PUT(match, tun_key.tp_src,
+					nla_get_be16(a), is_mask);
+			break;
+		case OVS_TUNNEL_KEY_ATTR_TP_DST:
+			SW_FLOW_KEY_PUT(match, tun_key.tp_dst,
+					nla_get_be16(a), is_mask);
+			break;
 		case OVS_TUNNEL_KEY_ATTR_OAM:
 			tun_flags |= TUNNEL_OAM;
 			break;
@@ -548,6 +568,12 @@ static int __ipv4_tun_to_nlattr(struct sk_buff *skb,
 	if ((output->tun_flags & TUNNEL_CSUM) &&
 	    nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_CSUM))
 		return -EMSGSIZE;
+	if (output->tp_src &&
+	    nla_put_be16(skb, OVS_TUNNEL_KEY_ATTR_TP_SRC, output->tp_src))
+		return -EMSGSIZE;
+	if (output->tp_dst &&
+	    nla_put_be16(skb, OVS_TUNNEL_KEY_ATTR_TP_DST, output->tp_dst))
+		return -EMSGSIZE;
 	if ((output->tun_flags & TUNNEL_OAM) &&
 	    nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_OAM))
 		return -EMSGSIZE;
@@ -559,7 +585,6 @@ static int __ipv4_tun_to_nlattr(struct sk_buff *skb,
 	return 0;
 }
 
-
 static int ipv4_tun_to_nlattr(struct sk_buff *skb,
 			      const struct ovs_key_ipv4_tunnel *output,
 			      const struct geneve_opt *tun_opts,
@@ -580,6 +605,14 @@ static int ipv4_tun_to_nlattr(struct sk_buff *skb,
 	return 0;
 }
 
+int ovs_nla_put_egress_tunnel_key(struct sk_buff *skb,
+				  const struct ovs_tunnel_info *egress_tun_info)
+{
+	return __ipv4_tun_to_nlattr(skb, &egress_tun_info->tunnel,
+				    egress_tun_info->options,
+				    egress_tun_info->options_len);
+}
+
 static int metadata_from_nlattrs(struct sw_flow_match *match,  u64 *attrs,
 				 const struct nlattr **a, bool is_mask)
 {
@@ -1653,6 +1686,7 @@ static int validate_userspace(const struct nlattr *attr)
 	static const struct nla_policy userspace_policy[OVS_USERSPACE_ATTR_MAX + 1] = {
 		[OVS_USERSPACE_ATTR_PID] = {.type = NLA_U32 },
 		[OVS_USERSPACE_ATTR_USERDATA] = {.type = NLA_UNSPEC },
+		[OVS_USERSPACE_ATTR_EGRESS_TUN_PORT] = {.type = NLA_U32 },
 	};
 	struct nlattr *a[OVS_USERSPACE_ATTR_MAX + 1];
 	int error;
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index eb0b177..90bbe37 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -37,6 +37,7 @@
 
 #include "flow.h"
 
+size_t ovs_tun_key_attr_size(void);
 size_t ovs_key_attr_size(void);
 
 void ovs_match_init(struct sw_flow_match *match,
@@ -49,6 +50,8 @@ int ovs_nla_get_flow_metadata(const struct nlattr *, struct sw_flow_key *);
 int ovs_nla_get_match(struct sw_flow_match *match,
 		      const struct nlattr *,
 		      const struct nlattr *);
+int ovs_nla_put_egress_tunnel_key(struct sk_buff *,
+				  const struct ovs_tunnel_info *);
 
 int ovs_nla_copy_actions(const struct nlattr *attr,
 			 const struct sw_flow_key *key,
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 70c9765..e31f19c 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -97,7 +97,9 @@ static void geneve_rcv(struct geneve_sock *gs, struct sk_buff *skb)
 
 	key = vni_to_tunnel_id(geneveh->vni);
 
-	ovs_flow_tun_info_init(&tun_info, ip_hdr(skb), key, flags,
+	ovs_flow_tun_info_init(&tun_info, ip_hdr(skb),
+			       udp_hdr(skb)->source, udp_hdr(skb)->dest,
+			       key, flags,
 			       geneveh->options, opts_len);
 
 	ovs_vport_receive(vport, skb, &tun_info);
@@ -228,6 +230,22 @@ static const char *geneve_get_name(const struct vport *vport)
 	return geneve_port->name;
 }
 
+static int geneve_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
+				      struct ovs_tunnel_info *egress_tun_info)
+{
+	struct geneve_port *geneve_port = geneve_vport(vport);
+	struct net *net = ovs_dp_get_net(vport->dp);
+	__be16 dport = inet_sk(geneve_port->gs->sock->sk)->inet_sport;
+	__be16 sport = udp_flow_src_port(net, skb, 1, USHRT_MAX, true);
+
+	/* Get tp_src and tp_dst, refert to geneve_build_header().
+	 */
+	return ovs_tunnel_get_egress_info(egress_tun_info,
+					  ovs_dp_get_net(vport->dp),
+					  OVS_CB(skb)->egress_tun_info,
+					  IPPROTO_UDP, skb->mark, sport, dport);
+}
+
 static struct vport_ops ovs_geneve_vport_ops = {
 	.type		= OVS_VPORT_TYPE_GENEVE,
 	.create		= geneve_tnl_create,
@@ -236,6 +254,7 @@ static struct vport_ops ovs_geneve_vport_ops = {
 	.get_options	= geneve_get_options,
 	.send		= geneve_tnl_send,
 	.owner          = THIS_MODULE,
+	.get_egress_tun_info	= geneve_get_egress_tun_info,
 };
 
 static int __init ovs_geneve_tnl_init(void)
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index 00270b6..8e61a5c 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -108,7 +108,7 @@ static int gre_rcv(struct sk_buff *skb,
 		return PACKET_REJECT;
 
 	key = key_to_tunnel_id(tpi->key, tpi->seq);
-	ovs_flow_tun_info_init(&tun_info, ip_hdr(skb), key,
+	ovs_flow_tun_info_init(&tun_info, ip_hdr(skb), 0, 0, key,
 			       filter_tnl_flags(tpi->flags), NULL, 0);
 
 	ovs_vport_receive(vport, skb, &tun_info);
@@ -284,12 +284,22 @@ static void gre_tnl_destroy(struct vport *vport)
 	gre_exit();
 }
 
+static int gre_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
+				   struct ovs_tunnel_info *egress_tun_info)
+{
+	return ovs_tunnel_get_egress_info(egress_tun_info,
+					  ovs_dp_get_net(vport->dp),
+					  OVS_CB(skb)->egress_tun_info,
+					  IPPROTO_GRE, skb->mark, 0, 0);
+}
+
 static struct vport_ops ovs_gre_vport_ops = {
 	.type		= OVS_VPORT_TYPE_GRE,
 	.create		= gre_create,
 	.destroy	= gre_tnl_destroy,
 	.get_name	= gre_get_name,
 	.send		= gre_tnl_send,
+	.get_egress_tun_info	= gre_get_egress_tun_info,
 	.owner		= THIS_MODULE,
 };
 
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 965e750..38f95a5 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -69,7 +69,9 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
 	/* Save outer tunnel values */
 	iph = ip_hdr(skb);
 	key = cpu_to_be64(ntohl(vx_vni) >> 8);
-	ovs_flow_tun_info_init(&tun_info, iph, key, TUNNEL_KEY, NULL, 0);
+	ovs_flow_tun_info_init(&tun_info, iph,
+			       udp_hdr(skb)->source, udp_hdr(skb)->dest,
+			       key, TUNNEL_KEY, NULL, 0);
 
 	ovs_vport_receive(vport, skb, &tun_info);
 }
@@ -189,6 +191,25 @@ error:
 	return err;
 }
 
+static int vxlan_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
+				     struct ovs_tunnel_info *egress_tun_info)
+{
+	struct net *net = ovs_dp_get_net(vport->dp);
+	struct vxlan_port *vxlan_port = vxlan_vport(vport);
+	__be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
+	__be16 src_port;
+	int port_min;
+	int port_max;
+
+	inet_get_local_port_range(net, &port_min, &port_max);
+	src_port = udp_flow_src_port(net, skb, 0, 0, true);
+
+	return ovs_tunnel_get_egress_info(egress_tun_info, net,
+					  OVS_CB(skb)->egress_tun_info,
+					  IPPROTO_UDP, skb->mark,
+					  src_port, dst_port);
+}
+
 static const char *vxlan_get_name(const struct vport *vport)
 {
 	struct vxlan_port *vxlan_port = vxlan_vport(vport);
@@ -202,6 +223,7 @@ static struct vport_ops ovs_vxlan_vport_ops = {
 	.get_name	= vxlan_get_name,
 	.get_options	= vxlan_get_options,
 	.send		= vxlan_tnl_send,
+	.get_egress_tun_info	= vxlan_get_egress_tun_info,
 	.owner		= THIS_MODULE,
 };
 
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 4b5dd18..630e819 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -573,3 +573,64 @@ void ovs_vport_deferred_free(struct vport *vport)
 	call_rcu(&vport->rcu, free_vport_rcu);
 }
 EXPORT_SYMBOL_GPL(ovs_vport_deferred_free);
+
+int ovs_tunnel_get_egress_info(struct ovs_tunnel_info *egress_tun_info,
+			       struct net *net,
+			       const struct ovs_tunnel_info *tun_info,
+			       u8 ipproto,
+			       u32 skb_mark,
+			       __be16 tp_src,
+			       __be16 tp_dst)
+{
+	const struct ovs_key_ipv4_tunnel *tun_key;
+	struct rtable *rt;
+	struct flowi4 fl;
+
+	if (unlikely(!tun_info))
+		return -EINVAL;
+
+	tun_key = &tun_info->tunnel;
+
+	/* Route lookup to get srouce IP address.
+	 * The process may need to be changed if the corresponding process
+	 * in vports ops changed.
+	 */
+	memset(&fl, 0, sizeof(fl));
+	fl.daddr = tun_key->ipv4_dst;
+	fl.saddr = tun_key->ipv4_src;
+	fl.flowi4_tos = RT_TOS(tun_key->ipv4_tos);
+	fl.flowi4_mark = skb_mark;
+	fl.flowi4_proto = IPPROTO_GRE;
+
+	rt = ip_route_output_key(net, &fl);
+	if (IS_ERR(rt))
+		return PTR_ERR(rt);
+
+	ip_rt_put(rt);
+
+	/* Generate egress_tun_info based on tun_info,
+	 * saddr, tp_src and tp_dst
+	 */
+	__ovs_flow_tun_info_init(egress_tun_info,
+				 fl.saddr, tun_key->ipv4_dst,
+				 tun_key->ipv4_tos,
+				 tun_key->ipv4_ttl,
+				 tp_src, tp_dst,
+				 tun_key->tun_id,
+				 tun_key->tun_flags,
+				 tun_info->options,
+				 tun_info->options_len);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ovs_tunnel_get_egress_info);
+
+int ovs_vport_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
+				  struct ovs_tunnel_info *info)
+{
+	/* get_egress_tun_info() is only implemented on tunnel ports. */
+	if (unlikely(!vport->ops->get_egress_tun_info))
+		return -EINVAL;
+
+	return vport->ops->get_egress_tun_info(vport, skb, info);
+}
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index e41c3fa..0635d1d 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -58,6 +58,16 @@ u32 ovs_vport_find_upcall_portid(const struct vport *, struct sk_buff *);
 
 int ovs_vport_send(struct vport *, struct sk_buff *);
 
+int ovs_tunnel_get_egress_info(struct ovs_tunnel_info *egress_tun_info,
+			       struct net *net,
+			       const struct ovs_tunnel_info *tun_info,
+			       u8 ipproto,
+			       u32 skb_mark,
+			       __be16 tp_src,
+			       __be16 tp_dst);
+int ovs_vport_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
+				  struct ovs_tunnel_info *info);
+
 /* The following definitions are for implementers of vport devices: */
 
 struct vport_err_stats {
@@ -146,6 +156,8 @@ struct vport_parms {
  * @get_name: Get the device's name.
  * @send: Send a packet on the device.  Returns the length of the packet sent,
  * zero for dropped packets or negative for error.
+ * @get_egress_tun_info: Get the egress tunnel 5-tuple and other info for
+ * a packet.
  */
 struct vport_ops {
 	enum ovs_vport_type type;
@@ -161,6 +173,8 @@ struct vport_ops {
 	const char *(*get_name)(const struct vport *);
 
 	int (*send)(struct vport *, struct sk_buff *);
+	int (*get_egress_tun_info)(struct vport *, struct sk_buff *,
+				   struct ovs_tunnel_info *);
 
 	struct module *owner;
 	struct list_head list;
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 3/6] openvswitch: Optimize recirc action.
From: Pravin B Shelar @ 2014-11-10  3:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

OVS need to flow key for flow lookup in recic action. OVS
does key extract in recic action. Most of cases we could
use OVS_CB packet key directly and can avoid packet flow key
extract. SET action we can update flow-key along with packet
to keep it consistent. But there are some action like MPLS
pop which forces OVS to do flow-extract. In such cases we
can mark flow key as invalid so that subsequent recirc
action can do full flow extract.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
---
 net/openvswitch/actions.c | 151 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 106 insertions(+), 45 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index ceb618c..d4c2f73 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -109,6 +109,16 @@ static struct deferred_action *add_deferred_actions(struct sk_buff *skb,
 	return da;
 }
 
+static void invalidate_flow_key(struct sw_flow_key *key)
+{
+	key->eth.type = htons(0);
+}
+
+static bool is_flow_key_valid(const struct sw_flow_key *key)
+{
+	return !!key->eth.type;
+}
+
 static int make_writable(struct sk_buff *skb, int write_len)
 {
 	if (!pskb_may_pull(skb, write_len))
@@ -120,7 +130,7 @@ static int make_writable(struct sk_buff *skb, int write_len)
 	return pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
 }
 
-static int push_mpls(struct sk_buff *skb,
+static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 		     const struct ovs_action_push_mpls *mpls)
 {
 	__be32 *new_mpls_lse;
@@ -151,10 +161,12 @@ static int push_mpls(struct sk_buff *skb,
 	skb_set_inner_protocol(skb, skb->protocol);
 	skb->protocol = mpls->mpls_ethertype;
 
+	invalidate_flow_key(key);
 	return 0;
 }
 
-static int pop_mpls(struct sk_buff *skb, const __be16 ethertype)
+static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
+		    const __be16 ethertype)
 {
 	struct ethhdr *hdr;
 	int err;
@@ -181,10 +193,13 @@ static int pop_mpls(struct sk_buff *skb, const __be16 ethertype)
 	hdr->h_proto = ethertype;
 	if (eth_p_mpls(skb->protocol))
 		skb->protocol = ethertype;
+
+	invalidate_flow_key(key);
 	return 0;
 }
 
-static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse)
+static int set_mpls(struct sk_buff *skb, struct sw_flow_key *key,
+		    const __be32 *mpls_lse)
 {
 	__be32 *stack;
 	int err;
@@ -196,13 +211,12 @@ static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse)
 	stack = (__be32 *)skb_mpls_header(skb);
 	if (skb->ip_summed == CHECKSUM_COMPLETE) {
 		__be32 diff[] = { ~(*stack), *mpls_lse };
-
 		skb->csum = ~csum_partial((char *)diff, sizeof(diff),
 					  ~skb->csum);
 	}
 
 	*stack = *mpls_lse;
-
+	key->mpls.top_lse = *mpls_lse;
 	return 0;
 }
 
@@ -237,7 +251,7 @@ static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
 	return 0;
 }
 
-static int pop_vlan(struct sk_buff *skb)
+static int pop_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 {
 	__be16 tci;
 	int err;
@@ -255,9 +269,12 @@ static int pop_vlan(struct sk_buff *skb)
 	}
 	/* move next vlan tag to hw accel tag */
 	if (likely(skb->protocol != htons(ETH_P_8021Q) ||
-		   skb->len < VLAN_ETH_HLEN))
+		   skb->len < VLAN_ETH_HLEN)) {
+		key->eth.tci = 0;
 		return 0;
+	}
 
+	invalidate_flow_key(key);
 	err = __pop_vlan_tci(skb, &tci);
 	if (unlikely(err))
 		return err;
@@ -266,7 +283,8 @@ static int pop_vlan(struct sk_buff *skb)
 	return 0;
 }
 
-static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vlan)
+static int push_vlan(struct sk_buff *skb, struct sw_flow_key *key,
+		     const struct ovs_action_push_vlan *vlan)
 {
 	if (unlikely(vlan_tx_tag_present(skb))) {
 		u16 current_tag;
@@ -283,12 +301,15 @@ static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vla
 			skb->csum = csum_add(skb->csum, csum_partial(skb->data
 					+ (2 * ETH_ALEN), VLAN_HLEN, 0));
 
+		invalidate_flow_key(key);
+	} else {
+		key->eth.tci = vlan->vlan_tci;
 	}
 	__vlan_hwaccel_put_tag(skb, vlan->vlan_tpid, ntohs(vlan->vlan_tci) & ~VLAN_TAG_PRESENT);
 	return 0;
 }
 
-static int set_eth_addr(struct sk_buff *skb,
+static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *key,
 			const struct ovs_key_ethernet *eth_key)
 {
 	int err;
@@ -303,11 +324,13 @@ static int set_eth_addr(struct sk_buff *skb,
 
 	ovs_skb_postpush_rcsum(skb, eth_hdr(skb), ETH_ALEN * 2);
 
+	ether_addr_copy(key->eth.src, eth_key->eth_src);
+	ether_addr_copy(key->eth.dst, eth_key->eth_dst);
 	return 0;
 }
 
 static void set_ip_addr(struct sk_buff *skb, struct iphdr *nh,
-				__be32 *addr, __be32 new_addr)
+			__be32 *addr, __be32 new_addr)
 {
 	int transport_len = skb->len - skb_transport_offset(skb);
 
@@ -386,7 +409,8 @@ static void set_ip_ttl(struct sk_buff *skb, struct iphdr *nh, u8 new_ttl)
 	nh->ttl = new_ttl;
 }
 
-static int set_ipv4(struct sk_buff *skb, const struct ovs_key_ipv4 *ipv4_key)
+static int set_ipv4(struct sk_buff *skb, struct sw_flow_key *key,
+		    const struct ovs_key_ipv4 *ipv4_key)
 {
 	struct iphdr *nh;
 	int err;
@@ -398,22 +422,31 @@ static int set_ipv4(struct sk_buff *skb, const struct ovs_key_ipv4 *ipv4_key)
 
 	nh = ip_hdr(skb);
 
-	if (ipv4_key->ipv4_src != nh->saddr)
+	if (ipv4_key->ipv4_src != nh->saddr) {
 		set_ip_addr(skb, nh, &nh->saddr, ipv4_key->ipv4_src);
+		key->ipv4.addr.src = ipv4_key->ipv4_src;
+	}
 
-	if (ipv4_key->ipv4_dst != nh->daddr)
+	if (ipv4_key->ipv4_dst != nh->daddr) {
 		set_ip_addr(skb, nh, &nh->daddr, ipv4_key->ipv4_dst);
+		key->ipv4.addr.dst = ipv4_key->ipv4_dst;
+	}
 
-	if (ipv4_key->ipv4_tos != nh->tos)
+	if (ipv4_key->ipv4_tos != nh->tos) {
 		ipv4_change_dsfield(nh, 0, ipv4_key->ipv4_tos);
+		key->ip.tos = nh->tos;
+	}
 
-	if (ipv4_key->ipv4_ttl != nh->ttl)
+	if (ipv4_key->ipv4_ttl != nh->ttl) {
 		set_ip_ttl(skb, nh, ipv4_key->ipv4_ttl);
+		key->ip.ttl = ipv4_key->ipv4_ttl;
+	}
 
 	return 0;
 }
 
-static int set_ipv6(struct sk_buff *skb, const struct ovs_key_ipv6 *ipv6_key)
+static int set_ipv6(struct sk_buff *skb, struct sw_flow_key *key,
+		    const struct ovs_key_ipv6 *ipv6_key)
 {
 	struct ipv6hdr *nh;
 	int err;
@@ -429,9 +462,12 @@ static int set_ipv6(struct sk_buff *skb, const struct ovs_key_ipv6 *ipv6_key)
 	saddr = (__be32 *)&nh->saddr;
 	daddr = (__be32 *)&nh->daddr;
 
-	if (memcmp(ipv6_key->ipv6_src, saddr, sizeof(ipv6_key->ipv6_src)))
+	if (memcmp(ipv6_key->ipv6_src, saddr, sizeof(ipv6_key->ipv6_src))) {
 		set_ipv6_addr(skb, ipv6_key->ipv6_proto, saddr,
 			      ipv6_key->ipv6_src, true);
+		memcpy(&key->ipv6.addr.src, ipv6_key->ipv6_src,
+		       sizeof(ipv6_key->ipv6_src));
+	}
 
 	if (memcmp(ipv6_key->ipv6_dst, daddr, sizeof(ipv6_key->ipv6_dst))) {
 		unsigned int offset = 0;
@@ -445,12 +481,18 @@ static int set_ipv6(struct sk_buff *skb, const struct ovs_key_ipv6 *ipv6_key)
 
 		set_ipv6_addr(skb, ipv6_key->ipv6_proto, daddr,
 			      ipv6_key->ipv6_dst, recalc_csum);
+		memcpy(&key->ipv6.addr.dst, ipv6_key->ipv6_dst,
+		       sizeof(ipv6_key->ipv6_dst));
 	}
 
 	set_ipv6_tc(nh, ipv6_key->ipv6_tclass);
+	key->ip.tos = ipv6_get_dsfield(nh);
+
 	set_ipv6_fl(nh, ntohl(ipv6_key->ipv6_label));
-	nh->hop_limit = ipv6_key->ipv6_hlimit;
+	key->ipv6.label = *(__be32 *)nh & htonl(IPV6_FLOWINFO_FLOWLABEL);
 
+	nh->hop_limit = ipv6_key->ipv6_hlimit;
+	key->ip.ttl = ipv6_key->ipv6_hlimit;
 	return 0;
 }
 
@@ -478,7 +520,8 @@ static void set_udp_port(struct sk_buff *skb, __be16 *port, __be16 new_port)
 	}
 }
 
-static int set_udp(struct sk_buff *skb, const struct ovs_key_udp *udp_port_key)
+static int set_udp(struct sk_buff *skb, struct sw_flow_key *key,
+		   const struct ovs_key_udp *udp_port_key)
 {
 	struct udphdr *uh;
 	int err;
@@ -489,16 +532,21 @@ static int set_udp(struct sk_buff *skb, const struct ovs_key_udp *udp_port_key)
 		return err;
 
 	uh = udp_hdr(skb);
-	if (udp_port_key->udp_src != uh->source)
+	if (udp_port_key->udp_src != uh->source) {
 		set_udp_port(skb, &uh->source, udp_port_key->udp_src);
+		key->tp.src = udp_port_key->udp_src;
+	}
 
-	if (udp_port_key->udp_dst != uh->dest)
+	if (udp_port_key->udp_dst != uh->dest) {
 		set_udp_port(skb, &uh->dest, udp_port_key->udp_dst);
+		key->tp.dst = udp_port_key->udp_dst;
+	}
 
 	return 0;
 }
 
-static int set_tcp(struct sk_buff *skb, const struct ovs_key_tcp *tcp_port_key)
+static int set_tcp(struct sk_buff *skb, struct sw_flow_key *key,
+		   const struct ovs_key_tcp *tcp_port_key)
 {
 	struct tcphdr *th;
 	int err;
@@ -509,17 +557,21 @@ static int set_tcp(struct sk_buff *skb, const struct ovs_key_tcp *tcp_port_key)
 		return err;
 
 	th = tcp_hdr(skb);
-	if (tcp_port_key->tcp_src != th->source)
+	if (tcp_port_key->tcp_src != th->source) {
 		set_tp_port(skb, &th->source, tcp_port_key->tcp_src, &th->check);
+		key->tp.src = tcp_port_key->tcp_src;
+	}
 
-	if (tcp_port_key->tcp_dst != th->dest)
+	if (tcp_port_key->tcp_dst != th->dest) {
 		set_tp_port(skb, &th->dest, tcp_port_key->tcp_dst, &th->check);
+		key->tp.dst = tcp_port_key->tcp_dst;
+	}
 
 	return 0;
 }
 
-static int set_sctp(struct sk_buff *skb,
-		     const struct ovs_key_sctp *sctp_port_key)
+static int set_sctp(struct sk_buff *skb, struct sw_flow_key *key,
+		    const struct ovs_key_sctp *sctp_port_key)
 {
 	struct sctphdr *sh;
 	int err;
@@ -546,6 +598,8 @@ static int set_sctp(struct sk_buff *skb,
 		sh->checksum = old_csum ^ old_correct_csum ^ new_csum;
 
 		skb_clear_hash(skb);
+		key->tp.src = sctp_port_key->sctp_src;
+		key->tp.dst = sctp_port_key->sctp_dst;
 	}
 
 	return 0;
@@ -675,18 +729,20 @@ static void execute_hash(struct sk_buff *skb, struct sw_flow_key *key,
 	key->ovs_flow_hash = hash;
 }
 
-static int execute_set_action(struct sk_buff *skb,
-				 const struct nlattr *nested_attr)
+static int execute_set_action(struct sk_buff *skb, struct sw_flow_key *key,
+			      const struct nlattr *nested_attr)
 {
 	int err = 0;
 
 	switch (nla_type(nested_attr)) {
 	case OVS_KEY_ATTR_PRIORITY:
 		skb->priority = nla_get_u32(nested_attr);
+		key->phy.priority = skb->priority;
 		break;
 
 	case OVS_KEY_ATTR_SKB_MARK:
 		skb->mark = nla_get_u32(nested_attr);
+		key->phy.skb_mark = skb->mark;
 		break;
 
 	case OVS_KEY_ATTR_TUNNEL_INFO:
@@ -694,31 +750,31 @@ static int execute_set_action(struct sk_buff *skb,
 		break;
 
 	case OVS_KEY_ATTR_ETHERNET:
-		err = set_eth_addr(skb, nla_data(nested_attr));
+		err = set_eth_addr(skb, key, nla_data(nested_attr));
 		break;
 
 	case OVS_KEY_ATTR_IPV4:
-		err = set_ipv4(skb, nla_data(nested_attr));
+		err = set_ipv4(skb, key, nla_data(nested_attr));
 		break;
 
 	case OVS_KEY_ATTR_IPV6:
-		err = set_ipv6(skb, nla_data(nested_attr));
+		err = set_ipv6(skb, key, nla_data(nested_attr));
 		break;
 
 	case OVS_KEY_ATTR_TCP:
-		err = set_tcp(skb, nla_data(nested_attr));
+		err = set_tcp(skb, key, nla_data(nested_attr));
 		break;
 
 	case OVS_KEY_ATTR_UDP:
-		err = set_udp(skb, nla_data(nested_attr));
+		err = set_udp(skb, key, nla_data(nested_attr));
 		break;
 
 	case OVS_KEY_ATTR_SCTP:
-		err = set_sctp(skb, nla_data(nested_attr));
+		err = set_sctp(skb, key, nla_data(nested_attr));
 		break;
 
 	case OVS_KEY_ATTR_MPLS:
-		err = set_mpls(skb, nla_data(nested_attr));
+		err = set_mpls(skb, key, nla_data(nested_attr));
 		break;
 	}
 
@@ -730,11 +786,15 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb,
 			  const struct nlattr *a, int rem)
 {
 	struct deferred_action *da;
-	int err;
 
-	err = ovs_flow_key_update(skb, key);
-	if (err)
-		return err;
+	if (!is_flow_key_valid(key)) {
+		int err;
+
+		err = ovs_flow_key_update(skb, key);
+		if (err)
+			return err;
+	}
+	BUG_ON(!is_flow_key_valid(key));
 
 	if (!nla_is_last(a, rem)) {
 		/* Recirc action is the not the last action
@@ -771,7 +831,8 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 	/* Every output action needs a separate clone of 'skb', but the common
 	 * case is just a single output action, so that doing a clone and
 	 * then freeing the original skbuff is wasteful.  So the following code
-	 * is slightly obscure just to avoid that. */
+	 * is slightly obscure just to avoid that.
+	 */
 	int prev_port = -1;
 	const struct nlattr *a;
 	int rem;
@@ -803,21 +864,21 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			break;
 
 		case OVS_ACTION_ATTR_PUSH_MPLS:
-			err = push_mpls(skb, nla_data(a));
+			err = push_mpls(skb, key, nla_data(a));
 			break;
 
 		case OVS_ACTION_ATTR_POP_MPLS:
-			err = pop_mpls(skb, nla_get_be16(a));
+			err = pop_mpls(skb, key, nla_get_be16(a));
 			break;
 
 		case OVS_ACTION_ATTR_PUSH_VLAN:
-			err = push_vlan(skb, nla_data(a));
+			err = push_vlan(skb, key, nla_data(a));
 			if (unlikely(err)) /* skb already freed. */
 				return err;
 			break;
 
 		case OVS_ACTION_ATTR_POP_VLAN:
-			err = pop_vlan(skb);
+			err = pop_vlan(skb, key);
 			break;
 
 		case OVS_ACTION_ATTR_RECIRC:
@@ -832,7 +893,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			break;
 
 		case OVS_ACTION_ATTR_SET:
-			err = execute_set_action(skb, nla_data(a));
+			err = execute_set_action(skb, key, nla_data(a));
 			break;
 
 		case OVS_ACTION_ATTR_SAMPLE:
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 4/6] openvswitch: Remove redundant key ref from upcall_info.
From: Pravin B Shelar @ 2014-11-10  3:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

struct dp_upcall_info has pointer to pkt_key which is already
available in OVS_CB.  This also simplifies upcall handling
for gso packet.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
---
 net/openvswitch/actions.c  |  3 +--
 net/openvswitch/datapath.c | 45 ++++++++++++++++++++++++++-------------------
 net/openvswitch/datapath.h | 12 +++++-------
 3 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index d4c2f73..10c94ac 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -624,7 +624,6 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
 	int rem;
 
 	upcall.cmd = OVS_PACKET_CMD_ACTION;
-	upcall.key = key;
 	upcall.userdata = NULL;
 	upcall.portid = 0;
 	upcall.egress_tun_info = NULL;
@@ -659,7 +658,7 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
 		} /* End of switch. */
 	}
 
-	return ovs_dp_upcall(dp, skb, &upcall);
+	return ovs_dp_upcall(dp, skb, key, &upcall);
 }
 
 static int sample(struct datapath *dp, struct sk_buff *skb,
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c2ac340..7146b38 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -136,8 +136,10 @@ EXPORT_SYMBOL_GPL(lockdep_ovsl_is_held);
 
 static struct vport *new_vport(const struct vport_parms *);
 static int queue_gso_packets(struct datapath *dp, struct sk_buff *,
+			     const struct sw_flow_key *,
 			     const struct dp_upcall_info *);
 static int queue_userspace_packet(struct datapath *dp, struct sk_buff *,
+				  const struct sw_flow_key *,
 				  const struct dp_upcall_info *);
 
 /* Must be called with rcu_read_lock. */
@@ -271,11 +273,10 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
 		int error;
 
 		upcall.cmd = OVS_PACKET_CMD_MISS;
-		upcall.key = key;
 		upcall.userdata = NULL;
 		upcall.portid = ovs_vport_find_upcall_portid(p, skb);
 		upcall.egress_tun_info = NULL;
-		error = ovs_dp_upcall(dp, skb, &upcall);
+		error = ovs_dp_upcall(dp, skb, key, &upcall);
 		if (unlikely(error))
 			kfree_skb(skb);
 		else
@@ -299,6 +300,7 @@ out:
 }
 
 int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
+		  const struct sw_flow_key *key,
 		  const struct dp_upcall_info *upcall_info)
 {
 	struct dp_stats_percpu *stats;
@@ -310,9 +312,9 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
 	}
 
 	if (!skb_is_gso(skb))
-		err = queue_userspace_packet(dp, skb, upcall_info);
+		err = queue_userspace_packet(dp, skb, key, upcall_info);
 	else
-		err = queue_gso_packets(dp, skb, upcall_info);
+		err = queue_gso_packets(dp, skb, key, upcall_info);
 	if (err)
 		goto err;
 
@@ -329,39 +331,43 @@ err:
 }
 
 static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
+			     const struct sw_flow_key *key,
 			     const struct dp_upcall_info *upcall_info)
 {
 	unsigned short gso_type = skb_shinfo(skb)->gso_type;
-	struct dp_upcall_info later_info;
 	struct sw_flow_key later_key;
 	struct sk_buff *segs, *nskb;
+	struct ovs_skb_cb ovs_cb;
 	int err;
 
+	ovs_cb = *OVS_CB(skb);
 	segs = __skb_gso_segment(skb, NETIF_F_SG, false);
+	*OVS_CB(skb) = ovs_cb;
 	if (IS_ERR(segs))
 		return PTR_ERR(segs);
 	if (segs == NULL)
 		return -EINVAL;
 
+	if (gso_type & SKB_GSO_UDP) {
+		/* The initial flow key extracted by ovs_flow_key_extract()
+		 * in this case is for a first fragment, so we need to
+		 * properly mark later fragments.
+		 */
+		later_key = *key;
+		later_key.ip.frag = OVS_FRAG_TYPE_LATER;
+	}
+
 	/* Queue all of the segments. */
 	skb = segs;
 	do {
-		err = queue_userspace_packet(dp, skb, upcall_info);
+		*OVS_CB(skb) = ovs_cb;
+		if (gso_type & SKB_GSO_UDP && skb != segs)
+			key = &later_key;
+
+		err = queue_userspace_packet(dp, skb, key, upcall_info);
 		if (err)
 			break;
 
-		if (skb == segs && gso_type & SKB_GSO_UDP) {
-			/* The initial flow key extracted by ovs_flow_extract()
-			 * in this case is for a first fragment, so we need to
-			 * properly mark later fragments.
-			 */
-			later_key = *upcall_info->key;
-			later_key.ip.frag = OVS_FRAG_TYPE_LATER;
-
-			later_info = *upcall_info;
-			later_info.key = &later_key;
-			upcall_info = &later_info;
-		}
 	} while ((skb = skb->next));
 
 	/* Free all of the segments. */
@@ -395,6 +401,7 @@ static size_t upcall_msg_size(const struct dp_upcall_info *upcall_info,
 }
 
 static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
+				  const struct sw_flow_key *key,
 				  const struct dp_upcall_info *upcall_info)
 {
 	struct ovs_header *upcall;
@@ -457,7 +464,7 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 	upcall->dp_ifindex = dp_ifindex;
 
 	nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_KEY);
-	err = ovs_nla_put_flow(upcall_info->key, upcall_info->key, user_skb);
+	err = ovs_nla_put_flow(key, key, user_skb);
 	BUG_ON(err);
 	nla_nest_end(user_skb, nla);
 
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 2bc577b..8de9f7e 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -108,20 +108,18 @@ struct ovs_skb_cb {
 /**
  * struct dp_upcall - metadata to include with a packet to send to userspace
  * @cmd: One of %OVS_PACKET_CMD_*.
- * @key: Becomes %OVS_PACKET_ATTR_KEY.  Must be nonnull.
  * @userdata: If nonnull, its variable-length value is passed to userspace as
  * %OVS_PACKET_ATTR_USERDATA.
- * @pid: Netlink PID to which packet should be sent.  If @pid is 0 then no
- * packet is sent and the packet is accounted in the datapath's @n_lost
+ * @portid: Netlink portid to which packet should be sent.  If @portid is 0
+ * then no packet is sent and the packet is accounted in the datapath's @n_lost
  * counter.
  * @egress_tun_info: If nonnull, becomes %OVS_PACKET_ATTR_EGRESS_TUN_KEY.
  */
 struct dp_upcall_info {
-	u8 cmd;
-	const struct sw_flow_key *key;
+	const struct ovs_tunnel_info *egress_tun_info;
 	const struct nlattr *userdata;
 	u32 portid;
-	const struct ovs_tunnel_info *egress_tun_info;
+	u8 cmd;
 };
 
 /**
@@ -187,7 +185,7 @@ extern struct genl_family dp_vport_genl_family;
 void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key);
 void ovs_dp_detach_port(struct vport *);
 int ovs_dp_upcall(struct datapath *, struct sk_buff *,
-		  const struct dp_upcall_info *);
+		  const struct sw_flow_key *, const struct dp_upcall_info *);
 
 const char *ovs_dp_name(const struct datapath *dp);
 struct sk_buff *ovs_vport_cmd_build_info(struct vport *, u32 pid, u32 seq,
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 5/6] openvswitch: Constify various function arguments
From: Pravin B Shelar @ 2014-11-10  3:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Thomas Graf, Pravin B Shelar

From: Thomas Graf <tgraf@noironetworks.com>

Help produce better optimized code.

Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/openvswitch/actions.c      |  7 ++++---
 net/openvswitch/datapath.c     | 10 +++++-----
 net/openvswitch/datapath.h     |  4 ++--
 net/openvswitch/flow.c         |  4 ++--
 net/openvswitch/flow.h         | 11 ++++++-----
 net/openvswitch/flow_table.c   | 12 ++++++------
 net/openvswitch/flow_table.h   |  8 ++++----
 net/openvswitch/vport-geneve.c |  2 +-
 net/openvswitch/vport-netdev.c |  2 +-
 net/openvswitch/vport.c        |  8 ++++----
 net/openvswitch/vport.h        |  6 +++---
 11 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 10c94ac..394efa6 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -69,7 +69,7 @@ static void action_fifo_init(struct action_fifo *fifo)
 	fifo->tail = 0;
 }
 
-static bool action_fifo_is_empty(struct action_fifo *fifo)
+static bool action_fifo_is_empty(const struct action_fifo *fifo)
 {
 	return (fifo->head == fifo->tail);
 }
@@ -92,7 +92,7 @@ static struct deferred_action *action_fifo_put(struct action_fifo *fifo)
 
 /* Return true if fifo is not full */
 static struct deferred_action *add_deferred_actions(struct sk_buff *skb,
-						    struct sw_flow_key *key,
+						    const struct sw_flow_key *key,
 						    const struct nlattr *attr)
 {
 	struct action_fifo *fifo;
@@ -944,7 +944,8 @@ static void process_deferred_actions(struct datapath *dp)
 
 /* Execute a list of actions against 'skb'. */
 int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
-			struct sw_flow_actions *acts, struct sw_flow_key *key)
+			const struct sw_flow_actions *acts,
+			struct sw_flow_key *key)
 {
 	int level = this_cpu_read(exec_actions_level);
 	int err;
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 7146b38..65561eb 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -178,7 +178,7 @@ const char *ovs_dp_name(const struct datapath *dp)
 	return vport->ops->get_name(vport);
 }
 
-static int get_dpifindex(struct datapath *dp)
+static int get_dpifindex(const struct datapath *dp)
 {
 	struct vport *local;
 	int ifindex;
@@ -633,7 +633,7 @@ static struct genl_family dp_packet_genl_family = {
 	.n_ops = ARRAY_SIZE(dp_packet_genl_ops),
 };
 
-static void get_dp_stats(struct datapath *dp, struct ovs_dp_stats *stats,
+static void get_dp_stats(const struct datapath *dp, struct ovs_dp_stats *stats,
 			 struct ovs_dp_megaflow_stats *mega_stats)
 {
 	int i;
@@ -1352,7 +1352,7 @@ static struct sk_buff *ovs_dp_cmd_alloc_info(struct genl_info *info)
 
 /* Called with rcu_read_lock or ovs_mutex. */
 static struct datapath *lookup_datapath(struct net *net,
-					struct ovs_header *ovs_header,
+					const struct ovs_header *ovs_header,
 					struct nlattr *a[OVS_DP_ATTR_MAX + 1])
 {
 	struct datapath *dp;
@@ -1380,7 +1380,7 @@ static void ovs_dp_reset_user_features(struct sk_buff *skb, struct genl_info *in
 	dp->user_features = 0;
 }
 
-static void ovs_dp_change(struct datapath *dp, struct nlattr **a)
+static void ovs_dp_change(struct datapath *dp, struct nlattr *a[])
 {
 	if (a[OVS_DP_ATTR_USER_FEATURES])
 		dp->user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
@@ -1744,7 +1744,7 @@ struct sk_buff *ovs_vport_cmd_build_info(struct vport *vport, u32 portid,
 
 /* Called with ovs_mutex or RCU read lock. */
 static struct vport *lookup_vport(struct net *net,
-				  struct ovs_header *ovs_header,
+				  const struct ovs_header *ovs_header,
 				  struct nlattr *a[OVS_VPORT_ATTR_MAX + 1])
 {
 	struct datapath *dp;
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 8de9f7e..8389c1d 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -149,7 +149,7 @@ int lockdep_ovsl_is_held(void);
 #define rcu_dereference_ovsl(p)					\
 	rcu_dereference_check(p, lockdep_ovsl_is_held())
 
-static inline struct net *ovs_dp_get_net(struct datapath *dp)
+static inline struct net *ovs_dp_get_net(const struct datapath *dp)
 {
 	return read_pnet(&dp->net);
 }
@@ -192,7 +192,7 @@ struct sk_buff *ovs_vport_cmd_build_info(struct vport *, u32 pid, u32 seq,
 					 u8 cmd);
 
 int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
-			struct sw_flow_actions *acts, struct sw_flow_key *);
+			const struct sw_flow_actions *, struct sw_flow_key *);
 
 void ovs_dp_notify_wq(struct work_struct *work);
 
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 90a2101..25e9abc 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -66,7 +66,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies)
 #define TCP_FLAGS_BE16(tp) (*(__be16 *)&tcp_flag_word(tp) & htons(0x0FFF))
 
 void ovs_flow_stats_update(struct sw_flow *flow, __be16 tcp_flags,
-			   struct sk_buff *skb)
+			   const struct sk_buff *skb)
 {
 	struct flow_stats *stats;
 	int node = numa_node_id();
@@ -679,7 +679,7 @@ int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
 	return key_extract(skb, key);
 }
 
-int ovs_flow_key_extract(struct ovs_tunnel_info *tun_info,
+int ovs_flow_key_extract(const struct ovs_tunnel_info *tun_info,
 			 struct sk_buff *skb, struct sw_flow_key *key)
 {
 	/* Extract metadata from packet. */
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 543b358..9e0a787 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -53,7 +53,7 @@ struct ovs_key_ipv4_tunnel {
 
 struct ovs_tunnel_info {
 	struct ovs_key_ipv4_tunnel tunnel;
-	struct geneve_opt *options;
+	const struct geneve_opt *options;
 	u8 options_len;
 };
 
@@ -73,7 +73,7 @@ static inline void __ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
 					    __be16 tp_dst,
 					    __be64 tun_id,
 					    __be16 tun_flags,
-					    struct geneve_opt *opts,
+					    const struct geneve_opt *opts,
 					    u8 opts_len)
 {
 	tun_info->tunnel.tun_id = tun_id;
@@ -105,7 +105,7 @@ static inline void ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
 					  __be16 tp_dst,
 					  __be64 tun_id,
 					  __be16 tun_flags,
-					  struct geneve_opt *opts,
+					  const struct geneve_opt *opts,
 					  u8 opts_len)
 {
 	__ovs_flow_tun_info_init(tun_info, iph->saddr, iph->daddr,
@@ -244,14 +244,15 @@ struct arp_eth_header {
 } __packed;
 
 void ovs_flow_stats_update(struct sw_flow *, __be16 tcp_flags,
-			   struct sk_buff *);
+			   const struct sk_buff *);
 void ovs_flow_stats_get(const struct sw_flow *, struct ovs_flow_stats *,
 			unsigned long *used, __be16 *tcp_flags);
 void ovs_flow_stats_clear(struct sw_flow *);
 u64 ovs_flow_used_time(unsigned long flow_jiffies);
 
 int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key);
-int ovs_flow_key_extract(struct ovs_tunnel_info *tun_info, struct sk_buff *skb,
+int ovs_flow_key_extract(const struct ovs_tunnel_info *tun_info,
+			 struct sk_buff *skb,
 			 struct sw_flow_key *key);
 /* Extract key from packet coming from userspace. */
 int ovs_flow_key_extract_userspace(const struct nlattr *attr,
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 90f8b40..e0a7fef 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -107,7 +107,7 @@ err:
 	return ERR_PTR(-ENOMEM);
 }
 
-int ovs_flow_tbl_count(struct flow_table *table)
+int ovs_flow_tbl_count(const struct flow_table *table)
 {
 	return table->count;
 }
@@ -401,7 +401,7 @@ static bool flow_cmp_masked_key(const struct sw_flow *flow,
 }
 
 bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
-			       struct sw_flow_match *match)
+			       const struct sw_flow_match *match)
 {
 	struct sw_flow_key *key = match->key;
 	int key_start = flow_key_start(key);
@@ -412,7 +412,7 @@ bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
 
 static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
 					  const struct sw_flow_key *unmasked,
-					  struct sw_flow_mask *mask)
+					  const struct sw_flow_mask *mask)
 {
 	struct sw_flow *flow;
 	struct hlist_head *head;
@@ -460,7 +460,7 @@ struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
 }
 
 struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
-					  struct sw_flow_match *match)
+					  const struct sw_flow_match *match)
 {
 	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
 	struct sw_flow_mask *mask;
@@ -563,7 +563,7 @@ static struct sw_flow_mask *flow_mask_find(const struct flow_table *tbl,
 
 /* Add 'mask' into the mask list, if it is not already there. */
 static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
-			    struct sw_flow_mask *new)
+			    const struct sw_flow_mask *new)
 {
 	struct sw_flow_mask *mask;
 	mask = flow_mask_find(tbl, new);
@@ -586,7 +586,7 @@ static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
 
 /* Must be called with OVS mutex held. */
 int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
-			struct sw_flow_mask *mask)
+			const struct sw_flow_mask *mask)
 {
 	struct table_instance *new_ti = NULL;
 	struct table_instance *ti;
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index f682c8c..309fa64 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -61,12 +61,12 @@ struct sw_flow *ovs_flow_alloc(void);
 void ovs_flow_free(struct sw_flow *, bool deferred);
 
 int ovs_flow_tbl_init(struct flow_table *);
-int ovs_flow_tbl_count(struct flow_table *table);
+int ovs_flow_tbl_count(const struct flow_table *table);
 void ovs_flow_tbl_destroy(struct flow_table *table);
 int ovs_flow_tbl_flush(struct flow_table *flow_table);
 
 int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
-			struct sw_flow_mask *mask);
+			const struct sw_flow_mask *mask);
 void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow);
 int  ovs_flow_tbl_num_masks(const struct flow_table *table);
 struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *table,
@@ -77,9 +77,9 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *,
 struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *,
 				    const struct sw_flow_key *);
 struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
-					  struct sw_flow_match *match);
+					  const struct sw_flow_match *match);
 bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
-			       struct sw_flow_match *match);
+			       const struct sw_flow_match *match);
 
 void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
 		       const struct sw_flow_mask *mask);
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index e31f19c..347fa23 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -68,7 +68,7 @@ static void tunnel_id_to_vni(__be64 tun_id, __u8 *vni)
 }
 
 /* Convert 24 bit VNI to 64 bit tunnel ID. */
-static __be64 vni_to_tunnel_id(__u8 *vni)
+static __be64 vni_to_tunnel_id(const __u8 *vni)
 {
 #ifdef __BIG_ENDIAN
 	return (vni[0] << 16) | (vni[1] << 8) | vni[2];
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 877ee74..4776282 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -77,7 +77,7 @@ static rx_handler_result_t netdev_frame_hook(struct sk_buff **pskb)
 	return RX_HANDLER_CONSUMED;
 }
 
-static struct net_device *get_dpdev(struct datapath *dp)
+static struct net_device *get_dpdev(const struct datapath *dp)
 {
 	struct vport *local;
 
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 630e819..e771a46 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -68,7 +68,7 @@ void ovs_vport_exit(void)
 	kfree(dev_table);
 }
 
-static struct hlist_head *hash_bucket(struct net *net, const char *name)
+static struct hlist_head *hash_bucket(const struct net *net, const char *name)
 {
 	unsigned int hash = jhash(name, strlen(name), (unsigned long) net);
 	return &dev_table[hash & (VPORT_HASH_BUCKETS - 1)];
@@ -107,7 +107,7 @@ EXPORT_SYMBOL_GPL(ovs_vport_ops_unregister);
  *
  * Must be called with ovs or RCU read lock.
  */
-struct vport *ovs_vport_locate(struct net *net, const char *name)
+struct vport *ovs_vport_locate(const struct net *net, const char *name)
 {
 	struct hlist_head *bucket = hash_bucket(net, name);
 	struct vport *vport;
@@ -380,7 +380,7 @@ int ovs_vport_get_options(const struct vport *vport, struct sk_buff *skb)
  *
  * Must be called with ovs_mutex.
  */
-int ovs_vport_set_upcall_portids(struct vport *vport,  struct nlattr *ids)
+int ovs_vport_set_upcall_portids(struct vport *vport, const struct nlattr *ids)
 {
 	struct vport_portids *old, *vport_portids;
 
@@ -471,7 +471,7 @@ u32 ovs_vport_find_upcall_portid(const struct vport *vport, struct sk_buff *skb)
  * skb->data should point to the Ethernet header.
  */
 void ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
-		       struct ovs_tunnel_info *tun_info)
+		       const struct ovs_tunnel_info *tun_info)
 {
 	struct pcpu_sw_netstats *stats;
 	struct sw_flow_key key;
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index 0635d1d..99c8e71 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -45,14 +45,14 @@ void ovs_vport_exit(void);
 struct vport *ovs_vport_add(const struct vport_parms *);
 void ovs_vport_del(struct vport *);
 
-struct vport *ovs_vport_locate(struct net *net, const char *name);
+struct vport *ovs_vport_locate(const struct net *net, const char *name);
 
 void ovs_vport_get_stats(struct vport *, struct ovs_vport_stats *);
 
 int ovs_vport_set_options(struct vport *, struct nlattr *options);
 int ovs_vport_get_options(const struct vport *, struct sk_buff *);
 
-int ovs_vport_set_upcall_portids(struct vport *, struct nlattr *pids);
+int ovs_vport_set_upcall_portids(struct vport *, const struct nlattr *pids);
 int ovs_vport_get_upcall_portids(const struct vport *, struct sk_buff *);
 u32 ovs_vport_find_upcall_portid(const struct vport *, struct sk_buff *);
 
@@ -224,7 +224,7 @@ static inline struct vport *vport_from_priv(void *priv)
 }
 
 void ovs_vport_receive(struct vport *, struct sk_buff *,
-		       struct ovs_tunnel_info *);
+		       const struct ovs_tunnel_info *);
 
 static inline void ovs_skb_postpush_rcsum(struct sk_buff *skb,
 				      const void *start, unsigned int len)
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 6/6] openvswitch: Add support for OVS_FLOW_ATTR_PROBE.
From: Pravin B Shelar @ 2014-11-10  3:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Jarno Rajahalme, Pravin B Shelar

From: Jarno Rajahalme <jrajahalme@nicira.com>

This new flag is useful for suppressing error logging while probing
for datapath features using flow commands.  For backwards
compatibility reasons the commands are executed normally, but error
logging is suppressed.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 include/uapi/linux/openvswitch.h |   2 +
 net/openvswitch/datapath.c       |  49 ++++---
 net/openvswitch/datapath.h       |   6 +-
 net/openvswitch/flow.c           |   4 +-
 net/openvswitch/flow.h           |   2 +-
 net/openvswitch/flow_netlink.c   | 303 +++++++++++++++++++++------------------
 net/openvswitch/flow_netlink.h   |  10 +-
 7 files changed, 208 insertions(+), 168 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index cf81856..3a6dcaa 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -457,6 +457,8 @@ enum ovs_flow_attr {
 	OVS_FLOW_ATTR_USED,      /* u64 msecs last used in monotonic time. */
 	OVS_FLOW_ATTR_CLEAR,     /* Flag to clear stats, tcp_flags, used. */
 	OVS_FLOW_ATTR_MASK,      /* Sequence of OVS_KEY_ATTR_* attributes. */
+	OVS_FLOW_ATTR_PROBE,     /* Flow operation is a feature probe, error
+				  * logging should be suppressed. */
 	__OVS_FLOW_ATTR_MAX
 };
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 65561eb..ab141d4 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -526,6 +526,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	struct vport *input_vport;
 	int len;
 	int err;
+	bool log = !a[OVS_FLOW_ATTR_PROBE];
 
 	err = -EINVAL;
 	if (!a[OVS_PACKET_ATTR_PACKET] || !a[OVS_PACKET_ATTR_KEY] ||
@@ -559,12 +560,12 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 		goto err_kfree_skb;
 
 	err = ovs_flow_key_extract_userspace(a[OVS_PACKET_ATTR_KEY], packet,
-					     &flow->key);
+					     &flow->key, log);
 	if (err)
 		goto err_flow_free;
 
 	err = ovs_nla_copy_actions(a[OVS_PACKET_ATTR_ACTIONS],
-				   &flow->key, &acts);
+				   &flow->key, &acts, log);
 	if (err)
 		goto err_flow_free;
 
@@ -855,15 +856,16 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	struct sw_flow_actions *acts;
 	struct sw_flow_match match;
 	int error;
+	bool log = !a[OVS_FLOW_ATTR_PROBE];
 
 	/* Must have key and actions. */
 	error = -EINVAL;
 	if (!a[OVS_FLOW_ATTR_KEY]) {
-		OVS_NLERR("Flow key attribute not present in new flow.\n");
+		OVS_NLERR(log, "Flow key attr not present in new flow.");
 		goto error;
 	}
 	if (!a[OVS_FLOW_ATTR_ACTIONS]) {
-		OVS_NLERR("Flow actions attribute not present in new flow.\n");
+		OVS_NLERR(log, "Flow actions attr not present in new flow.");
 		goto error;
 	}
 
@@ -878,8 +880,8 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 
 	/* Extract key. */
 	ovs_match_init(&match, &new_flow->unmasked_key, &mask);
-	error = ovs_nla_get_match(&match,
-				  a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK]);
+	error = ovs_nla_get_match(&match, a[OVS_FLOW_ATTR_KEY],
+				  a[OVS_FLOW_ATTR_MASK], log);
 	if (error)
 		goto err_kfree_flow;
 
@@ -887,9 +889,9 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 
 	/* Validate actions. */
 	error = ovs_nla_copy_actions(a[OVS_FLOW_ATTR_ACTIONS], &new_flow->key,
-				     &acts);
+				     &acts, log);
 	if (error) {
-		OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
+		OVS_NLERR(log, "Flow actions may not be safe on all matching packets.");
 		goto err_kfree_flow;
 	}
 
@@ -942,6 +944,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		}
 		/* The unmasked key has to be the same for flow updates. */
 		if (unlikely(!ovs_flow_cmp_unmasked_key(flow, &match))) {
+			/* Look for any overlapping flow. */
 			flow = ovs_flow_tbl_lookup_exact(&dp->table, &match);
 			if (!flow) {
 				error = -ENOENT;
@@ -984,16 +987,18 @@ error:
 /* Factor out action copy to avoid "Wframe-larger-than=1024" warning. */
 static struct sw_flow_actions *get_flow_actions(const struct nlattr *a,
 						const struct sw_flow_key *key,
-						const struct sw_flow_mask *mask)
+						const struct sw_flow_mask *mask,
+						bool log)
 {
 	struct sw_flow_actions *acts;
 	struct sw_flow_key masked_key;
 	int error;
 
 	ovs_flow_mask_key(&masked_key, key, mask);
-	error = ovs_nla_copy_actions(a, &masked_key, &acts);
+	error = ovs_nla_copy_actions(a, &masked_key, &acts, log);
 	if (error) {
-		OVS_NLERR("Actions may not be safe on all matching packets.\n");
+		OVS_NLERR(log,
+			  "Actions may not be safe on all matching packets");
 		return ERR_PTR(error);
 	}
 
@@ -1012,23 +1017,25 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	struct sw_flow_actions *old_acts = NULL, *acts = NULL;
 	struct sw_flow_match match;
 	int error;
+	bool log = !a[OVS_FLOW_ATTR_PROBE];
 
 	/* Extract key. */
 	error = -EINVAL;
 	if (!a[OVS_FLOW_ATTR_KEY]) {
-		OVS_NLERR("Flow key attribute not present in set flow.\n");
+		OVS_NLERR(log, "Flow key attribute not present in set flow.");
 		goto error;
 	}
 
 	ovs_match_init(&match, &key, &mask);
-	error = ovs_nla_get_match(&match,
-				  a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK]);
+	error = ovs_nla_get_match(&match, a[OVS_FLOW_ATTR_KEY],
+				  a[OVS_FLOW_ATTR_MASK], log);
 	if (error)
 		goto error;
 
 	/* Validate actions. */
 	if (a[OVS_FLOW_ATTR_ACTIONS]) {
-		acts = get_flow_actions(a[OVS_FLOW_ATTR_ACTIONS], &key, &mask);
+		acts = get_flow_actions(a[OVS_FLOW_ATTR_ACTIONS], &key, &mask,
+					log);
 		if (IS_ERR(acts)) {
 			error = PTR_ERR(acts);
 			goto error;
@@ -1109,14 +1116,16 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info)
 	struct datapath *dp;
 	struct sw_flow_match match;
 	int err;
+	bool log = !a[OVS_FLOW_ATTR_PROBE];
 
 	if (!a[OVS_FLOW_ATTR_KEY]) {
-		OVS_NLERR("Flow get message rejected, Key attribute missing.\n");
+		OVS_NLERR(log,
+			  "Flow get message rejected, Key attribute missing.");
 		return -EINVAL;
 	}
 
 	ovs_match_init(&match, &key, NULL);
-	err = ovs_nla_get_match(&match, a[OVS_FLOW_ATTR_KEY], NULL);
+	err = ovs_nla_get_match(&match, a[OVS_FLOW_ATTR_KEY], NULL, log);
 	if (err)
 		return err;
 
@@ -1157,10 +1166,12 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	struct datapath *dp;
 	struct sw_flow_match match;
 	int err;
+	bool log = !a[OVS_FLOW_ATTR_PROBE];
 
 	if (likely(a[OVS_FLOW_ATTR_KEY])) {
 		ovs_match_init(&match, &key, NULL);
-		err = ovs_nla_get_match(&match, a[OVS_FLOW_ATTR_KEY], NULL);
+		err = ovs_nla_get_match(&match, a[OVS_FLOW_ATTR_KEY], NULL,
+					log);
 		if (unlikely(err))
 			return err;
 	}
@@ -1250,8 +1261,10 @@ static int ovs_flow_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
 
 static const struct nla_policy flow_policy[OVS_FLOW_ATTR_MAX + 1] = {
 	[OVS_FLOW_ATTR_KEY] = { .type = NLA_NESTED },
+	[OVS_FLOW_ATTR_MASK] = { .type = NLA_NESTED },
 	[OVS_FLOW_ATTR_ACTIONS] = { .type = NLA_NESTED },
 	[OVS_FLOW_ATTR_CLEAR] = { .type = NLA_FLAG },
+	[OVS_FLOW_ATTR_PROBE] = { .type = NLA_FLAG },
 };
 
 static const struct genl_ops dp_flow_genl_ops[] = {
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 8389c1d..3ece945 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -199,9 +199,9 @@ void ovs_dp_notify_wq(struct work_struct *work);
 int action_fifos_init(void);
 void action_fifos_exit(void);
 
-#define OVS_NLERR(fmt, ...)					\
+#define OVS_NLERR(logging_allowed, fmt, ...)			\
 do {								\
-	if (net_ratelimit())					\
-		pr_info("netlink: " fmt, ##__VA_ARGS__);	\
+	if (logging_allowed && net_ratelimit())			\
+		pr_info("netlink: " fmt "\n", ##__VA_ARGS__);	\
 } while (0)
 #endif /* datapath.h */
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 25e9abc..70bef2a 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -712,12 +712,12 @@ int ovs_flow_key_extract(const struct ovs_tunnel_info *tun_info,
 
 int ovs_flow_key_extract_userspace(const struct nlattr *attr,
 				   struct sk_buff *skb,
-				   struct sw_flow_key *key)
+				   struct sw_flow_key *key, bool log)
 {
 	int err;
 
 	/* Extract metadata from netlink attributes. */
-	err = ovs_nla_get_flow_metadata(attr, key);
+	err = ovs_nla_get_flow_metadata(attr, key, log);
 	if (err)
 		return err;
 
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 9e0a787..a8b30f3 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -257,6 +257,6 @@ int ovs_flow_key_extract(const struct ovs_tunnel_info *tun_info,
 /* Extract key from packet coming from userspace. */
 int ovs_flow_key_extract_userspace(const struct nlattr *attr,
 				   struct sk_buff *skb,
-				   struct sw_flow_key *key);
+				   struct sw_flow_key *key, bool log);
 
 #endif /* flow.h */
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 98a3e96..c0d066d 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -112,7 +112,7 @@ static void update_range(struct sw_flow_match *match,
 	} while (0)
 
 static bool match_validate(const struct sw_flow_match *match,
-			   u64 key_attrs, u64 mask_attrs)
+			   u64 key_attrs, u64 mask_attrs, bool log)
 {
 	u64 key_expected = 1 << OVS_KEY_ATTR_ETHERNET;
 	u64 mask_allowed = key_attrs;  /* At most allow all key attributes */
@@ -230,15 +230,17 @@ static bool match_validate(const struct sw_flow_match *match,
 
 	if ((key_attrs & key_expected) != key_expected) {
 		/* Key attributes check failed. */
-		OVS_NLERR("Missing expected key attributes (key_attrs=%llx, expected=%llx).\n",
-				(unsigned long long)key_attrs, (unsigned long long)key_expected);
+		OVS_NLERR(log, "Missing key (keys=%llx, expected=%llx)",
+			  (unsigned long long)key_attrs,
+			  (unsigned long long)key_expected);
 		return false;
 	}
 
 	if ((mask_attrs & mask_allowed) != mask_attrs) {
 		/* Mask attributes check failed. */
-		OVS_NLERR("Contain more than allowed mask fields (mask_attrs=%llx, mask_allowed=%llx).\n",
-				(unsigned long long)mask_attrs, (unsigned long long)mask_allowed);
+		OVS_NLERR(log, "Unexpected mask (mask=%llx, allowed=%llx)",
+			  (unsigned long long)mask_attrs,
+			  (unsigned long long)mask_allowed);
 		return false;
 	}
 
@@ -328,7 +330,7 @@ static bool is_all_zero(const u8 *fp, size_t size)
 
 static int __parse_flow_nlattrs(const struct nlattr *attr,
 				const struct nlattr *a[],
-				u64 *attrsp, bool nz)
+				u64 *attrsp, bool log, bool nz)
 {
 	const struct nlattr *nla;
 	u64 attrs;
@@ -340,21 +342,20 @@ static int __parse_flow_nlattrs(const struct nlattr *attr,
 		int expected_len;
 
 		if (type > OVS_KEY_ATTR_MAX) {
-			OVS_NLERR("Unknown key attribute (type=%d, max=%d).\n",
+			OVS_NLERR(log, "Key type %d is out of range max %d",
 				  type, OVS_KEY_ATTR_MAX);
 			return -EINVAL;
 		}
 
 		if (attrs & (1 << type)) {
-			OVS_NLERR("Duplicate key attribute (type %d).\n", type);
+			OVS_NLERR(log, "Duplicate key (type %d).", type);
 			return -EINVAL;
 		}
 
 		expected_len = ovs_key_lens[type];
 		if (nla_len(nla) != expected_len && expected_len != -1) {
-			OVS_NLERR("Key attribute has unexpected length (type=%d"
-				  ", length=%d, expected=%d).\n", type,
-				  nla_len(nla), expected_len);
+			OVS_NLERR(log, "Key %d has unexpected len %d expected %d",
+				  type, nla_len(nla), expected_len);
 			return -EINVAL;
 		}
 
@@ -364,7 +365,7 @@ static int __parse_flow_nlattrs(const struct nlattr *attr,
 		}
 	}
 	if (rem) {
-		OVS_NLERR("Message has %d unknown bytes.\n", rem);
+		OVS_NLERR(log, "Message has %d unknown bytes.", rem);
 		return -EINVAL;
 	}
 
@@ -373,28 +374,84 @@ static int __parse_flow_nlattrs(const struct nlattr *attr,
 }
 
 static int parse_flow_mask_nlattrs(const struct nlattr *attr,
-				   const struct nlattr *a[], u64 *attrsp)
+				   const struct nlattr *a[], u64 *attrsp,
+				   bool log)
 {
-	return __parse_flow_nlattrs(attr, a, attrsp, true);
+	return __parse_flow_nlattrs(attr, a, attrsp, log, true);
 }
 
 static int parse_flow_nlattrs(const struct nlattr *attr,
-			      const struct nlattr *a[], u64 *attrsp)
+			      const struct nlattr *a[], u64 *attrsp,
+			      bool log)
 {
-	return __parse_flow_nlattrs(attr, a, attrsp, false);
+	return __parse_flow_nlattrs(attr, a, attrsp, log, false);
+}
+
+static int genev_tun_opt_from_nlattr(const struct nlattr *a,
+				     struct sw_flow_match *match, bool is_mask,
+				     bool log)
+{
+	unsigned long opt_key_offset;
+
+	if (nla_len(a) > sizeof(match->key->tun_opts)) {
+		OVS_NLERR(log, "Geneve option length err (len %d, max %zu).",
+			  nla_len(a), sizeof(match->key->tun_opts));
+		return -EINVAL;
+	}
+
+	if (nla_len(a) % 4 != 0) {
+		OVS_NLERR(log, "Geneve opt len %d is not a multiple of 4.",
+			  nla_len(a));
+		return -EINVAL;
+	}
+
+	/* We need to record the length of the options passed
+	 * down, otherwise packets with the same format but
+	 * additional options will be silently matched.
+	 */
+	if (!is_mask) {
+		SW_FLOW_KEY_PUT(match, tun_opts_len, nla_len(a),
+				false);
+	} else {
+		/* This is somewhat unusual because it looks at
+		 * both the key and mask while parsing the
+		 * attributes (and by extension assumes the key
+		 * is parsed first). Normally, we would verify
+		 * that each is the correct length and that the
+		 * attributes line up in the validate function.
+		 * However, that is difficult because this is
+		 * variable length and we won't have the
+		 * information later.
+		 */
+		if (match->key->tun_opts_len != nla_len(a)) {
+			OVS_NLERR(log, "Geneve option len %d != mask len %d",
+				  match->key->tun_opts_len, nla_len(a));
+			return -EINVAL;
+		}
+
+		SW_FLOW_KEY_PUT(match, tun_opts_len, 0xff, true);
+	}
+
+	opt_key_offset = (unsigned long)GENEVE_OPTS((struct sw_flow_key *)0,
+						    nla_len(a));
+	SW_FLOW_KEY_MEMCPY_OFFSET(match, opt_key_offset, nla_data(a),
+				  nla_len(a), is_mask);
+	return 0;
 }
 
 static int ipv4_tun_from_nlattr(const struct nlattr *attr,
-				struct sw_flow_match *match, bool is_mask)
+				struct sw_flow_match *match, bool is_mask,
+				bool log)
 {
 	struct nlattr *a;
 	int rem;
 	bool ttl = false;
 	__be16 tun_flags = 0;
-	unsigned long opt_key_offset;
 
 	nla_for_each_nested(a, attr, rem) {
 		int type = nla_type(a);
+		int err;
+
 		static const u32 ovs_tunnel_key_lens[OVS_TUNNEL_KEY_ATTR_MAX + 1] = {
 			[OVS_TUNNEL_KEY_ATTR_ID] = sizeof(u64),
 			[OVS_TUNNEL_KEY_ATTR_IPV4_SRC] = sizeof(u32),
@@ -410,15 +467,14 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 		};
 
 		if (type > OVS_TUNNEL_KEY_ATTR_MAX) {
-			OVS_NLERR("Unknown IPv4 tunnel attribute (type=%d, max=%d).\n",
-			type, OVS_TUNNEL_KEY_ATTR_MAX);
+			OVS_NLERR(log, "Tunnel attr %d out of range max %d",
+				  type, OVS_TUNNEL_KEY_ATTR_MAX);
 			return -EINVAL;
 		}
 
 		if (ovs_tunnel_key_lens[type] != nla_len(a) &&
 		    ovs_tunnel_key_lens[type] != -1) {
-			OVS_NLERR("IPv4 tunnel attribute type has unexpected "
-				  " length (type=%d, length=%d, expected=%d).\n",
+			OVS_NLERR(log, "Tunnel attr %d has unexpected len %d expected %d",
 				  type, nla_len(a), ovs_tunnel_key_lens[type]);
 			return -EINVAL;
 		}
@@ -464,58 +520,14 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 			tun_flags |= TUNNEL_OAM;
 			break;
 		case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS:
-			tun_flags |= TUNNEL_OPTIONS_PRESENT;
-			if (nla_len(a) > sizeof(match->key->tun_opts)) {
-				OVS_NLERR("Geneve option length exceeds maximum size (len %d, max %zu).\n",
-					  nla_len(a),
-					  sizeof(match->key->tun_opts));
-				return -EINVAL;
-			}
-
-			if (nla_len(a) % 4 != 0) {
-				OVS_NLERR("Geneve option length is not a multiple of 4 (len %d).\n",
-					  nla_len(a));
-				return -EINVAL;
-			}
-
-			/* We need to record the length of the options passed
-			 * down, otherwise packets with the same format but
-			 * additional options will be silently matched.
-			 */
-			if (!is_mask) {
-				SW_FLOW_KEY_PUT(match, tun_opts_len, nla_len(a),
-						false);
-			} else {
-				/* This is somewhat unusual because it looks at
-				 * both the key and mask while parsing the
-				 * attributes (and by extension assumes the key
-				 * is parsed first). Normally, we would verify
-				 * that each is the correct length and that the
-				 * attributes line up in the validate function.
-				 * However, that is difficult because this is
-				 * variable length and we won't have the
-				 * information later.
-				 */
-				if (match->key->tun_opts_len != nla_len(a)) {
-					OVS_NLERR("Geneve option key length (%d) is different from mask length (%d).",
-						  match->key->tun_opts_len,
-						  nla_len(a));
-					return -EINVAL;
-				}
-
-				SW_FLOW_KEY_PUT(match, tun_opts_len, 0xff,
-						true);
-			}
+			err = genev_tun_opt_from_nlattr(a, match, is_mask, log);
+			if (err)
+				return err;
 
-			opt_key_offset = (unsigned long)GENEVE_OPTS(
-					  (struct sw_flow_key *)0,
-					  nla_len(a));
-			SW_FLOW_KEY_MEMCPY_OFFSET(match, opt_key_offset,
-						  nla_data(a), nla_len(a),
-						  is_mask);
+			tun_flags |= TUNNEL_OPTIONS_PRESENT;
 			break;
 		default:
-			OVS_NLERR("Unknown IPv4 tunnel attribute (%d).\n",
+			OVS_NLERR(log, "Unknown IPv4 tunnel attribute %d",
 				  type);
 			return -EINVAL;
 		}
@@ -524,18 +536,19 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 	SW_FLOW_KEY_PUT(match, tun_key.tun_flags, tun_flags, is_mask);
 
 	if (rem > 0) {
-		OVS_NLERR("IPv4 tunnel attribute has %d unknown bytes.\n", rem);
+		OVS_NLERR(log, "IPv4 tunnel attribute has %d unknown bytes.",
+			  rem);
 		return -EINVAL;
 	}
 
 	if (!is_mask) {
 		if (!match->key->tun_key.ipv4_dst) {
-			OVS_NLERR("IPv4 tunnel destination address is zero.\n");
+			OVS_NLERR(log, "IPv4 tunnel dst address is zero");
 			return -EINVAL;
 		}
 
 		if (!ttl) {
-			OVS_NLERR("IPv4 tunnel TTL not specified.\n");
+			OVS_NLERR(log, "IPv4 tunnel TTL not specified.");
 			return -EINVAL;
 		}
 	}
@@ -614,7 +627,8 @@ int ovs_nla_put_egress_tunnel_key(struct sk_buff *skb,
 }
 
 static int metadata_from_nlattrs(struct sw_flow_match *match,  u64 *attrs,
-				 const struct nlattr **a, bool is_mask)
+				 const struct nlattr **a, bool is_mask,
+				 bool log)
 {
 	if (*attrs & (1 << OVS_KEY_ATTR_DP_HASH)) {
 		u32 hash_val = nla_get_u32(a[OVS_KEY_ATTR_DP_HASH]);
@@ -642,7 +656,7 @@ static int metadata_from_nlattrs(struct sw_flow_match *match,  u64 *attrs,
 		if (is_mask) {
 			in_port = 0xffffffff; /* Always exact match in_port. */
 		} else if (in_port >= DP_MAX_PORTS) {
-			OVS_NLERR("Port (%d) exceeds maximum allowable (%d).\n",
+			OVS_NLERR(log, "Port %d exceeds max allowable %d",
 				  in_port, DP_MAX_PORTS);
 			return -EINVAL;
 		}
@@ -661,7 +675,7 @@ static int metadata_from_nlattrs(struct sw_flow_match *match,  u64 *attrs,
 	}
 	if (*attrs & (1 << OVS_KEY_ATTR_TUNNEL)) {
 		if (ipv4_tun_from_nlattr(a[OVS_KEY_ATTR_TUNNEL], match,
-					 is_mask))
+					 is_mask, log))
 			return -EINVAL;
 		*attrs &= ~(1 << OVS_KEY_ATTR_TUNNEL);
 	}
@@ -669,11 +683,12 @@ static int metadata_from_nlattrs(struct sw_flow_match *match,  u64 *attrs,
 }
 
 static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
-				const struct nlattr **a, bool is_mask)
+				const struct nlattr **a, bool is_mask,
+				bool log)
 {
 	int err;
 
-	err = metadata_from_nlattrs(match, &attrs, a, is_mask);
+	err = metadata_from_nlattrs(match, &attrs, a, is_mask, log);
 	if (err)
 		return err;
 
@@ -694,9 +709,9 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
 		tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
 		if (!(tci & htons(VLAN_TAG_PRESENT))) {
 			if (is_mask)
-				OVS_NLERR("VLAN TCI mask does not have exact match for VLAN_TAG_PRESENT bit.\n");
+				OVS_NLERR(log, "VLAN TCI mask does not have exact match for VLAN_TAG_PRESENT bit.");
 			else
-				OVS_NLERR("VLAN TCI does not have VLAN_TAG_PRESENT bit set.\n");
+				OVS_NLERR(log, "VLAN TCI does not have VLAN_TAG_PRESENT bit set.");
 
 			return -EINVAL;
 		}
@@ -713,8 +728,8 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
 			/* Always exact match EtherType. */
 			eth_type = htons(0xffff);
 		} else if (ntohs(eth_type) < ETH_P_802_3_MIN) {
-			OVS_NLERR("EtherType is less than minimum (type=%x, min=%x).\n",
-					ntohs(eth_type), ETH_P_802_3_MIN);
+			OVS_NLERR(log, "EtherType %x is less than min %x",
+				  ntohs(eth_type), ETH_P_802_3_MIN);
 			return -EINVAL;
 		}
 
@@ -729,8 +744,8 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
 
 		ipv4_key = nla_data(a[OVS_KEY_ATTR_IPV4]);
 		if (!is_mask && ipv4_key->ipv4_frag > OVS_FRAG_TYPE_MAX) {
-			OVS_NLERR("Unknown IPv4 fragment type (value=%d, max=%d).\n",
-				ipv4_key->ipv4_frag, OVS_FRAG_TYPE_MAX);
+			OVS_NLERR(log, "IPv4 frag type %d is out of range max %d",
+				  ipv4_key->ipv4_frag, OVS_FRAG_TYPE_MAX);
 			return -EINVAL;
 		}
 		SW_FLOW_KEY_PUT(match, ip.proto,
@@ -753,8 +768,8 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
 
 		ipv6_key = nla_data(a[OVS_KEY_ATTR_IPV6]);
 		if (!is_mask && ipv6_key->ipv6_frag > OVS_FRAG_TYPE_MAX) {
-			OVS_NLERR("Unknown IPv6 fragment type (value=%d, max=%d).\n",
-				ipv6_key->ipv6_frag, OVS_FRAG_TYPE_MAX);
+			OVS_NLERR(log, "IPv6 frag type %d is out of range max %d",
+				  ipv6_key->ipv6_frag, OVS_FRAG_TYPE_MAX);
 			return -EINVAL;
 		}
 		SW_FLOW_KEY_PUT(match, ipv6.label,
@@ -784,7 +799,7 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
 
 		arp_key = nla_data(a[OVS_KEY_ATTR_ARP]);
 		if (!is_mask && (arp_key->arp_op & htons(0xff00))) {
-			OVS_NLERR("Unknown ARP opcode (opcode=%d).\n",
+			OVS_NLERR(log, "Unknown ARP opcode (opcode=%d).",
 				  arp_key->arp_op);
 			return -EINVAL;
 		}
@@ -885,7 +900,7 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
 	}
 
 	if (attrs != 0) {
-		OVS_NLERR("Unknown key attributes (%llx).\n",
+		OVS_NLERR(log, "Unknown key attributes %llx",
 			  (unsigned long long)attrs);
 		return -EINVAL;
 	}
@@ -926,10 +941,14 @@ static void mask_set_nlattr(struct nlattr *attr, u8 val)
  * of this flow.
  * @mask: Optional. Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink
  * attribute specifies the mask field of the wildcarded flow.
+ * @log: Boolean to allow kernel error logging.  Normally true, but when
+ * probing for feature compatibility this should be passed in as false to
+ * suppress unnecessary error logging.
  */
 int ovs_nla_get_match(struct sw_flow_match *match,
 		      const struct nlattr *nla_key,
-		      const struct nlattr *nla_mask)
+		      const struct nlattr *nla_mask,
+		      bool log)
 {
 	const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
 	const struct nlattr *encap;
@@ -939,7 +958,7 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 	bool encap_valid = false;
 	int err;
 
-	err = parse_flow_nlattrs(nla_key, a, &key_attrs);
+	err = parse_flow_nlattrs(nla_key, a, &key_attrs, log);
 	if (err)
 		return err;
 
@@ -950,7 +969,7 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 
 		if (!((key_attrs & (1 << OVS_KEY_ATTR_VLAN)) &&
 		      (key_attrs & (1 << OVS_KEY_ATTR_ENCAP)))) {
-			OVS_NLERR("Invalid Vlan frame.\n");
+			OVS_NLERR(log, "Invalid Vlan frame.");
 			return -EINVAL;
 		}
 
@@ -961,22 +980,22 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 		encap_valid = true;
 
 		if (tci & htons(VLAN_TAG_PRESENT)) {
-			err = parse_flow_nlattrs(encap, a, &key_attrs);
+			err = parse_flow_nlattrs(encap, a, &key_attrs, log);
 			if (err)
 				return err;
 		} else if (!tci) {
 			/* Corner case for truncated 802.1Q header. */
 			if (nla_len(encap)) {
-				OVS_NLERR("Truncated 802.1Q header has non-zero encap attribute.\n");
+				OVS_NLERR(log, "Truncated 802.1Q header has non-zero encap attribute.");
 				return -EINVAL;
 			}
 		} else {
-			OVS_NLERR("Encap attribute is set for a non-VLAN frame.\n");
+			OVS_NLERR(log, "Encap attr is set for non-VLAN frame");
 			return  -EINVAL;
 		}
 	}
 
-	err = ovs_key_from_nlattrs(match, key_attrs, a, false);
+	err = ovs_key_from_nlattrs(match, key_attrs, a, false, log);
 	if (err)
 		return err;
 
@@ -1010,7 +1029,7 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 			nla_mask = newmask;
 		}
 
-		err = parse_flow_mask_nlattrs(nla_mask, a, &mask_attrs);
+		err = parse_flow_mask_nlattrs(nla_mask, a, &mask_attrs, log);
 		if (err)
 			goto free_newmask;
 
@@ -1022,7 +1041,7 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 			__be16 tci = 0;
 
 			if (!encap_valid) {
-				OVS_NLERR("Encap mask attribute is set for non-VLAN frame.\n");
+				OVS_NLERR(log, "Encap mask attribute is set for non-VLAN frame.");
 				err = -EINVAL;
 				goto free_newmask;
 			}
@@ -1034,12 +1053,13 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 			if (eth_type == htons(0xffff)) {
 				mask_attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE);
 				encap = a[OVS_KEY_ATTR_ENCAP];
-				err = parse_flow_mask_nlattrs(encap, a, &mask_attrs);
+				err = parse_flow_mask_nlattrs(encap, a,
+							      &mask_attrs, log);
 				if (err)
 					goto free_newmask;
 			} else {
-				OVS_NLERR("VLAN frames must have an exact match on the TPID (mask=%x).\n",
-						ntohs(eth_type));
+				OVS_NLERR(log, "VLAN frames must have an exact match on the TPID (mask=%x).",
+					  ntohs(eth_type));
 				err = -EINVAL;
 				goto free_newmask;
 			}
@@ -1048,18 +1068,19 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 				tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
 
 			if (!(tci & htons(VLAN_TAG_PRESENT))) {
-				OVS_NLERR("VLAN tag present bit must have an exact match (tci_mask=%x).\n", ntohs(tci));
+				OVS_NLERR(log, "VLAN tag present bit must have an exact match (tci_mask=%x).",
+					  ntohs(tci));
 				err = -EINVAL;
 				goto free_newmask;
 			}
 		}
 
-		err = ovs_key_from_nlattrs(match, mask_attrs, a, true);
+		err = ovs_key_from_nlattrs(match, mask_attrs, a, true, log);
 		if (err)
 			goto free_newmask;
 	}
 
-	if (!match_validate(match, key_attrs, mask_attrs))
+	if (!match_validate(match, key_attrs, mask_attrs, log))
 		err = -EINVAL;
 
 free_newmask:
@@ -1072,6 +1093,9 @@ free_newmask:
  * @key: Receives extracted in_port, priority, tun_key and skb_mark.
  * @attr: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute
  * sequence.
+ * @log: Boolean to allow kernel error logging.  Normally true, but when
+ * probing for feature compatibility this should be passed in as false to
+ * suppress unnecessary error logging.
  *
  * This parses a series of Netlink attributes that form a flow key, which must
  * take the same form accepted by flow_from_nlattrs(), but only enough of it to
@@ -1080,14 +1104,15 @@ free_newmask:
  */
 
 int ovs_nla_get_flow_metadata(const struct nlattr *attr,
-			      struct sw_flow_key *key)
+			      struct sw_flow_key *key,
+			      bool log)
 {
 	const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
 	struct sw_flow_match match;
 	u64 attrs = 0;
 	int err;
 
-	err = parse_flow_nlattrs(attr, a, &attrs);
+	err = parse_flow_nlattrs(attr, a, &attrs, log);
 	if (err)
 		return -EINVAL;
 
@@ -1096,7 +1121,7 @@ int ovs_nla_get_flow_metadata(const struct nlattr *attr,
 
 	key->phy.in_port = DP_MAX_PORTS;
 
-	return metadata_from_nlattrs(&match, &attrs, a, false);
+	return metadata_from_nlattrs(&match, &attrs, a, false, log);
 }
 
 int ovs_nla_put_flow(const struct sw_flow_key *swkey,
@@ -1316,12 +1341,12 @@ nla_put_failure:
 
 #define MAX_ACTIONS_BUFSIZE	(32 * 1024)
 
-static struct sw_flow_actions *nla_alloc_flow_actions(int size)
+static struct sw_flow_actions *nla_alloc_flow_actions(int size, bool log)
 {
 	struct sw_flow_actions *sfa;
 
 	if (size > MAX_ACTIONS_BUFSIZE) {
-		OVS_NLERR("Flow action size (%u bytes) exceeds maximum", size);
+		OVS_NLERR(log, "Flow action size %u bytes exceeds max", size);
 		return ERR_PTR(-EINVAL);
 	}
 
@@ -1341,7 +1366,7 @@ void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
 }
 
 static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa,
-				       int attr_len)
+				       int attr_len, bool log)
 {
 
 	struct sw_flow_actions *acts;
@@ -1361,7 +1386,7 @@ static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa,
 		new_acts_size = MAX_ACTIONS_BUFSIZE;
 	}
 
-	acts = nla_alloc_flow_actions(new_acts_size);
+	acts = nla_alloc_flow_actions(new_acts_size, log);
 	if (IS_ERR(acts))
 		return (void *)acts;
 
@@ -1376,11 +1401,11 @@ out:
 }
 
 static struct nlattr *__add_action(struct sw_flow_actions **sfa,
-				   int attrtype, void *data, int len)
+				   int attrtype, void *data, int len, bool log)
 {
 	struct nlattr *a;
 
-	a = reserve_sfa_size(sfa, nla_attr_size(len));
+	a = reserve_sfa_size(sfa, nla_attr_size(len), log);
 	if (IS_ERR(a))
 		return a;
 
@@ -1395,11 +1420,11 @@ static struct nlattr *__add_action(struct sw_flow_actions **sfa,
 }
 
 static int add_action(struct sw_flow_actions **sfa, int attrtype,
-		      void *data, int len)
+		      void *data, int len, bool log)
 {
 	struct nlattr *a;
 
-	a = __add_action(sfa, attrtype, data, len);
+	a = __add_action(sfa, attrtype, data, len, log);
 	if (IS_ERR(a))
 		return PTR_ERR(a);
 
@@ -1407,12 +1432,12 @@ static int add_action(struct sw_flow_actions **sfa, int attrtype,
 }
 
 static inline int add_nested_action_start(struct sw_flow_actions **sfa,
-					  int attrtype)
+					  int attrtype, bool log)
 {
 	int used = (*sfa)->actions_len;
 	int err;
 
-	err = add_action(sfa, attrtype, NULL, 0);
+	err = add_action(sfa, attrtype, NULL, 0, log);
 	if (err)
 		return err;
 
@@ -1431,12 +1456,12 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa,
 static int __ovs_nla_copy_actions(const struct nlattr *attr,
 				  const struct sw_flow_key *key,
 				  int depth, struct sw_flow_actions **sfa,
-				  __be16 eth_type, __be16 vlan_tci);
+				  __be16 eth_type, __be16 vlan_tci, bool log);
 
 static int validate_and_copy_sample(const struct nlattr *attr,
 				    const struct sw_flow_key *key, int depth,
 				    struct sw_flow_actions **sfa,
-				    __be16 eth_type, __be16 vlan_tci)
+				    __be16 eth_type, __be16 vlan_tci, bool log)
 {
 	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
 	const struct nlattr *probability, *actions;
@@ -1462,19 +1487,19 @@ static int validate_and_copy_sample(const struct nlattr *attr,
 		return -EINVAL;
 
 	/* validation done, copy sample action. */
-	start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SAMPLE);
+	start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SAMPLE, log);
 	if (start < 0)
 		return start;
 	err = add_action(sfa, OVS_SAMPLE_ATTR_PROBABILITY,
-			 nla_data(probability), sizeof(u32));
+			 nla_data(probability), sizeof(u32), log);
 	if (err)
 		return err;
-	st_acts = add_nested_action_start(sfa, OVS_SAMPLE_ATTR_ACTIONS);
+	st_acts = add_nested_action_start(sfa, OVS_SAMPLE_ATTR_ACTIONS, log);
 	if (st_acts < 0)
 		return st_acts;
 
 	err = __ovs_nla_copy_actions(actions, key, depth + 1, sfa,
-				     eth_type, vlan_tci);
+				     eth_type, vlan_tci, log);
 	if (err)
 		return err;
 
@@ -1511,7 +1536,7 @@ void ovs_match_init(struct sw_flow_match *match,
 }
 
 static int validate_and_copy_set_tun(const struct nlattr *attr,
-				     struct sw_flow_actions **sfa)
+				     struct sw_flow_actions **sfa, bool log)
 {
 	struct sw_flow_match match;
 	struct sw_flow_key key;
@@ -1520,7 +1545,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 	int err, start;
 
 	ovs_match_init(&match, &key, NULL);
-	err = ipv4_tun_from_nlattr(nla_data(attr), &match, false);
+	err = ipv4_tun_from_nlattr(nla_data(attr), &match, false, log);
 	if (err)
 		return err;
 
@@ -1549,12 +1574,12 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 		key.tun_key.tun_flags |= crit_opt ? TUNNEL_CRIT_OPT : 0;
 	};
 
-	start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SET);
+	start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SET, log);
 	if (start < 0)
 		return start;
 
 	a = __add_action(sfa, OVS_KEY_ATTR_TUNNEL_INFO, NULL,
-			 sizeof(*tun_info) + key.tun_opts_len);
+			 sizeof(*tun_info) + key.tun_opts_len, log);
 	if (IS_ERR(a))
 		return PTR_ERR(a);
 
@@ -1582,7 +1607,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 static int validate_set(const struct nlattr *a,
 			const struct sw_flow_key *flow_key,
 			struct sw_flow_actions **sfa,
-			bool *set_tun, __be16 eth_type)
+			bool *set_tun, __be16 eth_type, bool log)
 {
 	const struct nlattr *ovs_key = nla_data(a);
 	int key_type = nla_type(ovs_key);
@@ -1611,7 +1636,7 @@ static int validate_set(const struct nlattr *a,
 			return -EINVAL;
 
 		*set_tun = true;
-		err = validate_and_copy_set_tun(a, sfa);
+		err = validate_and_copy_set_tun(a, sfa, log);
 		if (err)
 			return err;
 		break;
@@ -1704,12 +1729,12 @@ static int validate_userspace(const struct nlattr *attr)
 }
 
 static int copy_action(const struct nlattr *from,
-		       struct sw_flow_actions **sfa)
+		       struct sw_flow_actions **sfa, bool log)
 {
 	int totlen = NLA_ALIGN(from->nla_len);
 	struct nlattr *to;
 
-	to = reserve_sfa_size(sfa, from->nla_len);
+	to = reserve_sfa_size(sfa, from->nla_len, log);
 	if (IS_ERR(to))
 		return PTR_ERR(to);
 
@@ -1720,7 +1745,7 @@ static int copy_action(const struct nlattr *from,
 static int __ovs_nla_copy_actions(const struct nlattr *attr,
 				  const struct sw_flow_key *key,
 				  int depth, struct sw_flow_actions **sfa,
-				  __be16 eth_type, __be16 vlan_tci)
+				  __be16 eth_type, __be16 vlan_tci, bool log)
 {
 	const struct nlattr *a;
 	bool out_tnl_port = false;
@@ -1843,7 +1868,7 @@ static int __ovs_nla_copy_actions(const struct nlattr *attr,
 
 		case OVS_ACTION_ATTR_SET:
 			err = validate_set(a, key, sfa,
-					   &out_tnl_port, eth_type);
+					   &out_tnl_port, eth_type, log);
 			if (err)
 				return err;
 
@@ -1852,18 +1877,18 @@ static int __ovs_nla_copy_actions(const struct nlattr *attr,
 
 		case OVS_ACTION_ATTR_SAMPLE:
 			err = validate_and_copy_sample(a, key, depth, sfa,
-						       eth_type, vlan_tci);
+						       eth_type, vlan_tci, log);
 			if (err)
 				return err;
 			skip_copy = true;
 			break;
 
 		default:
-			OVS_NLERR("Unknown tunnel attribute (%d).\n", type);
+			OVS_NLERR(log, "Unknown Action type %d", type);
 			return -EINVAL;
 		}
 		if (!skip_copy) {
-			err = copy_action(a, sfa);
+			err = copy_action(a, sfa, log);
 			if (err)
 				return err;
 		}
@@ -1877,16 +1902,16 @@ static int __ovs_nla_copy_actions(const struct nlattr *attr,
 
 int ovs_nla_copy_actions(const struct nlattr *attr,
 			 const struct sw_flow_key *key,
-			 struct sw_flow_actions **sfa)
+			 struct sw_flow_actions **sfa, bool log)
 {
 	int err;
 
-	*sfa = nla_alloc_flow_actions(nla_len(attr));
+	*sfa = nla_alloc_flow_actions(nla_len(attr), log);
 	if (IS_ERR(*sfa))
 		return PTR_ERR(*sfa);
 
 	err = __ovs_nla_copy_actions(attr, key, 0, sfa, key->eth.type,
-				     key->eth.tci);
+				     key->eth.tci, log);
 	if (err)
 		kfree(*sfa);
 
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index 90bbe37..577f12b 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -45,17 +45,17 @@ void ovs_match_init(struct sw_flow_match *match,
 
 int ovs_nla_put_flow(const struct sw_flow_key *,
 		     const struct sw_flow_key *, struct sk_buff *);
-int ovs_nla_get_flow_metadata(const struct nlattr *, struct sw_flow_key *);
+int ovs_nla_get_flow_metadata(const struct nlattr *, struct sw_flow_key *,
+			      bool log);
 
-int ovs_nla_get_match(struct sw_flow_match *match,
-		      const struct nlattr *,
-		      const struct nlattr *);
+int ovs_nla_get_match(struct sw_flow_match *, const struct nlattr *key,
+		      const struct nlattr *mask, bool log);
 int ovs_nla_put_egress_tunnel_key(struct sk_buff *,
 				  const struct ovs_tunnel_info *);
 
 int ovs_nla_copy_actions(const struct nlattr *attr,
 			 const struct sw_flow_key *key,
-			 struct sw_flow_actions **sfa);
+			 struct sw_flow_actions **sfa, bool log);
 int ovs_nla_put_actions(const struct nlattr *attr,
 			int len, struct sk_buff *skb);
 
-- 
1.9.3

^ permalink raw reply related

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
From: Jamal Hadi Salim @ 2014-11-10  4:03 UTC (permalink / raw)
  To: Simon Horman
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl
In-Reply-To: <20141110034612.GB17246@vergenet.net>

Hi Simon,

On 11/09/14 22:46, Simon Horman wrote:
> Hi Jamal, Hi Jiri,
>
> On a somewhat related note I am also wondering what if any progress has
> been made regarding discussions of (and code for) the following:
>
> 1. Exposing flow tables to user-space
>     - I realise that this is Open vSwitch specific to some extent
>       but I am in no way implying that it should be done instead of
>       non-Open vSwitch specific work.
>     - Jiri, IIRC this was part ~v2 of your earlier offload patchset
>

I dont know what Rocker crowd is doing; however, I know
John F. has been doing some work which i have stared at
and I was hoping to join in with Ben's effort and show tc flow
offload on the realtek chip in my infinite spare time unles.
(for both Linux bridge and ports).
The priority is to merge the obvious bits first.

> 2. Describing Switch Hardware
>     - I see John Fastabend moving forwards on this in his git repository
>       https://github.com/jrfastab/flow-net-next
>
> The way that I see things is that both of the above could be exposed via
> netlink. And that the first at least could be backed by NDOs.  As such I
> see this work as complementary and perhaps applying on top of this
> patchset. If I am mistaken in this regards it would be good to know :)
>

You are correct - I will let John speak on his work, but
that is the intent.
The challenge is there are many schools of thoughts and i am hoping
it is not an arms race.

> I am of course also interested to know if the above are moving forwards.
> To be clear I am very interested in being able to use these APIs to
> perform Open vSwitch offloads and I am very happy to help.
> (Jamal: I'm also interested in non-Open vSwitch offloads :)
>

Hey, OVS should be able to use these APIs; i am just interested in 
making sure they are not just for OVS or OF. Then we are all happy;->

cheers,
jamal

^ permalink raw reply

* Re: [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload
From: Simon Horman @ 2014-11-10  4:58 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, alexander.h.duyck, john.ronciak, mleitner,
	shrijeet, gospo, bcrl
In-Reply-To: <5460391C.3010107@mojatatu.com>

On Sun, Nov 09, 2014 at 11:03:40PM -0500, Jamal Hadi Salim wrote:
> Hi Simon,
> 
> On 11/09/14 22:46, Simon Horman wrote:
> >Hi Jamal, Hi Jiri,
> >
> >On a somewhat related note I am also wondering what if any progress has
> >been made regarding discussions of (and code for) the following:
> >
> >1. Exposing flow tables to user-space
> >    - I realise that this is Open vSwitch specific to some extent
> >      but I am in no way implying that it should be done instead of
> >      non-Open vSwitch specific work.
> >    - Jiri, IIRC this was part ~v2 of your earlier offload patchset
> >
> 
> I dont know what Rocker crowd is doing; however, I know
> John F. has been doing some work which i have stared at
> and I was hoping to join in with Ben's effort and show tc flow
> offload on the realtek chip in my infinite spare time unles.
> (for both Linux bridge and ports).
> The priority is to merge the obvious bits first.

Merging the obvious bits first is quite fine my me.

> >2. Describing Switch Hardware
> >    - I see John Fastabend moving forwards on this in his git repository
> >      https://github.com/jrfastab/flow-net-next
> >
> >The way that I see things is that both of the above could be exposed via
> >netlink. And that the first at least could be backed by NDOs.  As such I
> >see this work as complementary and perhaps applying on top of this
> >patchset. If I am mistaken in this regards it would be good to know :)
> >
> 
> You are correct - I will let John speak on his work, but
> that is the intent.
> The challenge is there are many schools of thoughts and i am hoping
> it is not an arms race.

That is also my hope.

> >I am of course also interested to know if the above are moving forwards.
> >To be clear I am very interested in being able to use these APIs to
> >perform Open vSwitch offloads and I am very happy to help.
> >(Jamal: I'm also interested in non-Open vSwitch offloads :)
> >
> 
> Hey, OVS should be able to use these APIs; i am just interested in making
> sure they are not just for OVS or OF. Then we are all happy;->

I think we are all happy :)

^ permalink raw reply

* [PATCH v2] VNIC: Adding support for Cavium ThunderX network controller
From: Robert Richter @ 2014-11-10  5:14 UTC (permalink / raw)
  To: David S. Miller
  Cc: Stephen Hemminger, Stefan Assmann, linux-kernel, linux-arm-kernel,
	netdev, Sunil Goutham, Robert Richter

From: Sunil Goutham <sgoutham@cavium.com>

This patch adds support for the Cavium ThunderX network controller.
The driver is on the pci bus and thus requires the Thunder PCIe host
controller driver to be enabled.

v2:
 * non-generic module parameters removed
 * ethtool support added (nicvf_set_rxnfc())

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 MAINTAINERS                                        |    7 +
 arch/arm64/Kconfig                                 |    1 +
 drivers/net/ethernet/Kconfig                       |    1 +
 drivers/net/ethernet/Makefile                      |    1 +
 drivers/net/ethernet/cavium/Kconfig                |   40 +
 drivers/net/ethernet/cavium/Makefile               |    5 +
 drivers/net/ethernet/cavium/thunder/Makefile       |   13 +
 drivers/net/ethernet/cavium/thunder/nic.h          |  434 +++++++
 drivers/net/ethernet/cavium/thunder/nic_main.c     |  807 ++++++++++++
 drivers/net/ethernet/cavium/thunder/nic_reg.h      |  214 ++++
 .../net/ethernet/cavium/thunder/nicvf_ethtool.c    |  559 +++++++++
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   | 1323 ++++++++++++++++++++
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 1302 +++++++++++++++++++
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h |  355 ++++++
 drivers/net/ethernet/cavium/thunder/q_struct.h     |  702 +++++++++++
 drivers/net/ethernet/cavium/thunder/thunder_bgx.c  |  386 ++++++
 drivers/net/ethernet/cavium/thunder/thunder_bgx.h  |   78 ++
 include/linux/pci_ids.h                            |    2 +
 18 files changed, 6230 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/Kconfig
 create mode 100644 drivers/net/ethernet/cavium/Makefile
 create mode 100644 drivers/net/ethernet/cavium/thunder/Makefile
 create mode 100644 drivers/net/ethernet/cavium/thunder/nic.h
 create mode 100644 drivers/net/ethernet/cavium/thunder/nic_main.c
 create mode 100644 drivers/net/ethernet/cavium/thunder/nic_reg.h
 create mode 100644 drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
 create mode 100644 drivers/net/ethernet/cavium/thunder/nicvf_main.c
 create mode 100644 drivers/net/ethernet/cavium/thunder/nicvf_queues.c
 create mode 100644 drivers/net/ethernet/cavium/thunder/nicvf_queues.h
 create mode 100644 drivers/net/ethernet/cavium/thunder/q_struct.h
 create mode 100644 drivers/net/ethernet/cavium/thunder/thunder_bgx.c
 create mode 100644 drivers/net/ethernet/cavium/thunder/thunder_bgx.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a20df9bf8ab0..9a36569d75df 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -882,6 +882,13 @@ M:	Krzysztof Halasa <khalasa@piap.pl>
 S:	Maintained
 F:	arch/arm/mach-cns3xxx/
 
+ARM/CAVIUM THUNDER ARCHITECTURE
+M:	Sunil Goutham <sgoutham@cavium.com>
+M:	Robert Richter <rric@kernel.org>
+L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+S:	Supported
+F:	drivers/net/ethernet/cavium/
+
 ARM/CIRRUS LOGIC CLPS711X ARM ARCHITECTURE
 M:	Alexander Shiyan <shc_work@mail.ru>
 L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ac9afde76dea..bc14b1a78ae2 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -145,6 +145,7 @@ config ARCH_THUNDER
 	bool "Cavium Inc. Thunder SoC Family"
 	help
 	  This enables support for Cavium's Thunder Family of SoCs.
+	select NET_VENDOR_CAVIUM
 
 config ARCH_VEXPRESS
 	bool "ARMv8 software model (Versatile Express)"
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 1ed1fbba5d58..7c872ff82aed 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -34,6 +34,7 @@ source "drivers/net/ethernet/adi/Kconfig"
 source "drivers/net/ethernet/broadcom/Kconfig"
 source "drivers/net/ethernet/brocade/Kconfig"
 source "drivers/net/ethernet/calxeda/Kconfig"
+source "drivers/net/ethernet/cavium/Kconfig"
 source "drivers/net/ethernet/chelsio/Kconfig"
 source "drivers/net/ethernet/cirrus/Kconfig"
 source "drivers/net/ethernet/cisco/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 6e0b629e9859..7a9f7d7b55bf 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_NET_BFIN) += adi/
 obj-$(CONFIG_NET_VENDOR_BROADCOM) += broadcom/
 obj-$(CONFIG_NET_VENDOR_BROCADE) += brocade/
 obj-$(CONFIG_NET_CALXEDA_XGMAC) += calxeda/
+obj-$(CONFIG_NET_VENDOR_CAVIUM) += cavium/
 obj-$(CONFIG_NET_VENDOR_CHELSIO) += chelsio/
 obj-$(CONFIG_NET_VENDOR_CIRRUS) += cirrus/
 obj-$(CONFIG_NET_VENDOR_CISCO) += cisco/
diff --git a/drivers/net/ethernet/cavium/Kconfig b/drivers/net/ethernet/cavium/Kconfig
new file mode 100644
index 000000000000..6365fb4242be
--- /dev/null
+++ b/drivers/net/ethernet/cavium/Kconfig
@@ -0,0 +1,40 @@
+#
+# Cavium ethernet device configuration
+#
+
+config NET_VENDOR_CAVIUM
+	tristate "Cavium ethernet drivers"
+	depends on PCI
+	---help---
+	  Enable support for the Cavium ThunderX Network Interface
+	  Controller (NIC). The NIC provides the controller and DMA
+	  engines to move network traffic to/from the memory. The NIC
+	  works closely with TNS, BGX and SerDes to implement the
+	  functions replacing and virtualizing those of a typical
+	  standalone PCIe NIC chip.
+
+	  If you have a Cavium Thunder board, say Y.
+
+if NET_VENDOR_CAVIUM
+
+config THUNDER_NIC_PF
+	tristate "Thunder Physical function driver"
+	default NET_VENDOR_CAVIUM
+	select THUNDER_NIC_BGX
+	---help---
+	  This driver supports Thunder's NIC physical function.
+
+config THUNDER_NIC_VF
+	tristate "Thunder Virtual function driver"
+	default NET_VENDOR_CAVIUM
+	---help---
+	  This driver supports Thunder's NIC virtual function
+
+config	THUNDER_NIC_BGX
+	tristate "Thunder MAC interface driver (BGX)"
+	default NET_VENDOR_CAVIUM
+	---help---
+	  This driver supports programming and controlling of MAC
+	  interface from NIC physical function driver.
+
+endif # NET_VENDOR_CAVIUM
diff --git a/drivers/net/ethernet/cavium/Makefile b/drivers/net/ethernet/cavium/Makefile
new file mode 100644
index 000000000000..7aac4780d050
--- /dev/null
+++ b/drivers/net/ethernet/cavium/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the Cavium ethernet device drivers.
+#
+
+obj-$(CONFIG_NET_VENDOR_CAVIUM) += thunder/
diff --git a/drivers/net/ethernet/cavium/thunder/Makefile b/drivers/net/ethernet/cavium/thunder/Makefile
new file mode 100644
index 000000000000..8ee6e043f8ff
--- /dev/null
+++ b/drivers/net/ethernet/cavium/thunder/Makefile
@@ -0,0 +1,13 @@
+#
+# Makefile for Cavium's Thunder ethernet device
+#
+
+
+# Don't change the order, NICPF driver is dependent on BGX driver init
+obj-$(CONFIG_THUNDER_NIC_BGX) += thunder_bgx.o
+obj-$(CONFIG_THUNDER_NIC_PF) += nicpf.o
+obj-$(CONFIG_THUNDER_NIC_VF) += nicvf.o
+
+nicpf-y := nic_main.o
+nicvf-y := nicvf_main.o nicvf_queues.o
+nicvf-y += nicvf_ethtool.o
diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
new file mode 100644
index 000000000000..be3530ca9cf2
--- /dev/null
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -0,0 +1,434 @@
+/*
+ * Copyright (C) 2014 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ */
+
+#ifndef NIC_H
+#define	NIC_H
+
+#include <linux/netdevice.h>
+#include <linux/interrupt.h>
+#include "thunder_bgx.h"
+
+/* PCI device IDs */
+#define	PCI_DEVICE_ID_THUNDER_NIC_PF	0xA01E
+#define	PCI_DEVICE_ID_THUNDER_NIC_VF	0x0011
+#define	PCI_DEVICE_ID_THUNDER_BGX	0xA026
+
+/* PCI BAR nos */
+#define	PCI_CFG_REG_BAR_NUM		0
+#define	PCI_MSIX_REG_BAR_NUM		4
+
+/* NIC SRIOV VF count */
+#define	MAX_NUM_VFS_SUPPORTED		128
+#define	DEFAULT_NUM_VF_ENABLED		8
+
+#define	NIC_TNS_BYPASS_MODE		0
+#define	NIC_TNS_MODE			1
+
+/* NIC priv flags */
+#define	NIC_SRIOV_ENABLED		(1 << 0)
+#define	NIC_TNS_ENABLED			(1 << 1)
+
+/* VNIC HW optimiation features */
+#undef	VNIC_RX_CSUM_OFFLOAD_SUPPORT
+#undef	VNIC_TX_CSUM_OFFLOAD_SUPPORT
+#define	VNIC_SG_SUPPORT
+#define	VNIC_TSO_SUPPORT
+#undef	VNIC_LRO_SUPPORT
+#undef  VNIC_RSS_SUPPORT
+
+/* TSO not supported in Thunder pass1 */
+#ifdef	VNIC_TSO_SUPPORT
+#define	VNIC_SW_TSO_SUPPORT
+#undef	VNIC_HW_TSO_SUPPORT
+#endif
+
+/* ETHTOOL enable or disable, undef this to disable */
+#define	NICVF_ETHTOOL_ENABLE
+
+/* Min/Max packet size */
+#define	NIC_HW_MIN_FRS			64
+#define	NIC_HW_MAX_FRS			9194 /* 9216 max packet including FCS */
+
+/* Max pkinds */
+#define	NIC_MAX_PKIND			16
+
+/* Rx Channels */
+/* Receive channel configuration in TNS bypass mode
+ * Below is configuration in TNS bypass mode
+ * BGX0-LMAC0-CHAN0 - VNIC CHAN0
+ * BGX0-LMAC1-CHAN0 - VNIC CHAN16
+ * ...
+ * BGX1-LMAC0-CHAN0 - VNIC CHAN128
+ * ...
+ * BGX1-LMAC3-CHAN0 - VNIC CHAN174
+ */
+#define	NIC_INF_COUNT			2  /* No of interfaces */
+#define	NIC_CHANS_PER_INF		128
+#define	NIC_MAX_CHANS			(NIC_INF_COUNT * NIC_CHANS_PER_INF)
+#define	NIC_CPI_COUNT			2048 /* No of channel parse indices */
+
+/* TNS bypass mode: 1-1 mapping between VNIC and BGX:LMAC */
+#define NIC_MAX_BGX			MAX_BGX_PER_CN88XX
+#define	NIC_CPI_PER_BGX			(NIC_CPI_COUNT / NIC_MAX_BGX)
+#define	NIC_MAX_CPI_PER_LMAC		64 /* Max when CPI_ALG is IP diffserv */
+#define	NIC_RSSI_PER_BGX		(NIC_RSSI_COUNT / NIC_MAX_BGX)
+
+/* Tx scheduling */
+#define	NIC_MAX_TL4			1024
+#define	NIC_MAX_TL4_SHAPERS		256 /* 1 shaper for 4 TL4s */
+#define	NIC_MAX_TL3			256
+#define	NIC_MAX_TL3_SHAPERS		64  /* 1 shaper for 4 TL3s */
+#define	NIC_MAX_TL2			64
+#define	NIC_MAX_TL2_SHAPERS		2  /* 1 shaper for 32 TL2s */
+#define	NIC_MAX_TL1			2
+
+/* TNS bypass mode */
+#define	NIC_TL4_PER_BGX			(NIC_MAX_TL4 / NIC_MAX_BGX)
+#define	NIC_TL4_PER_LMAC		(NIC_MAX_TL4 / NIC_CHANS_PER_INF)
+
+/* NIC VF Interrupts */
+#define	NICVF_INTR_CQ			0
+#define	NICVF_INTR_SQ			1
+#define	NICVF_INTR_RBDR			2
+#define	NICVF_INTR_PKT_DROP		3
+#define	NICVF_INTR_TCP_TIMER		4
+#define	NICVF_INTR_MBOX			5
+#define	NICVF_INTR_QS_ERR		6
+
+#define	NICVF_INTR_CQ_SHIFT		0
+#define	NICVF_INTR_SQ_SHIFT		8
+#define	NICVF_INTR_RBDR_SHIFT		16
+#define	NICVF_INTR_PKT_DROP_SHIFT	20
+#define	NICVF_INTR_TCP_TIMER_SHIFT	21
+#define	NICVF_INTR_MBOX_SHIFT		22
+#define	NICVF_INTR_QS_ERR_SHIFT		23
+
+#define	NICVF_INTR_CQ_MASK		(0xFF << NICVF_INTR_CQ_SHIFT)
+#define	NICVF_INTR_SQ_MASK		(0xFF << NICVF_INTR_SQ_SHIFT)
+#define	NICVF_INTR_RBDR_MASK		(0x03 << NICVF_INTR_RBDR_SHIFT)
+#define	NICVF_INTR_PKT_DROP_MASK	(1 << NICVF_INTR_PKT_DROP_SHIFT)
+#define	NICVF_INTR_TCP_TIMER_MASK	(1 << NICVF_INTR_TCP_TIMER_SHIFT)
+#define	NICVF_INTR_MBOX_MASK		(1 << NICVF_INTR_MBOX_SHIFT)
+#define	NICVF_INTR_QS_ERR_MASK		(1 << NICVF_INTR_QS_ERR_SHIFT)
+
+/* MSI-X interrupts */
+#define	NIC_PF_MSIX_VECTORS		10
+#define	NIC_VF_MSIX_VECTORS		20
+
+#define NIC_PF_INTR_ID_ECC0_SBE		0
+#define NIC_PF_INTR_ID_ECC0_DBE		1
+#define NIC_PF_INTR_ID_ECC1_SBE		2
+#define NIC_PF_INTR_ID_ECC1_DBE		3
+#define NIC_PF_INTR_ID_ECC2_SBE		4
+#define NIC_PF_INTR_ID_ECC2_DBE		5
+#define NIC_PF_INTR_ID_ECC3_SBE		6
+#define NIC_PF_INTR_ID_ECC3_DBE		7
+#define NIC_PF_INTR_ID_MBOX0		8
+#define NIC_PF_INTR_ID_MBOX1		9
+
+/* For CQ timer threshold interrupt */
+#define NIC_NS_PER_100_SYETEM_CLK	125
+#define NICPF_CLK_PER_INT_TICK		100
+
+struct nicvf_cq_poll {
+	uint8_t	cq_idx;		/* Completion queue index */
+	struct napi_struct napi;
+};
+
+#define	NIC_RSSI_COUNT			4096 /* Total no of RSS indices */
+#define NIC_MAX_RSS_HASH_BITS		8
+#define NIC_MAX_RSS_IDR_TBL_SIZE	(1 << NIC_MAX_RSS_HASH_BITS)
+#define RSS_HASH_KEY_SIZE		5 /* 320 bit key */
+
+#ifdef VNIC_RSS_SUPPORT
+struct nicvf_rss_info {
+	bool enable;
+#define	RSS_L2_EXTENDED_HASH_ENA	(1 << 0)
+#define	RSS_IP_HASH_ENA			(1 << 1)
+#define	RSS_TCP_HASH_ENA		(1 << 2)
+#define	RSS_TCP_SYN_DIS			(1 << 3)
+#define	RSS_UDP_HASH_ENA		(1 << 4)
+#define RSS_L4_EXTENDED_HASH_ENA	(1 << 5)
+#define	RSS_ROCE_ENA			(1 << 6)
+#define	RSS_L3_BI_DIRECTION_ENA		(1 << 7)
+#define	RSS_L4_BI_DIRECTION_ENA		(1 << 8)
+	uint64_t cfg;
+	uint8_t  hash_bits;
+	uint16_t rss_size;
+	uint8_t  ind_tbl[NIC_MAX_RSS_IDR_TBL_SIZE];
+	uint64_t key[RSS_HASH_KEY_SIZE];
+};
+#endif
+
+enum rx_stats_reg_offset {
+	RX_OCTS = 0x0,
+	RX_UCAST = 0x1,
+	RX_BCAST = 0x2,
+	RX_MCAST = 0x3,
+	RX_RED = 0x4,
+	RX_RED_OCTS = 0x5,
+	RX_ORUN = 0x6,
+	RX_ORUN_OCTS = 0x7,
+	RX_FCS = 0x8,
+	RX_L2ERR = 0x9,
+	RX_DRP_BCAST = 0xa,
+	RX_DRP_MCAST = 0xb,
+	RX_DRP_L3BCAST = 0xc,
+	RX_DRP_L3MCAST = 0xd,
+	RX_STATS_ENUM_LAST,
+};
+
+enum tx_stats_reg_offset {
+	TX_OCTS = 0x0,
+	TX_UCAST = 0x1,
+	TX_BCAST = 0x2,
+	TX_MCAST = 0x3,
+	TX_DROP = 0x4,
+	TX_STATS_ENUM_LAST,
+};
+
+struct nicvf_hw_stats {
+	u64 rx_bytes_ok;
+	u64 rx_ucast_frames_ok;
+	u64 rx_bcast_frames_ok;
+	u64 rx_mcast_frames_ok;
+	u64 rx_fcs_errors;
+	u64 rx_l2_errors;
+	u64 rx_drop_red;
+	u64 rx_drop_red_bytes;
+	u64 rx_drop_overrun;
+	u64 rx_drop_overrun_bytes;
+	u64 rx_drop_bcast;
+	u64 rx_drop_mcast;
+	u64 rx_drop_l3_bcast;
+	u64 rx_drop_l3_mcast;
+	u64 tx_bytes_ok;
+	u64 tx_ucast_frames_ok;
+	u64 tx_bcast_frames_ok;
+	u64 tx_mcast_frames_ok;
+	u64 tx_drops;
+};
+
+struct nicvf_drv_stats {
+	/* Rx */
+	u64 rx_frames_ok;
+	u64 rx_frames_64;
+	u64 rx_frames_127;
+	u64 rx_frames_255;
+	u64 rx_frames_511;
+	u64 rx_frames_1023;
+	u64 rx_frames_1518;
+	u64 rx_frames_jumbo;
+	u64 rx_drops;
+	/* Tx */
+	u64 tx_frames_ok;
+	u64 tx_drops;
+	u64 tx_busy;
+	u64 tx_tso;
+};
+
+struct nicvf {
+	struct net_device	*netdev;
+	struct pci_dev		*pdev;
+	uint8_t			vf_id;
+	uint8_t			tns_mode;
+	uint16_t		mtu;
+	struct queue_set	*qs;
+	uint8_t			num_qs;
+	void			*addnl_qs;
+	uint16_t		vf_mtu;
+	uint64_t		reg_base;
+	struct tasklet_struct	rbdr_task;
+	struct tasklet_struct	qs_err_task;
+	struct nicvf_cq_poll	*napi[8];
+#ifdef VNIC_RSS_SUPPORT
+	struct nicvf_rss_info	rss_info;
+#endif
+	uint8_t			cpi_alg;
+
+	struct nicvf_hw_stats   stats;
+	struct nicvf_drv_stats  drv_stats;
+	struct work_struct	reset_task;
+
+	/* MSI-X  */
+	bool			msix_enabled;
+	uint16_t		num_vec;
+	struct msix_entry	msix_entries[NIC_VF_MSIX_VECTORS];
+	char			irq_name[NIC_VF_MSIX_VECTORS][20];
+	uint8_t			irq_allocated[NIC_VF_MSIX_VECTORS];
+};
+
+struct nicpf {
+	struct net_device	*netdev;
+	struct pci_dev		*pdev;
+#define NIC_NODE_ID_MASK	0x300000000000
+#define NIC_NODE_ID(x)		((x & NODE_ID_MASK) >> 44)
+	uint8_t			node;
+	unsigned int		flags;
+	uint16_t		total_vf_cnt;   /* Total num of VF supported */
+	uint16_t		num_vf_en;      /* No of VF enabled */
+	uint64_t		reg_base;       /* Register start address */
+	struct pkind_cfg	pkind;
+	uint8_t			bgx_cnt;
+#define	NIC_SET_VF_LMAC_MAP(bgx, lmac)	(((bgx & 0xF) << 4) | (lmac & 0xF))
+#define	NIC_GET_BGX_FROM_VF_LMAC_MAP(map)	((map >> 4) & 0xF)
+#define	NIC_GET_LMAC_FROM_VF_LMAC_MAP(map)	(map & 0xF)
+	uint8_t			vf_lmac_map[MAX_LMAC];
+	uint16_t		cpi_base[MAX_NUM_VFS_SUPPORTED];
+	uint16_t		rss_ind_tbl_size;
+
+	/* MSI-X */
+	bool			msix_enabled;
+	uint16_t		num_vec;
+	struct msix_entry	msix_entries[NIC_PF_MSIX_VECTORS];
+	uint8_t			irq_allocated[NIC_PF_MSIX_VECTORS];
+};
+
+/* PF <--> VF Mailbox communication
+ * Eight 64bit registers are shared between PF and VF.
+ * Separate set for each VF.
+ * Writing '1' into last register mbx7 means end of message.
+ */
+
+/* PF <--> VF mailbox communication */
+#define	NIC_PF_VF_MAILBOX_SIZE		8
+#define	NIC_PF_VF_MBX_TIMEOUT		5000 /* ms */
+
+/* Mailbox message types */
+#define	NIC_PF_VF_MSG_READY		0x01	/* Is PF ready to rcv msgs */
+#define	NIC_PF_VF_MSG_ACK		0x02	/* ACK the message received */
+#define	NIC_PF_VF_MSG_NACK		0x03	/* NACK the message received */
+#define	NIC_PF_VF_MSG_QS_CFG		0x04	/* Configure Qset */
+#define	NIC_PF_VF_MSG_RQ_CFG		0x05	/* Configure receive queue */
+#define	NIC_PF_VF_MSG_SQ_CFG		0x06	/* Configure Send queue */
+#define	NIC_PF_VF_MSG_RQ_DROP_CFG	0x07	/* Configure receive queue */
+#define	NIC_PF_VF_MSG_SET_MAC		0x08	/* Add MAC ID to DMAC filter */
+#define	NIC_PF_VF_MSG_SET_MAX_FRS	0x09	/* Set max frame size */
+#define	NIC_PF_VF_MSG_CPI_CFG		0x0A	/* Config CPI, RSSI */
+#define	NIC_PF_VF_MSG_RSS_SIZE		0x0B	/* Get RSS indir_tbl size */
+#define	NIC_PF_VF_MSG_RSS_CFG		0x0C	/* Config RSS table */
+#define	NIC_PF_VF_MSG_RSS_CFG_CONT	0x0D	/* RSS config continuation */
+
+struct nic_cfg_msg {
+	uint64_t   vf_id;
+	uint64_t   tns_mode;
+	uint64_t   mac_addr;
+};
+
+/* Qset configuration */
+struct qs_cfg_msg {
+	uint64_t   num;
+	uint64_t   cfg;
+};
+
+/* Receive queue configuration */
+struct rq_cfg_msg {
+	uint64_t   qs_num;
+	uint64_t   rq_num;
+	uint64_t   cfg;
+};
+
+/* Send queue configuration */
+struct sq_cfg_msg {
+	uint64_t   qs_num;
+	uint64_t   sq_num;
+	uint64_t   cfg;
+};
+
+/* Set VF's MAC address */
+struct set_mac_msg {
+	uint64_t   vf_id;
+	uint64_t   addr;
+};
+
+/* Set Maximum frame size */
+struct set_frs_msg {
+	uint64_t   vf_id;
+	uint64_t   max_frs;
+};
+
+/* Set CPI algorithm type */
+struct cpi_cfg_msg {
+	uint64_t   vf_id;
+	uint64_t   rq_cnt;
+	uint64_t   cpi_alg;
+};
+
+#ifdef VNIC_RSS_SUPPORT
+/* Get RSS table size */
+struct rss_sz_msg {
+	uint64_t   vf_id;
+	uint64_t   ind_tbl_size;
+};
+
+/* Set RSS configuration */
+struct rss_cfg_msg {
+	uint8_t   vf_id;
+	uint8_t   hash_bits;
+	uint16_t  tbl_len;
+	uint16_t  tbl_offset;
+#define RSS_IND_TBL_LEN_PER_MBX_MSG	42
+	uint8_t   ind_tbl[RSS_IND_TBL_LEN_PER_MBX_MSG];
+};
+#endif
+
+/* Maximum 8 64bit locations */
+struct nic_mbx {
+#define	NIC_PF_VF_MBX_MSG_MASK		0xFFFF
+	uint16_t	   msg;
+#define	NIC_PF_VF_MBX_LOCK_OFFSET	0
+#define	NIC_PF_VF_MBX_LOCK_VAL(x)	((x >> 16) & 0xFFFF)
+#define	NIC_PF_VF_MBX_LOCK_CLEAR(x)	(x & ~(0xFFFF0000))
+#define	NIC_PF_VF_MBX_LOCK_SET(x)\
+	(NIC_PF_VF_MBX_LOCK_CLEAR(x) | (1 << 16))
+	uint16_t	   mbx_lock;
+	uint32_t	   unused;
+	union	{
+		struct nic_cfg_msg	nic_cfg;
+		struct qs_cfg_msg	qs;
+		struct rq_cfg_msg	rq;
+		struct sq_cfg_msg	sq;
+		struct set_mac_msg	mac;
+		struct set_frs_msg	frs;
+		struct cpi_cfg_msg	cpi_cfg;
+#ifdef VNIC_RSS_SUPPORT
+		struct rss_sz_msg	rss_size;
+		struct rss_cfg_msg	rss_cfg;
+#endif
+		uint64_t		rsvd[6];
+	} data;
+	uint64_t	   mbx_trigger_intr;
+};
+
+int nicvf_set_real_num_queues(struct net_device *netdev,
+			      int tx_queues, int rx_queues);
+int nicvf_open(struct net_device *netdev);
+int nicvf_stop(struct net_device *netdev);
+int nicvf_send_msg_to_pf(struct nicvf *vf, struct nic_mbx *mbx);
+void nicvf_config_cpi(struct nicvf *nic);
+#ifdef VNIC_RSS_SUPPORT
+void nicvf_config_rss(struct nicvf *nic);
+#endif
+void nicvf_free_skb(struct nicvf *nic, struct sk_buff *skb);
+#ifdef NICVF_ETHTOOL_ENABLE
+void nicvf_set_ethtool_ops(struct net_device *netdev);
+#endif
+void nicvf_update_stats(struct nicvf *nic);
+
+/* Debug */
+#undef	NIC_DEBUG
+
+#ifdef	NIC_DEBUG
+#define	nic_dbg(dev, fmt, arg...) \
+		dev_info(dev, fmt, ##arg)
+#else
+#define	nic_dbg(dev, fmt, arg...) do {} while (0)
+#endif
+
+#endif /* NIC_H */
diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c
new file mode 100644
index 000000000000..ce43deec5014
--- /dev/null
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -0,0 +1,807 @@
+/*
+ * Copyright (C) 2014 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/interrupt.h>
+#include <linux/pci.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+
+#include "nic_reg.h"
+#include "nic.h"
+#include "q_struct.h"
+#include "thunder_bgx.h"
+
+#define DRV_NAME	"thunder-nic"
+#define DRV_VERSION	"1.0"
+
+static void nic_config_cpi(struct nicpf *nic, struct cpi_cfg_msg *cfg);
+#ifdef VNIC_RSS_SUPPORT
+static void nic_send_rss_size(struct nicpf *nic, int vf);
+static void nic_config_rss(struct nicpf *nic, struct rss_cfg_msg *cfg);
+#endif
+static void nic_tx_channel_cfg(struct nicpf *nic, int vnic, int sq_idx);
+static int nic_update_hw_frs(struct nicpf *nic, int new_frs, int vf);
+
+/* Supported devices */
+static const struct pci_device_id nic_id_table[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVICE_ID_THUNDER_NIC_PF) },
+	{ 0, }  /* end of table */
+};
+
+MODULE_AUTHOR("Sunil Goutham");
+MODULE_DESCRIPTION("Cavium Thunder NIC Physical Function Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_VERSION(DRV_VERSION);
+MODULE_DEVICE_TABLE(pci, nic_id_table);
+
+/* Register read/write APIs */
+static void nic_reg_write(struct nicpf *nic, uint64_t offset, uint64_t val)
+{
+	uint64_t addr = nic->reg_base + offset;
+
+	writeq_relaxed(val, (void *)addr);
+}
+
+static uint64_t nic_reg_read(struct nicpf *nic, uint64_t offset)
+{
+	uint64_t addr = nic->reg_base + offset;
+
+	return readq_relaxed((void *)addr);
+}
+
+/* PF -> VF mailbox communication APIs */
+static void nic_enable_mbx_intr(struct nicpf *nic)
+{
+	/* Enable mailbox interrupt for all 128 VFs */
+	nic_reg_write(nic, NIC_PF_MAILBOX_ENA_W1S, ~0x00ull);
+	nic_reg_write(nic, NIC_PF_MAILBOX_ENA_W1S + (1 << 3), ~0x00ull);
+}
+
+static uint64_t nic_get_mbx_intr_status(struct nicpf *nic, int mbx_reg)
+{
+	return nic_reg_read(nic, NIC_PF_MAILBOX_INT + (mbx_reg << 3));
+}
+
+static void nic_clear_mbx_intr(struct nicpf *nic, int vf, int mbx_reg)
+{
+	nic_reg_write(nic, NIC_PF_MAILBOX_INT + (mbx_reg << 3), (1ULL << vf));
+}
+
+static uint64_t nic_get_mbx_addr(int vf)
+{
+	return NIC_PF_VF_0_127_MAILBOX_0_7 + (vf << NIC_VF_NUM_SHIFT);
+}
+
+static int nic_lock_mbox(struct nicpf *nic, int vf)
+{
+	int timeout = NIC_PF_VF_MBX_TIMEOUT;
+	int sleep = 10;
+	uint64_t lock, mbx_addr;
+
+	mbx_addr = nic_get_mbx_addr(vf) + NIC_PF_VF_MBX_LOCK_OFFSET;
+	lock = NIC_PF_VF_MBX_LOCK_VAL(nic_reg_read(nic, mbx_addr));
+	while (lock) {
+		msleep(sleep);
+		lock = NIC_PF_VF_MBX_LOCK_VAL(nic_reg_read(nic, mbx_addr));
+		timeout -= sleep;
+		if (!timeout) {
+			netdev_err(nic->netdev, "PF couldn't lock mailbox\n");
+			return 0;
+		}
+	}
+	lock = nic_reg_read(nic, mbx_addr);
+	nic_reg_write(nic, mbx_addr, NIC_PF_VF_MBX_LOCK_SET(lock));
+	return 1;
+}
+
+void nic_release_mbx(struct nicpf *nic, int vf)
+{
+	uint64_t mbx_addr, lock;
+
+	mbx_addr = nic_get_mbx_addr(vf) + NIC_PF_VF_MBX_LOCK_OFFSET;
+	lock = nic_reg_read(nic, mbx_addr);
+	nic_reg_write(nic, mbx_addr, NIC_PF_VF_MBX_LOCK_CLEAR(lock));
+}
+
+static int nic_send_msg_to_vf(struct nicpf *nic, int vf,
+			      struct nic_mbx *mbx, bool lock_needed)
+{
+	int i;
+	uint64_t *msg;
+	uint64_t mbx_addr;
+
+	if (lock_needed && (!nic_lock_mbox(nic, vf)))
+		return -EBUSY;
+
+	mbx->mbx_trigger_intr = 1;
+	msg = (uint64_t *)mbx;
+	mbx_addr = nic->reg_base + nic_get_mbx_addr(vf);
+
+	for (i = 0; i < NIC_PF_VF_MAILBOX_SIZE; i++)
+		writeq_relaxed(*(msg + i), (void *)(mbx_addr + (i * 8)));
+
+	if (lock_needed)
+		nic_release_mbx(nic, vf);
+	return 0;
+}
+
+static void nic_mbx_send_ready(struct nicpf *nic, int vf)
+{
+	struct nic_mbx mbx = {};
+
+	/* Respond with VNIC ID */
+	mbx.msg = NIC_PF_VF_MSG_READY;
+	mbx.data.nic_cfg.vf_id = vf;
+
+	if (nic->flags & NIC_TNS_ENABLED)
+		mbx.data.nic_cfg.tns_mode = NIC_TNS_MODE;
+	else
+		mbx.data.nic_cfg.tns_mode = NIC_TNS_BYPASS_MODE;
+
+	nic_send_msg_to_vf(nic, vf, &mbx, false);
+}
+
+static void nic_mbx_send_ack(struct nicpf *nic, int vf)
+{
+	struct nic_mbx mbx = {};
+
+	mbx.msg = NIC_PF_VF_MSG_ACK;
+	nic_send_msg_to_vf(nic, vf, &mbx, false);
+}
+
+static void nic_mbx_send_nack(struct nicpf *nic, int vf)
+{
+	struct nic_mbx mbx = {};
+
+	mbx.msg = NIC_PF_VF_MSG_NACK;
+	nic_send_msg_to_vf(nic, vf, &mbx, false);
+}
+
+/* Handle Mailbox messages from VF and ack the message. */
+static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
+{
+	struct nic_mbx mbx = {};
+	uint64_t *mbx_data;
+	uint64_t mbx_addr;
+	uint64_t reg_addr;
+	int bgx, lmac;
+	int i;
+	int ret = 0;
+
+	mbx_addr = nic_get_mbx_addr(vf);
+	mbx_data = (uint64_t *)&mbx;
+
+	for (i = 0; i < NIC_PF_VF_MAILBOX_SIZE; i++) {
+		*mbx_data = nic_reg_read(nic, mbx_addr);
+		mbx_data++;
+		mbx_addr += NIC_PF_VF_MAILBOX_SIZE;
+	}
+
+	mbx.msg &= NIC_PF_VF_MBX_MSG_MASK;
+	nic_dbg(&nic->pdev->dev, "%s: Mailbox msg %d from VF%d\n",
+		__func__, mbx.msg, vf);
+	switch (mbx.msg) {
+	case NIC_PF_VF_MSG_READY:
+		nic_mbx_send_ready(nic, vf);
+		ret = 1;
+		break;
+	case NIC_PF_VF_MSG_QS_CFG:
+		reg_addr = NIC_PF_QSET_0_127_CFG |
+			   (mbx.data.qs.num << NIC_QS_ID_SHIFT);
+		nic_reg_write(nic, reg_addr, mbx.data.qs.cfg);
+		break;
+	case NIC_PF_VF_MSG_RQ_CFG:
+		reg_addr = NIC_PF_QSET_0_127_RQ_0_7_CFG |
+			   (mbx.data.rq.qs_num << NIC_QS_ID_SHIFT) |
+			   (mbx.data.rq.rq_num << NIC_Q_NUM_SHIFT);
+		nic_reg_write(nic, reg_addr, mbx.data.rq.cfg);
+		break;
+	case NIC_PF_VF_MSG_RQ_DROP_CFG:
+		reg_addr = NIC_PF_QSET_0_127_RQ_0_7_DROP_CFG |
+			   (mbx.data.rq.qs_num << NIC_QS_ID_SHIFT) |
+			   (mbx.data.rq.rq_num << NIC_Q_NUM_SHIFT);
+		nic_reg_write(nic, reg_addr, mbx.data.rq.cfg);
+		break;
+	case NIC_PF_VF_MSG_SQ_CFG:
+		reg_addr = NIC_PF_QSET_0_127_SQ_0_7_CFG |
+			   (mbx.data.sq.qs_num << NIC_QS_ID_SHIFT) |
+			   (mbx.data.sq.sq_num << NIC_Q_NUM_SHIFT);
+		nic_reg_write(nic, reg_addr, mbx.data.sq.cfg);
+		nic_tx_channel_cfg(nic, mbx.data.qs.num, mbx.data.sq.sq_num);
+		break;
+	case NIC_PF_VF_MSG_SET_MAC:
+		lmac = mbx.data.mac.vf_id;
+		bgx = NIC_GET_BGX_FROM_VF_LMAC_MAP(nic->vf_lmac_map[lmac]);
+		lmac = NIC_GET_LMAC_FROM_VF_LMAC_MAP(nic->vf_lmac_map[lmac]);
+		bgx_add_dmac_addr(mbx.data.mac.addr, nic->node, bgx, lmac);
+		break;
+	case NIC_PF_VF_MSG_SET_MAX_FRS:
+		ret = nic_update_hw_frs(nic, mbx.data.frs.max_frs,
+					mbx.data.frs.vf_id);
+		break;
+	case NIC_PF_VF_MSG_CPI_CFG:
+		nic_config_cpi(nic, &mbx.data.cpi_cfg);
+		break;
+#ifdef VNIC_RSS_SUPPORT
+	case NIC_PF_VF_MSG_RSS_SIZE:
+		nic_send_rss_size(nic, vf);
+		break;
+	case NIC_PF_VF_MSG_RSS_CFG:
+	case NIC_PF_VF_MSG_RSS_CFG_CONT:
+		nic_config_rss(nic, &mbx.data.rss_cfg);
+		break;
+#endif
+	default:
+		netdev_err(nic->netdev,
+			   "Invalid msg from VF%d, msg 0x%x\n", vf, mbx.msg);
+		break;
+	}
+
+	if (!ret)
+		nic_mbx_send_ack(nic, vf);
+	else if (mbx.msg != NIC_PF_VF_MSG_READY)
+		nic_mbx_send_nack(nic, vf);
+}
+
+static int nic_update_hw_frs(struct nicpf *nic, int new_frs, int vf)
+{
+	if ((new_frs > NIC_HW_MAX_FRS) || (new_frs < NIC_HW_MIN_FRS)) {
+		netdev_err(nic->netdev,
+			   "Invalid MTU setting from VF%d rejected, should be between %d and %d\n",
+			   vf, NIC_HW_MIN_FRS, NIC_HW_MAX_FRS);
+		return 1;
+	}
+	new_frs += ETH_HLEN;
+	if (new_frs <= nic->pkind.maxlen)
+		return 0;
+
+	nic->pkind.maxlen = new_frs;
+	nic_reg_write(nic, NIC_PF_PKIND_0_15_CFG, *(uint64_t *)&nic->pkind);
+	return 0;
+}
+
+/* Set minimum transmit packet size */
+static void nic_set_tx_pkt_pad(struct nicpf *nic, int size)
+{
+	int lmac;
+	uint64_t lmac_cfg;
+
+	/* Max value that can be set is 60 */
+	if (size > 60)
+		size = 60;
+
+	for (lmac = 0; lmac < (MAX_BGX_PER_CN88XX * MAX_LMAC_PER_BGX); lmac++) {
+		lmac_cfg = nic_reg_read(nic, NIC_PF_LMAC_0_7_CFG | (lmac << 3));
+		lmac_cfg &= ~(0xF << 2);
+		lmac_cfg |= ((size / 4) << 2);
+		nic_reg_write(nic, NIC_PF_LMAC_0_7_CFG | (lmac << 3), lmac_cfg);
+	}
+}
+
+/* Function to check number of LMACs present and set VF to LMAC mapping.
+ * Mapping will be used while initializing channels.
+ */
+static void nic_set_lmac_vf_mapping(struct nicpf *nic)
+{
+	int bgx, bgx_count, next_bgx_lmac = 0;
+	int lmac, lmac_cnt = 0;
+
+	nic->num_vf_en = 0;
+	if (nic->flags & NIC_TNS_ENABLED) {
+		nic->num_vf_en = DEFAULT_NUM_VF_ENABLED;
+		return;
+	}
+
+	bgx_get_count(nic->node, &bgx_count);
+	for (bgx = 0; bgx < NIC_MAX_BGX; bgx++) {
+		if (!(bgx_count & (1 << bgx)))
+			continue;
+		nic->bgx_cnt++;
+		lmac_cnt = bgx_get_lmac_count(nic->node, bgx);
+		for (lmac = 0; lmac < lmac_cnt; lmac++)
+			nic->vf_lmac_map[next_bgx_lmac++] =
+						NIC_SET_VF_LMAC_MAP(bgx, lmac);
+		nic->num_vf_en += lmac_cnt;
+	}
+}
+
+static void nic_init_hw(struct nicpf *nic)
+{
+	int i;
+	uint64_t reg;
+
+	/* Reset NIC, incase if driver is repeatedly inserted and removed */
+	nic_reg_write(nic, NIC_PF_SOFT_RESET, 1);
+
+	/* Enable NIC HW block */
+	nic_reg_write(nic, NIC_PF_CFG, 1);
+
+	if (nic->flags & NIC_TNS_ENABLED) {
+		reg = NIC_TNS_MODE << 7;
+		reg |= 0x06;
+		nic_reg_write(nic, NIC_PF_INTF_0_1_SEND_CFG, reg);
+		reg &= ~0xFull;
+		reg |= 0x07;
+		nic_reg_write(nic, NIC_PF_INTF_0_1_SEND_CFG | (1 << 8), reg);
+	} else {
+		/* Disable TNS mode on both interfaces */
+		reg = NIC_TNS_BYPASS_MODE << 7;
+		reg |= 0x08; /* Block identifier */
+		nic_reg_write(nic, NIC_PF_INTF_0_1_SEND_CFG, reg);
+		reg &= ~0xFull;
+		reg |= 0x09;
+		nic_reg_write(nic, NIC_PF_INTF_0_1_SEND_CFG | (1 << 8), reg);
+	}
+
+	/* PKIND configuration */
+	nic->pkind.minlen = 0;
+	nic->pkind.maxlen = NIC_HW_MAX_FRS + ETH_HLEN;
+	nic->pkind.lenerr_en = 1;
+	nic->pkind.rx_hdr = 0;
+	nic->pkind.hdr_sl = 0;
+
+	for (i = 0; i < NIC_MAX_PKIND; i++)
+		nic_reg_write(nic, NIC_PF_PKIND_0_15_CFG | (i << 3),
+			      *(uint64_t *)&nic->pkind);
+
+	nic_set_tx_pkt_pad(nic, NIC_HW_MIN_FRS);
+
+	/* Disable backpressure for now */
+	for (i = 0; i < NIC_MAX_CHANS; i++)
+		nic_reg_write(nic, NIC_PF_CHAN_0_255_TX_CFG | (i << 3), 0);
+
+	/* Timer config */
+	nic_reg_write(nic, NIC_PF_INTR_TIMER_CFG, NICPF_CLK_PER_INT_TICK);
+}
+
+static void nic_config_cpi(struct nicpf *nic, struct cpi_cfg_msg *cfg)
+{
+	uint32_t vnic, bgx, lmac, chan;
+	uint32_t padd, cpi_count = 0;
+	uint64_t cpi_base, cpi, rssi_base, rssi;
+	uint8_t qset, rq_idx = 0;
+
+	vnic = cfg->vf_id;
+	bgx = NIC_GET_BGX_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vnic]);
+	lmac = NIC_GET_LMAC_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vnic]);
+
+	chan = (lmac * MAX_BGX_CHANS_PER_LMAC) + (bgx * NIC_CHANS_PER_INF);
+	cpi_base = (lmac * NIC_MAX_CPI_PER_LMAC) + (bgx * NIC_CPI_PER_BGX);
+	rssi_base = (lmac * nic->rss_ind_tbl_size) + (bgx * NIC_RSSI_PER_BGX);
+
+	/* Rx channel configuration */
+	nic_reg_write(nic, NIC_PF_CHAN_0_255_RX_CFG | (chan << 3),
+		      (cfg->cpi_alg << 62) | (cpi_base << 48));
+
+	if (cfg->cpi_alg == CPI_ALG_NONE)
+		cpi_count = 1;
+	else if (cfg->cpi_alg == CPI_ALG_VLAN) /* 3 bits of PCP */
+		cpi_count = 8;
+	else if (cfg->cpi_alg == CPI_ALG_VLAN16) /* 3 bits PCP + DEI */
+		cpi_count = 16;
+	else if (cfg->cpi_alg == CPI_ALG_DIFF) /* 6bits DSCP */
+		cpi_count = NIC_MAX_CPI_PER_LMAC;
+
+	/* RSS Qset, Qidx mapping */
+	qset = cfg->vf_id;
+	rssi = rssi_base;
+	for (; rssi < (rssi_base + cfg->rq_cnt); rssi++) {
+		nic_reg_write(nic, NIC_PF_RSSI_0_4097_RQ | (rssi << 3),
+			      (qset << 3) | rq_idx);
+		rq_idx++;
+	}
+
+	rssi = 0;
+	cpi = cpi_base;
+	for (; cpi < (cpi_base + cpi_count); cpi++) {
+		/* Determine port to channel adder */
+		if (cfg->cpi_alg != CPI_ALG_DIFF)
+			padd = cpi % cpi_count;
+		else
+			padd = cpi % 8; /* 3 bits CS out of 6bits DSCP */
+
+		/* Leave RSS_SIZE as '0' to disable RSS */
+		nic_reg_write(nic, NIC_PF_CPI_0_2047_CFG | (cpi << 3),
+			      (vnic << 24) | (padd << 16) | (rssi_base + rssi));
+
+		if ((rssi + 1) >= cfg->rq_cnt)
+			continue;
+
+		if (cfg->cpi_alg == CPI_ALG_VLAN)
+			rssi++;
+		else if (cfg->cpi_alg == CPI_ALG_VLAN16)
+			rssi = ((cpi - cpi_base) & 0xe) >> 1;
+		else if (cfg->cpi_alg == CPI_ALG_DIFF)
+			rssi = ((cpi - cpi_base) & 0x38) >> 3;
+	}
+	nic->cpi_base[cfg->vf_id] = cpi_base;
+}
+
+#ifdef VNIC_RSS_SUPPORT
+static void nic_send_rss_size(struct nicpf *nic, int vf)
+{
+	struct nic_mbx mbx = {};
+
+	mbx.msg = NIC_PF_VF_MSG_RSS_SIZE;
+	mbx.data.rss_size.ind_tbl_size = nic->rss_ind_tbl_size;
+	nic_send_msg_to_vf(nic, vf, &mbx, false);
+}
+
+static void nic_config_rss(struct nicpf *nic, struct rss_cfg_msg *cfg)
+{
+	uint64_t cpi_cfg, cpi_base, rssi_base, rssi;
+	uint8_t qset, idx = 0;
+
+	cpi_base = nic->cpi_base[cfg->vf_id];
+	cpi_cfg = nic_reg_read(nic, NIC_PF_CPI_0_2047_CFG | (cpi_base << 3));
+	rssi_base = cpi_cfg & 0x0FFF;
+
+	rssi = rssi_base + cfg->tbl_offset;
+	qset = cfg->vf_id;
+
+	for (; rssi < (rssi_base + cfg->tbl_len); rssi++) {
+		nic_reg_write(nic, NIC_PF_RSSI_0_4097_RQ | (rssi << 3),
+			      (qset << 3) | cfg->ind_tbl[idx]);
+		idx++;
+	}
+
+	cpi_cfg &= ~(0xFULL << 20);
+	cpi_cfg |= (cfg->hash_bits << 20);
+	nic_reg_write(nic, NIC_PF_CPI_0_2047_CFG | (cpi_base << 3), cpi_cfg);
+}
+#endif
+
+/* Transmit channel configuration (TL4 -> TL3 -> Chan)
+ * VNIC0-SQ0 -> TL4(0)  -> TL4A(0) -> TL3[0] -> BGX0/LMAC0/Chan0
+ * VNIC1-SQ0 -> TL4(8)  -> TL4A(2) -> TL3[2] -> BGX0/LMAC1/Chan0
+ * VNIC2-SQ0 -> TL4(16) -> TL4A(4) -> TL3[4] -> BGX0/LMAC2/Chan0
+ * VNIC3-SQ0 -> TL4(32) -> TL4A(6) -> TL3[6] -> BGX0/LMAC3/Chan0
+ * VNIC4-SQ0 -> TL4(512)  -> TL4A(128) -> TL3[128] -> BGX1/LMAC0/Chan0
+ * VNIC5-SQ0 -> TL4(520)  -> TL4A(130) -> TL3[130] -> BGX1/LMAC1/Chan0
+ * VNIC6-SQ0 -> TL4(528)  -> TL4A(132) -> TL3[132] -> BGX1/LMAC2/Chan0
+ * VNIC7-SQ0 -> TL4(536)  -> TL4A(134) -> TL3[134] -> BGX1/LMAC3/Chan0
+ */
+static void nic_tx_channel_cfg(struct nicpf *nic, int vnic, int sq_idx)
+{
+	uint32_t bgx, lmac, tl3, tl4;
+
+	bgx = NIC_GET_BGX_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vnic]);
+	lmac = NIC_GET_LMAC_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vnic]);
+
+	tl4 = (lmac * NIC_TL4_PER_LMAC) + (bgx * NIC_TL4_PER_BGX);
+	tl4 += sq_idx;
+
+	tl3 = tl4 / (NIC_MAX_TL4 / NIC_MAX_TL3);
+	nic_reg_write(nic, NIC_PF_QSET_0_127_SQ_0_7_CFG2 |
+				(vnic << NIC_QS_ID_SHIFT) |
+				(sq_idx << NIC_Q_NUM_SHIFT), tl4);
+	nic_reg_write(nic, NIC_PF_TL4_0_1023_CFG | (tl4 << 3),
+		      (vnic << 27) | (sq_idx << 24));
+	nic_reg_write(nic, NIC_PF_TL4A_0_255_CFG | (tl3 << 3), tl3);
+	nic_reg_write(nic, NIC_PF_TL3_0_255_CHAN | (tl3 << 3), lmac << 4);
+}
+
+static irqreturn_t nic_mbx0_intr_handler (int irq, void *nic_irq)
+{
+	int vf;
+	uint16_t vf_per_mbx_reg = 64;
+	uint64_t intr;
+	struct nicpf *nic = (struct nicpf *)nic_irq;
+
+	intr = nic_get_mbx_intr_status(nic, 0);
+	nic_dbg(&nic->pdev->dev, "PF MSIX interrupt Mbox0 0x%llx\n", intr);
+	for (vf = 0; vf < min(nic->num_vf_en, vf_per_mbx_reg); vf++) {
+		if (intr & (1ULL << vf)) {
+			nic_dbg(&nic->pdev->dev, "Intr from VF %d\n", vf);
+			nic_handle_mbx_intr(nic, vf);
+			nic_clear_mbx_intr(nic, vf, 0);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t nic_mbx1_intr_handler (int irq, void *nic_irq)
+{
+	int vf;
+	uint16_t vf_per_mbx_reg = 64;
+	uint64_t intr;
+	struct nicpf *nic = (struct nicpf *)nic_irq;
+
+	if (nic->num_vf_en <= vf_per_mbx_reg)
+		return IRQ_HANDLED;
+
+	intr = nic_get_mbx_intr_status(nic, 1);
+	nic_dbg(&nic->pdev->dev, "PF MSIX interrupt Mbox1 0x%llx\n", intr);
+	for (vf = 0; vf < (nic->num_vf_en - vf_per_mbx_reg); vf++) {
+		if (intr & (1ULL << vf)) {
+			nic_dbg(&nic->pdev->dev,
+				"Intr from VF %d\n", vf + vf_per_mbx_reg);
+			nic_handle_mbx_intr(nic, vf + vf_per_mbx_reg);
+			nic_clear_mbx_intr(nic, vf, 1);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static int nic_enable_msix(struct nicpf *nic)
+{
+	int i, ret;
+
+	nic->num_vec = NIC_PF_MSIX_VECTORS;
+
+	for (i = 0; i < nic->num_vec; i++)
+		nic->msix_entries[i].entry = i;
+
+	ret = pci_enable_msix(nic->pdev, nic->msix_entries, nic->num_vec);
+	if (ret) {
+		netdev_err(nic->netdev,
+			   "Request for #%d msix vectors failed\n",
+			   nic->num_vec);
+		return 0;
+	}
+
+	nic->msix_enabled = 1;
+	return 1;
+}
+
+static void nic_disable_msix(struct nicpf *nic)
+{
+	if (nic->msix_enabled) {
+		pci_disable_msix(nic->pdev);
+		nic->msix_enabled = 0;
+		nic->num_vec = 0;
+	}
+}
+
+static int nic_register_interrupts(struct nicpf *nic)
+{
+	int irq, ret = 0;
+
+	/* Enable MSI-X */
+	if (!nic_enable_msix(nic))
+		return 1;
+
+	/* Register mailbox interrupt handlers */
+	ret = request_irq(nic->msix_entries[NIC_PF_INTR_ID_MBOX0].vector,
+			  nic_mbx0_intr_handler, 0 , "NIC Mbox0", nic);
+	if (ret)
+		goto fail;
+	else
+		nic->irq_allocated[NIC_PF_INTR_ID_MBOX0] = 1;
+
+	ret = request_irq(nic->msix_entries[NIC_PF_INTR_ID_MBOX1].vector,
+			  nic_mbx1_intr_handler, 0 , "NIC Mbox1", nic);
+	if (ret)
+		goto fail;
+	else
+		nic->irq_allocated[NIC_PF_INTR_ID_MBOX1] = 1;
+
+	/* Enable mailbox interrupt */
+	nic_enable_mbx_intr(nic);
+	return 0;
+fail:
+	netdev_err(nic->netdev, "Request irq failed\n");
+	for (irq = 0; irq < nic->num_vec; irq++) {
+		if (nic->irq_allocated[irq])
+			free_irq(nic->msix_entries[irq].vector, nic);
+		nic->irq_allocated[irq] = 0;
+	}
+	return 1;
+}
+
+static void nic_unregister_interrupts(struct nicpf *nic)
+{
+	int irq;
+
+	/* Free registered interrupts */
+	for (irq = 0; irq < nic->num_vec; irq++) {
+		if (nic->irq_allocated[irq])
+			free_irq(nic->msix_entries[irq].vector, nic);
+		nic->irq_allocated[irq] = 0;
+	}
+
+	/* Disable MSI-X */
+	nic_disable_msix(nic);
+}
+
+int nic_sriov_configure(struct pci_dev *pdev, int num_vfs_requested)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct nicpf *nic = netdev_priv(netdev);
+	int err;
+
+	if (nic->num_vf_en == num_vfs_requested)
+		return num_vfs_requested;
+
+	if (nic->flags & NIC_SRIOV_ENABLED) {
+		pci_disable_sriov(pdev);
+		nic->flags &= ~NIC_SRIOV_ENABLED;
+	}
+
+	nic->num_vf_en = 0;
+	if (num_vfs_requested > MAX_NUM_VFS_SUPPORTED)
+		return -EPERM;
+
+	if (num_vfs_requested) {
+		err = pci_enable_sriov(pdev, num_vfs_requested);
+		if (err) {
+			dev_err(&pdev->dev, "SRIOV, Failed to enable %d VFs\n",
+				num_vfs_requested);
+			return err;
+		}
+		nic->num_vf_en = num_vfs_requested;
+		nic->flags |= NIC_SRIOV_ENABLED;
+	}
+
+	return num_vfs_requested;
+}
+
+static int nic_sriov_init(struct pci_dev *pdev, struct nicpf *nic)
+{
+	int    pos = 0;
+
+	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV);
+	if (!pos) {
+		dev_err(&pdev->dev, "SRIOV capability is not found in PCIe config space\n");
+		return 0;
+	}
+
+	pci_read_config_word(pdev, (pos + PCI_SRIOV_TOTAL_VF),
+			     &nic->total_vf_cnt);
+	if (nic->total_vf_cnt < nic->num_vf_en)
+		nic->num_vf_en = nic->total_vf_cnt;
+
+	if (nic->total_vf_cnt && pci_enable_sriov(pdev, nic->num_vf_en)) {
+		dev_err(&pdev->dev, "SRIOV enable failed, num VF is %d\n",
+			nic->num_vf_en);
+		nic->num_vf_en = 0;
+		return 0;
+	}
+	dev_info(&pdev->dev, "SRIOV enabled, numer of VF available %d\n",
+		 nic->num_vf_en);
+
+	nic->flags |= NIC_SRIOV_ENABLED;
+	return 1;
+}
+
+static int nic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	struct device *dev = &pdev->dev;
+	struct net_device *netdev;
+	struct nicpf *nic;
+	int    err;
+
+	netdev = alloc_etherdev(sizeof(struct nicpf));
+	if (!netdev)
+		return -ENOMEM;
+
+	pci_set_drvdata(pdev, netdev);
+
+	SET_NETDEV_DEV(netdev, &pdev->dev);
+
+	nic = netdev_priv(netdev);
+	nic->netdev = netdev;
+	nic->pdev = pdev;
+
+	err = pci_enable_device(pdev);
+	if (err) {
+		dev_err(dev, "Failed to enable PCI device\n");
+		goto exit;
+	}
+
+	err = pci_request_regions(pdev, DRV_NAME);
+	if (err) {
+		dev_err(dev, "PCI request regions failed 0x%x\n", err);
+		goto err_disable_device;
+	}
+
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(48));
+	if (err) {
+		dev_err(dev, "Unable to get usable DMA configuration\n");
+		goto err_release_regions;
+	}
+
+	err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(48));
+	if (err) {
+		dev_err(dev, "unable to get 48-bit DMA for consistent allocations\n");
+		goto err_release_regions;
+	}
+
+	/* MAP PF's configuration registers */
+	nic->reg_base = (uint64_t)pci_ioremap_bar(pdev, PCI_CFG_REG_BAR_NUM);
+	if (!nic->reg_base) {
+		dev_err(dev, "Cannot map config register space, aborting\n");
+		err = -ENOMEM;
+		goto err_release_regions;
+	}
+	nic->node = NIC_NODE_ID(pci_resource_start(pdev, PCI_CFG_REG_BAR_NUM));
+
+	/* By default set NIC in TNS bypass mode */
+	nic->flags &= ~NIC_TNS_ENABLED;
+	nic_set_lmac_vf_mapping(nic);
+
+	/* Initialize hardware */
+	nic_init_hw(nic);
+
+	/* Set RSS TBL size for each VF */
+	nic->rss_ind_tbl_size = max((NIC_RSSI_COUNT / nic->num_vf_en),
+				    NIC_MAX_RSS_IDR_TBL_SIZE);
+	nic->rss_ind_tbl_size = rounddown_pow_of_two(nic->rss_ind_tbl_size);
+
+	/* Register interrupts */
+	if (nic_register_interrupts(nic))
+		goto err_unmap_resources;
+
+	/* Configure SRIOV */
+	if (!nic_sriov_init(pdev, nic))
+		goto err_unmap_resources;
+
+	goto exit;
+
+err_unmap_resources:
+	if (nic->reg_base)
+		iounmap((void *)nic->reg_base);
+err_release_regions:
+	pci_release_regions(pdev);
+err_disable_device:
+	pci_disable_device(pdev);
+exit:
+	return err;
+}
+
+static void nic_remove(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct nicpf *nic;
+
+	if (!netdev)
+		return;
+
+	nic = netdev_priv(netdev);
+
+	nic_unregister_interrupts(nic);
+
+	if (nic->flags & NIC_SRIOV_ENABLED)
+		pci_disable_sriov(pdev);
+
+	pci_set_drvdata(pdev, NULL);
+
+	if (nic->reg_base)
+		iounmap((void *)nic->reg_base);
+
+	pci_release_regions(pdev);
+	pci_disable_device(pdev);
+	free_netdev(netdev);
+}
+
+static struct pci_driver nic_driver = {
+	.name = DRV_NAME,
+	.id_table = nic_id_table,
+	.probe = nic_probe,
+	.remove = nic_remove,
+	.sriov_configure = nic_sriov_configure,
+};
+
+static int __init nic_init_module(void)
+{
+	pr_info("%s, ver %s\n", DRV_NAME, DRV_VERSION);
+
+	return pci_register_driver(&nic_driver);
+}
+
+static void __exit nic_cleanup_module(void)
+{
+	pci_unregister_driver(&nic_driver);
+}
+
+module_init(nic_init_module);
+module_exit(nic_cleanup_module);
diff --git a/drivers/net/ethernet/cavium/thunder/nic_reg.h b/drivers/net/ethernet/cavium/thunder/nic_reg.h
new file mode 100644
index 000000000000..cbf1c9f73ef1
--- /dev/null
+++ b/drivers/net/ethernet/cavium/thunder/nic_reg.h
@@ -0,0 +1,214 @@
+/*
+ * Copyright (C) 2014 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ */
+
+#ifndef NIC_REG_H
+#define NIC_REG_H
+
+#define   NIC_PF_REG_COUNT			29573
+#define   NIC_VF_REG_COUNT			249
+
+/* Physical function register offsets */
+#define   NIC_PF_CFG				(0x0000)
+#define   NIC_PF_STATUS				(0x0010)
+#define   NIC_PF_INTR_TIMER_CFG			(0x0030)
+#define   NIC_PF_BIST_STATUS			(0x0040)
+#define   NIC_PF_SOFT_RESET			(0x0050)
+#define   NIC_PF_TCP_TIMER			(0x0060)
+#define   NIC_PF_BP_CFG				(0x0080)
+#define   NIC_PF_RRM_CFG			(0x0088)
+#define   NIC_PF_CQM_CF				(0x00A0)
+#define   NIC_PF_CNM_CF				(0x00A8)
+#define   NIC_PF_CNM_STATUS			(0x00B0)
+#define   NIC_PF_CQ_AVG_CFG			(0x00C0)
+#define   NIC_PF_RRM_AVG_CFG			(0x00C8)
+#define   NIC_PF_INTF_0_1_SEND_CFG		(0x0200)
+#define   NIC_PF_INTF_0_1_BP_CFG		(0x0208)
+#define   NIC_PF_INTF_0_1_BP_DIS_0_1		(0x0210)
+#define   NIC_PF_INTF_0_1_BP_SW_0_1		(0x0220)
+#define   NIC_PF_RBDR_BP_STATE_0_3		(0x0240)
+#define   NIC_PF_MAILBOX_INT			(0x0410)
+#define   NIC_PF_MAILBOX_INT_W1S		(0x0430)
+#define   NIC_PF_MAILBOX_ENA_W1C		(0x0450)
+#define   NIC_PF_MAILBOX_ENA_W1S		(0x0470)
+#define   NIC_PF_RX_ETYPE_0_7			(0x0500)
+#define   NIC_PF_PKIND_0_15_CFG			(0x0600)
+#define   NIC_PF_ECC0_FLIP0			(0x1000)
+#define   NIC_PF_ECC1_FLIP0			(0x1008)
+#define   NIC_PF_ECC2_FLIP0			(0x1010)
+#define   NIC_PF_ECC3_FLIP0			(0x1018)
+#define   NIC_PF_ECC0_FLIP1			(0x1080)
+#define   NIC_PF_ECC1_FLIP1			(0x1088)
+#define   NIC_PF_ECC2_FLIP1			(0x1090)
+#define   NIC_PF_ECC3_FLIP1			(0x1098)
+#define   NIC_PF_ECC0_CDIS			(0x1100)
+#define   NIC_PF_ECC1_CDIS			(0x1108)
+#define   NIC_PF_ECC2_CDIS			(0x1110)
+#define   NIC_PF_ECC3_CDIS			(0x1118)
+#define   NIC_PF_BIST0_STATUS			(0x1280)
+#define   NIC_PF_BIST1_STATUS			(0x1288)
+#define   NIC_PF_BIST2_STATUS			(0x1290)
+#define   NIC_PF_BIST3_STATUS			(0x1298)
+#define   NIC_PF_ECC0_SBE_INT			(0x2000)
+#define   NIC_PF_ECC0_SBE_INT_W1S		(0x2008)
+#define   NIC_PF_ECC0_SBE_ENA_W1C		(0x2010)
+#define   NIC_PF_ECC0_SBE_ENA_W1S		(0x2018)
+#define   NIC_PF_ECC0_DBE_INT			(0x2100)
+#define   NIC_PF_ECC0_DBE_INT_W1S		(0x2108)
+#define   NIC_PF_ECC0_DBE_ENA_W1C		(0x2110)
+#define   NIC_PF_ECC0_DBE_ENA_W1S		(0x2118)
+#define   NIC_PF_ECC1_SBE_INT			(0x2200)
+#define   NIC_PF_ECC1_SBE_INT_W1S		(0x2208)
+#define   NIC_PF_ECC1_SBE_ENA_W1C		(0x2210)
+#define   NIC_PF_ECC1_SBE_ENA_W1S		(0x2218)
+#define   NIC_PF_ECC1_DBE_INT			(0x2300)
+#define   NIC_PF_ECC1_DBE_INT_W1S		(0x2308)
+#define   NIC_PF_ECC1_DBE_ENA_W1C		(0x2310)
+#define   NIC_PF_ECC1_DBE_ENA_W1S		(0x2318)
+#define   NIC_PF_ECC2_SBE_INT			(0x2400)
+#define   NIC_PF_ECC2_SBE_INT_W1S		(0x2408)
+#define   NIC_PF_ECC2_SBE_ENA_W1C		(0x2410)
+#define   NIC_PF_ECC2_SBE_ENA_W1S		(0x2418)
+#define   NIC_PF_ECC2_DBE_INT			(0x2500)
+#define   NIC_PF_ECC2_DBE_INT_W1S		(0x2508)
+#define   NIC_PF_ECC2_DBE_ENA_W1C		(0x2510)
+#define   NIC_PF_ECC2_DBE_ENA_W1S		(0x2518)
+#define   NIC_PF_ECC3_SBE_INT			(0x2600)
+#define   NIC_PF_ECC3_SBE_INT_W1S		(0x2608)
+#define   NIC_PF_ECC3_SBE_ENA_W1C		(0x2610)
+#define   NIC_PF_ECC3_SBE_ENA_W1S		(0x2618)
+#define   NIC_PF_ECC3_DBE_INT			(0x2700)
+#define   NIC_PF_ECC3_DBE_INT_W1S		(0x2708)
+#define   NIC_PF_ECC3_DBE_ENA_W1C		(0x2710)
+#define   NIC_PF_ECC3_DBE_ENA_W1S		(0x2718)
+#define   NIC_PF_CPI_0_2047_CFG			(0x200000)
+#define   NIC_PF_RSSI_0_4097_RQ			(0x220000)
+#define   NIC_PF_LMAC_0_7_CFG			(0x240000)
+#define   NIC_PF_LMAC_0_7_SW_XOFF		(0x242000)
+#define   NIC_PF_LMAC_0_7_CREDIT		(0x244000)
+#define   NIC_PF_CHAN_0_255_TX_CFG		(0x400000)
+#define   NIC_PF_CHAN_0_255_RX_CFG		(0x420000)
+#define   NIC_PF_CHAN_0_255_SW_XOFF		(0x440000)
+#define   NIC_PF_CHAN_0_255_CREDIT		(0x460000)
+#define   NIC_PF_CHAN_0_255_RX_BP_CFG		(0x480000)
+#define   NIC_PF_SW_SYNC_RX			(0x490000)
+#define   NIC_PF_SW_SYNC_RX_DONE		(0x490008)
+#define   NIC_PF_TL2_0_63_CFG			(0x500000)
+#define   NIC_PF_TL2_0_63_PRI			(0x520000)
+#define   NIC_PF_TL2_0_63_SH_STATUS		(0x580000)
+#define   NIC_PF_TL3A_0_63_CFG			(0x5F0000)
+#define   NIC_PF_TL3_0_255_CFG			(0x600000)
+#define   NIC_PF_TL3_0_255_CHAN			(0x620000)
+#define   NIC_PF_TL3_0_255_PIR			(0x640000)
+#define   NIC_PF_TL3_0_255_SW_XOFF		(0x660000)
+#define   NIC_PF_TL3_0_255_CNM_RATE		(0x680000)
+#define   NIC_PF_TL3_0_255_SH_STATUS		(0x6A0000)
+#define   NIC_PF_TL4A_0_255_CFG			(0x6F0000)
+#define   NIC_PF_TL4_0_1023_CFG			(0x800000)
+#define   NIC_PF_TL4_0_1023_SW_XOFF		(0x820000)
+#define   NIC_PF_TL4_0_1023_SH_STATUS		(0x840000)
+#define   NIC_PF_TL4A_0_1023_CNM_RATE		(0x880000)
+#define   NIC_PF_TL4A_0_1023_CNM_STATUS		(0x8A0000)
+#define   NIC_PF_VF_0_127_MAILBOX_0_7		(0x20002000)
+#define   NIC_PF_VNIC_0_127_TX_STAT_0_4		(0x20004000)
+#define   NIC_PF_VNIC_0_127_RX_STAT_0_13	(0x20004100)
+#define   NIC_PF_QSET_0_127_LOCK_0_15		(0x20006000)
+#define   NIC_PF_QSET_0_127_CFG			(0x20010000)
+#define   NIC_PF_QSET_0_127_RQ_0_7_CFG		(0x20010400)
+#define   NIC_PF_QSET_0_127_RQ_0_7_DROP_CFG	(0x20010420)
+#define   NIC_PF_QSET_0_127_RQ_0_7_BP_CFG	(0x20010500)
+#define   NIC_PF_QSET_0_127_RQ_0_7_STAT_0_1	(0x20010600)
+#define   NIC_PF_QSET_0_127_SQ_0_7_CFG		(0x20010C00)
+#define   NIC_PF_QSET_0_127_SQ_0_7_CFG2		(0x20010C08)
+#define   NIC_PF_QSET_0_127_SQ_0_7_STAT_0_1	(0x20010D00)
+
+#define   NIC_PF_MSIX_VEC_0_18_ADDR		(0x000000)
+#define   NIC_PF_MSIX_VEC_0_CTL			(0x000008)
+#define   NIC_PF_MSIX_PBA_0			(0x0F0000)
+
+/* Virtual function register offsets */
+#define   NIC_VNIC_CFG				(0x000020)
+#define   NIC_VF_PF_MAILBOX_0_7			(0x000100)
+#define   NIC_VF_INT				(0x000200)
+#define   NIC_VF_INT_W1S			(0x000220)
+#define   NIC_VF_ENA_W1C			(0x000240)
+#define   NIC_VF_ENA_W1S			(0x000260)
+
+#define   NIC_VNIC_RSS_CFG			(0x0020E0)
+#define   NIC_VNIC_RSS_KEY_0_4			(0x002200)
+#define   NIC_VNIC_TX_STAT_0_4			(0x004000)
+#define   NIC_VNIC_RX_STAT_0_13			(0x004100)
+#define   NIC_QSET_RQ_GEN_CFG			(0x010010)
+
+#define   NIC_QSET_CQ_0_7_CFG			(0x010400)
+#define   NIC_QSET_CQ_0_7_CFG2			(0x010408)
+#define   NIC_QSET_CQ_0_7_THRESH		(0x010410)
+#define   NIC_QSET_CQ_0_7_BASE			(0x010420)
+#define   NIC_QSET_CQ_0_7_HEAD			(0x010428)
+#define   NIC_QSET_CQ_0_7_TAIL			(0x010430)
+#define   NIC_QSET_CQ_0_7_DOOR			(0x010438)
+#define   NIC_QSET_CQ_0_7_STATUS		(0x010440)
+#define   NIC_QSET_CQ_0_7_STATUS2		(0x010448)
+#define   NIC_QSET_CQ_0_7_DEBUG			(0x010450)
+
+#define   NIC_QSET_RQ_0_7_CFG			(0x010600)
+#define   NIC_QSET_RQ_0_7_STAT_0_1		(0x010700)
+
+#define   NIC_QSET_SQ_0_7_CFG			(0x010800)
+#define   NIC_QSET_SQ_0_7_THRESH		(0x010810)
+#define   NIC_QSET_SQ_0_7_BASE			(0x010820)
+#define   NIC_QSET_SQ_0_7_HEAD			(0x010828)
+#define   NIC_QSET_SQ_0_7_TAIL			(0x010830)
+#define   NIC_QSET_SQ_0_7_DOOR			(0x010838)
+#define   NIC_QSET_SQ_0_7_STATUS		(0x010840)
+#define   NIC_QSET_SQ_0_7_DEBUG			(0x010848)
+#define   NIC_QSET_SQ_0_7_CNM_CHG		(0x010860)
+#define   NIC_QSET_SQ_0_7_STAT_0_1		(0x010900)
+
+#define   NIC_QSET_RBDR_0_1_CFG			(0x010C00)
+#define   NIC_QSET_RBDR_0_1_THRESH		(0x010C10)
+#define   NIC_QSET_RBDR_0_1_BASE		(0x010C20)
+#define   NIC_QSET_RBDR_0_1_HEAD		(0x010C28)
+#define   NIC_QSET_RBDR_0_1_TAIL		(0x010C30)
+#define   NIC_QSET_RBDR_0_1_DOOR		(0x010C38)
+#define   NIC_QSET_RBDR_0_1_STATUS0		(0x010C40)
+#define   NIC_QSET_RBDR_0_1_STATUS1		(0x010C48)
+#define   NIC_QSET_RBDR_0_1_PREFETCH_STATUS	(0x010C50)
+
+#define   NIC_VF_MSIX_VECTOR_0_19_ADDR		(0x000000)
+#define   NIC_VF_MSIX_VECTOR_0_19_CTL		(0x000008)
+#define   NIC_VF_MSIX_PBA			(0x0F0000)
+
+/* Offsets within registers */
+#define   NIC_MSIX_VEC_SHIFT			4
+#define   NIC_Q_NUM_SHIFT			18
+#define   NIC_QS_ID_SHIFT			21
+#define   NIC_VF_NUM_SHIFT			21
+
+/* Port kind configuration register */
+struct pkind_cfg {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t reserved_42_63:22;
+	uint64_t hdr_sl:5;	/* Header skip length */
+	uint64_t rx_hdr:3;	/* TNS Receive header present */
+	uint64_t lenerr_en:1;	/* L2 length error check enable */
+	uint64_t reserved_32_32:1;
+	uint64_t maxlen:16;	/* Max frame size */
+	uint64_t minlen:16;	/* Min frame size */
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t minlen:16;
+	uint64_t maxlen:16;
+	uint64_t reserved_32_32:1;
+	uint64_t lenerr_en:1;
+	uint64_t rx_hdr:3;
+	uint64_t hdr_sl:5;
+	uint64_t reserved_42_63:22;
+#endif
+};
+
+#endif /* NIC_REG_H */
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
new file mode 100644
index 000000000000..14a9ebaf043a
--- /dev/null
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
@@ -0,0 +1,559 @@
+/*
+ * Copyright (C) 2014 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ */
+
+/* ETHTOOL Support for VNIC_VF Device*/
+
+#include <linux/pci.h>
+
+#include "nic_reg.h"
+#include "nic.h"
+#include "nicvf_queues.h"
+#include "q_struct.h"
+
+#define DRV_NAME	"thunder-nicvf"
+#define DRV_VERSION     "1.0"
+
+struct nicvf_stat {
+	char name[ETH_GSTRING_LEN];
+	unsigned int index;
+};
+
+#define NICVF_HW_STAT(stat) { \
+	.name = #stat, \
+	.index = offsetof(struct nicvf_hw_stats, stat) / sizeof(u64), \
+}
+
+#define NICVF_DRV_STAT(stat) { \
+	.name = #stat, \
+	.index = offsetof(struct nicvf_drv_stats, stat) / sizeof(u64), \
+}
+
+static const struct nicvf_stat nicvf_hw_stats[] = {
+	NICVF_HW_STAT(rx_bytes_ok),
+	NICVF_HW_STAT(rx_ucast_frames_ok),
+	NICVF_HW_STAT(rx_bcast_frames_ok),
+	NICVF_HW_STAT(rx_mcast_frames_ok),
+	NICVF_HW_STAT(rx_fcs_errors),
+	NICVF_HW_STAT(rx_l2_errors),
+	NICVF_HW_STAT(rx_drop_red),
+	NICVF_HW_STAT(rx_drop_red_bytes),
+	NICVF_HW_STAT(rx_drop_overrun),
+	NICVF_HW_STAT(rx_drop_overrun_bytes),
+	NICVF_HW_STAT(rx_drop_bcast),
+	NICVF_HW_STAT(rx_drop_mcast),
+	NICVF_HW_STAT(rx_drop_l3_bcast),
+	NICVF_HW_STAT(rx_drop_l3_mcast),
+	NICVF_HW_STAT(tx_bytes_ok),
+	NICVF_HW_STAT(tx_ucast_frames_ok),
+	NICVF_HW_STAT(tx_bcast_frames_ok),
+	NICVF_HW_STAT(tx_mcast_frames_ok),
+};
+
+static const struct nicvf_stat nicvf_drv_stats[] = {
+	NICVF_DRV_STAT(rx_frames_ok),
+	NICVF_DRV_STAT(rx_frames_64),
+	NICVF_DRV_STAT(rx_frames_127),
+	NICVF_DRV_STAT(rx_frames_255),
+	NICVF_DRV_STAT(rx_frames_511),
+	NICVF_DRV_STAT(rx_frames_1023),
+	NICVF_DRV_STAT(rx_frames_1518),
+	NICVF_DRV_STAT(rx_frames_jumbo),
+	NICVF_DRV_STAT(rx_drops),
+	NICVF_DRV_STAT(tx_frames_ok),
+	NICVF_DRV_STAT(tx_busy),
+	NICVF_DRV_STAT(tx_tso),
+	NICVF_DRV_STAT(tx_drops),
+};
+
+static const struct nicvf_stat nicvf_queue_stats[] = {
+	{ "bytes", 0 },
+	{ "frames", 1 },
+};
+
+static const unsigned int nicvf_n_hw_stats = ARRAY_SIZE(nicvf_hw_stats);
+static const unsigned int nicvf_n_drv_stats = ARRAY_SIZE(nicvf_drv_stats);
+static const unsigned int nicvf_n_queue_stats = ARRAY_SIZE(nicvf_queue_stats);
+
+static int nicvf_get_settings(struct net_device *netdev,
+			      struct ethtool_cmd *cmd)
+{
+	cmd->supported = (SUPPORTED_1000baseT_Full |
+			SUPPORTED_100baseT_Full |
+			SUPPORTED_10baseT_Full |
+			SUPPORTED_10000baseT_Full | SUPPORTED_FIBRE);
+
+	cmd->advertising = (ADVERTISED_1000baseT_Full |
+			ADVERTISED_100baseT_Full |
+			ADVERTISED_10baseT_Full |
+			ADVERTISED_10000baseT_Full | ADVERTISED_FIBRE);
+
+	cmd->port = PORT_FIBRE;
+	cmd->transceiver = XCVR_EXTERNAL;
+	if (netif_carrier_ok(netdev)) {
+		ethtool_cmd_speed_set(cmd, SPEED_10000);
+		cmd->duplex = DUPLEX_FULL;
+	} else {
+		ethtool_cmd_speed_set(cmd, -1);
+		cmd->duplex = -1;
+	}
+
+	cmd->autoneg = AUTONEG_DISABLE;
+	ethtool_cmd_speed_set(cmd, SPEED_1000);
+	return 0;
+}
+
+static int nicvf_set_settings(struct net_device *netdev,
+			      struct ethtool_cmd *cmd)
+{
+	return -EOPNOTSUPP;
+
+	/* 10G full duplex setting supported only */
+	if (cmd->autoneg == AUTONEG_ENABLE)
+		return -EOPNOTSUPP;
+
+	if (ethtool_cmd_speed(cmd) != SPEED_10000)
+		return -EOPNOTSUPP;
+
+	if (cmd->duplex != DUPLEX_FULL)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+static void nicvf_get_drvinfo(struct net_device *netdev,
+			      struct ethtool_drvinfo *info)
+{
+	struct nicvf *nic = netdev_priv(netdev);
+
+	strlcpy(info->driver, DRV_NAME, sizeof(info->driver));
+	strlcpy(info->version, DRV_VERSION, sizeof(info->version));
+	strlcpy(info->bus_info, pci_name(nic->pdev), sizeof(info->bus_info));
+}
+
+static void nicvf_get_strings(struct net_device *netdev, u32 sset, u8 *data)
+{
+	int stats, qidx;
+
+	if (sset != ETH_SS_STATS)
+		return;
+
+	for (stats = 0; stats < nicvf_n_hw_stats; stats++) {
+		memcpy(data, nicvf_hw_stats[stats].name, ETH_GSTRING_LEN);
+		data += ETH_GSTRING_LEN;
+	}
+
+	for (stats = 0; stats < nicvf_n_drv_stats; stats++) {
+		memcpy(data, nicvf_drv_stats[stats].name, ETH_GSTRING_LEN);
+		data += ETH_GSTRING_LEN;
+	}
+
+	for (qidx = 0; qidx < MAX_RCV_QUEUES_PER_QS; qidx++) {
+		for (stats = 0; stats < nicvf_n_queue_stats; stats++) {
+			sprintf(data, "rxq%d: %s", qidx,
+				nicvf_queue_stats[stats].name);
+			data += ETH_GSTRING_LEN;
+		}
+	}
+
+	for (qidx = 0; qidx < MAX_SND_QUEUES_PER_QS; qidx++) {
+		for (stats = 0; stats < nicvf_n_queue_stats; stats++) {
+			sprintf(data, "txq%d: %s", qidx,
+				nicvf_queue_stats[stats].name);
+			data += ETH_GSTRING_LEN;
+		}
+	}
+}
+
+static int nicvf_get_sset_count(struct net_device *netdev, int sset)
+{
+	if (sset != ETH_SS_STATS)
+		return -EINVAL;
+
+	return nicvf_n_hw_stats + nicvf_n_drv_stats +
+		(nicvf_n_queue_stats *
+		 (MAX_RCV_QUEUES_PER_QS + MAX_SND_QUEUES_PER_QS));
+}
+
+static void nicvf_get_ethtool_stats(struct net_device *netdev,
+				    struct ethtool_stats *stats, u64 *data)
+{
+	struct nicvf *nic = netdev_priv(netdev);
+	int stat, qidx;
+
+	nicvf_update_stats(nic);
+
+	for (stat = 0; stat < nicvf_n_hw_stats; stat++)
+		*(data++) = ((u64 *)&nic->stats)
+				[nicvf_hw_stats[stat].index];
+	for (stat = 0; stat < nicvf_n_drv_stats; stat++)
+		*(data++) = ((u64 *)&nic->drv_stats)
+				[nicvf_drv_stats[stat].index];
+
+	for (qidx = 0; qidx < MAX_RCV_QUEUES_PER_QS; qidx++) {
+		for (stat = 0; stat < nicvf_n_queue_stats; stat++)
+			*(data++) = ((u64 *)&nic->qs->rq[qidx].stats)
+					[nicvf_queue_stats[stat].index];
+	}
+
+	for (qidx = 0; qidx < MAX_SND_QUEUES_PER_QS; qidx++) {
+		for (stat = 0; stat < nicvf_n_queue_stats; stat++)
+			*(data++) = ((u64 *)&nic->qs->sq[qidx].stats)
+					[nicvf_queue_stats[stat].index];
+	}
+}
+
+static int nicvf_get_regs_len(struct net_device *dev)
+{
+	return sizeof(uint64_t) * NIC_VF_REG_COUNT;
+}
+
+static void nicvf_get_regs(struct net_device *dev,
+			   struct ethtool_regs *regs, void *reg)
+{
+	struct nicvf *nic = netdev_priv(dev);
+	uint64_t *p = (uint64_t *)reg;
+	uint64_t reg_offset;
+	int mbox, key, stat, q;
+	int i = 0;
+
+	regs->version = 0;
+	memset(p, 0, NIC_VF_REG_COUNT);
+
+	p[i++] = nicvf_reg_read(nic, NIC_VNIC_CFG);
+	/* Mailbox registers */
+	for (mbox = 0; mbox < NIC_PF_VF_MAILBOX_SIZE; mbox++)
+		p[i++] = nicvf_reg_read(nic,
+					NIC_VF_PF_MAILBOX_0_7 | (mbox << 3));
+
+	p[i++] = nicvf_reg_read(nic, NIC_VF_INT);
+	p[i++] = nicvf_reg_read(nic, NIC_VF_INT_W1S);
+	p[i++] = nicvf_reg_read(nic, NIC_VF_ENA_W1C);
+	p[i++] = nicvf_reg_read(nic, NIC_VF_ENA_W1S);
+	p[i++] = nicvf_reg_read(nic, NIC_VNIC_RSS_CFG);
+
+	for (key = 0; key < RSS_HASH_KEY_SIZE; key++)
+		p[i++] = nicvf_reg_read(nic, NIC_VNIC_RSS_KEY_0_4 | (key << 3));
+
+	/* Tx/Rx statistics */
+	for (stat = 0; stat < TX_STATS_ENUM_LAST; stat++)
+		p[i++] = nicvf_reg_read(nic,
+					NIC_VNIC_TX_STAT_0_4 | (stat << 3));
+
+	for (i = 0; i < RX_STATS_ENUM_LAST; i++)
+		p[i++] = nicvf_reg_read(nic,
+					NIC_VNIC_RX_STAT_0_13 | (stat << 3));
+
+	p[i++] = nicvf_reg_read(nic, NIC_QSET_RQ_GEN_CFG);
+
+	/* All completion queue's registers */
+	for (q = 0; q < MAX_CMP_QUEUES_PER_QS; q++) {
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_CFG, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_CFG2, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_THRESH, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_BASE, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_HEAD, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_TAIL, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_DOOR, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_STATUS, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_STATUS2, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_DEBUG, q);
+	}
+
+	/* All receive queue's registers */
+	for (q = 0; q < MAX_RCV_QUEUES_PER_QS; q++) {
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_RQ_0_7_CFG, q);
+		p[i++] = nicvf_queue_reg_read(nic,
+						  NIC_QSET_RQ_0_7_STAT_0_1, q);
+		reg_offset = NIC_QSET_RQ_0_7_STAT_0_1 | (1 << 3);
+		p[i++] = nicvf_queue_reg_read(nic, reg_offset, q);
+	}
+
+	for (q = 0; q < MAX_SND_QUEUES_PER_QS; q++) {
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_CFG, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_THRESH, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_BASE, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_HEAD, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_TAIL, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_DOOR, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_STATUS, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_DEBUG, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_CNM_CHG, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_STAT_0_1, q);
+		reg_offset = NIC_QSET_SQ_0_7_STAT_0_1 | (1 << 3);
+		p[i++] = nicvf_queue_reg_read(nic, reg_offset, q);
+	}
+
+	for (q = 0; q < MAX_RCV_BUF_DESC_RINGS_PER_QS; q++) {
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_RBDR_0_1_CFG, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_RBDR_0_1_THRESH, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_RBDR_0_1_BASE, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_RBDR_0_1_HEAD, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_RBDR_0_1_TAIL, q);
+		p[i++] = nicvf_queue_reg_read(nic, NIC_QSET_RBDR_0_1_DOOR, q);
+		p[i++] = nicvf_queue_reg_read(nic,
+					      NIC_QSET_RBDR_0_1_STATUS0, q);
+		p[i++] = nicvf_queue_reg_read(nic,
+					      NIC_QSET_RBDR_0_1_STATUS1, q);
+		reg_offset = NIC_QSET_RBDR_0_1_PREFETCH_STATUS;
+		p[i++] = nicvf_queue_reg_read(nic, reg_offset, q);
+	}
+}
+
+#ifdef VNIC_RSS_SUPPORT
+static int nicvf_get_rss_hash_opts(struct nicvf *nic,
+				   struct ethtool_rxnfc *info)
+{
+	info->data = 0;
+
+	switch (info->flow_type) {
+	case TCP_V4_FLOW:
+	case TCP_V6_FLOW:
+	case UDP_V4_FLOW:
+	case UDP_V6_FLOW:
+	case SCTP_V4_FLOW:
+	case SCTP_V6_FLOW:
+		info->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+	case IPV4_FLOW:
+	case IPV6_FLOW:
+		info->data |= RXH_IP_SRC | RXH_IP_DST;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int nicvf_get_rxnfc(struct net_device *dev,
+			   struct ethtool_rxnfc *info, uint32_t *rules)
+{
+	struct nicvf *nic = netdev_priv(dev);
+	int ret = -EOPNOTSUPP;
+
+	switch (info->cmd) {
+	case ETHTOOL_GRXRINGS:
+		info->data = nic->qs->rq_cnt;
+		ret = 0;
+		break;
+	case ETHTOOL_GRXFH:
+		return nicvf_get_rss_hash_opts(nic, info);
+	default:
+		break;
+	}
+	return ret;
+}
+
+static int nicvf_set_rss_hash_opts(struct nicvf *nic,
+				   struct ethtool_rxnfc *info)
+{
+	struct nicvf_rss_info *rss = &nic->rss_info;
+	uint64_t rss_cfg = nicvf_reg_read(nic, NIC_VNIC_RSS_CFG);
+
+	if (!rss->enable)
+		netdev_err(dev, "RSS is disabled, hash cannot be set\n");
+
+	netdev_info(dev, "Set RSS flow type = %d, data = %d\n",
+		    info->flow_type, info->data);
+
+	if (!(info->data & RXH_IP_SRC) || !(info->data & RXH_IP_DST))
+		return -EINVAL;
+
+	switch (cmd->flow_type) {
+	case TCP_V4_FLOW:
+	case TCP_V6_FLOW:
+		switch (info->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
+		case 0:
+			rss_cfg &= ~(1ULL << RSS_HASH_TCP);
+			break;
+		case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
+			rss_cfg |= (1ULL << RSS_HASH_TCP);
+			break;
+		default:
+			return -EINVAL;
+		}
+		break;
+	case UDP_V4_FLOW:
+	case UDP_V6_FLOW:
+		switch (info->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
+		case 0:
+			rss_cfg &= ~(1ULL << RSS_HASH_UDP);
+			break;
+		case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
+			rss_cfg |= (1ULL << RSS_HASH_UDP);
+			break;
+		default:
+			return -EINVAL;
+		}
+		break;
+	case SCTP_V4_FLOW:
+	case SCTP_V6_FLOW:
+		switch (info->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
+		case 0:
+			rss_cfg &= ~(1ULL << RSS_HASH_L4ETC);
+			break;
+		case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
+			rss_cfg |= (1ULL << RSS_HASH_L4ETC);
+			break;
+		default:
+			return -EINVAL;
+		}
+		break;
+	case IPV4_FLOW:
+	case IPV6_FLOW:
+		rss_cfg = RSS_HASH_IP;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	nicvf_reg_read(nic, NIC_VNIC_RSS_CFG, rss_cfg);
+	return 0;
+}
+
+static int nicvf_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info)
+{
+	struct nicvf *nic = netdev_priv(dev);
+
+	switch (info->cmd) {
+	case ETHTOOL_SRXFH:
+		return nicvf_set_rss_hash_opts(nic, info);
+	default:
+		break;
+	}
+	return -EOPNOTSUPP;
+}
+
+static u32 nicvf_get_rxfh_key_size(struct net_device *netdev)
+{
+	return RSS_HASH_KEY_SIZE;
+}
+
+static u32 nicvf_get_rxfh_indir_size(struct net_device *dev)
+{
+	struct nicvf *nic = netdev_priv(dev);
+
+	return nic->rss_info.rss_size;
+}
+
+static int nicvf_get_rxfh(struct net_device *dev, u32 *indir, u8 *hkey)
+{
+	struct nicvf *nic = netdev_priv(dev);
+	struct nicvf_rss_info *rss = &nic->rss_info;
+	int idx;
+
+	if (indir) {
+		for (idx = 0; idx < rss->rss_size; idx++)
+			indir[idx] = rss->ind_tbl[idx];
+	}
+
+	if (hkey)
+		memcpy(hkey, rss->key, RSS_HASH_KEY_SIZE);
+
+	return 0;
+}
+
+static int nicvf_set_rxfh(struct net_device *dev, const u32 *indir,
+			  const u8 *hkey)
+{
+	struct nicvf *nic = netdev_priv(dev);
+	struct nicvf_rss_info *rss = &nic->rss_info;
+	int idx;
+
+	if ((nic->qs->rq_cnt <= 1) || (nic->cpi_alg != CPI_ALG_NONE)) {
+		rss->enable = false;
+		rss->hash_bits = 0;
+		return -EIO;
+	}
+
+	rss->enable = true;
+	if (indir) {
+		for (idx = 0; idx < rss->rss_size; idx++)
+			rss->ind_tbl[idx] = indir[idx];
+	}
+
+	nicvf_config_rss(nic);
+	return 0;
+}
+#endif
+
+/* Get no of queues device supports and current queue count */
+static void nicvf_get_channels(struct net_device *dev,
+			       struct ethtool_channels *channel)
+{
+	struct nicvf *nic = netdev_priv(dev);
+
+	memset(channel, 0, sizeof(*channel));
+
+	channel->max_rx = MAX_RCV_QUEUES_PER_QS;
+	channel->max_tx = MAX_SND_QUEUES_PER_QS;
+
+	channel->rx_count = nic->qs->rq_cnt;
+	channel->tx_count = nic->qs->sq_cnt;
+}
+
+/* Set no of Tx, Rx queues to be used */
+static int nicvf_set_channels(struct net_device *dev,
+			      struct ethtool_channels *channel)
+{
+	struct nicvf *nic = netdev_priv(dev);
+	int err = 0;
+
+	if (!channel->rx_count || !channel->tx_count)
+		return -EINVAL;
+	if (channel->rx_count > MAX_RCV_QUEUES_PER_QS)
+		return -EINVAL;
+	if (channel->tx_count > MAX_SND_QUEUES_PER_QS)
+		return -EINVAL;
+
+	nic->qs->rq_cnt = channel->rx_count;
+	nic->qs->sq_cnt = channel->tx_count;
+	nic->qs->cq_cnt = nic->qs->sq_cnt;
+
+	err = nicvf_set_real_num_queues(dev, nic->qs->sq_cnt, nic->qs->rq_cnt);
+	if (err)
+		return err;
+
+	if (!netif_running(dev))
+		return err;
+
+	nicvf_stop(dev);
+	nicvf_open(dev);
+	netdev_info(dev, "Setting num Tx rings to %d, Rx rings to %d success\n",
+		    nic->qs->sq_cnt, nic->qs->rq_cnt);
+
+	return err;
+}
+
+static const struct ethtool_ops nicvf_ethtool_ops = {
+	.get_settings		= nicvf_get_settings,
+	.set_settings		= nicvf_set_settings,
+	.get_link		= ethtool_op_get_link,
+	.get_drvinfo		= nicvf_get_drvinfo,
+	.get_strings		= nicvf_get_strings,
+	.get_sset_count		= nicvf_get_sset_count,
+	.get_ethtool_stats	= nicvf_get_ethtool_stats,
+	.get_regs_len		= nicvf_get_regs_len,
+	.get_regs		= nicvf_get_regs,
+#ifdef VNIC_RSS_SUPPORT
+	.get_rxnfc		= nicvf_get_rxnfc,
+	.set_rxnfc		= nicvf_set_rxnfc,
+	.get_rxfh_key_size	= nicvf_get_rxfh_key_size,
+	.get_rxfh_indir_size	= nicvf_get_rxfh_indir_size,
+	.get_rxfh		= nicvf_get_rxfh,
+	.set_rxfh		= nicvf_set_rxfh,
+#endif
+	.get_channels		= nicvf_get_channels,
+	.set_channels		= nicvf_set_channels,
+	.get_ts_info		= ethtool_op_get_ts_info,
+};
+
+void nicvf_set_ethtool_ops(struct net_device *netdev)
+{
+	netdev->ethtool_ops = &nicvf_ethtool_ops;
+}
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
new file mode 100644
index 000000000000..b920812ec2dd
--- /dev/null
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -0,0 +1,1323 @@
+/*
+ * Copyright (C) 2014 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/interrupt.h>
+#include <linux/pci.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/log2.h>
+
+#include "nic_reg.h"
+#include "nic.h"
+#include "nicvf_queues.h"
+
+#define DRV_NAME	"thunder-nicvf"
+#define DRV_VERSION	"1.0"
+
+/* Supported devices */
+static const struct pci_device_id nicvf_id_table[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVICE_ID_THUNDER_NIC_VF) },
+	{ 0, }  /* end of table */
+};
+
+MODULE_AUTHOR("Sunil Goutham");
+MODULE_DESCRIPTION("Cavium Thunder NIC Virtual Function Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_VERSION(DRV_VERSION);
+MODULE_DEVICE_TABLE(pci, nicvf_id_table);
+
+static int cpi_alg = CPI_ALG_NONE;
+module_param(cpi_alg, int, S_IRUGO);
+MODULE_PARM_DESC(cpi_alg,
+		 "PFC algorithm (0=none, 1=VLAN, 2=VLAN16, 3=IP Diffserv)");
+#ifdef	VNIC_RSS_SUPPORT
+static int rss_config = RSS_IP_HASH_ENA | RSS_TCP_HASH_ENA | RSS_UDP_HASH_ENA;
+#endif
+
+static int nicvf_enable_msix(struct nicvf *nic);
+static netdev_tx_t nicvf_xmit(struct sk_buff *skb, struct net_device *netdev);
+
+static void nicvf_dump_packet(struct sk_buff *skb)
+{
+#ifdef NICVF_DUMP_PACKET
+	int i;
+
+	for (i = 0; i < skb->len; i++) {
+		if (!(i % 16))
+			pr_cont("\n");
+		pr_debug("%02x ", (u_char)skb->data[i]);
+	}
+#endif
+}
+
+#ifdef NICVF_ETHTOOL_ENABLE
+static void nicvf_set_rx_frame_cnt(struct nicvf *nic, struct sk_buff *skb)
+{
+	if (skb->len <= 64)
+		nic->drv_stats.rx_frames_64++;
+	else if ((skb->len > 64) && (skb->len <= 127))
+		nic->drv_stats.rx_frames_127++;
+	else if ((skb->len > 127) && (skb->len <= 255))
+		nic->drv_stats.rx_frames_255++;
+	else if ((skb->len > 255) && (skb->len <= 511))
+		nic->drv_stats.rx_frames_511++;
+	else if ((skb->len > 511) && (skb->len <= 1023))
+		nic->drv_stats.rx_frames_1023++;
+	else if ((skb->len > 1023) && (skb->len <= 1518))
+		nic->drv_stats.rx_frames_1518++;
+	else if (skb->len > 1518)
+		nic->drv_stats.rx_frames_jumbo++;
+}
+#endif
+
+/* Register read/write APIs */
+void nicvf_reg_write(struct nicvf *nic, uint64_t offset, uint64_t val)
+{
+	uint64_t addr = nic->reg_base + offset;
+
+	writeq_relaxed(val, (void *)addr);
+}
+
+uint64_t nicvf_reg_read(struct nicvf *nic, uint64_t offset)
+{
+	uint64_t addr = nic->reg_base + offset;
+
+	return readq_relaxed((void *)addr);
+}
+
+void nicvf_queue_reg_write(struct nicvf *nic, uint64_t offset,
+			   uint64_t qidx, uint64_t val)
+{
+	uint64_t addr = nic->reg_base + offset;
+
+	writeq_relaxed(val, (void *)(addr + (qidx << NIC_Q_NUM_SHIFT)));
+}
+
+uint64_t nicvf_queue_reg_read(struct nicvf *nic, uint64_t offset, uint64_t qidx)
+{
+	uint64_t addr = nic->reg_base + offset;
+
+	return readq_relaxed((void *)(addr + (qidx << NIC_Q_NUM_SHIFT)));
+}
+
+/* VF -> PF mailbox communication */
+static bool pf_ready_to_rcv_msg;
+static bool pf_acked;
+static bool pf_nacked;
+
+int nicvf_lock_mbox(struct nicvf *nic)
+{
+	int timeout = NIC_PF_VF_MBX_TIMEOUT;
+	int sleep = 10;
+	uint64_t lock, mbx_addr;
+
+	mbx_addr = NIC_VF_PF_MAILBOX_0_7 + NIC_PF_VF_MBX_LOCK_OFFSET;
+	lock = NIC_PF_VF_MBX_LOCK_VAL(nicvf_reg_read(nic, mbx_addr));
+	while (lock) {
+		msleep(sleep);
+		lock = NIC_PF_VF_MBX_LOCK_VAL(nicvf_reg_read(nic, mbx_addr));
+		timeout -= sleep;
+		if (!timeout) {
+			netdev_err(nic->netdev,
+				   "VF%d Couldn't lock mailbox\n", nic->vf_id);
+			return 0;
+		}
+	}
+	nicvf_reg_write(nic, mbx_addr, NIC_PF_VF_MBX_LOCK_SET(lock));
+	return 1;
+}
+
+void nicvf_release_mbx(struct nicvf *nic)
+{
+	uint64_t mbx_addr, lock;
+
+	mbx_addr = NIC_VF_PF_MAILBOX_0_7 + NIC_PF_VF_MBX_LOCK_OFFSET;
+	lock = nicvf_reg_read(nic, mbx_addr);
+	nicvf_reg_write(nic, mbx_addr, NIC_PF_VF_MBX_LOCK_CLEAR(lock));
+}
+
+int nicvf_send_msg_to_pf(struct nicvf *nic, struct nic_mbx *mbx)
+{
+	int i, timeout = NIC_PF_VF_MBX_TIMEOUT;
+	int sleep = 10;
+	uint64_t *msg;
+	uint64_t mbx_addr;
+
+	if (!nicvf_lock_mbox(nic))
+		return -EBUSY;
+
+	pf_acked = false;
+	pf_nacked = false;
+	mbx->mbx_trigger_intr = 1;
+	msg = (uint64_t *)mbx;
+	mbx_addr = nic->reg_base + NIC_VF_PF_MAILBOX_0_7;
+
+	for (i = 0; i < NIC_PF_VF_MAILBOX_SIZE; i++)
+		writeq_relaxed(*(msg + i), (void *)(mbx_addr + (i * 8)));
+	nicvf_release_mbx(nic);
+
+	/* Wait for previous message to be acked, timeout 5sec */
+	while (!pf_acked) {
+		if (pf_nacked)
+			return -EINVAL;
+		msleep(sleep);
+		if (pf_acked)
+			break;
+		timeout -= sleep;
+		if (!timeout) {
+			netdev_err(nic->netdev,
+				   "PF didn't ack to mbox msg %d from VF%d\n",
+				   (mbx->msg & 0xFF), nic->vf_id);
+			return -EBUSY;
+		}
+	}
+	return 0;
+}
+
+/* Checks if VF is able to comminicate with PF
+* and also gets the VNIC number this VF is associated to.
+*/
+static int nicvf_check_pf_ready(struct nicvf *nic)
+{
+	int timeout = 5000, sleep = 20;
+	uint64_t mbx_addr = NIC_VF_PF_MAILBOX_0_7;
+
+	pf_ready_to_rcv_msg = false;
+
+	nicvf_reg_write(nic, mbx_addr, NIC_PF_VF_MSG_READY);
+
+	mbx_addr += (NIC_PF_VF_MAILBOX_SIZE - 1) * 8;
+	nicvf_reg_write(nic, mbx_addr, 1ULL);
+
+	while (!pf_ready_to_rcv_msg) {
+		msleep(sleep);
+		if (pf_ready_to_rcv_msg)
+			break;
+		timeout -= sleep;
+		if (!timeout) {
+			netdev_err(nic->netdev,
+				   "PF didn't respond to READY msg\n");
+			return 0;
+		}
+	}
+	return 1;
+}
+
+static void  nicvf_handle_mbx_intr(struct nicvf *nic)
+{
+	struct nic_mbx mbx = {};
+	uint64_t *mbx_data;
+	uint64_t mbx_addr;
+	int i;
+
+	mbx_addr = NIC_VF_PF_MAILBOX_0_7;
+	mbx_data = (uint64_t *)&mbx;
+
+	for (i = 0; i < NIC_PF_VF_MAILBOX_SIZE; i++) {
+		*mbx_data = nicvf_reg_read(nic, mbx_addr);
+		mbx_data++;
+		mbx_addr += NIC_PF_VF_MAILBOX_SIZE;
+	}
+
+	switch (mbx.msg & NIC_PF_VF_MBX_MSG_MASK) {
+	case NIC_PF_VF_MSG_READY:
+		pf_ready_to_rcv_msg = true;
+		nic->vf_id = mbx.data.nic_cfg.vf_id & 0x7F;
+		nic->tns_mode = mbx.data.nic_cfg.tns_mode & 0x7F;
+		break;
+	case NIC_PF_VF_MSG_ACK:
+		pf_acked = true;
+		break;
+	case NIC_PF_VF_MSG_NACK:
+		pf_nacked = true;
+		break;
+#ifdef VNIC_RSS_SUPPORT
+	case NIC_PF_VF_MSG_RSS_SIZE:
+		nic->rss_info.rss_size = mbx.data.rss_size.ind_tbl_size;
+		break;
+#endif
+	default:
+		netdev_err(nic->netdev,
+			   "Invalid message from PF, msg 0x%x\n", mbx.msg);
+		break;
+	}
+	nicvf_clear_intr(nic, NICVF_INTR_MBOX, 0);
+}
+
+static int nicvf_hw_set_mac_addr(struct nicvf *nic, struct net_device *netdev)
+{
+	struct nic_mbx mbx = {};
+	int i;
+
+	mbx.msg = NIC_PF_VF_MSG_SET_MAC;
+	mbx.data.mac.vf_id = nic->vf_id;
+	for (i = 0; i < ETH_ALEN; i++)
+		mbx.data.mac.addr = (mbx.data.mac.addr << 8) |
+				     netdev->dev_addr[i];
+
+	return nicvf_send_msg_to_pf(nic, &mbx);
+}
+
+static int nicvf_is_link_active(struct nicvf *nic)
+{
+	return 1;
+}
+
+void nicvf_config_cpi(struct nicvf *nic)
+{
+	struct nic_mbx mbx = {};
+
+	mbx.msg = NIC_PF_VF_MSG_CPI_CFG;
+	mbx.data.cpi_cfg.vf_id = nic->vf_id;
+	mbx.data.cpi_cfg.cpi_alg = nic->cpi_alg;
+	mbx.data.cpi_cfg.rq_cnt = nic->qs->rq_cnt;
+
+	nicvf_send_msg_to_pf(nic, &mbx);
+}
+
+#ifdef	VNIC_RSS_SUPPORT
+void nicvf_get_rss_size(struct nicvf *nic)
+{
+	struct nic_mbx mbx = {};
+
+	mbx.msg = NIC_PF_VF_MSG_RSS_SIZE;
+	mbx.data.rss_size.vf_id = nic->vf_id;
+	nicvf_send_msg_to_pf(nic, &mbx);
+}
+
+void nicvf_config_rss(struct nicvf *nic)
+{
+	struct nic_mbx mbx = {};
+	struct nicvf_rss_info *rss = &nic->rss_info;
+	int ind_tbl_len = rss->rss_size;
+	bool cont = false;
+	int i;
+
+	mbx.msg = NIC_PF_VF_MSG_RSS_CFG;
+	mbx.data.rss_cfg.vf_id = nic->vf_id;
+	mbx.data.rss_cfg.hash_bits = rss->hash_bits;
+	mbx.data.rss_cfg.tbl_len = 0;
+	mbx.data.rss_cfg.tbl_offset = 0;
+	while (ind_tbl_len) {
+		mbx.data.rss_cfg.tbl_offset += mbx.data.rss_cfg.tbl_len;
+		if (ind_tbl_len > RSS_IND_TBL_LEN_PER_MBX_MSG)
+			mbx.data.rss_cfg.tbl_len = RSS_IND_TBL_LEN_PER_MBX_MSG;
+		else
+			mbx.data.rss_cfg.tbl_len = ind_tbl_len;
+
+		for (i = mbx.data.rss_cfg.tbl_offset;
+		     i < (mbx.data.rss_cfg.tbl_offset +
+		     mbx.data.rss_cfg.tbl_len); i++)
+			mbx.data.rss_cfg.ind_tbl[i] = rss->ind_tbl[i];
+
+		if (cont)
+			mbx.msg = NIC_PF_VF_MSG_RSS_CFG_CONT;
+
+		nicvf_send_msg_to_pf(nic, &mbx);
+
+		ind_tbl_len -= mbx.data.rss_cfg.tbl_len;
+		if (!ind_tbl_len)
+			return;
+		cont = true;
+	}
+}
+
+static void nicvf_set_rss_key(struct nicvf *nic)
+{
+	struct nicvf_rss_info *rss = &nic->rss_info;
+	int idx;
+
+	for (idx = 0; idx < RSS_HASH_KEY_SIZE; idx++)
+		nicvf_reg_write(nic, NIC_VNIC_RSS_KEY_0_4, rss->key[idx]);
+}
+
+static int nicvf_rss_init(struct nicvf *nic)
+{
+	struct nicvf_rss_info *rss = &nic->rss_info;
+	int idx;
+
+	nicvf_get_rss_size(nic);
+
+	if ((nic->qs->rq_cnt <= 1) || (cpi_alg != CPI_ALG_NONE)) {
+		rss->enable = false;
+		rss->hash_bits = 0;
+		return 0;
+	}
+
+	rss->enable = true;
+
+	/* Using the HW reset value for now */
+	rss->key[0] = 0xFEED0BADFEED0BAD;
+	rss->key[1] = 0xFEED0BADFEED0BAD;
+	rss->key[2] = 0xFEED0BADFEED0BAD;
+	rss->key[3] = 0xFEED0BADFEED0BAD;
+	rss->key[4] = 0xFEED0BADFEED0BAD;
+
+	nicvf_set_rss_key(nic);
+
+	rss->cfg = rss_config;
+	nicvf_reg_write(nic, NIC_VNIC_RSS_CFG, rss->cfg);
+
+	rss->hash_bits =  ilog2(rounddown_pow_of_two(rss->rss_size));
+
+	for (idx = 0; idx < rss->rss_size; idx++)
+		rss->ind_tbl[idx] = ethtool_rxfh_indir_default(idx,
+							       nic->qs->rq_cnt);
+	nicvf_config_rss(nic);
+	return 1;
+}
+#endif
+
+int nicvf_set_real_num_queues(struct net_device *netdev,
+			      int tx_queues, int rx_queues)
+{
+	int err = 0;
+
+	err = netif_set_real_num_tx_queues(netdev, tx_queues);
+	if (err) {
+		netdev_err(netdev,
+			   "Failed to set no of Tx queues: %d\n", tx_queues);
+		return err;
+	}
+
+	err = netif_set_real_num_rx_queues(netdev, rx_queues);
+	if (err)
+		netdev_err(netdev,
+			   "Failed to set no of Rx queues: %d\n", rx_queues);
+	return err;
+}
+
+static int nicvf_init_resources(struct nicvf *nic)
+{
+	int err;
+
+	nic->num_qs = 1;
+
+	/* Initialize queues and HW for data transfer */
+	err = nicvf_config_data_transfer(nic, true);
+	if (err) {
+		netdev_err(nic->netdev,
+			   "Failed to alloc/config VF's QSet resources\n");
+		return err;
+	}
+	/* Enable Qset */
+	nicvf_qset_config(nic, true);
+
+	return 0;
+}
+
+void nicvf_free_skb(struct nicvf *nic, struct sk_buff *skb)
+{
+	int i;
+
+	if (!skb_shinfo(skb)->nr_frags)
+		goto free_skb;
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		const struct skb_frag_struct *frag;
+
+		frag = &skb_shinfo(skb)->frags[i];
+		pci_unmap_single(nic->pdev, (dma_addr_t)skb_frag_address(frag),
+				 skb_frag_size(frag), PCI_DMA_TODEVICE);
+	}
+free_skb:
+	pci_unmap_single(nic->pdev, (dma_addr_t)skb->data,
+			 skb_headlen(skb), PCI_DMA_TODEVICE);
+	dev_kfree_skb_any(skb);
+}
+
+static void nicvf_snd_pkt_handler(struct net_device *netdev,
+				  struct cmp_queue *cq,
+				  void *cq_desc, int cqe_type)
+{
+	struct sk_buff *skb = NULL;
+	struct cqe_send_t *cqe_tx;
+	struct nicvf *nic = netdev_priv(netdev);
+	struct snd_queue *sq;
+	struct sq_hdr_subdesc *hdr;
+
+	cqe_tx = (struct cqe_send_t *)cq_desc;
+	sq = &nic->qs->sq[cqe_tx->sq_idx];
+
+	hdr = (struct sq_hdr_subdesc *)GET_SQ_DESC(sq, cqe_tx->sqe_ptr);
+	if (hdr->subdesc_type != SQ_DESC_TYPE_HEADER)
+		return;
+
+	nic_dbg(&nic->pdev->dev,
+		"%s Qset #%d SQ #%d SQ ptr #%d subdesc count %d\n",
+		__func__, cqe_tx->sq_qs, cqe_tx->sq_idx,
+		cqe_tx->sqe_ptr, hdr->subdesc_cnt);
+
+	skb = (struct sk_buff *)sq->skbuff[cqe_tx->sqe_ptr];
+	nicvf_free_skb(nic, skb);
+	nicvf_put_sq_desc(sq, hdr->subdesc_cnt + 1);
+	nicvf_check_cqe_tx_errs(nic, cq, cq_desc);
+}
+
+static void nicvf_rcv_pkt_handler(struct net_device *netdev,
+				  struct napi_struct *napi,
+				  struct cmp_queue *cq,
+				  void *cq_desc, int cqe_type)
+{
+	struct sk_buff *skb;
+	struct nicvf *nic = netdev_priv(netdev);
+	struct cqe_rx_t *cqe_rx = (struct cqe_rx_t *)cq_desc;
+	int err = 0;
+
+	/* Check for errors */
+	err = nicvf_check_cqe_rx_errs(nic, cq, cq_desc);
+	if (err && !cqe_rx->rb_cnt)
+		return;
+
+	skb = nicvf_get_rcv_skb(nic, cq_desc);
+	if (!skb) {
+		nic_dbg(&nic->pdev->dev, "Packet not received\n");
+		return;
+	}
+
+	nicvf_dump_packet(skb);
+
+#ifdef NICVF_ETHTOOL_ENABLE
+	nicvf_set_rx_frame_cnt(nic, skb);
+#endif
+
+	skb->protocol = eth_type_trans(skb, netdev);
+
+#ifdef VNIC_RX_CHKSUM_SUPPORTED
+	skb->ip_summed = CHECKSUM_UNNECESSARY;
+#else
+	skb_checksum_none_assert(skb);
+#endif
+
+	if (napi && (netdev->features & NETIF_F_GRO))
+		napi_gro_receive(napi, skb);
+	else
+		netif_receive_skb(skb);
+}
+
+static int nicvf_cq_intr_handler(struct net_device *netdev, uint8_t cq_idx,
+				 struct napi_struct *napi, int budget)
+{
+	int processed_cqe = 0, work_done = 0;
+	int cqe_count, cqe_head;
+	struct nicvf *nic = netdev_priv(netdev);
+	struct queue_set *qs = nic->qs;
+	struct cmp_queue *cq = &qs->cq[cq_idx];
+	struct cqe_rx_t *cq_desc;
+
+	spin_lock(&cq->cq_lock);
+	/* Get no of valid CQ entries to process */
+	cqe_count = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_STATUS, cq_idx);
+	cqe_count &= CQ_CQE_COUNT;
+	if (!cqe_count)
+		goto done;
+
+	/* Get head of the valid CQ entries */
+	cqe_head = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_HEAD, cq_idx) >> 9;
+	cqe_head &= 0xFFFF;
+
+	nic_dbg(&nic->pdev->dev, "%s cqe_count %d cqe_head %d\n",
+		__func__, cqe_count, cqe_head);
+	while (processed_cqe < cqe_count) {
+		/* Get the CQ descriptor */
+		cq_desc = (struct cqe_rx_t *)GET_CQ_DESC(cq, cqe_head);
+
+		if (napi && (work_done >= budget) &&
+		    (cq_desc->cqe_type != CQE_TYPE_SEND)) {
+			break;
+		}
+
+		nic_dbg(&nic->pdev->dev, "cq_desc->cqe_type %d\n",
+			cq_desc->cqe_type);
+		switch (cq_desc->cqe_type) {
+		case CQE_TYPE_RX:
+			nicvf_rcv_pkt_handler(netdev, napi, cq,
+					      cq_desc, CQE_TYPE_RX);
+			work_done++;
+		break;
+		case CQE_TYPE_SEND:
+			nicvf_snd_pkt_handler(netdev, cq,
+					      cq_desc, CQE_TYPE_SEND);
+		break;
+		case CQE_TYPE_INVALID:
+		case CQE_TYPE_RX_SPLIT:
+		case CQE_TYPE_RX_TCP:
+		case CQE_TYPE_SEND_PTP:
+			/* Ignore for now */
+		break;
+		}
+		cq_desc->cqe_type = CQE_TYPE_INVALID;
+		processed_cqe++;
+		cqe_head++;
+		cqe_head &= (cq->dmem.q_len - 1);
+		memset(cq_desc, 0, sizeof(struct cqe_rx_t));
+	}
+	nic_dbg(&nic->pdev->dev, "%s processed_cqe %d work_done %d budget %d\n",
+		__func__, processed_cqe, work_done, budget);
+
+	/* Ring doorbell to inform H/W to reuse processed CQEs */
+	nicvf_queue_reg_write(nic, NIC_QSET_CQ_0_7_DOOR,
+			      cq_idx, processed_cqe);
+done:
+	spin_unlock(&cq->cq_lock);
+	return work_done;
+}
+
+static int nicvf_poll(struct napi_struct *napi, int budget)
+{
+	int  work_done = 0;
+	struct net_device *netdev = napi->dev;
+	struct nicvf *nic = netdev_priv(netdev);
+	struct nicvf_cq_poll *cq;
+	struct netdev_queue *txq;
+
+	cq = container_of(napi, struct nicvf_cq_poll, napi);
+	work_done = nicvf_cq_intr_handler(netdev, cq->cq_idx, napi, budget);
+
+	txq = netdev_get_tx_queue(netdev, cq->cq_idx);
+	if (netif_tx_queue_stopped(txq))
+		netif_tx_wake_queue(txq);
+
+	if (work_done < budget) {
+		/* Slow packet rate, exit polling */
+		napi_complete(napi);
+		/* Re-enable interrupts */
+		nicvf_enable_intr(nic, NICVF_INTR_CQ, cq->cq_idx);
+	}
+	return work_done;
+}
+
+/* Qset error interrupt handler
+ *
+ * As of now only CQ errors are handled
+ */
+void nicvf_handle_qs_err(unsigned long data)
+{
+	struct nicvf *nic = (struct nicvf *)data;
+	struct queue_set *qs = nic->qs;
+	int qidx;
+	uint64_t status;
+
+	netif_tx_disable(nic->netdev);
+
+	/* Check if it is CQ err */
+	for (qidx = 0; qidx < qs->cq_cnt; qidx++) {
+		status = nicvf_queue_reg_read(nic, NIC_QSET_CQ_0_7_STATUS,
+					      qidx);
+		if (!(status & CQ_ERR_MASK))
+			continue;
+		/* Process already queued CQEs and reconfig CQ */
+		nicvf_disable_intr(nic, NICVF_INTR_CQ, qidx);
+		nicvf_sq_disable(nic, qidx);
+		nicvf_cq_intr_handler(nic->netdev, qidx, NULL, 0);
+		nicvf_cmp_queue_config(nic, qs, qidx, true);
+		nicvf_sq_free_used_descs(nic->netdev, &qs->sq[qidx], qidx);
+		nicvf_sq_enable(nic, &qs->sq[qidx], qidx);
+
+		nicvf_enable_intr(nic, NICVF_INTR_CQ, qidx);
+	}
+
+	netif_tx_start_all_queues(nic->netdev);
+	/* Re-enable Qset error interrupt */
+	nicvf_enable_intr(nic, NICVF_INTR_QS_ERR, 0);
+}
+
+static irqreturn_t nicvf_misc_intr_handler(int irq, void *nicvf_irq)
+{
+	struct nicvf *nic = (struct nicvf *)nicvf_irq;
+	uint64_t intr;
+
+	intr = nicvf_reg_read(nic, NIC_VF_INT);
+	/* Check for spurious interrupt */
+	if (!(intr & NICVF_INTR_MBOX_MASK))
+		return IRQ_HANDLED;
+
+	nicvf_handle_mbx_intr(nic);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t nicvf_intr_handler(int irq, void *nicvf_irq)
+{
+	uint64_t qidx, intr;
+	uint64_t cq_intr, rbdr_intr, qs_err_intr;
+	struct nicvf *nic = (struct nicvf *)nicvf_irq;
+	struct queue_set *qs = nic->qs;
+	struct nicvf_cq_poll *cq_poll = NULL;
+
+	intr = nicvf_reg_read(nic, NIC_VF_INT);
+	nic_dbg(&nic->pdev->dev, "%s intr status 0x%llx\n", __func__, intr);
+
+	cq_intr = (intr & NICVF_INTR_CQ_MASK) >> NICVF_INTR_CQ_SHIFT;
+	qs_err_intr = intr & NICVF_INTR_QS_ERR_MASK;
+	if (qs_err_intr) {
+		/* Disable Qset err interrupt and schedule softirq */
+		nicvf_disable_intr(nic, NICVF_INTR_QS_ERR, 0);
+		tasklet_hi_schedule(&nic->qs_err_task);
+	}
+
+	/* Disable interrupts and start polling */
+	for (qidx = 0; qidx < qs->cq_cnt; qidx++) {
+		if (!(cq_intr & (1 << qidx)))
+			continue;
+		if (!nicvf_is_intr_enabled(nic, NICVF_INTR_CQ, qidx))
+			continue;
+		nicvf_disable_intr(nic, NICVF_INTR_CQ, qidx);
+		cq_poll = nic->napi[qidx];
+		/* Schedule NAPI */
+		napi_schedule(&cq_poll->napi);
+	}
+
+	/* Handle RBDR interrupts */
+	rbdr_intr = (intr & NICVF_INTR_RBDR_MASK) >> NICVF_INTR_RBDR_SHIFT;
+	if (rbdr_intr) {
+		/* Disable RBDR interrupt and schedule softirq */
+		for (qidx = 0; qidx < qs->rbdr_cnt; qidx++)
+			nicvf_disable_intr(nic, NICVF_INTR_RBDR, qidx);
+
+		tasklet_hi_schedule(&nic->rbdr_task);
+	}
+
+	/* Clear interrupts */
+	nicvf_reg_write(nic, NIC_VF_INT, intr);
+	return IRQ_HANDLED;
+}
+
+static int nicvf_enable_msix(struct nicvf *nic)
+{
+	int i, ret, vec;
+	struct queue_set *qs = nic->qs;
+
+	nic->num_vec = NIC_VF_MSIX_VECTORS;
+	vec = qs->cq_cnt + qs->rbdr_cnt + qs->sq_cnt;
+	vec = NIC_VF_MSIX_VECTORS;
+	if (vec > NIC_VF_MSIX_VECTORS)
+		nic->num_vec = vec;
+
+	for (i = 0; i < nic->num_vec; i++)
+		nic->msix_entries[i].entry = i;
+
+	ret = pci_enable_msix(nic->pdev, nic->msix_entries, nic->num_vec);
+	if (ret) {
+		netdev_err(nic->netdev,
+			   "Req for #%d msix vectors failed\n", nic->num_vec);
+		return 0;
+	}
+	nic->msix_enabled = 1;
+	return 1;
+}
+
+static void nicvf_disable_msix(struct nicvf *nic)
+{
+	if (nic->msix_enabled) {
+		pci_disable_msix(nic->pdev);
+		nic->msix_enabled = 0;
+		nic->num_vec = 0;
+	}
+}
+
+static int nicvf_register_interrupts(struct nicvf *nic)
+{
+	int irq, free, ret = 0;
+	int vector;
+
+	for_each_cq_irq(irq)
+		sprintf(nic->irq_name[irq], "%s%d CQ%d", "NICVF",
+			nic->vf_id, irq);
+
+	for_each_sq_irq(irq)
+		sprintf(nic->irq_name[irq], "%s%d SQ%d", "NICVF",
+			nic->vf_id, irq - NICVF_INTR_ID_SQ);
+
+	for_each_rbdr_irq(irq)
+		sprintf(nic->irq_name[irq], "%s%d RBDR%d", "NICVF",
+			nic->vf_id, irq - NICVF_INTR_ID_RBDR);
+
+	/* Register all interrupts except mailbox */
+	for (irq = 0; irq < NICVF_INTR_ID_MISC; irq++) {
+		vector = nic->msix_entries[irq].vector;
+		ret = request_irq(vector, nicvf_intr_handler,
+				  0, nic->irq_name[irq], nic);
+		if (ret)
+			break;
+		nic->irq_allocated[irq] = 1;
+	}
+
+	sprintf(nic->irq_name[NICVF_INTR_ID_QS_ERR],
+		"%s%d Qset error", "NICVF", nic->vf_id);
+	if (!ret) {
+		vector = nic->msix_entries[NICVF_INTR_ID_QS_ERR].vector;
+		irq = NICVF_INTR_ID_QS_ERR;
+		ret = request_irq(vector, nicvf_intr_handler,
+				  0, nic->irq_name[irq], nic);
+		if (!ret)
+			nic->irq_allocated[irq] = 1;
+	}
+
+	if (ret) {
+		netdev_err(nic->netdev, "Request irq failed\n");
+		for (free = 0; free < irq; free++)
+			free_irq(nic->msix_entries[free].vector, nic);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void nicvf_unregister_interrupts(struct nicvf *nic)
+{
+	int irq;
+
+	/* Free registered interrupts */
+	for (irq = 0; irq < nic->num_vec; irq++) {
+		if (nic->irq_allocated[irq])
+			free_irq(nic->msix_entries[irq].vector, nic);
+		nic->irq_allocated[irq] = 0;
+	}
+
+	/* Disable MSI-X */
+	nicvf_disable_msix(nic);
+}
+
+static int nicvf_register_misc_interrupt(struct nicvf *nic)
+{
+	int ret = 0;
+	int irq = NICVF_INTR_ID_MISC;
+
+	/* Enable MSI-X */
+	if (!nicvf_enable_msix(nic))
+		return 1;
+
+	sprintf(nic->irq_name[irq], "%s Mbox", "NICVF");
+	/* Register Misc interrupt */
+	ret = request_irq(nic->msix_entries[irq].vector,
+			  nicvf_misc_intr_handler, 0, nic->irq_name[irq], nic);
+
+	if (ret)
+		return ret;
+	nic->irq_allocated[irq] = 1;
+
+	/* Enable mailbox interrupt */
+	nicvf_enable_intr(nic, NICVF_INTR_MBOX, 0);
+
+	/* Check if VF is able to communicate with PF */
+	if (!nicvf_check_pf_ready(nic)) {
+		nicvf_disable_intr(nic, NICVF_INTR_MBOX, 0);
+		nicvf_unregister_interrupts(nic);
+		return 1;
+	}
+
+	return 0;
+}
+
+#ifdef VNIC_SW_TSO_SUPPORT
+static int nicvf_sw_tso(struct sk_buff *skb, struct net_device *netdev)
+{
+	struct sk_buff *segs, *nskb;
+	struct nicvf *nic = netdev_priv(netdev);
+
+	if (!skb_shinfo(skb)->gso_size)
+		return 1;
+
+	nic->drv_stats.tx_tso++;
+	/* Segment the large frame */
+	segs = skb_gso_segment(skb, netdev->features & ~NETIF_F_TSO);
+	if (IS_ERR(segs))
+		goto gso_err;
+
+	do {
+		nskb = segs;
+		segs = segs->next;
+		nskb->next = NULL;
+		nicvf_xmit(nskb, netdev);
+	} while (segs);
+
+gso_err:
+	dev_kfree_skb(skb);
+
+	return NETDEV_TX_OK;
+}
+#endif
+
+static netdev_tx_t nicvf_xmit(struct sk_buff *skb, struct net_device *netdev)
+{
+	struct nicvf *nic = netdev_priv(netdev);
+	int qid = skb_get_queue_mapping(skb);
+	struct netdev_queue *txq = netdev_get_tx_queue(netdev, qid);
+	int ret = 1;
+
+	/* Check for minimum packet length */
+	if (skb->len <= ETH_HLEN) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+#ifdef VNIC_SW_TSO_SUPPORT
+	if (netdev->features & NETIF_F_TSO)
+		ret = nicvf_sw_tso(skb, netdev);
+#endif
+	if (ret == NETDEV_TX_OK)
+		return NETDEV_TX_OK;
+
+#ifndef VNIC_TX_CSUM_OFFLOAD_SUPPORT
+	/* Calculate checksum in software */
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		if (unlikely(skb_checksum_help(skb))) {
+			netdev_dbg(netdev, "unable to do checksum\n");
+			dev_kfree_skb_any(skb);
+			return NETDEV_TX_OK;
+		}
+	}
+#endif
+#ifdef VNIC_HW_TSO_SUPPORT
+	if (skb_shinfo(skb)->gso_size && (skb->protocol == ETH_P_IP) &&
+	    (ip_hdr(skb)->protocol != IPPROTO_TCP)) {
+		netdev_dbg(netdev,
+			   "Only TCP segmentation is supported, dropping packet\n");
+		dev_kfree_skb_any(skb);
+		return NETDEV_TX_OK;
+	}
+#endif
+	if (!nicvf_sq_append_skb(nic, skb) && !netif_tx_queue_stopped(txq)) {
+		netif_tx_stop_queue(txq);
+		nic->drv_stats.tx_busy++;
+		netdev_err(netdev, "VF%d: TX ring full, stopping queue%d\n",
+			   nic->vf_id, qid);
+		return NETDEV_TX_BUSY;
+	}
+
+	return NETDEV_TX_OK;
+}
+
+int nicvf_stop(struct net_device *netdev)
+{
+	int qidx;
+	struct nicvf *nic = netdev_priv(netdev);
+	struct queue_set *qs = nic->qs;
+
+	netif_carrier_off(netdev);
+	netif_tx_disable(netdev);
+
+	/* Disable HW Qset, to stop receiving packets */
+	nicvf_qset_config(nic, false);
+
+	/* disable mailbox interrupt */
+	nicvf_disable_intr(nic, NICVF_INTR_MBOX, 0);
+
+	/* Disable interrupts */
+	for (qidx = 0; qidx < qs->cq_cnt; qidx++)
+		nicvf_disable_intr(nic, NICVF_INTR_CQ, qidx);
+	for (qidx = 0; qidx < qs->rbdr_cnt; qidx++)
+		nicvf_disable_intr(nic, NICVF_INTR_RBDR, qidx);
+	nicvf_disable_intr(nic, NICVF_INTR_QS_ERR, 0);
+
+	tasklet_kill(&nic->rbdr_task);
+	tasklet_kill(&nic->qs_err_task);
+
+	nicvf_unregister_interrupts(nic);
+	for (qidx = 0; qidx < nic->qs->cq_cnt; qidx++) {
+		napi_synchronize(&nic->napi[qidx]->napi);
+		napi_disable(&nic->napi[qidx]->napi);
+		netif_napi_del(&nic->napi[qidx]->napi);
+		kfree(nic->napi[qidx]);
+		nic->napi[qidx] = NULL;
+	}
+	/* Free resources */
+	nicvf_config_data_transfer(nic, false);
+
+	return 0;
+}
+
+int nicvf_open(struct net_device *netdev)
+{
+	int err, qidx;
+	struct nicvf *nic = netdev_priv(netdev);
+	struct queue_set *qs = nic->qs;
+	struct nicvf_cq_poll *cq_poll = NULL;
+
+	nic->mtu = netdev->mtu;
+
+	netif_carrier_off(netdev);
+
+	err = nicvf_register_misc_interrupt(nic);
+	if (err)
+		return err;
+
+	err = nicvf_init_resources(nic);
+	if (err) {
+		nicvf_disable_intr(nic, NICVF_INTR_MBOX, 0);
+		nicvf_unregister_interrupts(nic);
+		return err;
+	}
+
+	err = nicvf_register_interrupts(nic);
+	if (err) {
+		nicvf_stop(netdev);
+		return err;
+	}
+
+	for (qidx = 0; qidx < qs->cq_cnt; qidx++) {
+		cq_poll = kzalloc(sizeof(*cq_poll), GFP_KERNEL);
+		if (!cq_poll)
+			goto napi_del;
+		cq_poll->cq_idx = qidx;
+		netif_napi_add(netdev, &cq_poll->napi, nicvf_poll,
+			       NAPI_POLL_WEIGHT);
+		napi_enable(&cq_poll->napi);
+		nic->napi[qidx] = cq_poll;
+	}
+	goto no_err;
+napi_del:
+	while (qidx) {
+		qidx--;
+		cq_poll = nic->napi[qidx];
+		napi_disable(&cq_poll->napi);
+		netif_napi_del(&cq_poll->napi);
+		kfree(cq_poll);
+		nic->napi[qidx] = NULL;
+	}
+	return -ENOMEM;
+no_err:
+	/* Set MAC-ID */
+	if (is_zero_ether_addr(netdev->dev_addr))
+		eth_hw_addr_random(netdev);
+
+	nicvf_hw_set_mac_addr(nic, netdev);
+
+	/* Init tasklet for handling Qset err interrupt */
+	tasklet_init(&nic->qs_err_task, nicvf_handle_qs_err,
+		     (unsigned long)nic);
+
+	/* Enable Qset err interrupt */
+	nicvf_enable_intr(nic, NICVF_INTR_QS_ERR, 0);
+
+	/* Enable completion queue interrupt */
+	for (qidx = 0; qidx < qs->cq_cnt; qidx++)
+		nicvf_enable_intr(nic, NICVF_INTR_CQ, qidx);
+
+	/* Init RBDR tasklet and enable RBDR threshold interrupt */
+	tasklet_init(&nic->rbdr_task, nicvf_refill_rbdr,
+		     (unsigned long)nic);
+
+	for (qidx = 0; qidx < qs->rbdr_cnt; qidx++)
+		nicvf_enable_intr(nic, NICVF_INTR_RBDR, qidx);
+
+	/* Configure CPI alorithm */
+	nic->cpi_alg = cpi_alg;
+	nicvf_config_cpi(nic);
+
+#ifdef	VNIC_RSS_SUPPORT
+	/* Configure receive side scaling */
+	nicvf_rss_init(nic);
+#endif
+
+	if (nicvf_is_link_active(nic)) {
+		netif_carrier_on(netdev);
+		netif_tx_start_all_queues(netdev);
+	}
+
+	return 0;
+}
+
+static int nicvf_update_hw_max_frs(struct nicvf *nic, int mtu)
+{
+	struct nic_mbx mbx = {};
+
+	mbx.msg = NIC_PF_VF_MSG_SET_MAX_FRS;
+	mbx.data.frs.max_frs = mtu;
+	mbx.data.frs.vf_id = nic->vf_id;
+
+	return nicvf_send_msg_to_pf(nic, &mbx);
+}
+
+static int nicvf_change_mtu(struct net_device *netdev, int new_mtu)
+{
+	struct nicvf *nic = netdev_priv(netdev);
+
+	if (new_mtu > NIC_HW_MAX_FRS)
+		return -EINVAL;
+
+	if (new_mtu < NIC_HW_MIN_FRS)
+		return -EINVAL;
+
+	if (nicvf_update_hw_max_frs(nic, new_mtu))
+		return -EINVAL;
+	netdev->mtu = new_mtu;
+
+	return 0;
+}
+
+static int nicvf_set_mac_address(struct net_device *netdev, void *p)
+{
+	struct sockaddr *addr = p;
+	struct nicvf *nic = netdev_priv(netdev);
+
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+
+	if (nicvf_hw_set_mac_addr(nic, netdev))
+		return -EBUSY;
+
+	memcpy(netdev->dev_addr, addr->sa_data, netdev->addr_len);
+
+	return 0;
+}
+
+void nicvf_update_stats(struct nicvf *nic)
+{
+	int qidx;
+	struct nicvf_hw_stats *stats = &nic->stats;
+	struct nicvf_drv_stats *drv_stats = &nic->drv_stats;
+	struct queue_set *qs = nic->qs;
+
+#define GET_RX_STATS(reg) \
+	nicvf_reg_read(nic, NIC_VNIC_RX_STAT_0_13 | (reg << 3))
+#define GET_TX_STATS(reg) \
+	nicvf_reg_read(nic, NIC_VNIC_TX_STAT_0_4 | (reg << 3))
+
+	stats->rx_bytes_ok = GET_RX_STATS(RX_OCTS);
+	stats->rx_ucast_frames_ok = GET_RX_STATS(RX_UCAST);
+	stats->rx_bcast_frames_ok = GET_RX_STATS(RX_BCAST);
+	stats->rx_mcast_frames_ok = GET_RX_STATS(RX_MCAST);
+	stats->rx_fcs_errors = GET_RX_STATS(RX_FCS);
+	stats->rx_l2_errors = GET_RX_STATS(RX_L2ERR);
+	stats->rx_drop_red = GET_RX_STATS(RX_RED);
+	stats->rx_drop_overrun = GET_RX_STATS(RX_ORUN);
+	stats->rx_drop_bcast = GET_RX_STATS(RX_DRP_BCAST);
+	stats->rx_drop_mcast = GET_RX_STATS(RX_DRP_MCAST);
+	stats->rx_drop_l3_bcast = GET_RX_STATS(RX_DRP_L3BCAST);
+	stats->rx_drop_l3_mcast = GET_RX_STATS(RX_DRP_L3MCAST);
+
+	stats->tx_bytes_ok = GET_TX_STATS(TX_OCTS);
+	stats->tx_ucast_frames_ok = GET_TX_STATS(TX_UCAST);
+	stats->tx_bcast_frames_ok = GET_TX_STATS(TX_BCAST);
+	stats->tx_mcast_frames_ok = GET_TX_STATS(TX_MCAST);
+	stats->tx_drops = GET_RX_STATS(TX_DROP);
+
+	drv_stats->rx_frames_ok = stats->rx_ucast_frames_ok +
+				  stats->rx_bcast_frames_ok +
+				  stats->rx_mcast_frames_ok;
+	drv_stats->tx_frames_ok = stats->tx_ucast_frames_ok +
+				  stats->tx_bcast_frames_ok +
+				  stats->tx_mcast_frames_ok;
+	drv_stats->rx_drops = stats->rx_drop_red +
+			      stats->rx_drop_overrun;
+	drv_stats->rx_drops = stats->tx_drops;
+
+	/* Update RQ and SQ stats */
+	for (qidx = 0; qidx < qs->rq_cnt; qidx++)
+		nicvf_update_rq_stats(nic, qidx);
+	for (qidx = 0; qidx < qs->sq_cnt; qidx++)
+		nicvf_update_sq_stats(nic, qidx);
+}
+
+struct rtnl_link_stats64 *nicvf_get_stats64(struct net_device *netdev,
+					    struct rtnl_link_stats64 *stats)
+{
+	struct nicvf *nic = netdev_priv(netdev);
+	struct nicvf_hw_stats *hw_stats = &nic->stats;
+	struct nicvf_drv_stats *drv_stats = &nic->drv_stats;
+
+	nicvf_update_stats(nic);
+
+	stats->rx_bytes = hw_stats->rx_bytes_ok;
+	stats->rx_packets = drv_stats->rx_frames_ok;
+	stats->rx_dropped = drv_stats->rx_drops;
+
+	stats->tx_bytes = hw_stats->tx_bytes_ok;
+	stats->tx_packets = drv_stats->tx_frames_ok;
+	stats->tx_dropped = drv_stats->tx_drops;
+
+	return stats;
+}
+
+static void nicvf_tx_timeout(struct net_device *dev)
+{
+	struct nicvf *nic = netdev_priv(dev);
+
+	netdev_info(dev, "tx timeout\n");
+
+	schedule_work(&nic->reset_task);
+}
+
+static void nicvf_reset_task(struct work_struct *work)
+{
+	struct nicvf *nic;
+
+	nic = container_of(work, struct nicvf, reset_task);
+
+	if (!netif_running(nic->netdev))
+		return;
+
+	nicvf_stop(nic->netdev);
+	nicvf_open(nic->netdev);
+}
+
+static const struct net_device_ops nicvf_netdev_ops = {
+	.ndo_open		= nicvf_open,
+	.ndo_stop		= nicvf_stop,
+	.ndo_start_xmit		= nicvf_xmit,
+	.ndo_change_mtu		= nicvf_change_mtu,
+	.ndo_set_mac_address	= nicvf_set_mac_address,
+	.ndo_get_stats64	= nicvf_get_stats64,
+	.ndo_tx_timeout         = nicvf_tx_timeout,
+};
+
+static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	struct device *dev = &pdev->dev;
+	struct net_device *netdev;
+	struct nicvf *nic;
+	struct queue_set *qs;
+	int    err;
+
+	err = pci_enable_device(pdev);
+	if (err) {
+		dev_err(dev, "Failed to enable PCI device\n");
+		goto exit;
+	}
+
+	err = pci_request_regions(pdev, DRV_NAME);
+	if (err) {
+		dev_err(dev, "PCI request regions failed 0x%x\n", err);
+		goto err_disable_device;
+	}
+
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(48));
+	if (err) {
+		dev_err(dev, "Unable to get usable DMA configuration\n");
+		goto err_release_regions;
+	}
+
+	err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(48));
+	if (err) {
+		dev_err(dev, "unable to get 48-bit DMA for consistent allocations\n");
+		goto err_release_regions;
+	}
+
+	netdev = alloc_etherdev_mqs(sizeof(struct nicvf),
+				    MAX_RCV_QUEUES_PER_QS,
+				    MAX_SND_QUEUES_PER_QS);
+	if (!netdev) {
+		err = -ENOMEM;
+		goto err_release_regions;
+	}
+
+	pci_set_drvdata(pdev, netdev);
+
+	SET_NETDEV_DEV(netdev, &pdev->dev);
+
+	nic = netdev_priv(netdev);
+	nic->netdev = netdev;
+	nic->pdev = pdev;
+
+	/* MAP VF's configuration registers */
+	nic->reg_base = (uint64_t)pci_ioremap_bar(pdev, PCI_CFG_REG_BAR_NUM);
+	if (!nic->reg_base) {
+		dev_err(dev, "Cannot map config register space, aborting\n");
+		err = -ENOMEM;
+		goto err_release_regions;
+	}
+
+	err = nicvf_set_qset_resources(nic);
+	if (err)
+		goto err_unmap_resources;
+
+	qs = nic->qs;
+
+	err = nicvf_set_real_num_queues(netdev, qs->sq_cnt, qs->rq_cnt);
+	if (err)
+		goto err_unmap_resources;
+
+#ifdef VNIC_RX_CSUM_OFFLOAD_SUPPORT
+	netdev->hw_features |= NETIF_F_RXCSUM;
+#endif
+#ifdef VNIC_TX_CSUM_OFFLOAD_SUPPORT
+	netdev->hw_features |= NETIF_F_IP_CSUM;
+#endif
+#ifdef VNIC_SG_SUPPORT
+	netdev->hw_features |= NETIF_F_SG;
+#endif
+#ifdef VNIC_TSO_SUPPORT
+	netdev->hw_features |= NETIF_F_TSO | NETIF_F_SG | NETIF_F_IP_CSUM;
+#endif
+	netdev->features |= netdev->hw_features;
+	netdev->netdev_ops = &nicvf_netdev_ops;
+
+	INIT_WORK(&nic->reset_task, nicvf_reset_task);
+
+	err = register_netdev(netdev);
+	if (err) {
+		dev_err(dev, "Failed to register netdevice\n");
+		goto err_unmap_resources;
+	}
+
+#ifdef NICVF_ETHTOOL_ENABLE
+	nicvf_set_ethtool_ops(netdev);
+#endif
+	goto exit;
+
+err_unmap_resources:
+	if (nic->reg_base)
+		iounmap((void *)nic->reg_base);
+err_release_regions:
+	pci_release_regions(pdev);
+err_disable_device:
+	pci_disable_device(pdev);
+exit:
+	return err;
+}
+
+static void nicvf_remove(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	struct nicvf *nic;
+
+	if (!netdev)
+		return;
+
+	nic = netdev_priv(netdev);
+	unregister_netdev(netdev);
+
+	pci_set_drvdata(pdev, NULL);
+
+	if (nic->reg_base)
+		iounmap((void *)nic->reg_base);
+
+	/* Free Qset */
+	kfree(nic->qs);
+
+	pci_release_regions(pdev);
+	pci_disable_device(pdev);
+	free_netdev(netdev);
+}
+
+static struct pci_driver nicvf_driver = {
+	.name = DRV_NAME,
+	.id_table = nicvf_id_table,
+	.probe = nicvf_probe,
+	.remove = nicvf_remove,
+};
+
+static int __init nicvf_init_module(void)
+{
+	pr_info("%s, ver %s\n", DRV_NAME, DRV_VERSION);
+
+	return pci_register_driver(&nicvf_driver);
+}
+
+static void __exit nicvf_cleanup_module(void)
+{
+	pci_unregister_driver(&nicvf_driver);
+}
+
+module_init(nicvf_init_module);
+module_exit(nicvf_cleanup_module);
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
new file mode 100644
index 000000000000..ffbfe7425c61
--- /dev/null
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -0,0 +1,1302 @@
+/*
+ * Copyright (C) 2014 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ */
+
+#include <linux/pci.h>
+#include <linux/netdevice.h>
+
+#include "nic_reg.h"
+#include "nic.h"
+#include "q_struct.h"
+#include "nicvf_queues.h"
+
+#define  MIN_SND_QUEUE_DESC_FOR_PKT_XMIT 2
+
+static int nicvf_alloc_q_desc_mem(struct nicvf *nic, struct q_desc_mem *dmem,
+				  int q_len, int desc_size, int align_bytes)
+{
+	dmem->q_len = q_len;
+	dmem->size = (desc_size * q_len) + align_bytes;
+	dmem->unalign_base = dma_alloc_coherent(&nic->pdev->dev, dmem->size,
+						&dmem->dma, GFP_KERNEL);
+	if (!dmem->unalign_base)
+		return -1;
+
+	dmem->phys_base = NICVF_ALIGNED_ADDR((uint64_t)dmem->dma, align_bytes);
+	dmem->base = (void *)((u8 *)dmem->unalign_base +
+			      (dmem->phys_base - dmem->dma));
+	return 0;
+}
+
+static void nicvf_free_q_desc_mem(struct nicvf *nic, struct q_desc_mem *dmem)
+{
+	if (!dmem)
+		return;
+
+	dma_free_coherent(&nic->pdev->dev, dmem->size,
+			  dmem->unalign_base, dmem->dma);
+	dmem->unalign_base = NULL;
+	dmem->base = NULL;
+}
+
+static int nicvf_alloc_rcv_buffer(struct nicvf *nic,
+				  uint64_t buf_len, unsigned char **rbuf)
+{
+	struct sk_buff *skb = NULL;
+
+	buf_len += NICVF_RCV_BUF_ALIGN_BYTES + sizeof(void *);
+
+	skb = netdev_alloc_skb(nic->netdev, buf_len);
+	if (!skb) {
+		netdev_err(nic->netdev, "Failed to allocate new rcv buffer\n");
+		return -ENOMEM;
+	}
+
+	/* Reserve bytes for storing skb address */
+	skb_reserve(skb, sizeof(void *));
+	/* Align buffer addr to cache line i.e 128 bytes */
+	skb_reserve(skb, NICVF_RCV_BUF_ALIGN_LEN((uint64_t)skb->data));
+
+	/* Store skb address */
+	*(struct sk_buff **)(skb->data - sizeof(void *)) = skb;
+
+	/* Return buffer address */
+	*rbuf = skb->data;
+	return 0;
+}
+
+static struct sk_buff *nicvf_rb_ptr_to_skb(struct nicvf *nic, uint64_t rb_ptr)
+{
+	struct sk_buff *skb;
+
+	rb_ptr = (uint64_t)phys_to_virt(dma_to_phys(&nic->pdev->dev, rb_ptr));
+	skb = (struct sk_buff *)*(uint64_t *)(rb_ptr - sizeof(void *));
+	return skb;
+}
+
+static int  nicvf_init_rbdr(struct nicvf *nic, struct rbdr *rbdr,
+			    int ring_len, int buf_size)
+{
+	int idx;
+	unsigned char *rbuf;
+	struct rbdr_entry_t *desc;
+
+	if (nicvf_alloc_q_desc_mem(nic, &rbdr->dmem, ring_len,
+				   sizeof(struct rbdr_entry_t),
+				   NICVF_RCV_BUF_ALIGN_BYTES)) {
+		netdev_err(nic->netdev,
+			   "Unable to allocate memory for rcv buffer ring\n");
+		return -ENOMEM;
+	}
+
+	rbdr->desc = rbdr->dmem.base;
+	/* Buffer size has to be in multiples of 128 bytes */
+	rbdr->buf_size = buf_size;
+	rbdr->enable = true;
+	rbdr->thresh = RBDR_THRESH;
+
+	for (idx = 0; idx < ring_len; idx++) {
+		if (nicvf_alloc_rcv_buffer(nic, rbdr->buf_size, &rbuf))
+			return -ENOMEM;
+
+		desc = GET_RBDR_DESC(rbdr, idx);
+		desc->buf_addr = pci_map_single(nic->pdev, rbuf,
+						rbdr->buf_size,
+						PCI_DMA_FROMDEVICE) >>
+						NICVF_RCV_BUF_ALIGN;
+	}
+	return 0;
+}
+
+static void nicvf_free_rbdr(struct nicvf *nic, struct rbdr *rbdr, int rbdr_qidx)
+{
+	int head, tail;
+	struct sk_buff *skb;
+	uint64_t buf_addr;
+	struct rbdr_entry_t *desc;
+
+	if (!rbdr)
+		return;
+
+	rbdr->enable = false;
+	if (!rbdr->dmem.base)
+		return;
+
+	head = nicvf_queue_reg_read(nic,
+				    NIC_QSET_RBDR_0_1_HEAD, rbdr_qidx) >> 3;
+	tail = nicvf_queue_reg_read(nic,
+				    NIC_QSET_RBDR_0_1_TAIL, rbdr_qidx) >> 3;
+	/* Free SKBs */
+	while (head != tail) {
+		desc = GET_RBDR_DESC(rbdr, head);
+		buf_addr = desc->buf_addr << NICVF_RCV_BUF_ALIGN;
+		skb = nicvf_rb_ptr_to_skb(nic, buf_addr);
+		pci_unmap_single(nic->pdev, (dma_addr_t)buf_addr,
+				 rbdr->buf_size, PCI_DMA_FROMDEVICE);
+		dev_kfree_skb(skb);
+		head++;
+		head &= (rbdr->dmem.q_len - 1);
+	}
+	/* Free SKB of tail desc */
+	desc = GET_RBDR_DESC(rbdr, tail);
+	buf_addr = desc->buf_addr << NICVF_RCV_BUF_ALIGN;
+	skb = nicvf_rb_ptr_to_skb(nic, buf_addr);
+	pci_unmap_single(nic->pdev, (dma_addr_t)buf_addr,
+			 rbdr->buf_size, PCI_DMA_FROMDEVICE);
+	dev_kfree_skb(skb);
+
+	/* Free RBDR ring */
+	nicvf_free_q_desc_mem(nic, &rbdr->dmem);
+}
+
+/* Refill receive buffer descriptors with new buffers.
+ * This runs in softirq context .
+ */
+void nicvf_refill_rbdr(unsigned long data)
+{
+	struct nicvf *nic = (struct nicvf *)data;
+	struct queue_set *qs = nic->qs;
+	int rbdr_idx = qs->rbdr_cnt;
+	int tail, qcount;
+	int refill_rb_cnt;
+	struct rbdr *rbdr;
+	unsigned char *rbuf;
+	struct rbdr_entry_t *desc;
+
+refill:
+	if (!rbdr_idx)
+		return;
+	rbdr_idx--;
+	rbdr = &qs->rbdr[rbdr_idx];
+	/* Check if it's enabled */
+	if (!rbdr->enable)
+		goto next_rbdr;
+
+	/* check if valid descs reached or crossed threshold level */
+	qcount = nicvf_queue_reg_read(nic, NIC_QSET_RBDR_0_1_STATUS0, rbdr_idx);
+	qcount &= 0x7FFFF;
+	if (qcount > rbdr->thresh)
+		goto next_rbdr;
+
+	/* Get no of desc's to be refilled */
+	refill_rb_cnt = rbdr->thresh;
+
+	/* Start filling descs from tail */
+	tail = nicvf_queue_reg_read(nic, NIC_QSET_RBDR_0_1_TAIL, rbdr_idx) >> 3;
+	while (refill_rb_cnt) {
+		tail++;
+		tail &= (rbdr->dmem.q_len - 1);
+
+		if (nicvf_alloc_rcv_buffer(nic, rbdr->buf_size, &rbuf))
+			break;
+
+		desc = GET_RBDR_DESC(rbdr, tail);
+		desc->buf_addr = pci_map_single(nic->pdev, rbuf, rbdr->buf_size,
+						PCI_DMA_FROMDEVICE) >>
+						NICVF_RCV_BUF_ALIGN;
+		refill_rb_cnt--;
+	}
+	/* Notify HW */
+	nicvf_queue_reg_write(nic, NIC_QSET_RBDR_0_1_DOOR,
+			      rbdr_idx, rbdr->thresh);
+next_rbdr:
+	if (rbdr_idx)
+		goto refill;
+
+	/* Re-enable RBDR interrupts */
+	for (rbdr_idx = 0; rbdr_idx < qs->rbdr_cnt; rbdr_idx++)
+		nicvf_enable_intr(nic, NICVF_INTR_RBDR, rbdr_idx);
+}
+
+/* TBD: how to handle full packets received in CQ
+ * i.e conversion of buffers into SKBs
+ */
+static int nicvf_init_cmp_queue(struct nicvf *nic,
+				struct cmp_queue *cq, int q_len)
+{
+	int time;
+
+	if (nicvf_alloc_q_desc_mem(nic, &cq->dmem, q_len,
+				   CMP_QUEUE_DESC_SIZE,
+				   NICVF_CQ_BASE_ALIGN_BYTES)) {
+		netdev_err(nic->netdev,
+			   "Unable to allocate memory for completion queue\n");
+		return -ENOMEM;
+	}
+	cq->desc = cq->dmem.base;
+	cq->thresh = CMP_QUEUE_CQE_THRESH;
+
+	time = NIC_NS_PER_100_SYETEM_CLK / 100;
+	time = CMP_QUEUE_TIMER_THRESH / (NICPF_CLK_PER_INT_TICK * time);
+	cq->intr_timer_thresh = time;
+
+	return 0;
+}
+
+static void nicvf_free_cmp_queue(struct nicvf *nic, struct cmp_queue *cq)
+{
+	if (!cq)
+		return;
+	if (!cq->dmem.base)
+		return;
+
+	nicvf_free_q_desc_mem(nic, &cq->dmem);
+}
+
+static int nicvf_init_snd_queue(struct nicvf *nic,
+				struct snd_queue *sq, int q_len)
+{
+	if (nicvf_alloc_q_desc_mem(nic, &sq->dmem, q_len,
+				   SND_QUEUE_DESC_SIZE,
+				   NICVF_SQ_BASE_ALIGN_BYTES)) {
+		netdev_err(nic->netdev,
+			   "Unable to allocate memory for send queue\n");
+		return -ENOMEM;
+	}
+
+	sq->desc = sq->dmem.base;
+	sq->skbuff = kcalloc(q_len, sizeof(uint64_t), GFP_ATOMIC);
+	sq->head = 0;
+	sq->tail = 0;
+	sq->free_cnt = q_len;
+	sq->thresh = SND_QUEUE_THRESH;
+
+	return 0;
+}
+
+static void nicvf_free_snd_queue(struct nicvf *nic, struct snd_queue *sq)
+{
+	if (!sq)
+		return;
+	if (!sq->dmem.base)
+		return;
+
+	kfree(sq->skbuff);
+	nicvf_free_q_desc_mem(nic, &sq->dmem);
+}
+
+static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,
+				   int qidx, bool enable)
+{
+	struct nic_mbx mbx = {};
+	struct rcv_queue *rq;
+	struct rq_cfg rq_cfg;
+
+	rq = &qs->rq[qidx];
+	rq->enable = enable;
+
+	if (!rq->enable) {
+		/* Disable receive queue */
+		nicvf_queue_reg_write(nic, NIC_QSET_RQ_0_7_CFG, qidx, 0);
+		return;
+	}
+
+	rq->cq_qs = qs->vnic_id;
+	rq->cq_idx = qidx;
+	rq->start_rbdr_qs = qs->vnic_id;
+	rq->start_qs_rbdr_idx = qs->rbdr_cnt - 1;
+	rq->cont_rbdr_qs = qs->vnic_id;
+	rq->cont_qs_rbdr_idx = qs->rbdr_cnt - 1;
+	rq->caching = 1;
+
+	/* Send a mailbox msg to PF to config RQ */
+	mbx.msg = NIC_PF_VF_MSG_RQ_CFG;
+	mbx.data.rq.qs_num = qs->vnic_id;
+	mbx.data.rq.rq_num = qidx;
+	mbx.data.rq.cfg = (rq->caching << 26) | (rq->cq_qs << 19) |
+			  (rq->cq_idx << 16) | (rq->cont_rbdr_qs << 9) |
+			  (rq->cont_qs_rbdr_idx << 8) |
+			  (rq->start_rbdr_qs << 1) | (rq->start_qs_rbdr_idx);
+	nicvf_send_msg_to_pf(nic, &mbx);
+
+	/* RQ drop config
+	 * Enable CQ drop to reserve sufficient CQEs for all tx packets
+	 */
+	mbx.msg = NIC_PF_VF_MSG_RQ_DROP_CFG;
+	mbx.data.rq.cfg = (1ULL << 62) | (RQ_CQ_DROP << 8);
+	nicvf_send_msg_to_pf(nic, &mbx);
+
+	nicvf_queue_reg_write(nic, NIC_QSET_RQ_GEN_CFG, qidx, 0x00);
+
+	/* Enable Receive queue */
+	rq_cfg.ena = 1;
+	rq_cfg.tcp_ena = 0;
+	nicvf_queue_reg_write(nic, NIC_QSET_RQ_0_7_CFG, qidx, *(u64 *)&rq_cfg);
+}
+
+void nicvf_cmp_queue_config(struct nicvf *nic, struct queue_set *qs,
+			    int qidx, bool enable)
+{
+	struct cmp_queue *cq;
+	struct cq_cfg cq_cfg;
+
+	cq = &qs->cq[qidx];
+	cq->enable = enable;
+	if (!cq->enable) {
+		/* Disable completion queue */
+		nicvf_queue_reg_write(nic, NIC_QSET_CQ_0_7_CFG, qidx, 0);
+		return;
+	}
+
+	/* Reset completion queue */
+	nicvf_queue_reg_write(nic, NIC_QSET_CQ_0_7_CFG, qidx, NICVF_CQ_RESET);
+
+	/* Set completion queue base address */
+	nicvf_queue_reg_write(nic, NIC_QSET_CQ_0_7_BASE,
+			      qidx, (uint64_t)(cq->dmem.phys_base));
+
+	/* Enable Completion queue */
+	cq_cfg.ena = 1;
+	cq_cfg.reset = 0;
+	cq_cfg.caching = 1;
+	cq_cfg.qsize = (qs->cq_len >> 10) - 1;
+	cq_cfg.avg_con = 0;
+	nicvf_queue_reg_write(nic, NIC_QSET_CQ_0_7_CFG, qidx, *(u64 *)&cq_cfg);
+
+	/* Set threshold value for interrupt generation */
+	nicvf_queue_reg_write(nic, NIC_QSET_CQ_0_7_THRESH, qidx, cq->thresh);
+	nicvf_queue_reg_write(nic, NIC_QSET_CQ_0_7_CFG2,
+			      qidx, cq->intr_timer_thresh);
+}
+
+static void nicvf_snd_queue_config(struct nicvf *nic, struct queue_set *qs,
+				   int qidx, bool enable)
+{
+	struct nic_mbx mbx = {};
+	struct snd_queue *sq;
+	struct sq_cfg sq_cfg;
+
+	sq = &qs->sq[qidx];
+	sq->enable = enable;
+	if (!sq->enable) {
+		/* Disable send queue */
+		nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_CFG, qidx, 0);
+		return;
+	}
+
+	/* Reset send queue */
+	nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_CFG, qidx, NICVF_SQ_RESET);
+
+	sq->cq_qs = qs->vnic_id;
+	sq->cq_idx = qidx;
+
+	/* Send a mailbox msg to PF to config SQ */
+	mbx.msg = NIC_PF_VF_MSG_SQ_CFG;
+	mbx.data.sq.qs_num = qs->vnic_id;
+	mbx.data.sq.sq_num = qidx;
+	mbx.data.sq.cfg = (sq->cq_qs << 3) | sq->cq_idx;
+	nicvf_send_msg_to_pf(nic, &mbx);
+
+	/* Set queue base address */
+	nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_BASE,
+			      qidx, (uint64_t)(sq->dmem.phys_base));
+
+	/* Enable send queue  & set queue size */
+	sq_cfg.ena = 1;
+	sq_cfg.reset = 0;
+	sq_cfg.ldwb = 0;
+	sq_cfg.qsize = (qs->sq_len >> 10) - 1;
+	sq_cfg.tstmp_bgx_intf = 0;
+	nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_CFG, qidx, *(u64 *)&sq_cfg);
+
+	/* Set threshold value for interrupt generation */
+	nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_THRESH, qidx, sq->thresh);
+}
+
+static void nicvf_rbdr_config(struct nicvf *nic, struct queue_set *qs,
+			      int qidx, bool enable)
+{
+	int reset, timeout = 10;
+	struct rbdr *rbdr;
+	struct rbdr_cfg rbdr_cfg;
+
+	rbdr = &qs->rbdr[qidx];
+	if (!enable) {
+		/* Disable RBDR */
+		nicvf_queue_reg_write(nic, NIC_QSET_RBDR_0_1_CFG, qidx, 0);
+		return;
+	}
+
+	/* Reset RBDR */
+	nicvf_queue_reg_write(nic, NIC_QSET_RBDR_0_1_CFG,
+			      qidx, NICVF_RBDR_RESET);
+	/* Wait for reset to finish */
+	while (1) {
+		usleep_range(2000, 3000);
+		reset = nicvf_queue_reg_read(nic, NIC_QSET_RBDR_0_1_CFG, qidx);
+		if (!(reset & NICVF_RBDR_RESET))
+			break;
+		timeout--;
+		if (!timeout) {
+			netdev_err(nic->netdev,
+				   "RBDR%d didn't come out of reset\n", qidx);
+			return;
+		}
+	}
+
+	/* Set descriptor base address */
+	nicvf_queue_reg_write(nic, NIC_QSET_RBDR_0_1_BASE,
+			      qidx, (uint64_t)(rbdr->dmem.phys_base));
+
+	/* Enable RBDR  & set queue size */
+	/* Buffer size should be in multiples of 128 bytes */
+	rbdr_cfg.ena = 1;
+	rbdr_cfg.reset = 0;
+	rbdr_cfg.ldwb = 0;
+	rbdr_cfg.qsize = (qs->rbdr_len >> 13) - 1;
+	rbdr_cfg.avg_con = 0;
+	rbdr_cfg.lines = rbdr->buf_size / 128;
+	nicvf_queue_reg_write(nic, NIC_QSET_RBDR_0_1_CFG,
+			      qidx, *(u64 *)&rbdr_cfg);
+
+	/* Notify HW */
+	nicvf_queue_reg_write(nic, NIC_QSET_RBDR_0_1_DOOR,
+			      qidx, qs->rbdr_len - 1);
+
+	/* Set threshold value for interrupt generation */
+	nicvf_queue_reg_write(nic, NIC_QSET_RBDR_0_1_THRESH,
+			      qidx, rbdr->thresh - 1);
+}
+
+void nicvf_qset_config(struct nicvf *nic, bool enable)
+{
+	struct  nic_mbx mbx = {};
+	struct queue_set *qs = nic->qs;
+	struct qs_cfg *qs_cfg;
+
+	qs->enable = enable;
+
+	/* Send a mailbox msg to PF to config Qset */
+	mbx.msg = NIC_PF_VF_MSG_QS_CFG;
+	mbx.data.qs.num = qs->vnic_id;
+
+	mbx.data.qs.cfg = 0;
+	qs_cfg = (struct qs_cfg *)&mbx.data.qs.cfg;
+	if (qs->enable) {
+		qs_cfg->ena = 1;
+#ifdef __BIG_ENDIAN
+		qs_cfg->be = 1;
+#endif
+		qs_cfg->vnic = qs->vnic_id;
+	}
+	nicvf_send_msg_to_pf(nic, &mbx);
+}
+
+static void nicvf_free_resources(struct nicvf *nic)
+{
+	int qidx;
+	struct queue_set *qs = nic->qs;
+
+	if (!qs)
+		return;
+
+	/* Free receive buffer descriptor ring */
+	for (qidx = 0; qidx < qs->rbdr_cnt; qidx++)
+		nicvf_free_rbdr(nic, &qs->rbdr[qidx], qidx);
+
+	/* Free completion queue */
+	for (qidx = 0; qidx < qs->cq_cnt; qidx++)
+		nicvf_free_cmp_queue(nic, &qs->cq[qidx]);
+
+	/* Free send queue */
+	for (qidx = 0; qidx < qs->sq_cnt; qidx++)
+		nicvf_free_snd_queue(nic, &qs->sq[qidx]);
+}
+
+static int nicvf_alloc_resources(struct nicvf *nic)
+{
+	int qidx;
+	struct queue_set *qs = nic->qs;
+
+	/* Alloc receive buffer descriptor ring */
+	for (qidx = 0; qidx < qs->rbdr_cnt; qidx++) {
+		if (nicvf_init_rbdr(nic, &qs->rbdr[qidx], qs->rbdr_len,
+				    RCV_BUFFER_LEN))
+			goto alloc_fail;
+	}
+
+	/* Alloc send queue */
+	for (qidx = 0; qidx < qs->sq_cnt; qidx++) {
+		if (nicvf_init_snd_queue(nic, &qs->sq[qidx], qs->sq_len))
+			goto alloc_fail;
+	}
+
+	/* Alloc completion queue */
+	for (qidx = 0; qidx < qs->cq_cnt; qidx++) {
+		if (nicvf_init_cmp_queue(nic, &qs->cq[qidx], qs->cq_len))
+			goto alloc_fail;
+	}
+
+	return 0;
+alloc_fail:
+	nicvf_free_resources(nic);
+	return -ENOMEM;
+}
+
+int nicvf_set_qset_resources(struct nicvf *nic)
+{
+	struct queue_set *qs;
+
+	qs = kzalloc(sizeof(*qs), GFP_ATOMIC);
+	if (!qs)
+		return -ENOMEM;
+	nic->qs = qs;
+
+	/* Set count of each queue */
+	qs->rbdr_cnt = RBDR_CNT;
+	qs->rq_cnt = RCV_QUEUE_CNT;
+	qs->sq_cnt = SND_QUEUE_CNT;
+	qs->cq_cnt = CMP_QUEUE_CNT;
+
+	/* Set queue lengths */
+	qs->rbdr_len = RCV_BUF_COUNT;
+	qs->sq_len = SND_QUEUE_LEN;
+	qs->cq_len = CMP_QUEUE_LEN;
+	return 0;
+}
+
+int nicvf_config_data_transfer(struct nicvf *nic, bool enable)
+{
+	bool disable = false;
+	struct queue_set *qs = nic->qs;
+	int qidx;
+
+	if (enable) {
+		qs->vnic_id = nic->vf_id;
+		nic->qs = qs;
+
+		if (nicvf_alloc_resources(nic))
+			return -ENOMEM;
+
+		for (qidx = 0; qidx < qs->rbdr_cnt; qidx++)
+			nicvf_rbdr_config(nic, qs, qidx, enable);
+		for (qidx = 0; qidx < qs->rq_cnt; qidx++)
+			nicvf_rcv_queue_config(nic, qs, qidx, enable);
+		for (qidx = 0; qidx < qs->sq_cnt; qidx++)
+			nicvf_snd_queue_config(nic, qs, qidx, enable);
+		for (qidx = 0; qidx < qs->cq_cnt; qidx++)
+			nicvf_cmp_queue_config(nic, qs, qidx, enable);
+
+	} else {
+		qs = nic->qs;
+		if (!qs)
+			return 0;
+
+		for (qidx = 0; qidx < qs->rbdr_cnt; qidx++)
+			nicvf_rbdr_config(nic, qs, qidx, disable);
+		for (qidx = 0; qidx < qs->rq_cnt; qidx++)
+			nicvf_rcv_queue_config(nic, qs, qidx, disable);
+		for (qidx = 0; qidx < qs->sq_cnt; qidx++)
+			nicvf_snd_queue_config(nic, qs, qidx, disable);
+		for (qidx = 0; qidx < qs->cq_cnt; qidx++)
+			nicvf_cmp_queue_config(nic, qs, qidx, disable);
+
+		nicvf_free_resources(nic);
+	}
+
+	return 0;
+}
+
+/* Get a free desc from send queue
+ * @qs:   Qset from which to get a SQ descriptor
+ * @qnum: SQ number (0...7) in the Qset
+ *
+ * returns descriptor ponter & descriptor number
+ */
+static int nicvf_get_sq_desc(struct queue_set *qs, int qnum, void **desc)
+{
+	int qentry;
+	struct snd_queue *sq = &qs->sq[qnum];
+
+	if (!sq->free_cnt)
+		return 0;
+
+	qentry = sq->tail++;
+	sq->free_cnt--;
+	sq->tail &= (sq->dmem.q_len - 1);
+	*desc = GET_SQ_DESC(sq, qentry);
+	return qentry;
+}
+
+void nicvf_put_sq_desc(struct snd_queue *sq, int desc_cnt)
+{
+	while (desc_cnt--) {
+		sq->free_cnt++;
+		sq->head++;
+		sq->head &= (sq->dmem.q_len - 1);
+	}
+}
+
+void nicvf_sq_enable(struct nicvf *nic, struct snd_queue *sq, int qidx)
+{
+	uint64_t sq_cfg;
+
+	sq_cfg = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_CFG, qidx);
+	sq_cfg |= NICVF_SQ_EN;
+	nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_CFG, qidx, sq_cfg);
+	/* Ring doorbell so that H/W restarts processing SQEs */
+	nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_DOOR, qidx, 0);
+}
+
+void nicvf_sq_disable(struct nicvf *nic, int qidx)
+{
+	uint64_t sq_cfg;
+
+	sq_cfg = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_CFG, qidx);
+	sq_cfg &= ~NICVF_SQ_EN;
+	nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_CFG, qidx, sq_cfg);
+}
+
+void nicvf_sq_free_used_descs(struct net_device *netdev, struct snd_queue *sq,
+			      int qidx)
+{
+	uint64_t head, tail;
+	struct sk_buff *skb;
+	struct nicvf *nic = netdev_priv(netdev);
+	struct sq_hdr_subdesc *hdr;
+
+	head = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_HEAD, qidx) >> 4;
+	tail = nicvf_queue_reg_read(nic, NIC_QSET_SQ_0_7_TAIL, qidx) >> 4;
+	while (sq->head != head) {
+		hdr = (struct sq_hdr_subdesc *)GET_SQ_DESC(sq, sq->head);
+		if (hdr->subdesc_type != SQ_DESC_TYPE_HEADER) {
+			nicvf_put_sq_desc(sq, 1);
+			continue;
+		}
+		skb = (struct sk_buff *)sq->skbuff[sq->head];
+		atomic64_add(1, (atomic64_t *)&netdev->stats.tx_packets);
+		atomic64_add(hdr->tot_len,
+			     (atomic64_t *)&netdev->stats.tx_bytes);
+		nicvf_free_skb(nic, skb);
+		nicvf_put_sq_desc(sq, hdr->subdesc_cnt + 1);
+	}
+}
+
+static int nicvf_sq_subdesc_required(struct nicvf *nic, struct sk_buff *skb)
+{
+	int subdesc_cnt = MIN_SND_QUEUE_DESC_FOR_PKT_XMIT;
+
+	if (skb_shinfo(skb)->nr_frags)
+		subdesc_cnt += skb_shinfo(skb)->nr_frags;
+
+#ifdef VNIC_TX_CSUM_OFFLOAD_SUPPORT
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		if (skb->protocol == htons(ETH_P_IP))
+			subdesc_cnt++;
+		if ((ip_hdr(skb)->protocol == IPPROTO_TCP) ||
+		    (ip_hdr(skb)->protocol == IPPROTO_UDP))
+			subdesc_cnt++;
+	}
+#endif
+
+	return subdesc_cnt;
+}
+
+/* Add SQ HEADER subdescriptor.
+ * First subdescriptor for every send descriptor.
+ */
+struct sq_hdr_subdesc *
+nicvf_sq_add_hdr_subdesc(struct queue_set *qs, int sq_num,
+			 int subdesc_cnt, struct sk_buff *skb)
+{
+	int qentry;
+	void *desc;
+	struct snd_queue *sq;
+	struct sq_hdr_subdesc *hdr;
+
+	sq = &qs->sq[sq_num];
+	qentry = nicvf_get_sq_desc(qs, sq_num, &desc);
+	sq->skbuff[qentry] = (uint64_t)skb;
+
+	hdr = (struct sq_hdr_subdesc *)desc;
+
+	memset(hdr, 0, SND_QUEUE_DESC_SIZE);
+	hdr->subdesc_type = SQ_DESC_TYPE_HEADER;
+	hdr->post_cqe = 1;
+	hdr->subdesc_cnt = subdesc_cnt;
+	hdr->tot_len = skb->len;
+
+#ifdef VNIC_HW_TSO_SUPPORT
+	if (!skb_shinfo(skb)->gso_size)
+		return hdr;
+
+	/* Packet to be subjected to TSO */
+	hdr->tso = 1;
+	hdr->tso_l4_offset = (int)(skb_transport_header(skb) - skb->data) +
+				tcp_hdrlen(skb);
+	hdr->tso_max_paysize = skb_shinfo(skb)->gso_size + hdr->tso_l4_offset;
+	/* TBD: These fields have to be setup properly */
+	hdr->tso_sdc_first	= 0;
+	hdr->tso_sdc_cont	= 0;
+	hdr->tso_flags_first	= 0;
+	hdr->tso_flags_last	= 0;
+#endif
+	return hdr;
+}
+
+/* SQ GATHER subdescriptor
+ * Must follow HDR descriptor
+ */
+static void nicvf_sq_add_gather_subdesc(struct nicvf *nic, struct queue_set *qs,
+					int sq_num, struct sk_buff *skb)
+{
+	int i;
+	void *desc;
+	struct sq_gather_subdesc *gather;
+
+	nicvf_get_sq_desc(qs, sq_num, &desc);
+	gather = (struct sq_gather_subdesc *)desc;
+
+	memset(gather, 0, SND_QUEUE_DESC_SIZE);
+	gather->subdesc_type = SQ_DESC_TYPE_GATHER;
+	gather->ld_type = NIC_SEND_LD_TYPE_E_LDD;
+	gather->size = skb_is_nonlinear(skb) ? skb_headlen(skb) : skb->len;
+	gather->addr = pci_map_single(nic->pdev, skb->data,
+				gather->size, PCI_DMA_TODEVICE);
+
+	if (!skb_is_nonlinear(skb))
+		return;
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		const struct skb_frag_struct *frag;
+
+		frag = &skb_shinfo(skb)->frags[i];
+
+		nicvf_get_sq_desc(qs, sq_num, &desc);
+		gather = (struct sq_gather_subdesc *)desc;
+
+		memset(gather, 0, SND_QUEUE_DESC_SIZE);
+		gather->subdesc_type = SQ_DESC_TYPE_GATHER;
+		gather->ld_type = NIC_SEND_LD_TYPE_E_LDD;
+		gather->size = skb_frag_size(frag);
+		gather->addr = pci_map_single(nic->pdev, skb_frag_address(frag),
+						gather->size, PCI_DMA_TODEVICE);
+	}
+}
+
+#ifdef VNIC_TX_CSUM_OFFLOAD_SUPPORT
+static void nicvf_fill_l3_crc_subdesc(struct sq_crc_subdesc *l3,
+				      struct sk_buff *skb)
+{
+	int crc_pos;
+
+	crc_pos = skb_network_header(skb) - skb_mac_header(skb);
+	crc_pos += offsetof(struct iphdr, check);
+
+	l3->subdesc_type = SQ_DESC_TYPE_CRC;
+	l3->crc_alg = SEND_CRCALG_CRC32;
+	l3->crc_insert_pos = crc_pos;
+	l3->hdr_start = skb_network_offset(skb);
+	l3->crc_len = skb_transport_header(skb) - skb_network_header(skb);
+	l3->crc_ival = 0;
+}
+
+static void nicvf_fill_l4_crc_subdesc(struct sq_crc_subdesc *l4,
+				      struct sk_buff *skb)
+{
+	l4->subdesc_type = SQ_DESC_TYPE_CRC;
+	l4->crc_alg = SEND_CRCALG_CRC32;
+	l4->crc_insert_pos = skb->csum_start + skb->csum_offset;
+	l4->hdr_start = skb->csum_start;
+	l4->crc_len = skb->len - skb_transport_offset(skb);
+	l4->crc_ival = 0;
+}
+
+/* SQ CRC subdescriptor
+ * Must follow HDR and precede GATHER, IMM subdescriptors
+ */
+static void nicvf_sq_add_crc_subdesc(struct nicvf *nic, struct queue_set *qs,
+				     int sq_num, struct sk_buff *skb)
+{
+	int proto;
+	void *desc;
+	struct sq_crc_subdesc *crc;
+	struct snd_queue *sq;
+
+	if (skb->ip_summed != CHECKSUM_PARTIAL)
+		return;
+
+	if (skb->protocol != htons(ETH_P_IP))
+		return;
+
+	sq = &qs->sq[sq_num];
+	nicvf_get_sq_desc(qs, sq_num, &desc);
+
+	crc = (struct sq_crc_subdesc *)desc;
+
+	nicvf_fill_l3_crc_subdesc(crc, skb);
+
+	proto = ip_hdr(skb)->protocol;
+	if ((proto == IPPROTO_TCP) || (proto == IPPROTO_UDP)) {
+		nicvf_get_sq_desc(qs, sq_num, &desc);
+		crc = (struct sq_crc_subdesc *)desc;
+		nicvf_fill_l4_crc_subdesc(crc, skb);
+	}
+}
+#endif
+
+/* Append an skb to a SQ for packet transfer. */
+int nicvf_sq_append_skb(struct nicvf *nic, struct sk_buff *skb)
+{
+	int subdesc_cnt;
+	int sq_num;
+	struct queue_set *qs = nic->qs;
+	struct snd_queue *sq;
+	struct sq_hdr_subdesc *hdr_desc;
+
+	sq_num = skb_get_queue_mapping(skb);
+	sq = &qs->sq[sq_num];
+
+	subdesc_cnt = nicvf_sq_subdesc_required(nic, skb);
+
+	if (subdesc_cnt > sq->free_cnt)
+		goto append_fail;
+
+	/* Add SQ header subdesc */
+	hdr_desc = nicvf_sq_add_hdr_subdesc(qs, sq_num, subdesc_cnt - 1, skb);
+
+#ifdef VNIC_TX_CSUM_OFFLOAD_SUPPORT
+	/* Add CRC subdescriptor for IP/TCP/UDP (L3/L4) crc calculation */
+	if (skb->ip_summed == CHECKSUM_PARTIAL)
+		nicvf_sq_add_crc_subdesc(nic, qs, sq_num, skb);
+#endif
+
+	/* Add SQ gather subdesc */
+	nicvf_sq_add_gather_subdesc(nic, qs, sq_num, skb);
+
+	/* Inform HW to xmit new packet */
+	nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_DOOR,
+			      sq_num, subdesc_cnt);
+	return 1;
+
+append_fail:
+	nic_dbg(&nic->pdev->dev, "Not enough SQ descriptors to xmit pkt\n");
+	return 0;
+}
+
+static unsigned frag_num(unsigned i)
+{
+#ifdef __BIG_ENDIAN
+	return (i & ~3) + 3 - (i & 3);
+#else
+	return i;
+#endif
+}
+
+struct sk_buff *nicvf_get_rcv_skb(struct nicvf *nic, void *cq_desc)
+{
+	int frag;
+	int payload_len = 0;
+	struct sk_buff *skb = NULL;
+	struct sk_buff *skb_frag = NULL;
+	struct sk_buff *prev_frag = NULL;
+	struct cqe_rx_t *cqe_rx;
+	struct rbdr *rbdr;
+	struct rcv_queue *rq;
+	struct queue_set *qs = nic->qs;
+	uint16_t *rb_lens = NULL;
+	uint64_t *rb_ptrs = NULL;
+
+	cqe_rx = (struct cqe_rx_t *)cq_desc;
+
+	rq = &qs->rq[cqe_rx->rq_idx];
+	rbdr = &qs->rbdr[rq->start_qs_rbdr_idx];
+	rb_lens = cq_desc + (3 * sizeof(uint64_t)); /* Use offsetof */
+	rb_ptrs = cq_desc + (6 * sizeof(uint64_t));
+	nic_dbg(&nic->pdev->dev, "%s rb_cnt %d rb0_ptr %llx rb0_sz %d\n",
+		__func__, cqe_rx->rb_cnt, cqe_rx->rb0_ptr, cqe_rx->rb0_sz);
+
+	for (frag = 0; frag < cqe_rx->rb_cnt; frag++) {
+		payload_len = rb_lens[frag_num(frag)];
+		if (!frag) {
+			/* First fragment */
+			pci_unmap_single(nic->pdev, (dma_addr_t)(*rb_ptrs),
+					 rbdr->buf_size, PCI_DMA_FROMDEVICE);
+			skb = nicvf_rb_ptr_to_skb(nic, *rb_ptrs);
+			if (cqe_rx->align_pad) {
+				skb->data += cqe_rx->align_pad;
+				skb->tail += cqe_rx->align_pad;
+			}
+			skb_put(skb, payload_len);
+		} else {
+			/* Add fragments */
+			pci_unmap_single(nic->pdev, (dma_addr_t)(*rb_ptrs),
+					 rbdr->buf_size, PCI_DMA_FROMDEVICE);
+			skb_frag = nicvf_rb_ptr_to_skb(nic, *rb_ptrs);
+
+			if (!skb_shinfo(skb)->frag_list)
+				skb_shinfo(skb)->frag_list = skb_frag;
+			else
+				prev_frag->next = skb_frag;
+
+			prev_frag = skb_frag;
+			skb->len += payload_len;
+			skb->data_len += payload_len;
+			skb_frag->len = payload_len;
+		}
+		/* Next buffer pointer */
+		rb_ptrs++;
+	}
+	return skb;
+}
+
+/* Enable interrupt */
+void nicvf_enable_intr(struct nicvf *nic, int int_type, int q_idx)
+{
+	uint64_t reg_val;
+
+	reg_val = nicvf_reg_read(nic, NIC_VF_ENA_W1S);
+
+	switch (int_type) {
+	case NICVF_INTR_CQ:
+		reg_val |= ((1ULL << q_idx) << NICVF_INTR_CQ_SHIFT);
+	break;
+	case NICVF_INTR_SQ:
+		reg_val |= ((1ULL << q_idx) << NICVF_INTR_SQ_SHIFT);
+	break;
+	case NICVF_INTR_RBDR:
+		reg_val |= ((1ULL << q_idx) << NICVF_INTR_RBDR_SHIFT);
+	break;
+	case NICVF_INTR_PKT_DROP:
+		reg_val |= (1ULL << NICVF_INTR_PKT_DROP_SHIFT);
+	break;
+	case NICVF_INTR_TCP_TIMER:
+		reg_val |= (1ULL << NICVF_INTR_TCP_TIMER_SHIFT);
+	break;
+	case NICVF_INTR_MBOX:
+		reg_val |= (1ULL << NICVF_INTR_MBOX_SHIFT);
+	break;
+	case NICVF_INTR_QS_ERR:
+		reg_val |= (1ULL << NICVF_INTR_QS_ERR_SHIFT);
+	break;
+	default:
+		netdev_err(nic->netdev,
+			   "Failed to enable interrupt: unknown type\n");
+	break;
+	}
+
+	nicvf_reg_write(nic, NIC_VF_ENA_W1S, reg_val);
+}
+
+/* Disable interrupt */
+void nicvf_disable_intr(struct nicvf *nic, int int_type, int q_idx)
+{
+	uint64_t reg_val = 0;
+
+	switch (int_type) {
+	case NICVF_INTR_CQ:
+		reg_val |= ((1ULL << q_idx) << NICVF_INTR_CQ_SHIFT);
+	break;
+	case NICVF_INTR_SQ:
+		reg_val |= ((1ULL << q_idx) << NICVF_INTR_SQ_SHIFT);
+	break;
+	case NICVF_INTR_RBDR:
+		reg_val |= ((1ULL << q_idx) << NICVF_INTR_RBDR_SHIFT);
+	break;
+	case NICVF_INTR_PKT_DROP:
+		reg_val |= (1ULL << NICVF_INTR_PKT_DROP_SHIFT);
+	break;
+	case NICVF_INTR_TCP_TIMER:
+		reg_val |= (1ULL << NICVF_INTR_TCP_TIMER_SHIFT);
+	break;
+	case NICVF_INTR_MBOX:
+		reg_val |= (1ULL << NICVF_INTR_MBOX_SHIFT);
+	break;
+	case NICVF_INTR_QS_ERR:
+		reg_val |= (1ULL << NICVF_INTR_QS_ERR_SHIFT);
+	break;
+	default:
+		netdev_err(nic->netdev,
+			   "Failed to disable interrupt: unknown type\n");
+	break;
+	}
+
+	nicvf_reg_write(nic, NIC_VF_ENA_W1C, reg_val);
+}
+
+/* Clear interrupt */
+void nicvf_clear_intr(struct nicvf *nic, int int_type, int q_idx)
+{
+	uint64_t reg_val = 0;
+
+	switch (int_type) {
+	case NICVF_INTR_CQ:
+		reg_val = ((1ULL << q_idx) << NICVF_INTR_CQ_SHIFT);
+	break;
+	case NICVF_INTR_SQ:
+		reg_val = ((1ULL << q_idx) << NICVF_INTR_SQ_SHIFT);
+	break;
+	case NICVF_INTR_RBDR:
+		reg_val = ((1ULL << q_idx) << NICVF_INTR_RBDR_SHIFT);
+	break;
+	case NICVF_INTR_PKT_DROP:
+		reg_val = (1ULL << NICVF_INTR_PKT_DROP_SHIFT);
+	break;
+	case NICVF_INTR_TCP_TIMER:
+		reg_val = (1ULL << NICVF_INTR_TCP_TIMER_SHIFT);
+	break;
+	case NICVF_INTR_MBOX:
+		reg_val = (1ULL << NICVF_INTR_MBOX_SHIFT);
+	break;
+	case NICVF_INTR_QS_ERR:
+		reg_val |= (1ULL << NICVF_INTR_QS_ERR_SHIFT);
+	break;
+	default:
+		netdev_err(nic->netdev,
+			   "Failed to clear interrupt: unknown type\n");
+	break;
+	}
+
+	nicvf_reg_write(nic, NIC_VF_INT, reg_val);
+}
+
+/* Check if interrupt is enabled */
+int nicvf_is_intr_enabled(struct nicvf *nic, int int_type, int q_idx)
+{
+	uint64_t reg_val;
+	uint64_t mask = 0xff;
+
+	reg_val = nicvf_reg_read(nic, NIC_VF_ENA_W1S);
+
+	switch (int_type) {
+	case NICVF_INTR_CQ:
+		mask = ((1ULL << q_idx) << NICVF_INTR_CQ_SHIFT);
+	break;
+	case NICVF_INTR_SQ:
+		mask = ((1ULL << q_idx) << NICVF_INTR_SQ_SHIFT);
+	break;
+	case NICVF_INTR_RBDR:
+		mask = ((1ULL << q_idx) << NICVF_INTR_RBDR_SHIFT);
+	break;
+	case NICVF_INTR_PKT_DROP:
+		mask = NICVF_INTR_PKT_DROP_MASK;
+	break;
+	case NICVF_INTR_TCP_TIMER:
+		mask = NICVF_INTR_TCP_TIMER_MASK;
+	break;
+	case NICVF_INTR_MBOX:
+		mask = NICVF_INTR_MBOX_MASK;
+	break;
+	case NICVF_INTR_QS_ERR:
+		mask = NICVF_INTR_QS_ERR_MASK;
+	break;
+	default:
+		netdev_err(nic->netdev,
+			   "Failed to check interrupt enable: unknown type\n");
+	break;
+	}
+
+	return (reg_val & mask);
+}
+
+void nicvf_update_rq_stats(struct nicvf *nic, int rq_idx)
+{
+	struct rcv_queue *rq;
+
+#define GET_RQ_STATS(reg) \
+	nicvf_reg_read(nic, NIC_QSET_RQ_0_7_STAT_0_1 |\
+			    (rq_idx << NIC_Q_NUM_SHIFT) | (reg << 3))
+
+	rq = &nic->qs->rq[rq_idx];
+	rq->stats.bytes = GET_RQ_STATS(RQ_SQ_STATS_OCTS);
+	rq->stats.pkts = GET_RQ_STATS(RQ_SQ_STATS_PKTS);
+}
+
+void nicvf_update_sq_stats(struct nicvf *nic, int sq_idx)
+{
+	struct snd_queue *sq;
+
+#define GET_SQ_STATS(reg) \
+	nicvf_reg_read(nic, NIC_QSET_SQ_0_7_STAT_0_1 |\
+			    (sq_idx << NIC_Q_NUM_SHIFT) | (reg << 3))
+
+	sq = &nic->qs->sq[sq_idx];
+	sq->stats.bytes = GET_SQ_STATS(RQ_SQ_STATS_OCTS);
+	sq->stats.pkts = GET_SQ_STATS(RQ_SQ_STATS_PKTS);
+}
+
+/* Check for errors in the receive cmp.queue entry */
+int nicvf_check_cqe_rx_errs(struct nicvf *nic,
+			    struct cmp_queue *cq, void *cq_desc)
+{
+	struct cqe_rx_t *cqe_rx;
+	struct cmp_queue_stats *stats = &cq->stats;
+
+	cqe_rx = (struct cqe_rx_t *)cq_desc;
+	if (!cqe_rx->err_level && !cqe_rx->err_opcode) {
+		stats->rx.errop.good++;
+		return 0;
+	}
+
+	switch (cqe_rx->err_level) {
+	case CQ_ERRLVL_MAC:
+		stats->rx.errlvl.mac_errs++;
+	break;
+	case CQ_ERRLVL_L2:
+		stats->rx.errlvl.l2_errs++;
+	break;
+	case CQ_ERRLVL_L3:
+		stats->rx.errlvl.l3_errs++;
+	break;
+	case CQ_ERRLVL_L4:
+		stats->rx.errlvl.l4_errs++;
+	break;
+	}
+
+	switch (cqe_rx->err_opcode) {
+	case CQ_RX_ERROP_RE_PARTIAL:
+		stats->rx.errop.partial_pkts++;
+	break;
+	case CQ_RX_ERROP_RE_JABBER:
+		stats->rx.errop.jabber_errs++;
+	break;
+	case CQ_RX_ERROP_RE_FCS:
+		stats->rx.errop.fcs_errs++;
+	break;
+	case CQ_RX_ERROP_RE_TERMINATE:
+		stats->rx.errop.terminate_errs++;
+	break;
+	case CQ_RX_ERROP_RE_RX_CTL:
+		stats->rx.errop.bgx_rx_errs++;
+	break;
+	case CQ_RX_ERROP_PREL2_ERR:
+		stats->rx.errop.prel2_errs++;
+	break;
+	case CQ_RX_ERROP_L2_FRAGMENT:
+		stats->rx.errop.l2_frags++;
+	break;
+	case CQ_RX_ERROP_L2_OVERRUN:
+		stats->rx.errop.l2_overruns++;
+	break;
+	case CQ_RX_ERROP_L2_PFCS:
+		stats->rx.errop.l2_pfcs++;
+	break;
+	case CQ_RX_ERROP_L2_PUNY:
+		stats->rx.errop.l2_puny++;
+	break;
+	case CQ_RX_ERROP_L2_MAL:
+		stats->rx.errop.l2_hdr_malformed++;
+	break;
+	case CQ_RX_ERROP_L2_OVERSIZE:
+		stats->rx.errop.l2_oversize++;
+	break;
+	case CQ_RX_ERROP_L2_UNDERSIZE:
+		stats->rx.errop.l2_undersize++;
+	break;
+	case CQ_RX_ERROP_L2_LENMISM:
+		stats->rx.errop.l2_len_mismatch++;
+	break;
+	case CQ_RX_ERROP_L2_PCLP:
+		stats->rx.errop.l2_pclp++;
+	break;
+	case CQ_RX_ERROP_IP_NOT:
+		stats->rx.errop.non_ip++;
+	break;
+	case CQ_RX_ERROP_IP_CSUM_ERR:
+		stats->rx.errop.ip_csum_err++;
+	break;
+	case CQ_RX_ERROP_IP_MAL:
+		stats->rx.errop.ip_hdr_malformed++;
+	break;
+	case CQ_RX_ERROP_IP_MALD:
+		stats->rx.errop.ip_payload_malformed++;
+	break;
+	case CQ_RX_ERROP_IP_HOP:
+		stats->rx.errop.ip_hop_errs++;
+	break;
+	case CQ_RX_ERROP_L3_ICRC:
+		stats->rx.errop.l3_icrc_errs++;
+	break;
+	case CQ_RX_ERROP_L3_PCLP:
+		stats->rx.errop.l3_pclp++;
+	break;
+	case CQ_RX_ERROP_L4_MAL:
+		stats->rx.errop.l4_malformed++;
+	break;
+	case CQ_RX_ERROP_L4_CHK:
+		stats->rx.errop.l4_csum_errs++;
+	break;
+	case CQ_RX_ERROP_UDP_LEN:
+		stats->rx.errop.udp_len_err++;
+	break;
+	case CQ_RX_ERROP_L4_PORT:
+		stats->rx.errop.bad_l4_port++;
+	break;
+	case CQ_RX_ERROP_TCP_FLAG:
+		stats->rx.errop.bad_tcp_flag++;
+	break;
+	case CQ_RX_ERROP_TCP_OFFSET:
+		stats->rx.errop.tcp_offset_errs++;
+	break;
+	case CQ_RX_ERROP_L4_PCLP:
+		stats->rx.errop.l4_pclp++;
+	break;
+	case CQ_RX_ERROP_RBDR_TRUNC:
+		stats->rx.errop.pkt_truncated++;
+	break;
+	}
+
+	return 1;
+}
+
+/* Check for errors in the send cmp.queue entry */
+int nicvf_check_cqe_tx_errs(struct nicvf *nic,
+			    struct cmp_queue *cq, void *cq_desc)
+{
+	struct cqe_send_t *cqe_tx;
+	struct cmp_queue_stats *stats = &cq->stats;
+
+	cqe_tx = (struct cqe_send_t *)cq_desc;
+	switch (cqe_tx->send_status) {
+	case CQ_TX_ERROP_GOOD:
+		stats->tx.good++;
+		return 0;
+	break;
+	case CQ_TX_ERROP_DESC_FAULT:
+		stats->tx.desc_fault++;
+	break;
+	case CQ_TX_ERROP_HDR_CONS_ERR:
+		stats->tx.hdr_cons_err++;
+	break;
+	case CQ_TX_ERROP_SUBDC_ERR:
+		stats->tx.subdesc_err++;
+	break;
+	case CQ_TX_ERROP_IMM_SIZE_OFLOW:
+		stats->tx.imm_size_oflow++;
+	break;
+	case CQ_TX_ERROP_DATA_SEQUENCE_ERR:
+		stats->tx.data_seq_err++;
+	break;
+	case CQ_TX_ERROP_MEM_SEQUENCE_ERR:
+		stats->tx.mem_seq_err++;
+	break;
+	case CQ_TX_ERROP_LOCK_VIOL:
+		stats->tx.lock_viol++;
+	break;
+	case CQ_TX_ERROP_DATA_FAULT:
+		stats->tx.data_fault++;
+	break;
+	case CQ_TX_ERROP_TSTMP_CONFLICT:
+		stats->tx.tstmp_conflict++;
+	break;
+	case CQ_TX_ERROP_TSTMP_TIMEOUT:
+		stats->tx.tstmp_timeout++;
+	break;
+	case CQ_TX_ERROP_MEM_FAULT:
+		stats->tx.mem_fault++;
+	break;
+	case CQ_TX_ERROP_CK_OVERLAP:
+		stats->tx.csum_overlap++;
+	break;
+	case CQ_TX_ERROP_CK_OFLOW:
+		stats->tx.csum_overflow++;
+	break;
+	}
+
+	return 1;
+}
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
new file mode 100644
index 000000000000..f3383492d114
--- /dev/null
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
@@ -0,0 +1,355 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2014 Cavium, Inc.
+ */
+
+#ifndef NICVF_QUEUES_H
+#define NICVF_QUEUES_H
+
+#include "q_struct.h"
+
+#define MAX_QUEUE_SET			128
+#define MAX_RCV_QUEUES_PER_QS		8
+#define MAX_RCV_BUF_DESC_RINGS_PER_QS	2
+#define MAX_SND_QUEUES_PER_QS		8
+#define MAX_CMP_QUEUES_PER_QS		8
+
+/* VF's queue interrupt ranges */
+#define	NICVF_INTR_ID_CQ		0
+#define	NICVF_INTR_ID_SQ		8
+#define	NICVF_INTR_ID_RBDR		16
+#define	NICVF_INTR_ID_MISC		18
+#define	NICVF_INTR_ID_QS_ERR		19
+
+#define	for_each_cq_irq(irq)	\
+	for (irq = NICVF_INTR_ID_CQ; irq < NICVF_INTR_ID_SQ; irq++)
+#define	for_each_sq_irq(irq)	\
+	for (irq = NICVF_INTR_ID_SQ; irq < NICVF_INTR_ID_RBDR; irq++)
+#define	for_each_rbdr_irq(irq)	\
+	for (irq = NICVF_INTR_ID_RBDR; irq < NICVF_INTR_ID_MISC; irq++)
+
+#define RBDR_SIZE0		0ULL /* 8K entries */
+#define RBDR_SIZE1		1ULL /* 16K entries */
+#define RBDR_SIZE2		2ULL /* 32K entries */
+#define RBDR_SIZE3		3ULL /* 64K entries */
+#define RBDR_SIZE4		4ULL /* 126K entries */
+#define RBDR_SIZE5		5ULL /* 256K entries */
+#define RBDR_SIZE6		6ULL /* 512K entries */
+
+#define SND_QUEUE_SIZE0		0ULL /* 1K entries */
+#define SND_QUEUE_SIZE1		1ULL /* 2K entries */
+#define SND_QUEUE_SIZE2		2ULL /* 4K entries */
+#define SND_QUEUE_SIZE3		3ULL /* 8K entries */
+#define SND_QUEUE_SIZE4		4ULL /* 16K entries */
+#define SND_QUEUE_SIZE5		5ULL /* 32K entries */
+#define SND_QUEUE_SIZE6		6ULL /* 64K entries */
+
+#define CMP_QUEUE_SIZE0		0ULL /* 1K entries */
+#define CMP_QUEUE_SIZE1		1ULL /* 2K entries */
+#define CMP_QUEUE_SIZE2		2ULL /* 4K entries */
+#define CMP_QUEUE_SIZE3		3ULL /* 8K entries */
+#define CMP_QUEUE_SIZE4		4ULL /* 16K entries */
+#define CMP_QUEUE_SIZE5		5ULL /* 32K entries */
+#define CMP_QUEUE_SIZE6		6ULL /* 64K entries */
+
+/* Default queue count per QS, its lengths and threshold values */
+#define RBDR_CNT		1
+#define RCV_QUEUE_CNT		1
+#define SND_QUEUE_CNT		8
+#define CMP_QUEUE_CNT		8 /* Max of RCV and SND qcount */
+
+#define SND_QUEUE_LEN		(1ULL << (SND_QUEUE_SIZE0 + 10))
+#define SND_QUEUE_THRESH	2ULL
+
+#define CMP_QUEUE_LEN		(1ULL << (CMP_QUEUE_SIZE1 + 10))
+#define CMP_QUEUE_CQE_THRESH	10
+#define CMP_QUEUE_TIMER_THRESH	1000 /* 1 ms */
+
+#define RCV_BUF_COUNT		(1ULL << (RBDR_SIZE0 + 13))
+#define RBDR_THRESH		(RCV_BUF_COUNT / 2)
+#define RCV_BUFFER_LEN		2048 /* In multiples of 128bytes */
+#define RQ_CQ_DROP		((CMP_QUEUE_LEN - SND_QUEUE_LEN) / 256)
+
+/* Descriptor size */
+#define SND_QUEUE_DESC_SIZE	16   /* 128 bits */
+#define CMP_QUEUE_DESC_SIZE	512
+
+/* Buffer / descriptor alignments */
+#define NICVF_RCV_BUF_ALIGN		7
+#define NICVF_RCV_BUF_ALIGN_BYTES	(1ULL << NICVF_RCV_BUF_ALIGN)
+#define NICVF_CQ_BASE_ALIGN_BYTES	512  /* 9 bits */
+#define NICVF_SQ_BASE_ALIGN_BYTES	128  /* 7 bits */
+
+#define NICVF_ALIGNED_ADDR(ADDR, ALIGN_BYTES)	ALIGN(ADDR, ALIGN_BYTES)
+#define NICVF_ADDR_ALIGN_LEN(ADDR, BYTES)\
+	(NICVF_ALIGNED_ADDR(ADDR, BYTES) - BYTES)
+#define NICVF_RCV_BUF_ALIGN_LEN(X)\
+	(NICVF_ALIGNED_ADDR(X, NICVF_RCV_BUF_ALIGN_BYTES) - X)
+
+/* Queue enable/disable */
+#define NICVF_SQ_EN            (1ULL << 19)
+
+/* Queue reset */
+#define NICVF_CQ_RESET		(1ULL << 41)
+#define NICVF_SQ_RESET		(1ULL << 17)
+#define NICVF_RBDR_RESET	(1ULL << 43)
+
+enum CQ_RX_ERRLVL_E {
+	CQ_ERRLVL_MAC,
+	CQ_ERRLVL_L2,
+	CQ_ERRLVL_L3,
+	CQ_ERRLVL_L4,
+};
+
+enum CQ_RX_ERROP_E {
+	CQ_RX_ERROP_RE_NONE = 0x0,
+	CQ_RX_ERROP_RE_PARTIAL = 0x1,
+	CQ_RX_ERROP_RE_JABBER = 0x2,
+	CQ_RX_ERROP_RE_FCS = 0x7,
+	CQ_RX_ERROP_RE_TERMINATE = 0x9,
+	CQ_RX_ERROP_RE_RX_CTL = 0xb,
+	CQ_RX_ERROP_PREL2_ERR = 0x1f,
+	CQ_RX_ERROP_L2_FRAGMENT = 0x20,
+	CQ_RX_ERROP_L2_OVERRUN = 0x21,
+	CQ_RX_ERROP_L2_PFCS = 0x22,
+	CQ_RX_ERROP_L2_PUNY = 0x23,
+	CQ_RX_ERROP_L2_MAL = 0x24,
+	CQ_RX_ERROP_L2_OVERSIZE = 0x25,
+	CQ_RX_ERROP_L2_UNDERSIZE = 0x26,
+	CQ_RX_ERROP_L2_LENMISM = 0x27,
+	CQ_RX_ERROP_L2_PCLP = 0x28,
+	CQ_RX_ERROP_IP_NOT = 0x41,
+	CQ_RX_ERROP_IP_CSUM_ERR = 0x42,
+	CQ_RX_ERROP_IP_MAL = 0x43,
+	CQ_RX_ERROP_IP_MALD = 0x44,
+	CQ_RX_ERROP_IP_HOP = 0x45,
+	CQ_RX_ERROP_L3_ICRC = 0x46,
+	CQ_RX_ERROP_L3_PCLP = 0x47,
+	CQ_RX_ERROP_L4_MAL = 0x61,
+	CQ_RX_ERROP_L4_CHK = 0x62,
+	CQ_RX_ERROP_UDP_LEN = 0x63,
+	CQ_RX_ERROP_L4_PORT = 0x64,
+	CQ_RX_ERROP_TCP_FLAG = 0x65,
+	CQ_RX_ERROP_TCP_OFFSET = 0x66,
+	CQ_RX_ERROP_L4_PCLP = 0x67,
+	CQ_RX_ERROP_RBDR_TRUNC = 0x70,
+};
+
+enum CQ_TX_ERROP_E {
+	CQ_TX_ERROP_GOOD = 0x0,
+	CQ_TX_ERROP_DESC_FAULT = 0x10,
+	CQ_TX_ERROP_HDR_CONS_ERR = 0x11,
+	CQ_TX_ERROP_SUBDC_ERR = 0x12,
+	CQ_TX_ERROP_IMM_SIZE_OFLOW = 0x80,
+	CQ_TX_ERROP_DATA_SEQUENCE_ERR = 0x81,
+	CQ_TX_ERROP_MEM_SEQUENCE_ERR = 0x82,
+	CQ_TX_ERROP_LOCK_VIOL = 0x83,
+	CQ_TX_ERROP_DATA_FAULT = 0x84,
+	CQ_TX_ERROP_TSTMP_CONFLICT = 0x85,
+	CQ_TX_ERROP_TSTMP_TIMEOUT = 0x86,
+	CQ_TX_ERROP_MEM_FAULT = 0x87,
+	CQ_TX_ERROP_CK_OVERLAP = 0x88,
+	CQ_TX_ERROP_CK_OFLOW = 0x89,
+	CQ_TX_ERROP_ENUM_LAST = 0x8a,
+};
+
+struct cmp_queue_stats {
+	struct rx_stats {
+		struct {
+			u64 mac_errs;
+			u64 l2_errs;
+			u64 l3_errs;
+			u64 l4_errs;
+		} errlvl;
+		struct {
+			u64 good;
+			u64 partial_pkts;
+			u64 jabber_errs;
+			u64 fcs_errs;
+			u64 terminate_errs;
+			u64 bgx_rx_errs;
+			u64 prel2_errs;
+			u64 l2_frags;
+			u64 l2_overruns;
+			u64 l2_pfcs;
+			u64 l2_puny;
+			u64 l2_hdr_malformed;
+			u64 l2_oversize;
+			u64 l2_undersize;
+			u64 l2_len_mismatch;
+			u64 l2_pclp;
+			u64 non_ip;
+			u64 ip_csum_err;
+			u64 ip_hdr_malformed;
+			u64 ip_payload_malformed;
+			u64 ip_hop_errs;
+			u64 l3_icrc_errs;
+			u64 l3_pclp;
+			u64 l4_malformed;
+			u64 l4_csum_errs;
+			u64 udp_len_err;
+			u64 bad_l4_port;
+			u64 bad_tcp_flag;
+			u64 tcp_offset_errs;
+			u64 l4_pclp;
+			u64 pkt_truncated;
+		} errop;
+	} rx;
+	struct tx_stats {
+		u64 good;
+		u64 desc_fault;
+		u64 hdr_cons_err;
+		u64 subdesc_err;
+		u64 imm_size_oflow;
+		u64 data_seq_err;
+		u64 mem_seq_err;
+		u64 lock_viol;
+		u64 data_fault;
+		u64 tstmp_conflict;
+		u64 tstmp_timeout;
+		u64 mem_fault;
+		u64 csum_overlap;
+		u64 csum_overflow;
+	} tx;
+};
+
+enum RQ_SQ_STATS {
+	RQ_SQ_STATS_OCTS,
+	RQ_SQ_STATS_PKTS,
+};
+
+struct rx_tx_queue_stats {
+	u64	bytes;
+	u64	pkts;
+};
+
+struct q_desc_mem {
+	dma_addr_t	dma;
+	uint64_t	size;
+	uint16_t	q_len;
+	dma_addr_t	phys_base;
+	void		*base;
+	void		*unalign_base;
+};
+
+struct rbdr {
+	bool		enable;
+	uint32_t	buf_size;
+	uint32_t	thresh;      /* Threshold level for interrupt */
+	void		*desc;
+	struct q_desc_mem   dmem;
+};
+
+struct rcv_queue {
+	bool		enable;
+	struct	rbdr	*rbdr_start;
+	struct	rbdr	*rbdr_cont;
+	bool		en_tcp_reassembly;
+	uint8_t		cq_qs;  /* CQ's QS to which this RQ is assigned */
+	uint8_t		cq_idx; /* CQ index (0 to 7) in the QS */
+	uint8_t		cont_rbdr_qs;      /* Continue buffer ptrs - QS num */
+	uint8_t		cont_qs_rbdr_idx;  /* RBDR idx in the cont QS */
+	uint8_t		start_rbdr_qs;     /* First buffer ptrs - QS num */
+	uint8_t		start_qs_rbdr_idx; /* RBDR idx in the above QS */
+	uint8_t         caching;
+	struct		rx_tx_queue_stats stats;
+};
+
+struct cmp_queue {
+	bool		enable;
+	uint8_t		intr_timer_thresh;
+	uint16_t	thresh;
+	spinlock_t	cq_lock;  /* lock to serialize processing CQEs */
+	void		*desc;
+	struct q_desc_mem   dmem;
+	struct cmp_queue_stats	stats;
+};
+
+struct snd_queue {
+	bool		enable;
+	uint8_t		cq_qs;  /* CQ's QS to which this SQ is pointing */
+	uint8_t		cq_idx; /* CQ index (0 to 7) in the above QS */
+	uint16_t	thresh;
+	uint16_t	free_cnt;
+	uint64_t	head;
+	uint64_t	tail;
+	uint64_t	*skbuff;
+	void		*desc;
+	struct q_desc_mem   dmem;
+	struct rx_tx_queue_stats stats;
+};
+
+struct queue_set {
+	bool		enable;
+	bool		be_en;
+	uint8_t		vnic_id;
+	uint8_t		rq_cnt;
+	uint8_t		cq_cnt;
+	uint64_t	cq_len;
+	uint8_t		sq_cnt;
+	uint64_t	sq_len;
+	uint8_t		rbdr_cnt;
+	uint64_t	rbdr_len;
+	struct	rcv_queue	rq[MAX_RCV_QUEUES_PER_QS];
+	struct	cmp_queue	cq[MAX_CMP_QUEUES_PER_QS];
+	struct	snd_queue	sq[MAX_SND_QUEUES_PER_QS];
+	struct	rbdr		rbdr[MAX_RCV_BUF_DESC_RINGS_PER_QS];
+};
+
+#define GET_RBDR_DESC(RING, idx)\
+		(&(((struct rbdr_entry_t *)((RING)->desc))[idx]))
+#define GET_SQ_DESC(RING, idx)\
+		(&(((struct sq_hdr_subdesc *)((RING)->desc))[idx]))
+#define GET_CQ_DESC(RING, idx)\
+		(&(((union cq_desc_t *)((RING)->desc))[idx]))
+
+/* CQ status bits */
+#define	CQ_WR_FULL	(1 << 26)
+#define	CQ_WR_DISABLE	(1 << 25)
+#define	CQ_WR_FAULT	(1 << 24)
+#define	CQ_CQE_COUNT	(0xFFFF << 0)
+
+#define	CQ_ERR_MASK	(CQ_WR_FULL | CQ_WR_DISABLE | CQ_WR_FAULT)
+
+int nicvf_set_qset_resources(struct nicvf *nic);
+int nicvf_config_data_transfer(struct nicvf *nic, bool enable);
+void nicvf_qset_config(struct nicvf *nic, bool enable);
+void nicvf_cmp_queue_config(struct nicvf *nic, struct queue_set *qs,
+			    int qidx, bool enable);
+
+void nicvf_sq_enable(struct nicvf *nic, struct snd_queue *sq, int qidx);
+void nicvf_sq_disable(struct nicvf *nic, int qidx);
+void nicvf_put_sq_desc(struct snd_queue *sq, int desc_cnt);
+void nicvf_sq_free_used_descs(struct net_device *netdev,
+			      struct snd_queue *sq, int qidx);
+int nicvf_sq_append_skb(struct nicvf *nic, struct sk_buff *skb);
+
+struct sk_buff *nicvf_get_rcv_skb(struct nicvf *nic, void *cq_desc);
+void nicvf_refill_rbdr(unsigned long data);
+
+void nicvf_enable_intr(struct nicvf *nic, int int_type, int q_idx);
+void nicvf_disable_intr(struct nicvf *nic, int int_type, int q_idx);
+void nicvf_clear_intr(struct nicvf *nic, int int_type, int q_idx);
+int nicvf_is_intr_enabled(struct nicvf *nic, int int_type, int q_idx);
+
+/* Register access APIs */
+void nicvf_reg_write(struct nicvf *nic, uint64_t offset, uint64_t val);
+uint64_t nicvf_reg_read(struct nicvf *nic, uint64_t offset);
+void nicvf_qset_reg_write(struct nicvf *nic, uint64_t offset, uint64_t val);
+uint64_t nicvf_qset_reg_read(struct nicvf *nic, uint64_t offset);
+void nicvf_queue_reg_write(struct nicvf *nic, uint64_t offset,
+			   uint64_t qidx, uint64_t val);
+uint64_t nicvf_queue_reg_read(struct nicvf *nic,
+			      uint64_t offset, uint64_t qidx);
+
+/* Stats */
+void nicvf_update_rq_stats(struct nicvf *nic, int rq_idx);
+void nicvf_update_sq_stats(struct nicvf *nic, int sq_idx);
+int nicvf_check_cqe_rx_errs(struct nicvf *nic,
+			    struct cmp_queue *cq, void *cq_desc);
+int nicvf_check_cqe_tx_errs(struct nicvf *nic,
+			    struct cmp_queue *cq, void *cq_desc);
+#endif /* NICVF_QUEUES_H */
diff --git a/drivers/net/ethernet/cavium/thunder/q_struct.h b/drivers/net/ethernet/cavium/thunder/q_struct.h
new file mode 100644
index 000000000000..729276bd800a
--- /dev/null
+++ b/drivers/net/ethernet/cavium/thunder/q_struct.h
@@ -0,0 +1,702 @@
+/*
+ * This file contains HW queue descriptor formats, config register
+ * structures e.t.c
+ *
+ * Copyright (C) 2014 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ */
+
+#ifndef Q_STRUCT_H
+#define Q_STRUCT_H
+
+/* Load transaction types for reading segment bytes specified by
+ * NIC_SEND_GATHER_S[LD_TYPE].
+ */
+enum nic_send_ld_type_e {
+	NIC_SEND_LD_TYPE_E_LDD = 0x0,
+	NIC_SEND_LD_TYPE_E_LDT = 0x1,
+	NIC_SEND_LD_TYPE_E_LDWB = 0x2,
+	NIC_SEND_LD_TYPE_E_ENUM_LAST = 0x3,
+};
+
+enum ether_type_algorithm {
+	ETYPE_ALG_NONE = 0x0,
+	ETYPE_ALG_SKIP = 0x1,
+	ETYPE_ALG_ENDPARSE = 0x2,
+	ETYPE_ALG_VLAN = 0x3,
+	ETYPE_ALG_VLAN_STRIP = 0x4,
+};
+
+enum layer3_type {
+	L3TYPE_NONE = 0x00,
+	L3TYPE_GRH = 0x01,
+	L3TYPE_IPV4 = 0x04,
+	L3TYPE_IPV4_OPTIONS = 0x05,
+	L3TYPE_IPV6 = 0x06,
+	L3TYPE_IPV6_OPTIONS = 0x07,
+	L3TYPE_ET_STOP = 0x0D,
+	L3TYPE_OTHER = 0x0E,
+};
+
+enum layer4_type {
+	L4TYPE_NONE = 0x00,
+	L4TYPE_IPSEC_ESP = 0x01,
+	L4TYPE_IPFRAG = 0x02,
+	L4TYPE_IPCOMP = 0x03,
+	L4TYPE_TCP = 0x04,
+	L4TYPE_UDP = 0x05,
+	L4TYPE_SCTP = 0x06,
+	L4TYPE_GRE = 0x07,
+	L4TYPE_ROCE_BTH = 0x08,
+	L4TYPE_OTHER = 0x0E,
+};
+
+/* CPI and RSSI configuration */
+enum cpi_algorithm_type {
+	CPI_ALG_NONE = 0x0,
+	CPI_ALG_VLAN = 0x1,
+	CPI_ALG_VLAN16 = 0x2,
+	CPI_ALG_DIFF = 0x3,
+};
+
+enum rss_algorithm_type {
+	RSS_ALG_NONE = 0x00,
+	RSS_ALG_PORT = 0x01,
+	RSS_ALG_IP = 0x02,
+	RSS_ALG_TCP_IP = 0x03,
+	RSS_ALG_UDP_IP = 0x04,
+	RSS_ALG_SCTP_IP = 0x05,
+	RSS_ALG_GRE_IP = 0x06,
+	RSS_ALG_ROCE = 0x07,
+};
+
+enum rss_hash_cfg {
+	RSS_HASH_L2ETC = 0x00,
+	RSS_HASH_IP = 0x01,
+	RSS_HASH_TCP = 0x02,
+	RSS_TCP_SYN_DIS = 0x03,
+	RSS_HASH_UDP = 0x04,
+	RSS_HASH_L4ETC = 0x05,
+	RSS_HASH_ROCE = 0x06,
+	RSS_L3_BIDI = 0x07,
+	RSS_L4_BIDI = 0x08,
+};
+
+/* Completion queue entry types */
+enum cqe_type {
+	CQE_TYPE_INVALID = 0x0,
+	CQE_TYPE_RX = 0x2,
+	CQE_TYPE_RX_SPLIT = 0x3,
+	CQE_TYPE_RX_TCP = 0x4,
+	CQE_TYPE_SEND = 0x8,
+	CQE_TYPE_SEND_PTP = 0x9,
+};
+
+enum cqe_rx_tcp_status {
+	CQE_RX_STATUS_VALID_TCP_CNXT = 0x00,
+	CQE_RX_STATUS_INVALID_TCP_CNXT = 0x0F,
+};
+
+enum cqe_send_status {
+	CQE_SEND_STATUS_GOOD = 0x00,
+	CQE_SEND_STATUS_DESC_FAULT = 0x01,
+	CQE_SEND_STATUS_HDR_CONS_ERR = 0x11,
+	CQE_SEND_STATUS_SUBDESC_ERR = 0x12,
+	CQE_SEND_STATUS_IMM_SIZE_OFLOW = 0x80,
+	CQE_SEND_STATUS_CRC_SEQ_ERR = 0x81,
+	CQE_SEND_STATUS_DATA_SEQ_ERR = 0x82,
+	CQE_SEND_STATUS_MEM_SEQ_ERR = 0x83,
+	CQE_SEND_STATUS_LOCK_VIOL = 0x84,
+	CQE_SEND_STATUS_LOCK_UFLOW = 0x85,
+	CQE_SEND_STATUS_DATA_FAULT = 0x86,
+	CQE_SEND_STATUS_TSTMP_CONFLICT = 0x87,
+	CQE_SEND_STATUS_TSTMP_TIMEOUT = 0x88,
+	CQE_SEND_STATUS_MEM_FAULT = 0x89,
+	CQE_SEND_STATUS_CSUM_OVERLAP = 0x8A,
+	CQE_SEND_STATUS_CSUM_OVERFLOW = 0x8B,
+};
+
+enum cqe_rx_tcp_end_reason {
+	CQE_RX_TCP_END_FIN_FLAG_DET = 0,
+	CQE_RX_TCP_END_INVALID_FLAG = 1,
+	CQE_RX_TCP_END_TIMEOUT = 2,
+	CQE_RX_TCP_END_OUT_OF_SEQ = 3,
+	CQE_RX_TCP_END_PKT_ERR = 4,
+	CQE_RX_TCP_END_QS_DISABLED = 0x0F,
+};
+
+/* Packet protocol level error enumeration */
+enum cqe_rx_err_level {
+	CQE_RX_ERRLVL_RE = 0x0,
+	CQE_RX_ERRLVL_L2 = 0x1,
+	CQE_RX_ERRLVL_L3 = 0x2,
+	CQE_RX_ERRLVL_L4 = 0x3,
+};
+
+/* Packet protocol level error type enumeration */
+enum cqe_rx_err_opcode {
+	CQE_RX_ERR_RE_NONE = 0x0,
+	CQE_RX_ERR_RE_PARTIAL = 0x1,
+	CQE_RX_ERR_RE_JABBER = 0x2,
+	CQE_RX_ERR_RE_FCS = 0x7,
+	CQE_RX_ERR_RE_TERMINATE = 0x9,
+	CQE_RX_ERR_RE_RX_CTL = 0xb,
+	CQE_RX_ERR_PREL2_ERR = 0x1f,
+	CQE_RX_ERR_L2_FRAGMENT = 0x20,
+	CQE_RX_ERR_L2_OVERRUN = 0x21,
+	CQE_RX_ERR_L2_PFCS = 0x22,
+	CQE_RX_ERR_L2_PUNY = 0x23,
+	CQE_RX_ERR_L2_MAL = 0x24,
+	CQE_RX_ERR_L2_OVERSIZE = 0x25,
+	CQE_RX_ERR_L2_UNDERSIZE = 0x26,
+	CQE_RX_ERR_L2_LENMISM = 0x27,
+	CQE_RX_ERR_L2_PCLP = 0x28,
+	CQE_RX_ERR_IP_NOT = 0x41,
+	CQE_RX_ERR_IP_CHK = 0x42,
+	CQE_RX_ERR_IP_MAL = 0x43,
+	CQE_RX_ERR_IP_MALD = 0x44,
+	CQE_RX_ERR_IP_HOP = 0x45,
+	CQE_RX_ERR_L3_ICRC = 0x46,
+	CQE_RX_ERR_L3_PCLP = 0x47,
+	CQE_RX_ERR_L4_MAL = 0x61,
+	CQE_RX_ERR_L4_CHK = 0x62,
+	CQE_RX_ERR_UDP_LEN = 0x63,
+	CQE_RX_ERR_L4_PORT = 0x64,
+	CQE_RX_ERR_TCP_FLAG = 0x65,
+	CQE_RX_ERR_TCP_OFFSET = 0x66,
+	CQE_RX_ERR_L4_PCLP = 0x67,
+	CQE_RX_ERR_RBDR_TRUNC = 0x70,
+};
+
+struct cqe_rx_t {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t   cqe_type:4; /* W0 */
+	uint64_t   stdn_fault:1;
+	uint64_t   rsvd0:1;
+	uint64_t   rq_qs:7;
+	uint64_t   rq_idx:3;
+	uint64_t   rsvd1:12;
+	uint64_t   rss_alg:4;
+	uint64_t   rsvd2:4;
+	uint64_t   rb_cnt:4;
+	uint64_t   vlan_found:1;
+	uint64_t   vlan_stripped:1;
+	uint64_t   vlan2_found:1;
+	uint64_t   vlan2_stripped:1;
+	uint64_t   l4_type:4;
+	uint64_t   l3_type:4;
+	uint64_t   l2_present:1;
+	uint64_t   err_level:3;
+	uint64_t   err_opcode:8;
+
+	uint64_t   pkt_len:16; /* W1 */
+	uint64_t   l2_ptr:8;
+	uint64_t   l3_ptr:8;
+	uint64_t   l4_ptr:8;
+	uint64_t   cq_pkt_len:8;
+	uint64_t   align_pad:3;
+	uint64_t   rsvd3:1;
+	uint64_t   chan:12;
+
+	uint64_t   rss_tag:32; /* W2 */
+	uint64_t   vlan_tci:16;
+	uint64_t   vlan_ptr:8;
+	uint64_t   vlan2_ptr:8;
+
+	uint64_t   rb3_sz:16; /* W3 */
+	uint64_t   rb2_sz:16;
+	uint64_t   rb1_sz:16;
+	uint64_t   rb0_sz:16;
+
+	uint64_t   rb7_sz:16; /* W4 */
+	uint64_t   rb6_sz:16;
+	uint64_t   rb5_sz:16;
+	uint64_t   rb4_sz:16;
+
+	uint64_t   rb11_sz:16; /* W5 */
+	uint64_t   rb10_sz:16;
+	uint64_t   rb9_sz:16;
+	uint64_t   rb8_sz:16;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t   err_opcode:8;
+	uint64_t   err_level:3;
+	uint64_t   l2_present:1;
+	uint64_t   l3_type:4;
+	uint64_t   l4_type:4;
+	uint64_t   vlan2_stripped:1;
+	uint64_t   vlan2_found:1;
+	uint64_t   vlan_stripped:1;
+	uint64_t   vlan_found:1;
+	uint64_t   rb_cnt:4;
+	uint64_t   rsvd2:4;
+	uint64_t   rss_alg:4;
+	uint64_t   rsvd1:12;
+	uint64_t   rq_idx:3;
+	uint64_t   rq_qs:7;
+	uint64_t   rsvd0:1;
+	uint64_t   stdn_fault:1;
+	uint64_t   cqe_type:4; /* W0 */
+	uint64_t   chan:12;
+	uint64_t   rsvd3:1;
+	uint64_t   align_pad:3;
+	uint64_t   cq_pkt_len:8;
+	uint64_t   l4_ptr:8;
+	uint64_t   l3_ptr:8;
+	uint64_t   l2_ptr:8;
+	uint64_t   pkt_len:16; /* W1 */
+	uint64_t   vlan2_ptr:8;
+	uint64_t   vlan_ptr:8;
+	uint64_t   vlan_tci:16;
+	uint64_t   rss_tag:32; /* W2 */
+	uint64_t   rb0_sz:16;
+	uint64_t   rb1_sz:16;
+	uint64_t   rb2_sz:16;
+	uint64_t   rb3_sz:16; /* W3 */
+	uint64_t   rb4_sz:16;
+	uint64_t   rb5_sz:16;
+	uint64_t   rb6_sz:16;
+	uint64_t   rb7_sz:16; /* W4 */
+	uint64_t   rb8_sz:16;
+	uint64_t   rb9_sz:16;
+	uint64_t   rb10_sz:16;
+	uint64_t   rb11_sz:16; /* W5 */
+#endif
+	uint64_t   rb0_ptr:64;
+	uint64_t   rb1_ptr:64;
+	uint64_t   rb2_ptr:64;
+	uint64_t   rb3_ptr:64;
+	uint64_t   rb4_ptr:64;
+	uint64_t   rb5_ptr:64;
+	uint64_t   rb6_ptr:64;
+	uint64_t   rb7_ptr:64;
+	uint64_t   rb8_ptr:64;
+	uint64_t   rb9_ptr:64;
+	uint64_t   rb10_ptr:64;
+	uint64_t   rb11_ptr:64;
+};
+
+struct cqe_rx_tcp_err_t {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t   cqe_type:4; /* W0 */
+	uint64_t   rsvd0:60;
+
+	uint64_t   rsvd1:4; /* W1 */
+	uint64_t   partial_first:1;
+	uint64_t   rsvd2:27;
+	uint64_t   rbdr_bytes:8;
+	uint64_t   rsvd3:24;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t   rsvd0:60;
+	uint64_t   cqe_type:4;
+
+	uint64_t   rsvd3:24;
+	uint64_t   rbdr_bytes:8;
+	uint64_t   rsvd2:27;
+	uint64_t   partial_first:1;
+	uint64_t   rsvd1:4;
+#endif
+};
+
+struct cqe_rx_tcp_t {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t   cqe_type:4; /* W0 */
+	uint64_t   rsvd0:52;
+	uint64_t   cq_tcp_status:8;
+
+	uint64_t   rsvd1:32; /* W1 */
+	uint64_t   tcp_cntx_bytes:8;
+	uint64_t   rsvd2:8;
+	uint64_t   tcp_err_bytes:16;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t   cq_tcp_status:8;
+	uint64_t   rsvd0:52;
+	uint64_t   cqe_type:4; /* W0 */
+
+	uint64_t   tcp_err_bytes:16;
+	uint64_t   rsvd2:8;
+	uint64_t   tcp_cntx_bytes:8;
+	uint64_t   rsvd1:32; /* W1 */
+#endif
+};
+
+struct cqe_send_t {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t   cqe_type:4; /* W0 */
+	uint64_t   rsvd0:4;
+	uint64_t   sqe_ptr:16;
+	uint64_t   rsvd1:4;
+	uint64_t   rsvd2:10;
+	uint64_t   sq_qs:7;
+	uint64_t   sq_idx:3;
+	uint64_t   rsvd3:8;
+	uint64_t   send_status:8;
+
+	uint64_t   ptp_timestamp:64; /* W1 */
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t   send_status:8;
+	uint64_t   rsvd3:8;
+	uint64_t   sq_idx:3;
+	uint64_t   sq_qs:7;
+	uint64_t   rsvd2:10;
+	uint64_t   rsvd1:4;
+	uint64_t   sqe_ptr:16;
+	uint64_t   rsvd0:4;
+	uint64_t   cqe_type:4; /* W0 */
+
+	uint64_t   ptp_timestamp:64; /* W1 */
+#endif
+};
+
+union cq_desc_t {
+	uint64_t u[64];
+	struct cqe_send_t snd_hdr;
+	struct cqe_rx_t rx_hdr;
+	struct cqe_rx_tcp_t rx_tcp_hdr;
+	struct cqe_rx_tcp_err_t rx_tcp_err_hdr;
+};
+
+struct rbdr_entry_t {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t   rsvd0:15;
+	uint64_t   buf_addr:42;
+	uint64_t   cache_align:7;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t   cache_align:7;
+	uint64_t   buf_addr:42;
+	uint64_t   rsvd0:15;
+#endif
+};
+
+/* TCP reassembly context */
+struct rbe_tcp_cnxt_t {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t   tcp_pkt_cnt:12;
+	uint64_t   rsvd1:4;
+	uint64_t   align_hdr_bytes:4;
+	uint64_t   align_ptr_bytes:4;
+	uint64_t   ptr_bytes:16;
+	uint64_t   rsvd2:24;
+	uint64_t   cqe_type:4;
+	uint64_t   rsvd0:54;
+	uint64_t   tcp_end_reason:2;
+	uint64_t   tcp_status:4;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t   tcp_status:4;
+	uint64_t   tcp_end_reason:2;
+	uint64_t   rsvd0:54;
+	uint64_t   cqe_type:4;
+	uint64_t   rsvd2:24;
+	uint64_t   ptr_bytes:16;
+	uint64_t   align_ptr_bytes:4;
+	uint64_t   align_hdr_bytes:4;
+	uint64_t   rsvd1:4;
+	uint64_t   tcp_pkt_cnt:12;
+#endif
+};
+
+/* Always Big endian */
+struct rx_hdr_t {
+	uint64_t   opaque:32;
+	uint64_t   rss_flow:8;
+	uint64_t   skip_length:6;
+	uint64_t   disable_rss:1;
+	uint64_t   disable_tcp_reassembly:1;
+	uint64_t   nodrop:1;
+	uint64_t   dest_alg:2;
+	uint64_t   rsvd0:2;
+	uint64_t   dest_rq:11;
+};
+
+enum send_l4_csum_type {
+	SEND_L4_CSUM_DISABLE = 0x00,
+	SEND_L4_CSUM_UDP = 0x01,
+	SEND_L4_CSUM_TCP = 0x02,
+	SEND_L4_CSUM_SCTP = 0x03,
+};
+
+enum send_crc_alg {
+	SEND_CRCALG_CRC32 = 0x00,
+	SEND_CRCALG_CRC32C = 0x01,
+	SEND_CRCALG_ICRC = 0x02,
+};
+
+enum send_load_type {
+	SEND_LD_TYPE_LDD = 0x00,
+	SEND_LD_TYPE_LDT = 0x01,
+	SEND_LD_TYPE_LDWB = 0x02,
+};
+
+enum send_mem_alg_type {
+	SEND_MEMALG_SET = 0x00,
+	SEND_MEMALG_ADD = 0x08,
+	SEND_MEMALG_SUB = 0x09,
+	SEND_MEMALG_ADDLEN = 0x0A,
+	SEND_MEMALG_SUBLEN = 0x0B,
+};
+
+enum send_mem_dsz_type {
+	SEND_MEMDSZ_B64 = 0x00,
+	SEND_MEMDSZ_B32 = 0x01,
+	SEND_MEMDSZ_B8 = 0x03,
+};
+
+enum sq_subdesc_type {
+	SQ_DESC_TYPE_INVALID = 0x00,
+	SQ_DESC_TYPE_HEADER = 0x01,
+	SQ_DESC_TYPE_CRC = 0x02,
+	SQ_DESC_TYPE_IMMEDIATE = 0x03,
+	SQ_DESC_TYPE_GATHER = 0x04,
+	SQ_DESC_TYPE_MEMORY = 0x05,
+};
+
+struct sq_crc_subdesc {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t    rsvd1:32;
+	uint64_t    crc_ival:32;
+	uint64_t    subdesc_type:4;
+	uint64_t    crc_alg:2;
+	uint64_t    rsvd0:10;
+	uint64_t    crc_insert_pos:16;
+	uint64_t    hdr_start:16;
+	uint64_t    crc_len:16;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t    crc_len:16;
+	uint64_t    hdr_start:16;
+	uint64_t    crc_insert_pos:16;
+	uint64_t    rsvd0:10;
+	uint64_t    crc_alg:2;
+	uint64_t    subdesc_type:4;
+	uint64_t    crc_ival:32;
+	uint64_t    rsvd1:32;
+#endif
+};
+
+struct sq_gather_subdesc {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t    subdesc_type:4; /* W0 */
+	uint64_t    ld_type:2;
+	uint64_t    rsvd0:42;
+	uint64_t    size:16;
+
+	uint64_t    rsvd1:15; /* W1 */
+	uint64_t    addr:49;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t    size:16;
+	uint64_t    rsvd0:42;
+	uint64_t    ld_type:2;
+	uint64_t    subdesc_type:4; /* W0 */
+
+	uint64_t    addr:49;
+	uint64_t    rsvd1:15; /* W1 */
+#endif
+};
+
+/* SQ immediate subdescriptor */
+struct sq_imm_subdesc {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t    subdesc_type:4; /* W0 */
+	uint64_t    rsvd0:46;
+	uint64_t    len:14;
+
+	uint64_t    data:64; /* W1 */
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t    len:14;
+	uint64_t    rsvd0:46;
+	uint64_t    subdesc_type:4; /* W0 */
+
+	uint64_t    data:64; /* W1 */
+#endif
+};
+
+struct sq_mem_subdesc {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t    subdesc_type:4; /* W0 */
+	uint64_t    mem_alg:4;
+	uint64_t    mem_dsz:2;
+	uint64_t    wmem:1;
+	uint64_t    rsvd0:21;
+	uint64_t    offset:32;
+
+	uint64_t    rsvd1:15; /* W1 */
+	uint64_t    addr:49;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t    offset:32;
+	uint64_t    rsvd0:21;
+	uint64_t    wmem:1;
+	uint64_t    mem_dsz:2;
+	uint64_t    mem_alg:4;
+	uint64_t    subdesc_type:4; /* W0 */
+
+	uint64_t    addr:49;
+	uint64_t    rsvd1:15; /* W1 */
+#endif
+};
+
+struct sq_hdr_subdesc {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t    subdesc_type:4;
+	uint64_t    tso:1;
+	uint64_t    post_cqe:1; /* Post CQE on no error also */
+	uint64_t    dont_send:1;
+	uint64_t    tstmp:1;
+	uint64_t    subdesc_cnt:8;
+	uint64_t    csum_l4:2;
+	uint64_t    csum_l3:1;
+	uint64_t    rsvd0:5;
+	uint64_t    l4_offset:8;
+	uint64_t    l3_offset:8;
+	uint64_t    rsvd1:4;
+	uint64_t    tot_len:20; /* W0 */
+
+	uint64_t    tso_sdc_cont:8;
+	uint64_t    tso_sdc_first:8;
+	uint64_t    tso_l4_offset:8;
+	uint64_t    tso_flags_last:12;
+	uint64_t    tso_flags_first:12;
+	uint64_t    rsvd2:2;
+	uint64_t    tso_max_paysize:14; /* W1 */
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t    tot_len:20;
+	uint64_t    rsvd1:4;
+	uint64_t    l3_offset:8;
+	uint64_t    l4_offset:8;
+	uint64_t    rsvd0:5;
+	uint64_t    csum_l3:1;
+	uint64_t    csum_l4:2;
+	uint64_t    subdesc_cnt:8;
+	uint64_t    tstmp:1;
+	uint64_t    dont_send:1;
+	uint64_t    post_cqe:1; /* Post CQE on no error also */
+	uint64_t    tso:1;
+	uint64_t    subdesc_type:4; /* W0 */
+
+	uint64_t    tso_max_paysize:14;
+	uint64_t    rsvd2:2;
+	uint64_t    tso_flags_first:12;
+	uint64_t    tso_flags_last:12;
+	uint64_t    tso_l4_offset:8;
+	uint64_t    tso_sdc_first:8;
+	uint64_t    tso_sdc_cont:8; /* W1 */
+#endif
+};
+
+/* Queue config register formats */
+struct rq_cfg {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t reserved_2_63:62;
+	uint64_t ena:1;
+	uint64_t tcp_ena:1;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t tcp_ena:1;
+	uint64_t ena:1;
+	uint64_t reserved_2_63:62;
+#endif
+};
+
+struct cq_cfg {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t reserved_43_63:21;
+	uint64_t ena:1;
+	uint64_t reset:1;
+	uint64_t caching:1;
+	uint64_t reserved_35_39:5;
+	uint64_t qsize:3;
+	uint64_t reserved_25_31:7;
+	uint64_t avg_con:9;
+	uint64_t reserved_0_15:16;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t reserved_0_15:16;
+	uint64_t avg_con:9;
+	uint64_t reserved_25_31:7;
+	uint64_t qsize:3;
+	uint64_t reserved_35_39:5;
+	uint64_t caching:1;
+	uint64_t reset:1;
+	uint64_t ena:1;
+	uint64_t reserved_43_63:21;
+#endif
+};
+
+struct sq_cfg {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t reserved_20_63:44;
+	uint64_t ena:1;
+	uint64_t reserved_18_18:1;
+	uint64_t reset:1;
+	uint64_t ldwb:1;
+	uint64_t reserved_11_15:5;
+	uint64_t qsize:3;
+	uint64_t reserved_3_7:5;
+	uint64_t tstmp_bgx_intf:3;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t tstmp_bgx_intf:3;
+	uint64_t reserved_3_7:5;
+	uint64_t qsize:3;
+	uint64_t reserved_11_15:5;
+	uint64_t ldwb:1;
+	uint64_t reset:1;
+	uint64_t reserved_18_18:1;
+	uint64_t ena:1;
+	uint64_t reserved_20_63:44;
+#endif
+};
+
+struct rbdr_cfg {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t reserved_45_63:19;
+	uint64_t ena:1;
+	uint64_t reset:1;
+	uint64_t ldwb:1;
+	uint64_t reserved_36_41:6;
+	uint64_t qsize:4;
+	uint64_t reserved_25_31:7;
+	uint64_t avg_con:9;
+	uint64_t reserved_12_15:4;
+	uint64_t lines:12;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t lines:12;
+	uint64_t reserved_12_15:4;
+	uint64_t avg_con:9;
+	uint64_t reserved_25_31:7;
+	uint64_t qsize:4;
+	uint64_t reserved_36_41:6;
+	uint64_t ldwb:1;
+	uint64_t reset:1;
+	uint64_t ena: 1;
+	uint64_t reserved_45_63:19;
+#endif
+};
+
+struct qs_cfg {
+#if defined(__BIG_ENDIAN_BITFIELD)
+	uint64_t reserved_32_63:32;
+	uint64_t ena:1;
+	uint64_t reserved_27_30:4;
+	uint64_t sq_ins_ena:1;
+	uint64_t sq_ins_pos:6;
+	uint64_t lock_ena:1;
+	uint64_t lock_viol_cqe_ena:1;
+	uint64_t send_tstmp_ena:1;
+	uint64_t be:1;
+	uint64_t reserved_7_15:9;
+	uint64_t vnic:7;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	uint64_t vnic:7;
+	uint64_t reserved_7_15:9;
+	uint64_t be:1;
+	uint64_t send_tstmp_ena:1;
+	uint64_t lock_viol_cqe_ena:1;
+	uint64_t lock_ena:1;
+	uint64_t sq_ins_pos:6;
+	uint64_t sq_ins_ena:1;
+	uint64_t reserved_27_30:4;
+	uint64_t ena:1;
+	uint64_t reserved_32_63:32;
+#endif
+};
+
+#endif /* Q_STRUCT_H */
diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
new file mode 100644
index 000000000000..b8886f9e0f46
--- /dev/null
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
@@ -0,0 +1,386 @@
+/*
+ * Copyright (C) 2014 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/interrupt.h>
+#include <linux/pci.h>
+
+#include "nic_reg.h"
+#include "nic.h"
+#include "thunder_bgx.h"
+
+#define DRV_NAME	"thunder-BGX"
+#define DRV_VERSION	"1.0"
+
+struct lmac {
+	int	dmac;
+	bool	link_up;
+} lmac;
+
+struct bgx {
+	uint8_t			bgx_id;
+	struct	lmac		lmac[MAX_LMAC_PER_BGX];
+	int			lmac_count;
+	uint64_t		reg_base;
+	struct	pci_dev		*pdev;
+	 /* MSI-X */
+	bool			msix_enabled;
+	uint16_t		num_vec;
+	struct	msix_entry	msix_entries[BGX_MSIX_VECTORS];
+	char			irq_name[BGX_MSIX_VECTORS][20];
+	uint8_t			irq_allocated[BGX_MSIX_VECTORS];
+} bgx;
+
+struct bgx *bgx_vnic[MAX_BGX_THUNDER];
+
+/* Supported devices */
+static const struct pci_device_id bgx_id_table[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVICE_ID_THUNDER_BGX) },
+	{ 0, }  /* end of table */
+};
+
+MODULE_AUTHOR("Cavium Inc");
+MODULE_DESCRIPTION("Cavium Thunder BGX/MAC Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_VERSION(DRV_VERSION);
+MODULE_DEVICE_TABLE(pci, bgx_id_table);
+
+/* Register read/write APIs */
+static uint64_t bgx_reg_read(struct bgx *bgx, uint8_t lmac, uint64_t offset)
+{
+	uint64_t addr = bgx->reg_base + (lmac << 20) + offset;
+
+	return readq_relaxed((void *)addr);
+}
+
+static void bgx_reg_write(struct bgx *bgx, uint8_t lmac,
+			  uint64_t offset, uint64_t val)
+{
+	uint64_t addr = bgx->reg_base + (lmac << 20) + offset;
+
+	writeq_relaxed(val, (void *)addr);
+}
+
+/* Return number of BGX present in HW */
+void bgx_get_count(int node, int *bgx_count)
+{
+	int i;
+	struct bgx *bgx;
+
+	*bgx_count = 0;
+	for (i = 0; i < MAX_BGX_PER_CN88XX; i++) {
+		bgx = bgx_vnic[(node * MAX_BGX_PER_CN88XX) + i];
+		if (bgx)
+			*bgx_count |= (1 << i);
+	}
+}
+EXPORT_SYMBOL(bgx_get_count);
+
+/* Return number of LMAC configured for this BGX */
+int bgx_get_lmac_count(int node, int bgx_idx)
+{
+	struct bgx *bgx;
+
+	bgx = bgx_vnic[(node * MAX_BGX_PER_CN88XX) + bgx_idx];
+	if (bgx)
+		return bgx->lmac_count;
+
+	return 0;
+}
+EXPORT_SYMBOL(bgx_get_lmac_count);
+
+/* Link Interrupts APIs */
+static void bgx_enable_link_intr(struct bgx *bgx, uint8_t lmac)
+{
+	uint64_t val;
+
+	val = bgx_reg_read(bgx, lmac, BGX_SPUX_INT_ENA_W1S);
+	val |= (LMAC_INTR_LINK_UP | LMAC_INTR_LINK_DOWN);
+	bgx_reg_write(bgx, lmac, BGX_SPUX_INT_ENA_W1S, val);
+}
+
+static irqreturn_t bgx_lmac_intr_handler (int irq, void *bgx_irq)
+{
+	struct bgx *bgx = (struct bgx *)bgx_irq;
+	u64 result;
+	uint8_t lmac;
+
+	for (lmac = 0; lmac < bgx->lmac_count; lmac++) {
+		result = bgx_reg_read(bgx, lmac, BGX_SPUX_INT);
+		if (result & LMAC_INTR_LINK_UP) {
+			bgx_reg_write(bgx, lmac, BGX_SPUX_INT,
+				      LMAC_INTR_LINK_UP);
+			dev_info(&bgx->pdev->dev, "lmac %d link is Up\n", lmac);
+			bgx->lmac[lmac].link_up = true;
+		}
+
+		if (result & LMAC_INTR_LINK_DOWN) {
+			bgx_reg_write(bgx, lmac, BGX_SPUX_INT,
+				      LMAC_INTR_LINK_DOWN);
+			dev_info(&bgx->pdev->dev,
+				 "lmac %d link is Down\n", lmac);
+			bgx->lmac[lmac].link_up = false;
+		}
+	}
+	return IRQ_HANDLED;
+}
+
+static int bgx_enable_msix(struct bgx *bgx)
+{
+	int vec, ret;
+
+	bgx->num_vec = BGX_MSIX_VECTORS;
+	for (vec = 0; vec < bgx->num_vec; vec++)
+		bgx->msix_entries[vec].entry = vec;
+
+	ret = pci_enable_msix(bgx->pdev, bgx->msix_entries, bgx->num_vec);
+	if (ret) {
+		dev_err(&bgx->pdev->dev ,
+			"Request for #%d msix vectors failed\n", bgx->num_vec);
+		return 0;
+	}
+	bgx->msix_enabled = 1;
+	return 1;
+}
+
+static void bgx_disable_msix(struct bgx *bgx)
+{
+	if (bgx->msix_enabled) {
+		pci_disable_msix(bgx->pdev);
+		bgx->msix_enabled = 0;
+		bgx->num_vec = 0;
+	}
+}
+
+static int bgx_register_interrupts(struct bgx *bgx, uint8_t lmac)
+{
+	int irq, ret = 0;
+
+	/* Register only link interrupts now */
+	irq = SPUX_INT + (lmac * BGX_LMAC_VEC_OFFSET);
+	sprintf(bgx->irq_name[irq], "LMAC%d", lmac);
+	ret = request_irq(bgx->msix_entries[irq].vector,
+			  bgx_lmac_intr_handler, 0, bgx->irq_name[irq], bgx);
+	if (ret)
+		goto fail;
+	else
+		bgx->irq_allocated[irq] = 1;
+
+	/* Enable link interrupt */
+	bgx_enable_link_intr(bgx, lmac);
+	return 0;
+
+fail:
+	dev_err(&bgx->pdev->dev, "Request irq failed\n");
+	for (irq = 0; irq < bgx->num_vec; irq++) {
+		if (bgx->irq_allocated[irq])
+			free_irq(bgx->msix_entries[irq].vector, bgx);
+		bgx->irq_allocated[irq] = 0;
+	}
+	return 1;
+}
+
+static void bgx_unregister_interrupts(struct bgx *bgx)
+{
+	int irq;
+	/* Free registered interrupts */
+	for (irq = 0; irq < bgx->num_vec; irq++) {
+		if (bgx->irq_allocated[irq])
+			free_irq(bgx->msix_entries[irq].vector, bgx);
+		bgx->irq_allocated[irq] = 0;
+	}
+	/* Disable MSI-X */
+	bgx_disable_msix(bgx);
+}
+
+static void bgx_flush_dmac_addrs(struct bgx *bgx, uint64_t lmac)
+{
+	uint64_t dmac = 0x00;
+	uint64_t offset, addr;
+
+	while (bgx->lmac[lmac].dmac > 0) {
+		offset = ((bgx->lmac[lmac].dmac - 1) * sizeof(dmac)) +
+			(lmac * MAX_DMAC_PER_LMAC * sizeof(dmac));
+		addr = bgx->reg_base + BGX_CMR_RX_DMACX_CAM + offset;
+		writeq_relaxed(dmac, (void *)addr);
+		bgx->lmac[lmac].dmac--;
+	}
+}
+
+void bgx_add_dmac_addr(uint64_t dmac, int node, int bgx_idx, int lmac)
+{
+	uint64_t offset, addr;
+	struct bgx *bgx;
+
+	bgx_idx += node * MAX_BGX_PER_CN88XX;
+	bgx = bgx_vnic[bgx_idx];
+
+	if (!bgx) {
+		pr_err("BGX%d not yet initialized, ignoring DMAC addition\n",
+		       bgx_idx);
+		return;
+	}
+
+	dmac = dmac | (1ULL << 48) | ((uint64_t)lmac << 49); /* Enable DMAC */
+	if (bgx->lmac[lmac].dmac == MAX_DMAC_PER_LMAC) {
+		pr_err("Max DMAC filters for LMAC%d reached, ignoring DMAC addition\n",
+		       lmac);
+		return;
+	}
+
+	if (bgx->lmac[lmac].dmac == MAX_DMAC_PER_LMAC_TNS_BYPASS_MODE)
+		bgx->lmac[lmac].dmac = 1;
+
+	offset = (bgx->lmac[lmac].dmac * sizeof(dmac)) +
+		(lmac * MAX_DMAC_PER_LMAC * sizeof(dmac));
+	addr = bgx->reg_base + BGX_CMR_RX_DMACX_CAM + offset;
+	writeq_relaxed(dmac, (void *)addr);
+	bgx->lmac[lmac].dmac++;
+
+	bgx_reg_write(bgx, lmac, BGX_CMRX_RX_DMAC_CTL,
+		      (CAM_ACCEPT << 3) | (MCAST_MODE_CAM_FILTER << 1)
+		      | (BCAST_ACCEPT << 0));
+}
+EXPORT_SYMBOL(bgx_add_dmac_addr);
+
+void bgx_lmac_enable(struct bgx *bgx, int8_t lmac)
+{
+	uint64_t dmac_bcast = (1ULL << 48) - 1;
+
+	bgx_reg_write(bgx, lmac, BGX_CMRX_CFG,
+		      (1 << 15) | (1 << 14) | (1 << 13));
+
+	/* Register interrupts */
+	bgx_register_interrupts(bgx, lmac);
+
+	/* Add broadcast MAC into all LMAC's DMAC filters */
+	for (lmac = 0; lmac < bgx->lmac_count; lmac++)
+		bgx_add_dmac_addr(dmac_bcast, 0, bgx->bgx_id, lmac);
+}
+
+void bgx_lmac_disable(struct bgx *bgx, uint8_t lmac)
+{
+	bgx_reg_write(bgx, lmac, BGX_CMRX_CFG, 0x00);
+	bgx_flush_dmac_addrs(bgx, lmac);
+	bgx_unregister_interrupts(bgx);
+}
+
+static void bgx_init_hw(struct bgx *bgx)
+{
+	int lmac;
+	uint64_t enable = 0;
+
+	/* Enable all LMACs */
+	/* Enable LMAC, Pkt Rx enable, Pkt Tx enable */
+	enable = (1 << 15) | (1 << 14) | (1 << 13);
+	for (lmac = 0; lmac < MAX_LMAC_PER_BGX; lmac++) {
+		bgx_reg_write(bgx, lmac, BGX_CMRX_CFG, enable);
+		bgx->lmac_count++;
+	}
+}
+
+static int bgx_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	struct device *dev = &pdev->dev;
+	struct bgx *bgx;
+	int    err;
+	uint8_t lmac = 0;
+
+	bgx = kzalloc(sizeof(*bgx), GFP_KERNEL);
+	bgx->pdev = pdev;
+
+	pci_set_drvdata(pdev, bgx);
+
+	err = pci_enable_device(pdev);
+	if (err) {
+		dev_err(dev, "Failed to enable PCI device\n");
+		goto exit;
+	}
+
+	err = pci_request_regions(pdev, DRV_NAME);
+	if (err) {
+		dev_err(dev, "PCI request regions failed 0x%x\n", err);
+		goto err_disable_device;
+	}
+
+	/* MAP configuration registers */
+	bgx->reg_base = (uint64_t)pci_ioremap_bar(pdev, PCI_CFG_REG_BAR_NUM);
+	if (!bgx->reg_base) {
+		dev_err(dev, "BGX: Cannot map CSR memory space, aborting\n");
+		err = -ENOMEM;
+		goto err_release_regions;
+	}
+	bgx->bgx_id = (pci_resource_start(pdev, PCI_CFG_REG_BAR_NUM) >> 24) & 1;
+	bgx->bgx_id += NODE_ID(pci_resource_start(pdev, PCI_CFG_REG_BAR_NUM))
+							* MAX_BGX_PER_CN88XX;
+	bgx_vnic[bgx->bgx_id] = bgx;
+
+	/* Initialize BGX hardware */
+	bgx_init_hw(bgx);
+	/* Enable MSI-X */
+	if (!bgx_enable_msix(bgx))
+		return 1;
+	/* Enable all LMACs */
+	for (lmac = 0; lmac < bgx->lmac_count; lmac++)
+		bgx_lmac_enable(bgx, lmac);
+	goto exit;
+
+	if (bgx->reg_base)
+		iounmap((void *)bgx->reg_base);
+err_release_regions:
+	pci_release_regions(pdev);
+err_disable_device:
+	pci_disable_device(pdev);
+exit:
+	return err;
+}
+
+static void bgx_remove(struct pci_dev *pdev)
+{
+	struct bgx *bgx = pci_get_drvdata(pdev);
+	uint8_t lmac;
+
+	if (!bgx)
+		return;
+	/* Disable all LMACs */
+	for (lmac = 0; lmac < 4; lmac++)
+		bgx_lmac_disable(bgx, lmac);
+
+	pci_set_drvdata(pdev, NULL);
+
+	if (bgx->reg_base)
+		iounmap((void *)bgx->reg_base);
+
+	pci_release_regions(pdev);
+	pci_disable_device(pdev);
+	kfree(bgx);
+}
+
+static struct pci_driver bgx_driver = {
+	.name = DRV_NAME,
+	.id_table = bgx_id_table,
+	.probe = bgx_probe,
+	.remove = bgx_remove,
+};
+
+static int __init bgx_init_module(void)
+{
+	pr_info("%s, ver %s\n", DRV_NAME, DRV_VERSION);
+
+	return pci_register_driver(&bgx_driver);
+}
+
+static void __exit bgx_cleanup_module(void)
+{
+	pci_unregister_driver(&bgx_driver);
+}
+
+module_init(bgx_init_module);
+module_exit(bgx_cleanup_module);
+
diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.h b/drivers/net/ethernet/cavium/thunder/thunder_bgx.h
new file mode 100644
index 000000000000..de6503148d25
--- /dev/null
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.h
@@ -0,0 +1,78 @@
+/*
+ * Copyright (C) 2014 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ */
+
+#ifndef THUNDER_BGX_H
+#define THUNDER_BGX_H
+
+#define    MAX_BGX_THUNDER			8 /* Max 4 nodes, 2 per node */
+#define    MAX_BGX_PER_CN88XX			2
+#define    MAX_LMAC_PER_BGX			4
+#define    MAX_BGX_CHANS_PER_LMAC		16
+#define    MAX_DMAC_PER_LMAC			8
+
+#define    MAX_DMAC_PER_LMAC_TNS_BYPASS_MODE	2
+
+#define    MAX_LMAC	(MAX_BGX_PER_CN88XX * MAX_LMAC_PER_BGX)
+
+#define    NODE_ID_MASK				0x300000000000
+#define    NODE_ID(x)				((x & NODE_ID_MASK) >> 44)
+
+/* Registers */
+#define BGX_CMRX_CFG				0x00
+#define BGX_CMRX_RX_ID_MAP			0x60
+#define BGX_CMRX_RX_DMAC_CTL			0x0E8
+#define BGX_CMR_RX_DMACX_CAM			0x200
+#define BGX_CMR_RX_LMACS			0x468
+#define BGX_CMR_TX_LMACS			0x1000
+
+#define BGX_SPUX_STATUS1			0x10008
+#define BGX_SPUX_STATUS2			0x10020
+#define BGX_SPUX_INT				0x10220	/* +(0..3) << 20 */
+#define BGX_SPUX_INT_W1S			0x10228
+#define BGX_SPUX_INT_ENA_W1C			0x10230
+#define BGX_SPUX_INT_ENA_W1S			0x10238
+
+#define BGX_MSIX_VEC_0_29_ADDR			0x400000 /* +(0..29) << 4 */
+#define BGX_MSIX_VEC_0_29_CTL			0x400008
+#define BGX_MSIX_PBA_0				0x4F0000
+
+/* MSI-X interrupts */
+#define BGX_MSIX_VECTORS	30
+#define BGX_LMAC_VEC_OFFSET	7
+#define BGX_MSIX_VEC_SHIFT	4
+
+#define CMRX_INT		0
+#define SPUX_INT		1
+#define SMUX_RX_INT		2
+#define SMUX_TX_INT		3
+#define GMPX_PCS_INT		4
+#define GMPX_GMI_RX_INT		5
+#define GMPX_GMI_TX_INT		6
+#define CMR_MEM_INT		28
+#define SPU_MEM_INT		29
+
+#define LMAC_INTR_LINK_UP	(1 << 0)
+#define LMAC_INTR_LINK_DOWN	(1 << 1)
+
+/*  RX_DMAC_CTL configuration*/
+enum MCAST_MODE {
+		MCAST_MODE_REJECT,
+		MCAST_MODE_ACCEPT,
+		MCAST_MODE_CAM_FILTER,
+		RSVD
+};
+
+#define BCAST_ACCEPT	1
+#define CAM_ACCEPT	1
+
+void bgx_add_dmac_addr(uint64_t dmac, int node, int bgx_idx, int lmac);
+void bgx_get_count(int node, int *bgx_count);
+int bgx_get_lmac_count(int node, int bgx);
+
+#endif /* THUNDER_BGX_H */
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 1fa99a301817..80bd3336691e 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2324,6 +2324,8 @@
 #define PCI_DEVICE_ID_ALTIMA_AC9100	0x03ea
 #define PCI_DEVICE_ID_ALTIMA_AC1003	0x03eb
 
+#define PCI_VENDOR_ID_CAVIUM		0x177d
+
 #define PCI_VENDOR_ID_BELKIN		0x1799
 #define PCI_DEVICE_ID_BELKIN_F5D7010V7	0x701f
 
-- 
2.1.3

^ permalink raw reply related

* Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter
From: David Miller @ 2014-11-10  5:20 UTC (permalink / raw)
  To: viro; +Cc: herbert, netdev, linux-kernel, bcrl, mst
In-Reply-To: <20141109211908.GF7996@ZenIV.linux.org.uk>

From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Sun, 9 Nov 2014 21:19:08 +0000

> 1) does sparc64 access_ok() need to differ for 32bit and 64bit tasks?

sparc64 will just fault no matter what kind of task it is.

It is impossible for a user task to generate a reference to
a kernel virtual address, as kernel and user accesses each
go via a separate address space identifier.

^ permalink raw reply

* RE: [PATCH net-next v2 2/3] r8152: cleartheflagofSCHEDULE_TASKLETintasklet
From: Hayes Wang @ 2014-11-10  5:23 UTC (permalink / raw)
  To: Francois Romieu
  Cc: David Miller, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	nic_swsd, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20141108221149.GA26030-lmTtMILVy1jWQcoT9B9Ug5SCg42XY1Uw0E9HWUfgJXw@public.gmane.org>

 Francois Romieu [mailto:romieu-W8zweXLXuWQS+FvcfC7Uqw@public.gmane.org] 
> Sent: Sunday, November 09, 2014 6:12 AM
[...]
> The performance explanation leaves me a bit unconvinced. Without any
> figure one could simply go for the always locked clear_bit because of:
> 1. the "I'm racy" message that the open-coded test + set sends
> 2. the extra work needed to avoid 1 (comment, explain, ...).

Thanks. I would modify this patch with clear_bit only.

> The extra time could thus be used to see what happens when napi is
> shoehorned in this tasklet machinery. I'd naively expect it to be
> relevant for efficiency.

I thought about NAPI, but I gave up. The reasons are
1. I don't sure if it would run when autosuspending.
2. There is no hw interrupt for USB device. And I have
   no idea about how to check if the USB transfer is
   completed by polling.
3. I have to control the rx packets numbers in poll().
   However, I couldn't control the packets number for
   each bulk-in transfer. I have to do extra works to
   deal with the rx flow.
4. I don't find much different between tasklet and NAPI.

Best Regards,
Hayes
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name
From: David Miller @ 2014-11-10  5:23 UTC (permalink / raw)
  To: jhs
  Cc: jiri, netdev, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, sfeldma, f.fainelli,
	roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <54603270.509@mojatatu.com>

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Sun, 09 Nov 2014 22:35:12 -0500

> wouldnt this just break an existing ABI? You may need to introduce a
> new attribute.

He isn't breaking anything Jamal, he's just changing the internal
macro name we use for the attribute's maximum length.

Please read his patches carefully instead of jumping to conclusions.

^ permalink raw reply

* Re: Face some error after applying commit 7dfa4b414d4(net/mlx4_en: Code cleanups in tx path)
From: Wei Yang @ 2014-11-10  5:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Wei Yang, Eric Dumazet, Amir Vadai, David Miller, netdev
In-Reply-To: <1415587574.13896.131.camel@edumazet-glaptop2.roam.corp.google.com>

On Sun, Nov 09, 2014 at 06:46:14PM -0800, Eric Dumazet wrote:
>On Mon, 2014-11-10 at 09:59 +0800, Wei Yang wrote:
>> On Fri, Nov 07, 2014 at 07:38:15PM -0800, Eric Dumazet wrote:
>> >On Fri, Nov 7, 2014 at 6:57 PM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
>> >> Eric and Amir
>> >>
>> >> I am testing the VF on PowerNV platform with 3.18-rc2.
>> >> After applying this patch I face some errors.
>> >>
>> >> First is the compiling error.
>> >>
>> >>     drivers/net/ethernet/mellanox/mlx4//en_tx.c: In function ‘mlx4_en_xmit’:
>> >>     drivers/net/ethernet/mellanox/mlx4//en_tx.c:802:8: error: ‘shinfo’ undeclared (first use in this function)
>> >>             shinfo->tx_flags & SKBTX_HW_TSTAMP)) {
>> >>             ^
>> >>     include/linux/compiler.h:160:42: note: in definition of macro ‘unlikely’
>> >>      # define unlikely(x) __builtin_expect(!!(x), 0)
>> >>                                               ^
>> >>     drivers/net/ethernet/mellanox/mlx4//en_tx.c:802:8: note: each undeclared identifier is reported only once for each function it appears in
>> >>             shinfo->tx_flags & SKBTX_HW_TSTAMP)) {
>> >>             ^
>> >>     include/linux/compiler.h:160:42: note: in definition of macro ‘unlikely’
>> >>      # define unlikely(x) __builtin_expect(!!(x), 0)
>> >>                                               ^
>> >>     make[1]: *** [drivers/net/ethernet/mellanox/mlx4//en_tx.o] Error 1
>> >>     make: *** [_module_drivers/net/ethernet/mellanox/mlx4/] Error 2
>> >>
>> >
>> >
>> >This compilation error seems strange.
>> >
>> >Are you sure your tree is pristine, not corrupted in any way ?
>> 
>> I believe I did the revert one by one with git revert.
>> 
>> >
>> >
>> >> I tried to fix this with following change:
>> >>
>> >>     [root@tian-lp1 3.18]# git diff
>> >>     diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/m
>> >>     index eaf23eb..d2f06a7 100644
>> >>     --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> >>     +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> >>     @@ -799,8 +799,8 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_dev
>> >>              * set flag for further reference
>> >>              */
>> >>             if (unlikely(ring->hwtstamp_tx_type == HWTSTAMP_TX_ON &&
>> >>     -                    shinfo->tx_flags & SKBTX_HW_TSTAMP)) {
>> >>     -               shinfo->tx_flags |= SKBTX_IN_PROGRESS;
>> >>     +                    skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) {
>> >>     +               skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
>> >>                     tx_info->ts_requested = 1;
>> >>             }
>> >>
>> >> But seems to face another error.
>> >>
>> >
>> >I suspect your tree is not the official tree, I do not see how you got
>> >this compilation error.
>> 
>> 
>> I checked the upstream git tree again, and find this commit:
>> 
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7dfa4b414d4eec8da56e44fb2b4aea3e549b092f
>> 
>> 
>> And I want to say the shinfo local variable is introduced in commit:
>> 
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b9d8839a44092cb4268ef2813c34d5dbf3363603
>> 
>> And in my log tree, also checked the upstream, this one is applyed after the
>> first one. And the compiling error will disappear untill I apply this one.
>> 
>> So this compiling issue can't reproduced at your side? You have reset --hard
>> to the "Code cleanup" one, and can't see the error? That is strange.
>> 
>
>Okay, your message was not clear : I thought you had a compilation error
>on current tree.
>
>The true story of these patches is that Mellanox split an initial big
>chunk [1] I gave into multiple patches.
>
>Maybe they missed that one patch did not actually compile.
>
>[1] https://patchwork.ozlabs.org/patch/394256/
>
>Now, it is done, there is nothing we can do.
>
>I'll let Mellanox comment, but it looks like your hardware does not like
>something.
>
>Have you tried to disable Blue Frame ?
>

Yep, looks the PF works fine. But the current FW I can't just enable the PF.

How to disable Blue Frame? I am not clear about this.


-- 
Richard Yang
Help you, Help me

^ permalink raw reply

* [PATCH v2 net-next] PPC: bpf_jit_comp: add SKF_AD_HATYPE instruction
From: Denis Kirjanov @ 2014-11-10  5:59 UTC (permalink / raw)
  To: netdev
  Cc: linuxppc-dev, Denis Kirjanov, Alexei Starovoitov, Daniel Borkmann,
	Philippe Bergheaud

Add BPF extension SKF_AD_HATYPE to ppc JIT to check
the hw type of the interface

Before:
[   57.723666] test_bpf: #20 LD_HATYPE
[   57.723675] BPF filter opcode 0020 (@0) unsupported
[   57.724168] 48 48 PASS

After:
[  103.053184] test_bpf: #20 LD_HATYPE 7 6 PASS

CC: Alexei Starovoitov<alexei.starovoitov@gmail.com>
CC: Daniel Borkmann<dborkman@redhat.com>
CC: Philippe Bergheaud<felix@linux.vnet.ibm.com>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>

v2: address Alexei's comments
---
 arch/powerpc/net/bpf_jit_comp.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index d110e28..d3fa80d 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -361,6 +361,11 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 							    protocol));
 			break;
 		case BPF_ANC | SKF_AD_IFINDEX:
+		case BPF_ANC | SKF_AD_HATYPE:
+			BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
+						ifindex) != 4);
+			BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
+						type) != 2);
 			PPC_LD_OFFS(r_scratch1, r_skb, offsetof(struct sk_buff,
 								dev));
 			PPC_CMPDI(r_scratch1, 0);
@@ -368,14 +373,18 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 				PPC_BCC(COND_EQ, addrs[ctx->pc_ret0]);
 			} else {
 				/* Exit, returning 0; first pass hits here. */
-				PPC_BCC_SHORT(COND_NE, (ctx->idx*4)+12);
+				PPC_BCC_SHORT(COND_NE, ctx->idx * 4 + 12);
 				PPC_LI(r_ret, 0);
 				PPC_JMP(exit_addr);
 			}
-			BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
-						  ifindex) != 4);
-			PPC_LWZ_OFFS(r_A, r_scratch1,
+			if (code == (BPF_ANC | SKF_AD_IFINDEX)) {
+				PPC_LWZ_OFFS(r_A, r_scratch1,
 				     offsetof(struct net_device, ifindex));
+			} else {
+				PPC_LHZ_OFFS(r_A, r_scratch1,
+				     offsetof(struct net_device, type));
+			}
+
 			break;
 		case BPF_ANC | SKF_AD_MARK:
 			BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter
From: Al Viro @ 2014-11-10  6:58 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, netdev, linux-kernel, bcrl, mst
In-Reply-To: <20141110.002020.1062493586889118565.davem@davemloft.net>

On Mon, Nov 10, 2014 at 12:20:20AM -0500, David Miller wrote:
> From: Al Viro <viro@ZenIV.linux.org.uk>
> Date: Sun, 9 Nov 2014 21:19:08 +0000
> 
> > 1) does sparc64 access_ok() need to differ for 32bit and 64bit tasks?
> 
> sparc64 will just fault no matter what kind of task it is.
> 
> It is impossible for a user task to generate a reference to
> a kernel virtual address, as kernel and user accesses each
> go via a separate address space identifier.

Sure, but why do we have access_ok() there at all?  I.e. why not just have
it constant 1?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox