Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 5/6] cnic: Further unify kcq handling code.
From: David Miller @ 2010-06-26  4:07 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1277427522-17306-5-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Thu, 24 Jun 2010 17:58:41 -0700

> This eliminates some of the duplicate code for the various devices
> that require the same basic kcq handling.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 4/6] cnic: Restructure kcq processing.
From: David Miller @ 2010-06-26  4:07 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1277427522-17306-4-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Thu, 24 Jun 2010 17:58:40 -0700

> By doing more work in the common function cnic_get_kcqes(), and
> making full use of the kcq_info structure.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 3/6] cnic: Unify kcq allocation for all devices.
From: David Miller @ 2010-06-26  4:07 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1277427522-17306-3-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Thu, 24 Jun 2010 17:58:39 -0700

> By creating a common data stucture kcq_info for all devices, the kcq
> (kernel completion queue) for all devices can be allocated by common
> code.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 2/6] cnic: Unify IRQ code for all hardware types.
From: David Miller @ 2010-06-26  4:07 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1277427522-17306-2-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Thu, 24 Jun 2010 17:58:38 -0700

> By creating a common cnic_doirq().
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 1/6] cnic: Fine-tune CID memory space calculation.
From: David Miller @ 2010-06-26  4:06 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1277427522-17306-1-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Thu, 24 Jun 2010 17:58:37 -0700

> The current code makes assumptions about the CID (context ID) memory
> space and starting CID that may not be always correct when firmware
> changes.  In particular, BNX2_ISCSI_START_CID may not always be fixed.
> We now calculate cp->max_cid_space and cp->iscsi_start_cid dynamically
> instead of using fixed constants.  The unused cp->max_iscsi_conn is also
> eliminated.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 0/10] enic: updates to version 1.4.1.1
From: David Miller @ 2010-06-26  4:06 UTC (permalink / raw)
  To: vkolluri; +Cc: netdev, scofeldm, roprabhu
In-Reply-To: <20100624204723.22595.42990.stgit@savbu-pc100.cisco.com>

From: Vasanthy Kolluri <vkolluri@cisco.com>
Date: Thu, 24 Jun 2010 13:49:10 -0700

> The following patch series implements enic driver updates:
> 
> 01/10 - Feature Add: Replace LRO with GRO
> 02/10 - Bug Fix: Change hardware ingress vlan rewrite mode
> 03/10 - Use a lighter reset operation for enic devices
> 04/10 - Clean up: Add wrapper routines for firmware devcmd calls
> 05/10 - Use (netdev|dev|pr)_<level> macro helpers for logging
> 06/10 - Add new firmware devcmds
> 07/10 - Use receive queue buffer blocks of 32/64 entries
> 08/10 - Feature Add: Add loopback capability to enic devices
> 09/10 - Bug Fix: Handle surprise hardware removals
> 10/10 - Clean ups

All applied, without the "slab.h" include removals so it still
builds.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 10/10] enic: Clean ups
From: David Miller @ 2010-06-26  3:49 UTC (permalink / raw)
  To: vkolluri; +Cc: netdev, scofeldm, roprabhu
In-Reply-To: <20100624205213.22595.97942.stgit@savbu-pc100.cisco.com>

From: Vasanthy Kolluri <vkolluri@cisco.com>
Date: Thu, 24 Jun 2010 13:52:26 -0700

> diff --git a/drivers/net/enic/vnic_wq.c b/drivers/net/enic/vnic_wq.c
> index 3ab7fa5..5c84bb8 100644
> --- a/drivers/net/enic/vnic_wq.c
> +++ b/drivers/net/enic/vnic_wq.c
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright 2008 Cisco Systems, Inc.  All rights reserved.
> + * Copyright 2008-2010 Cisco Systems, Inc.  All rights reserved.
>   * Copyright 2007 Nuova Systems, Inc.  All rights reserved.
>   *
>   * This program is free software; you may redistribute it and/or modify
> @@ -22,7 +22,6 @@
>  #include <linux/types.h>
>  #include <linux/pci.h>
>  #include <linux/delay.h>
> -#include <linux/slab.h>
>  
>  #include "vnic_dev.h"
>  #include "vnic_wq.h"

I have to remove this otherwise the build breaks:

drivers/net/enic/vnic_wq.c: In function 'vnic_wq_alloc_bufs':
drivers/net/enic/vnic_wq.c:39:3: error: implicit declaration of function 'kzalloc'
drivers/net/enic/vnic_wq.c:39:15: warning: assignment makes pointer from integer without a cast
drivers/net/enic/vnic_wq.c: In function 'vnic_wq_free':
drivers/net/enic/vnic_wq.c:79:3: error: implicit declaration of function 'kfree'
  CC [M]  drivers/net/sfc/mcdi_mac.o
make[3]: *** [drivers/net/enic/vnic_wq.o] Error 1
make[3]: *** Waiting for unfinished jobs....
  CC [M]  drivers/net/sfc/selftest.o
  CC [M]  drivers/net/sfc/ethtool.o
  CC [M]  drivers/net/sfc/qt202x_phy.o
drivers/net/enic/vnic_rq.c: In function 'vnic_rq_alloc_bufs':
drivers/net/enic/vnic_rq.c:39:3: error: implicit declaration of function 'kzalloc'
drivers/net/enic/vnic_rq.c:39:15: warning: assignment makes pointer from integer without a cast
drivers/net/enic/vnic_rq.c: In function 'vnic_rq_free':
drivers/net/enic/vnic_rq.c:79:3: error: implicit declaration of function 'kfree'

Any time you remove any header includes, it is very wise to attempt a wide
assortment of build tests using all sorts of different configurations.

^ permalink raw reply

* [PATCH] net: Add batman-adv meshing protocol
From: Sven Eckelmann @ 2010-06-26  0:14 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Sven Eckelmann
In-Reply-To: <201006260214.06662.sven.eckelmann@gmx.de>

B.A.T.M.A.N. (better approach to mobile ad-hoc networking) is a routing
protocol for multi-hop ad-hoc mesh networks. The networks may be wired or
wireless. See http://www.open-mesh.org/ for more information and user space
tools.

Signed-off-by: Sven Eckelmann <sven.eckelmann@gmx.de>
---
 .../ABI/testing/sysfs-class-net-batman-adv         |   14 +
 Documentation/ABI/testing/sysfs-class-net-mesh     |   33 +
 MAINTAINERS                                        |    8 +
 net/Kconfig                                        |    1 +
 net/Makefile                                       |    1 +
 net/batman-adv/CHANGELOG                           |   63 +
 net/batman-adv/Kconfig                             |   26 +
 net/batman-adv/Makefile                            |   22 +
 net/batman-adv/README                              |  240 ++++
 net/batman-adv/aggregation.c                       |  268 ++++
 net/batman-adv/aggregation.h                       |   43 +
 net/batman-adv/bat_debugfs.c                       |  150 +++
 net/batman-adv/bat_debugfs.h                       |   33 +
 net/batman-adv/bat_sysfs.c                         |  433 +++++++
 net/batman-adv/bat_sysfs.h                         |   42 +
 net/batman-adv/bitarray.c                          |  204 +++
 net/batman-adv/bitarray.h                          |   46 +
 net/batman-adv/hard-interface.c                    |  548 +++++++++
 net/batman-adv/hard-interface.h                    |   45 +
 net/batman-adv/hash.c                              |  306 +++++
 net/batman-adv/hash.h                              |  100 ++
 net/batman-adv/icmp_socket.c                       |  337 +++++
 net/batman-adv/icmp_socket.h                       |   34 +
 net/batman-adv/main.c                              |  299 +++++
 net/batman-adv/main.h                              |  169 +++
 net/batman-adv/originator.c                        |  499 ++++++++
 net/batman-adv/originator.h                        |   36 +
 net/batman-adv/packet.h                            |  120 ++
 net/batman-adv/ring_buffer.c                       |   52 +
 net/batman-adv/ring_buffer.h                       |   28 +
 net/batman-adv/routing.c                           | 1291 ++++++++++++++++++++
 net/batman-adv/routing.h                           |   46 +
 net/batman-adv/send.c                              |  576 +++++++++
 net/batman-adv/send.h                              |   42 +
 net/batman-adv/soft-interface.c                    |  361 ++++++
 net/batman-adv/soft-interface.h                    |   33 +
 net/batman-adv/translation-table.c                 |  498 ++++++++
 net/batman-adv/translation-table.h                 |   45 +
 net/batman-adv/types.h                             |  175 +++
 net/batman-adv/vis.c                               |  818 +++++++++++++
 net/batman-adv/vis.h                               |   60 +
 41 files changed, 8145 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-class-net-batman-adv
 create mode 100644 Documentation/ABI/testing/sysfs-class-net-mesh
 create mode 100644 net/batman-adv/CHANGELOG
 create mode 100644 net/batman-adv/Kconfig
 create mode 100644 net/batman-adv/Makefile
 create mode 100644 net/batman-adv/README
 create mode 100644 net/batman-adv/aggregation.c
 create mode 100644 net/batman-adv/aggregation.h
 create mode 100644 net/batman-adv/bat_debugfs.c
 create mode 100644 net/batman-adv/bat_debugfs.h
 create mode 100644 net/batman-adv/bat_sysfs.c
 create mode 100644 net/batman-adv/bat_sysfs.h
 create mode 100644 net/batman-adv/bitarray.c
 create mode 100644 net/batman-adv/bitarray.h
 create mode 100644 net/batman-adv/hard-interface.c
 create mode 100644 net/batman-adv/hard-interface.h
 create mode 100644 net/batman-adv/hash.c
 create mode 100644 net/batman-adv/hash.h
 create mode 100644 net/batman-adv/icmp_socket.c
 create mode 100644 net/batman-adv/icmp_socket.h
 create mode 100644 net/batman-adv/main.c
 create mode 100644 net/batman-adv/main.h
 create mode 100644 net/batman-adv/originator.c
 create mode 100644 net/batman-adv/originator.h
 create mode 100644 net/batman-adv/packet.h
 create mode 100644 net/batman-adv/ring_buffer.c
 create mode 100644 net/batman-adv/ring_buffer.h
 create mode 100644 net/batman-adv/routing.c
 create mode 100644 net/batman-adv/routing.h
 create mode 100644 net/batman-adv/send.c
 create mode 100644 net/batman-adv/send.h
 create mode 100644 net/batman-adv/soft-interface.c
 create mode 100644 net/batman-adv/soft-interface.h
 create mode 100644 net/batman-adv/translation-table.c
 create mode 100644 net/batman-adv/translation-table.h
 create mode 100644 net/batman-adv/types.h
 create mode 100644 net/batman-adv/vis.c
 create mode 100644 net/batman-adv/vis.h

diff --git a/Documentation/ABI/testing/sysfs-class-net-batman-adv b/Documentation/ABI/testing/sysfs-class-net-batman-adv
new file mode 100644
index 0000000..38dd762
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-net-batman-adv
@@ -0,0 +1,14 @@
+
+What:           /sys/class/net/<iface>/batman-adv/mesh_iface
+Date:           May 2010
+Contact:        Marek Lindner <lindner_marek@yahoo.de>
+Description:
+                The /sys/class/net/<iface>/batman-adv/mesh_iface file
+                displays the batman mesh interface this <iface>
+                currently is associated with.
+
+What:           /sys/class/net/<iface>/batman-adv/iface_status
+Date:           May 2010
+Contact:        Marek Lindner <lindner_marek@yahoo.de>
+Description:
+                Indicates the status of <iface> as it is seen by batman.
diff --git a/Documentation/ABI/testing/sysfs-class-net-mesh b/Documentation/ABI/testing/sysfs-class-net-mesh
new file mode 100644
index 0000000..5aa1912
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-net-mesh
@@ -0,0 +1,33 @@
+
+What:           /sys/class/net/<mesh_iface>/mesh/aggregated_ogms
+Date:           May 2010
+Contact:        Marek Lindner <lindner_marek@yahoo.de>
+Description:
+                Indicates whether the batman protocol messages of the
+                mesh <mesh_iface> shall be aggregated or not.
+
+What:           /sys/class/net/<mesh_iface>/mesh/bonding
+Date:           June 2010
+Contact:        Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
+Description:
+                Indicates whether the data traffic going through the
+                mesh will be sent using multiple interfaces at the
+                same time (if available).
+
+What:           /sys/class/net/<mesh_iface>/mesh/orig_interval
+Date:           May 2010
+Contact:        Marek Lindner <lindner_marek@yahoo.de>
+Description:
+                Defines the interval in milliseconds in which batman
+                sends its protocol messages.
+
+What:           /sys/class/net/<mesh_iface>/mesh/vis_mode
+Date:           May 2010
+Contact:        Marek Lindner <lindner_marek@yahoo.de>
+Description:
+                Each batman node only maintains information about its
+                own local neighborhood, therefore generating graphs
+                showing the topology of the entire mesh is not easily
+                feasible without having a central instance to collect
+                the local topologies from all nodes. This file allows
+                to activate the collecting (server) mode.
diff --git a/MAINTAINERS b/MAINTAINERS
index e993716..2a0e678 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1246,6 +1246,14 @@ S:	Maintained
 F:	drivers/video/backlight/
 F:	include/linux/backlight.h
 
+BATMAN ADVANCED
+M:	Marek Lindner <lindner_marek@yahoo.de>
+M:	Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
+L:	b.a.t.m.a.n@lists.open-mesh.net
+W:	http://www.open-mesh.org/
+S:	Maintained
+F:	net/batman-adv/
+
 BAYCOM/HDLCDRV DRIVERS FOR AX.25
 M:	Thomas Sailer <t.sailer@alumni.ethz.ch>
 L:	linux-hams@vger.kernel.org
diff --git a/net/Kconfig b/net/Kconfig
index 0d68b40..b4244a3 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -203,6 +203,7 @@ source "net/phonet/Kconfig"
 source "net/ieee802154/Kconfig"
 source "net/sched/Kconfig"
 source "net/dcb/Kconfig"
+source "net/batman-adv/Kconfig"
 
 config RPS
 	boolean
diff --git a/net/Makefile b/net/Makefile
index cb7bdc1..7081ec9 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -67,3 +67,4 @@ ifeq ($(CONFIG_NET),y)
 obj-$(CONFIG_SYSCTL)		+= sysctl_net.o
 endif
 obj-$(CONFIG_WIMAX)		+= wimax/
+obj-$(CONFIG_BATMAN_ADV)	+= batman-adv/
diff --git a/net/batman-adv/CHANGELOG b/net/batman-adv/CHANGELOG
new file mode 100644
index 0000000..86450b4
--- /dev/null
+++ b/net/batman-adv/CHANGELOG
@@ -0,0 +1,63 @@
+batman-adv 2010.0.0:
+
+* support latest kernels (2.6.21 - 2.6.35)
+* further code refactoring and cleaning for coding style
+* move from procfs based configuration to sysfs
+* reorganized sequence number handling
+* limit queue lengths for batman and broadcast packets
+* many bugs (endless loop and rogue packets on shutdown, wrong tcpdump output,
+  missing frees in error situations, sleeps in atomic contexts) squashed
+
+ -- Fri, 18 Jun 2010 21:34:26 +0200
+
+batman-adv 0.2.1:
+
+* support latest kernels (2.6.20 - 2.6.33)
+* receive packets directly using skbs, remove old sockets and threads
+* fix various regressions in the vis server
+* don't disable interrupts while sending
+* replace internal logging mechanism with standard kernel logging
+* move vis formats into userland, one general format remains in the kernel
+* allow MAC address to be set, correctly initialize them
+* code refactoring and cleaning for coding style
+* many bugs (null pointers, locking, hash iterators) squashed
+
+ -- Sun, 21 Mar 2010 20:46:47 +0100
+
+batman-adv 0.2:
+
+* support latest kernels (2.6.20 - 2.6.31)
+* temporary routing loops / TTL code bug / ghost entries in originator table fixed
+* internal packet queue for packet aggregation & transmission retry (ARQ)
+  for payload broadcasts added
+* interface detection converted to event based handling to avoid timers
+* major linux coding style adjustments applied
+* all kernel version compatibility functions has been moved to compat.h
+* use random ethernet address generator from the kernel
+* /sys/module/batman_adv/version to export kernel module version
+* vis: secondary interface export for dot draw format + JSON output format added
+* many bugs (alignment issues, race conditions, deadlocks, etc) squashed
+
+ -- Sat, 07 Nov 2009 15:44:31 +0100
+
+batman-adv 0.1:
+
+* support latest kernels (2.6.20 - 2.6.28)
+* LOTS of cleanup: locking, stack usage, memory leaks
+* Change Ethertype from 0x0842 to 0x4305
+  unregistered at IEEE, if you want to sponsor an official Ethertype ($2500)
+  please contact us
+
+ -- Sun, 28 Dec 2008 00:44:31 +0100
+
+batman-adv 0.1-beta:
+
+* layer 2 meshing based on BATMAN TQ algorithm in kernelland
+* operates on any ethernet like interface
+* supports IPv4, IPv6, DHCP, etc
+* is controlled via /proc/net/batman-adv/
+* bridging via brctl is supported
+* interface watchdog (interfaces can be (de)activated dynamically)
+* offers integrated vis server which meshes/syncs with other vis servers in range
+
+ -- Mon, 05 May 2008 14:10:04 +0200
diff --git a/net/batman-adv/Kconfig b/net/batman-adv/Kconfig
new file mode 100644
index 0000000..8553f35
--- /dev/null
+++ b/net/batman-adv/Kconfig
@@ -0,0 +1,26 @@
+#
+# B.A.T.M.A.N meshing protocol
+#
+
+config BATMAN_ADV
+	tristate "B.A.T.M.A.N. Advanced Meshing Protocol"
+	depends on NET
+        default n
+	---help---
+
+        B.A.T.M.A.N. (better approach to mobile ad-hoc networking) is
+        a routing protocol for multi-hop ad-hoc mesh networks. The
+        networks may be wired or wireless. See
+        http://www.open-mesh.org/ for more information and user space
+        tools.
+
+config BATMAN_ADV_DEBUG
+	bool "B.A.T.M.A.N. debugging"
+	depends on BATMAN_ADV != n
+	---help---
+
+	  This is an option for use by developers; most people should
+	  say N here. This enables compilation of support for
+	  outputting debugging information to the kernel log. The
+	  output is controlled via the module parameter debug.
+
diff --git a/net/batman-adv/Makefile b/net/batman-adv/Makefile
new file mode 100644
index 0000000..e9817b5
--- /dev/null
+++ b/net/batman-adv/Makefile
@@ -0,0 +1,22 @@
+#
+# Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+#
+# Marek Lindner, Simon Wunderlich
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of version 2 of the GNU General Public
+# License as published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+# 02110-1301, USA
+#
+
+obj-$(CONFIG_BATMAN_ADV) += batman-adv.o
+batman-adv-objs := main.o bat_debugfs.o bat_sysfs.o send.o routing.o soft-interface.o icmp_socket.o translation-table.o bitarray.o hash.o ring_buffer.o vis.o hard-interface.o aggregation.o originator.o
diff --git a/net/batman-adv/README b/net/batman-adv/README
new file mode 100644
index 0000000..7192b7f
--- /dev/null
+++ b/net/batman-adv/README
@@ -0,0 +1,240 @@
+[state: 12-06-2010]
+
+BATMAN-ADV
+----------
+
+Batman  advanced  is  a new approach to wireless networking which
+does no longer operate on the IP basis. Unlike the batman daemon,
+which  exchanges  information  using UDP packets and sets routing
+tables, batman-advanced operates on ISO/OSI Layer 2 only and uses
+and  routes  (or  better: bridges) Ethernet Frames. It emulates a
+virtual network switch of all nodes participating.  Therefore all
+nodes  appear  to be link local, thus all higher operating proto-
+cols won't be affected by any changes within the network. You can
+run almost any protocol above batman advanced, prominent examples
+are: IPv4, IPv6, DHCP, IPX.
+
+Batman advanced was implemented as a Linux kernel driver  to  re-
+duce the overhead to a minimum. It does not depend on any (other)
+network driver, and can be used on wifi as well as ethernet  lan,
+vpn,  etc ... (anything with ethernet-style layer 2).
+
+CONFIGURATION
+-------------
+
+Load the batman-adv module into your kernel:
+
+# insmod batman-adv.ko
+
+The  module  is now waiting for activation. You must add some in-
+terfaces on which batman can operate. After  loading  the  module
+batman  advanced  will scan your systems interfaces to search for
+compatible interfaces. Once found, it will create  subfolders  in
+the /sys directories of each supported interface, e.g.
+
+# ls /sys/class/net/eth0/batman_adv/
+# iface_status  mesh_iface
+
+If an interface does not have the "batman_adv" subfolder it prob-
+ably is not supported. Not supported  interfaces  are:  loopback,
+non-ethernet and batman's own interfaces.
+
+Note:  After the module was loaded it will continuously watch for
+new interfaces to verify the compatibility. There is no  need  to
+reload the module if you plug your USB wifi adapter into your ma-
+chine after batman advanced was initially loaded.
+
+To activate a  given  interface  simply  write  "bat0"  into  its
+"mesh_iface" file inside the batman_adv subfolder:
+
+# echo bat0 > /sys/class/net/eth0/batman_adv/mesh_iface
+
+Repeat  this step for all interfaces you wish to add.  Now batman
+starts using/broadcasting on this/these interface(s).
+
+By reading the "iface_status" file you can check its status:
+
+# cat /sys/class/net/eth0/batman_adv/iface_status
+# active
+
+To deactivate an interface you have  to  write  "none"  into  its
+"mesh_iface" file:
+
+# echo none > /sys/class/net/eth0/batman_adv/mesh_iface
+
+
+All  mesh  wide  settings  can be found in batman's own interface
+folder:
+
+#  ls  /sys/class/net/bat0/mesh/
+#  aggregate_ogm   originators        transtable_global  vis_mode
+#  orig_interval   transtable_local   vis_data
+
+
+Some of the files contain all sort of status information  regard-
+ing  the  mesh  network.  For  example, you can view the table of
+originators (mesh participants) with:
+
+# cat /sys/class/net/bat0/mesh/originators
+
+Other files allow to change batman's behaviour to better fit your
+requirements.  For instance, you can check the current originator
+interval (value in milliseconds which determines how often batman
+sends its broadcast packets):
+
+# cat /sys/class/net/bat0/mesh/orig_interval
+# status: 1000
+
+and also change its value:
+
+# echo 3000 > /sys/class/net/bat0/mesh/orig_interval
+
+In very mobile scenarios, you might want to adjust the originator
+interval to a lower value. This will make the mesh  more  respon-
+sive to topology changes, but will also increase the overhead.
+
+
+USAGE
+-----
+
+To  make use of your newly created mesh, batman advanced provides
+a new interface "bat0" which you should use from this  point  on.
+All  interfaces  added  to  batman  advanced are not relevant any
+longer because batman handles them for you. Basically, one "hands
+over" the data by using the batman interface and batman will make
+sure it reaches its destination.
+
+The "bat0" interface can be used like any  other  regular  inter-
+face.  It needs an IP address which can be either statically con-
+figured or dynamically (by using DHCP or similar services):
+
+# NodeA: ifconfig bat0 192.168.0.1
+# NodeB: ifconfig bat0 192.168.0.2
+# NodeB: ping 192.168.0.1
+
+Note:  In  order to avoid problems remove all IP addresses previ-
+ously assigned to interfaces now used by batman advanced, e.g.
+
+# ifconfig eth0 0.0.0.0
+
+
+VISUALIZATION
+-------------
+
+If you want topology visualization, at least one mesh  node  must
+be configured as VIS-server:
+
+# echo "server" > /sys/class/net/bat0/mesh/vis_mode
+
+Each  node  is  either configured as "server" or as "client" (de-
+fault: "client").  Clients send their topology data to the server
+next to them, and server synchronize with other servers. If there
+is no server configured (default) within the  mesh,  no  topology
+information   will  be  transmitted.  With  these  "synchronizing
+servers", there can be 1 or more vis servers sharing the same (or
+at least very similar) data.
+
+When  configured  as  server,  you can get a topology snapshot of
+your mesh:
+
+# cat /sys/class/net/bat0/mesh/vis_data
+
+This raw output is intended to be easily parsable and convertable
+with  other tools. Have a look at the batctl README if you want a
+vis output in dot or json format for instance and how those  out-
+puts could then be visualised in an image.
+
+The raw format consists of comma separated values per entry where
+each entry is giving information about a  certain  source  inter-
+face.  Each  entry can/has to have the following values:
+-> "mac" - mac address of an originator's source interface
+           (each line begins with it)
+-> "TQ mac  value"  -  src mac's link quality towards mac address
+                       of a neighbor originator's interface which
+                       is being used for routing
+-> "HNA mac" - HNA announced by source mac
+-> "PRIMARY" - this  is a primary interface
+-> "SEC mac" - secondary mac address of source
+               (requires preceding PRIMARY)
+
+The TQ value has a range from 4 to 255 with 255 being  the  best.
+The HNA entries are showing which hosts are connected to the mesh
+via bat0 or being bridged into the mesh network.  The PRIMARY/SEC
+values are only applied on primary interfaces
+
+
+LOGGING/DEBUGGING
+-----------------
+
+All error messages, warnings and information messages are sent to
+the kernel log. Depending on your operating  system  distribution
+this  can  be read in one of a number of ways. Try using the com-
+mands: dmesg, logread, or looking in the files  /var/log/kern.log
+or  /var/log/syslog.  All  batman-adv  messages are prefixed with
+"batman-adv:" So to see just these messages try
+
+# dmesg | grep batman-adv
+
+When investigating problems with your mesh network  it  is  some-
+times  necessary  to see more detail debug messages. This must be
+enabled when compiling the batman-adv module. When building  bat-
+man-adv  as  part of kernel, use "make menuconfig" and enable the
+option "B.A.T.M.A.N. debugging".
+
+The additional debug output is by default disabled. It can be en-
+abled  either  at kernel modules load time or during run time. To
+enable debug output at module load time, add the module parameter
+debug=<value>.  <value> can take one of four values.
+
+0 - All  debug  output  disabled
+1 - Enable messages related to routing / flooding / broadcasting
+2 - Enable route or hna added / changed / deleted
+3 - Enable all messages
+
+e.g.
+
+# modprobe batman-adv debug=2
+
+will load the module and enable debug messages for when routes or
+HNAs change.
+
+The debug output can also be changed at runtime  using  the  file
+/sys/module/batman-adv/parameters/debug. e.g.
+
+# echo 2 > /sys/module/batman-adv/parameters/debug
+
+enables debug messages for when routes or HNAs
+
+The  debug  output  is sent to the kernel logs. So try dmesg, lo-
+gread, etc to see the debug messages.
+
+
+BATCTL
+------
+
+As batman advanced operates on layer 2 all hosts participating in
+the  virtual switch are completely transparent for all  protocols
+above layer 2. Therefore the common diagnosis tools do  not  work
+as  expected.  To  overcome these problems batctl was created. At
+the  moment the  batctl contains ping,  traceroute,  tcpdump  and
+interfaces to the kernel module settings.
+
+For more information, please see the manpage (man batctl).
+
+batctl is available on http://www.open-mesh.org/
+
+
+CONTACT
+-------
+
+Please send us comments, experiences, questions, anything :)
+
+IRC:            #batman   on   irc.freenode.org
+Mailing-list:   b.a.t.m.a.n@open-mesh.net (optional  subscription
+          at https://lists.open-mesh.org/mm/listinfo/b.a.t.m.a.n)
+
+You can also contact the Authors:
+
+Marek  Lindner  <lindner_marek@yahoo.de>
+Simon  Wunderlich  <siwu@hrz.tu-chemnitz.de>
+
diff --git a/net/batman-adv/aggregation.c b/net/batman-adv/aggregation.c
new file mode 100644
index 0000000..3a5c349
--- /dev/null
+++ b/net/batman-adv/aggregation.c
@@ -0,0 +1,268 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include "main.h"
+#include "aggregation.h"
+#include "send.h"
+#include "routing.h"
+
+/* calculate the size of the hna information for a given packet */
+static int hna_len(struct batman_packet *batman_packet)
+{
+	return batman_packet->num_hna * ETH_ALEN;
+}
+
+/* return true if new_packet can be aggregated with forw_packet */
+static bool can_aggregate_with(struct batman_packet *new_batman_packet,
+			       int packet_len,
+			       unsigned long send_time,
+			       bool directlink,
+			       struct batman_if *if_incoming,
+			       struct forw_packet *forw_packet)
+{
+	struct batman_packet *batman_packet =
+		(struct batman_packet *)forw_packet->packet_buff;
+	int aggregated_bytes = forw_packet->packet_len + packet_len;
+
+	/**
+	 * we can aggregate the current packet to this aggregated packet
+	 * if:
+	 *
+	 * - the send time is within our MAX_AGGREGATION_MS time
+	 * - the resulting packet wont be bigger than
+	 *   MAX_AGGREGATION_BYTES
+	 */
+
+	if (time_before(send_time, forw_packet->send_time) &&
+	    time_after_eq(send_time + msecs_to_jiffies(MAX_AGGREGATION_MS),
+					forw_packet->send_time) &&
+	    (aggregated_bytes <= MAX_AGGREGATION_BYTES)) {
+
+		/**
+		 * check aggregation compatibility
+		 * -> direct link packets are broadcasted on
+		 *    their interface only
+		 * -> aggregate packet if the current packet is
+		 *    a "global" packet as well as the base
+		 *    packet
+		 */
+
+		/* packets without direct link flag and high TTL
+		 * are flooded through the net  */
+		if ((!directlink) &&
+		    (!(batman_packet->flags & DIRECTLINK)) &&
+		    (batman_packet->ttl != 1) &&
+
+		    /* own packets originating non-primary
+		     * interfaces leave only that interface */
+		    ((!forw_packet->own) ||
+		     (forw_packet->if_incoming->if_num == 0)))
+			return true;
+
+		/* if the incoming packet is sent via this one
+		 * interface only - we still can aggregate */
+		if ((directlink) &&
+		    (new_batman_packet->ttl == 1) &&
+		    (forw_packet->if_incoming == if_incoming) &&
+
+		    /* packets from direct neighbors or
+		     * own secondary interface packets
+		     * (= secondary interface packets in general) */
+		    (batman_packet->flags & DIRECTLINK ||
+		     (forw_packet->own &&
+		      forw_packet->if_incoming->if_num != 0)))
+			return true;
+	}
+
+	return false;
+}
+
+#define atomic_dec_not_zero(v)          atomic_add_unless((v), -1, 0)
+/* create a new aggregated packet and add this packet to it */
+static void new_aggregated_packet(unsigned char *packet_buff,
+			   int packet_len,
+			   unsigned long send_time,
+			   bool direct_link,
+			   struct batman_if *if_incoming,
+			   int own_packet)
+{
+	struct forw_packet *forw_packet_aggr;
+	unsigned long flags;
+
+	/* own packet should always be scheduled */
+	if (!own_packet) {
+		if (!atomic_dec_not_zero(&batman_queue_left)) {
+			bat_dbg(DBG_BATMAN, "batman packet queue full\n");
+			return;
+		}
+	}
+
+	forw_packet_aggr = kmalloc(sizeof(struct forw_packet), GFP_ATOMIC);
+	if (!forw_packet_aggr) {
+		if (!own_packet)
+			atomic_inc(&batman_queue_left);
+		return;
+	}
+
+	forw_packet_aggr->packet_buff = kmalloc(MAX_AGGREGATION_BYTES,
+						GFP_ATOMIC);
+	if (!forw_packet_aggr->packet_buff) {
+		if (!own_packet)
+			atomic_inc(&batman_queue_left);
+		kfree(forw_packet_aggr);
+		return;
+	}
+
+	INIT_HLIST_NODE(&forw_packet_aggr->list);
+
+	forw_packet_aggr->packet_len = packet_len;
+	memcpy(forw_packet_aggr->packet_buff,
+	       packet_buff,
+	       forw_packet_aggr->packet_len);
+
+	forw_packet_aggr->skb = NULL;
+	forw_packet_aggr->own = own_packet;
+	forw_packet_aggr->if_incoming = if_incoming;
+	forw_packet_aggr->num_packets = 0;
+	forw_packet_aggr->direct_link_flags = 0;
+	forw_packet_aggr->send_time = send_time;
+
+	/* save packet direct link flag status */
+	if (direct_link)
+		forw_packet_aggr->direct_link_flags |= 1;
+
+	/* add new packet to packet list */
+	spin_lock_irqsave(&forw_bat_list_lock, flags);
+	hlist_add_head(&forw_packet_aggr->list, &forw_bat_list);
+	spin_unlock_irqrestore(&forw_bat_list_lock, flags);
+
+	/* start timer for this packet */
+	INIT_DELAYED_WORK(&forw_packet_aggr->delayed_work,
+			  send_outstanding_bat_packet);
+	queue_delayed_work(bat_event_workqueue,
+			   &forw_packet_aggr->delayed_work,
+			   send_time - jiffies);
+}
+
+/* aggregate a new packet into the existing aggregation */
+static void aggregate(struct forw_packet *forw_packet_aggr,
+		      unsigned char *packet_buff,
+		      int packet_len,
+		      bool direct_link)
+{
+	memcpy((forw_packet_aggr->packet_buff + forw_packet_aggr->packet_len),
+	       packet_buff, packet_len);
+	forw_packet_aggr->packet_len += packet_len;
+	forw_packet_aggr->num_packets++;
+
+	/* save packet direct link flag status */
+	if (direct_link)
+		forw_packet_aggr->direct_link_flags |=
+			(1 << forw_packet_aggr->num_packets);
+}
+
+void add_bat_packet_to_list(struct bat_priv *bat_priv,
+			    unsigned char *packet_buff, int packet_len,
+			    struct batman_if *if_incoming, char own_packet,
+			    unsigned long send_time)
+{
+	/**
+	 * _aggr -> pointer to the packet we want to aggregate with
+	 * _pos -> pointer to the position in the queue
+	 */
+	struct forw_packet *forw_packet_aggr = NULL, *forw_packet_pos = NULL;
+	struct hlist_node *tmp_node;
+	struct batman_packet *batman_packet =
+		(struct batman_packet *)packet_buff;
+	bool direct_link = batman_packet->flags & DIRECTLINK ? 1 : 0;
+	unsigned long flags;
+
+	/* find position for the packet in the forward queue */
+	spin_lock_irqsave(&forw_bat_list_lock, flags);
+	/* own packets are not to be aggregated */
+	if ((atomic_read(&bat_priv->aggregation_enabled)) && (!own_packet)) {
+		hlist_for_each_entry(forw_packet_pos, tmp_node, &forw_bat_list,
+				     list) {
+			if (can_aggregate_with(batman_packet,
+					       packet_len,
+					       send_time,
+					       direct_link,
+					       if_incoming,
+					       forw_packet_pos)) {
+				forw_packet_aggr = forw_packet_pos;
+				break;
+			}
+		}
+	}
+
+	/* nothing to aggregate with - either aggregation disabled or no
+	 * suitable aggregation packet found */
+	if (forw_packet_aggr == NULL) {
+		/* the following section can run without the lock */
+		spin_unlock_irqrestore(&forw_bat_list_lock, flags);
+
+		/**
+		 * if we could not aggregate this packet with one of the others
+		 * we hold it back for a while, so that it might be aggregated
+		 * later on
+		 */
+		if ((!own_packet) &&
+		    (atomic_read(&bat_priv->aggregation_enabled)))
+			send_time += msecs_to_jiffies(MAX_AGGREGATION_MS);
+
+		new_aggregated_packet(packet_buff, packet_len,
+				      send_time, direct_link,
+				      if_incoming, own_packet);
+	} else {
+		aggregate(forw_packet_aggr,
+			  packet_buff, packet_len,
+			  direct_link);
+		spin_unlock_irqrestore(&forw_bat_list_lock, flags);
+	}
+}
+
+/* unpack the aggregated packets and process them one by one */
+void receive_aggr_bat_packet(struct ethhdr *ethhdr, unsigned char *packet_buff,
+			     int packet_len, struct batman_if *if_incoming)
+{
+	struct batman_packet *batman_packet;
+	int buff_pos = 0;
+	unsigned char *hna_buff;
+
+	batman_packet = (struct batman_packet *)packet_buff;
+
+	while (aggregated_packet(buff_pos, packet_len,
+				 batman_packet->num_hna)) {
+
+		/* network to host order for our 32bit seqno, and the
+		   orig_interval. */
+		batman_packet->seqno = ntohl(batman_packet->seqno);
+
+		hna_buff = packet_buff + buff_pos + BAT_PACKET_LEN;
+		receive_bat_packet(ethhdr, batman_packet,
+				   hna_buff, hna_len(batman_packet),
+				   if_incoming);
+
+		buff_pos += BAT_PACKET_LEN + hna_len(batman_packet);
+		batman_packet = (struct batman_packet *)
+			(packet_buff + buff_pos);
+	}
+}
diff --git a/net/batman-adv/aggregation.h b/net/batman-adv/aggregation.h
new file mode 100644
index 0000000..71a91b3
--- /dev/null
+++ b/net/batman-adv/aggregation.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_AGGREGATION_H_
+#define _NET_BATMAN_ADV_AGGREGATION_H_
+
+#include "main.h"
+
+/* is there another aggregated packet here? */
+static inline int aggregated_packet(int buff_pos, int packet_len, int num_hna)
+{
+	int next_buff_pos = buff_pos + BAT_PACKET_LEN + (num_hna * ETH_ALEN);
+
+	return (next_buff_pos <= packet_len) &&
+		(next_buff_pos <= MAX_AGGREGATION_BYTES);
+}
+
+void add_bat_packet_to_list(struct bat_priv *bat_priv,
+			    unsigned char *packet_buff, int packet_len,
+			    struct batman_if *if_incoming, char own_packet,
+			    unsigned long send_time);
+void receive_aggr_bat_packet(struct ethhdr *ethhdr, unsigned char *packet_buff,
+			     int packet_len, struct batman_if *if_incoming);
+
+#endif /* _NET_BATMAN_ADV_AGGREGATION_H_ */
diff --git a/net/batman-adv/bat_debugfs.c b/net/batman-adv/bat_debugfs.c
new file mode 100644
index 0000000..fafca9f
--- /dev/null
+++ b/net/batman-adv/bat_debugfs.c
@@ -0,0 +1,150 @@
+/*
+ * Copyright (C) 2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include <linux/debugfs.h>
+
+#include "main.h"
+#include "bat_debugfs.h"
+#include "translation-table.h"
+#include "originator.h"
+#include "hard-interface.h"
+#include "vis.h"
+#include "icmp_socket.h"
+
+static struct dentry *bat_debugfs;
+
+static int originators_open(struct inode *inode, struct file *file)
+{
+	struct net_device *net_dev = (struct net_device *)inode->i_private;
+	return single_open(file, orig_seq_print_text, net_dev);
+}
+
+static int transtable_global_open(struct inode *inode, struct file *file)
+{
+	struct net_device *net_dev = (struct net_device *)inode->i_private;
+	return single_open(file, hna_global_seq_print_text, net_dev);
+}
+
+static int transtable_local_open(struct inode *inode, struct file *file)
+{
+	struct net_device *net_dev = (struct net_device *)inode->i_private;
+	return single_open(file, hna_local_seq_print_text, net_dev);
+}
+
+static int vis_data_open(struct inode *inode, struct file *file)
+{
+	struct net_device *net_dev = (struct net_device *)inode->i_private;
+	return single_open(file, vis_seq_print_text, net_dev);
+}
+
+struct bat_debuginfo {
+	struct attribute attr;
+	const struct file_operations fops;
+};
+
+#define BAT_DEBUGINFO(_name, _mode, _open)	\
+struct bat_debuginfo bat_debuginfo_##_name = {	\
+	.attr = { .name = __stringify(_name),	\
+		  .mode = _mode, },		\
+	.fops = { .owner = THIS_MODULE,		\
+		  .open = _open,		\
+		  .read	= seq_read,		\
+		  .llseek = seq_lseek,		\
+		  .release = single_release,	\
+		}				\
+};
+
+static BAT_DEBUGINFO(originators, S_IRUGO, originators_open);
+static BAT_DEBUGINFO(transtable_global, S_IRUGO, transtable_global_open);
+static BAT_DEBUGINFO(transtable_local, S_IRUGO, transtable_local_open);
+static BAT_DEBUGINFO(vis_data, S_IRUGO, vis_data_open);
+
+static struct bat_debuginfo *mesh_debuginfos[] = {
+	&bat_debuginfo_originators,
+	&bat_debuginfo_transtable_global,
+	&bat_debuginfo_transtable_local,
+	&bat_debuginfo_vis_data,
+	NULL,
+};
+
+void debugfs_init(void)
+{
+	bat_debugfs = debugfs_create_dir(DEBUGFS_BAT_SUBDIR, NULL);
+	if (bat_debugfs == ERR_PTR(-ENODEV))
+		bat_debugfs = NULL;
+}
+
+void debugfs_destroy(void)
+{
+	if (bat_debugfs) {
+		debugfs_remove_recursive(bat_debugfs);
+		bat_debugfs = NULL;
+	}
+}
+
+int debugfs_add_meshif(struct net_device *dev)
+{
+	struct bat_priv *bat_priv = netdev_priv(dev);
+	struct bat_debuginfo **bat_debug;
+	struct dentry *file;
+
+	if (!bat_debugfs)
+		goto out;
+
+	bat_priv->debug_dir = debugfs_create_dir(dev->name, bat_debugfs);
+	if (!bat_priv->debug_dir)
+		goto out;
+
+	bat_socket_setup(bat_priv);
+
+	for (bat_debug = mesh_debuginfos; *bat_debug; ++bat_debug) {
+		file = debugfs_create_file(((*bat_debug)->attr).name,
+					  S_IFREG | ((*bat_debug)->attr).mode,
+					  bat_priv->debug_dir,
+					  dev, &(*bat_debug)->fops);
+		if (!file) {
+			printk(KERN_ERR "batman-adv:Can't add debugfs file: "
+			       "%s/%s\n", dev->name, ((*bat_debug)->attr).name);
+			goto rem_attr;
+		}
+	}
+
+	return 0;
+rem_attr:
+	debugfs_remove_recursive(bat_priv->debug_dir);
+	bat_priv->debug_dir = NULL;
+out:
+#ifdef CONFIG_DEBUG_FS
+	return -ENOMEM;
+#else
+	return 0;
+#endif /* CONFIG_DEBUG_FS */
+}
+
+void debugfs_del_meshif(struct net_device *dev)
+{
+	struct bat_priv *bat_priv = netdev_priv(dev);
+
+	if (bat_debugfs) {
+		debugfs_remove_recursive(bat_priv->debug_dir);
+		bat_priv->debug_dir = NULL;
+	}
+}
diff --git a/net/batman-adv/bat_debugfs.h b/net/batman-adv/bat_debugfs.h
new file mode 100644
index 0000000..72df532
--- /dev/null
+++ b/net/batman-adv/bat_debugfs.h
@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+
+#ifndef _NET_BATMAN_ADV_DEBUGFS_H_
+#define _NET_BATMAN_ADV_DEBUGFS_H_
+
+#define DEBUGFS_BAT_SUBDIR "batman_adv"
+
+void debugfs_init(void);
+void debugfs_destroy(void);
+int debugfs_add_meshif(struct net_device *dev);
+void debugfs_del_meshif(struct net_device *dev);
+
+#endif /* _NET_BATMAN_ADV_DEBUGFS_H_ */
diff --git a/net/batman-adv/bat_sysfs.c b/net/batman-adv/bat_sysfs.c
new file mode 100644
index 0000000..4e9c71d
--- /dev/null
+++ b/net/batman-adv/bat_sysfs.c
@@ -0,0 +1,433 @@
+/*
+ * Copyright (C) 2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include "main.h"
+#include "bat_sysfs.h"
+#include "translation-table.h"
+#include "originator.h"
+#include "hard-interface.h"
+#include "vis.h"
+
+#define to_dev(obj)     container_of(obj, struct device, kobj)
+
+#define BAT_ATTR(_name, _mode, _show, _store)	\
+struct bat_attribute bat_attr_##_name = {	\
+	.attr = {.name = __stringify(_name),	\
+		 .mode = _mode },		\
+	.show   = _show,			\
+	.store  = _store,			\
+};
+
+static ssize_t show_aggr_ogms(struct kobject *kobj, struct attribute *attr,
+			     char *buff)
+{
+	struct device *dev = to_dev(kobj->parent);
+	struct bat_priv *bat_priv = netdev_priv(to_net_dev(dev));
+	int aggr_status = atomic_read(&bat_priv->aggregation_enabled);
+
+	return sprintf(buff, "%s\n",
+		       aggr_status == 0 ? "disabled" : "enabled");
+}
+
+static ssize_t store_aggr_ogms(struct kobject *kobj, struct attribute *attr,
+			      char *buff, size_t count)
+{
+	struct device *dev = to_dev(kobj->parent);
+	struct net_device *net_dev = to_net_dev(dev);
+	struct bat_priv *bat_priv = netdev_priv(net_dev);
+	int aggr_tmp = -1;
+
+	if (((count == 2) && (buff[0] == '1')) ||
+	    (strncmp(buff, "enable", 6) == 0))
+		aggr_tmp = 1;
+
+	if (((count == 2) && (buff[0] == '0')) ||
+	    (strncmp(buff, "disable", 7) == 0))
+		aggr_tmp = 0;
+
+	if (aggr_tmp < 0) {
+		if (buff[count - 1] == '\n')
+			buff[count - 1] = '\0';
+
+		printk(KERN_INFO "batman-adv:Invalid parameter for 'aggregate OGM' setting on mesh %s received: %s\n",
+		       net_dev->name, buff);
+		return -EINVAL;
+	}
+
+	if (atomic_read(&bat_priv->aggregation_enabled) == aggr_tmp)
+		return count;
+
+	printk(KERN_INFO "batman-adv:Changing aggregation from: %s to: %s on mesh: %s\n",
+	       atomic_read(&bat_priv->aggregation_enabled) == 1 ?
+	       "enabled" : "disabled", aggr_tmp == 1 ? "enabled" : "disabled",
+	       net_dev->name);
+
+	atomic_set(&bat_priv->aggregation_enabled, (unsigned)aggr_tmp);
+	return count;
+}
+
+static ssize_t show_bond(struct kobject *kobj, struct attribute *attr,
+			     char *buff)
+{
+	struct device *dev = to_dev(kobj->parent);
+	struct bat_priv *bat_priv = netdev_priv(to_net_dev(dev));
+	int bond_status = atomic_read(&bat_priv->bonding_enabled);
+
+	return sprintf(buff, "%s\n",
+		       bond_status == 0 ? "disabled" : "enabled");
+}
+
+static ssize_t store_bond(struct kobject *kobj, struct attribute *attr,
+			  char *buff, size_t count)
+{
+	struct device *dev = to_dev(kobj->parent);
+	struct net_device *net_dev = to_net_dev(dev);
+	struct bat_priv *bat_priv = netdev_priv(net_dev);
+	int bonding_enabled_tmp = -1;
+
+	if (((count == 2) && (buff[0] == '1')) ||
+	    (strncmp(buff, "enable", 6) == 0))
+		bonding_enabled_tmp = 1;
+
+	if (((count == 2) && (buff[0] == '0')) ||
+	    (strncmp(buff, "disable", 7) == 0))
+		bonding_enabled_tmp = 0;
+
+	if (bonding_enabled_tmp < 0) {
+		if (buff[count - 1] == '\n')
+			buff[count - 1] = '\0';
+
+		printk(KERN_ERR "batman-adv:Invalid parameter for 'bonding' setting on mesh %s received: %s\n",
+		       net_dev->name, buff);
+		return -EINVAL;
+	}
+
+	if (atomic_read(&bat_priv->bonding_enabled) == bonding_enabled_tmp)
+		return count;
+
+	printk(KERN_INFO "batman-adv:Changing bonding from: %s to: %s on mesh: %s\n",
+	       atomic_read(&bat_priv->bonding_enabled) == 1 ?
+	       "enabled" : "disabled",
+	       bonding_enabled_tmp == 1 ? "enabled" : "disabled",
+	       net_dev->name);
+
+	atomic_set(&bat_priv->bonding_enabled, (unsigned)bonding_enabled_tmp);
+	return count;
+}
+
+static ssize_t show_vis_mode(struct kobject *kobj, struct attribute *attr,
+			     char *buff)
+{
+	struct device *dev = to_dev(kobj->parent);
+	struct bat_priv *bat_priv = netdev_priv(to_net_dev(dev));
+	int vis_mode = atomic_read(&bat_priv->vis_mode);
+
+	return sprintf(buff, "%s\n",
+		       vis_mode == VIS_TYPE_CLIENT_UPDATE ?
+							"client" : "server");
+}
+
+static ssize_t store_vis_mode(struct kobject *kobj, struct attribute *attr,
+			      char *buff, size_t count)
+{
+	struct device *dev = to_dev(kobj->parent);
+	struct net_device *net_dev = to_net_dev(dev);
+	struct bat_priv *bat_priv = netdev_priv(net_dev);
+	unsigned long val;
+	int ret, vis_mode_tmp = -1;
+
+	ret = strict_strtoul(buff, 10, &val);
+
+	if (((count == 2) && (!ret) && (val == VIS_TYPE_CLIENT_UPDATE)) ||
+	    (strncmp(buff, "client", 6) == 0) ||
+	    (strncmp(buff, "off", 3) == 0))
+		vis_mode_tmp = VIS_TYPE_CLIENT_UPDATE;
+
+	if (((count == 2) && (!ret) && (val == VIS_TYPE_SERVER_SYNC)) ||
+	    (strncmp(buff, "server", 6) == 0))
+		vis_mode_tmp = VIS_TYPE_SERVER_SYNC;
+
+	if (vis_mode_tmp < 0) {
+		if (buff[count - 1] == '\n')
+			buff[count - 1] = '\0';
+
+		printk(KERN_INFO "batman-adv:Invalid parameter for 'vis mode' setting on mesh %s received: %s\n",
+		       net_dev->name, buff);
+		return -EINVAL;
+	}
+
+	if (atomic_read(&bat_priv->vis_mode) == vis_mode_tmp)
+		return count;
+
+	printk(KERN_INFO "batman-adv:Changing vis mode from: %s to: %s on mesh: %s\n",
+	       atomic_read(&bat_priv->vis_mode) == VIS_TYPE_CLIENT_UPDATE ?
+	       "client" : "server", vis_mode_tmp == VIS_TYPE_CLIENT_UPDATE ?
+	       "client" : "server", net_dev->name);
+
+	atomic_set(&bat_priv->vis_mode, (unsigned)vis_mode_tmp);
+	return count;
+}
+
+static ssize_t show_orig_interval(struct kobject *kobj, struct attribute *attr,
+				 char *buff)
+{
+	struct device *dev = to_dev(kobj->parent);
+	struct bat_priv *bat_priv = netdev_priv(to_net_dev(dev));
+
+	return sprintf(buff, "%i\n",
+		       atomic_read(&bat_priv->orig_interval));
+}
+
+static ssize_t store_orig_interval(struct kobject *kobj, struct attribute *attr,
+				  char *buff, size_t count)
+{
+	struct device *dev = to_dev(kobj->parent);
+	struct net_device *net_dev = to_net_dev(dev);
+	struct bat_priv *bat_priv = netdev_priv(net_dev);
+	unsigned long orig_interval_tmp;
+	int ret;
+
+	ret = strict_strtoul(buff, 10, &orig_interval_tmp);
+	if (ret) {
+		printk(KERN_INFO "batman-adv:Invalid parameter for 'orig_interval' setting on mesh %s received: %s\n",
+		       net_dev->name, buff);
+		return -EINVAL;
+	}
+
+	if (orig_interval_tmp < JITTER * 2) {
+		printk(KERN_INFO "batman-adv:New originator interval too small: %li (min: %i)\n",
+		       orig_interval_tmp, JITTER * 2);
+		return -EINVAL;
+	}
+
+	if (atomic_read(&bat_priv->orig_interval) == orig_interval_tmp)
+		return count;
+
+	printk(KERN_INFO "batman-adv:Changing originator interval from: %i to: %li on mesh: %s\n",
+	       atomic_read(&bat_priv->orig_interval),
+	       orig_interval_tmp, net_dev->name);
+
+	atomic_set(&bat_priv->orig_interval, orig_interval_tmp);
+	return count;
+}
+
+static BAT_ATTR(aggregated_ogms, S_IRUGO | S_IWUSR,
+		show_aggr_ogms, store_aggr_ogms);
+static BAT_ATTR(bonding, S_IRUGO | S_IWUSR, show_bond, store_bond);
+static BAT_ATTR(vis_mode, S_IRUGO | S_IWUSR, show_vis_mode, store_vis_mode);
+static BAT_ATTR(orig_interval, S_IRUGO | S_IWUSR,
+		show_orig_interval, store_orig_interval);
+
+static struct bat_attribute *mesh_attrs[] = {
+	&bat_attr_aggregated_ogms,
+	&bat_attr_bonding,
+	&bat_attr_vis_mode,
+	&bat_attr_orig_interval,
+	NULL,
+};
+
+int sysfs_add_meshif(struct net_device *dev)
+{
+	struct kobject *batif_kobject = &dev->dev.kobj;
+	struct bat_priv *bat_priv = netdev_priv(dev);
+	struct bat_attribute **bat_attr;
+	int err;
+
+	/* FIXME: should be done in the general mesh setup
+		  routine as soon as we have it */
+	atomic_set(&bat_priv->aggregation_enabled, 1);
+	atomic_set(&bat_priv->bonding_enabled, 0);
+	atomic_set(&bat_priv->vis_mode, VIS_TYPE_CLIENT_UPDATE);
+	atomic_set(&bat_priv->orig_interval, 1000);
+	bat_priv->primary_if = NULL;
+	bat_priv->num_ifaces = 0;
+
+	bat_priv->mesh_obj = kobject_create_and_add(SYSFS_IF_MESH_SUBDIR,
+						    batif_kobject);
+	if (!bat_priv->mesh_obj) {
+		printk(KERN_ERR "batman-adv:Can't add sysfs directory: %s/%s\n",
+		       dev->name, SYSFS_IF_MESH_SUBDIR);
+		goto out;
+	}
+
+	for (bat_attr = mesh_attrs; *bat_attr; ++bat_attr) {
+		err = sysfs_create_file(bat_priv->mesh_obj,
+					&((*bat_attr)->attr));
+		if (err) {
+			printk(KERN_ERR "batman-adv:Can't add sysfs file: %s/%s/%s\n",
+			       dev->name, SYSFS_IF_MESH_SUBDIR,
+			       ((*bat_attr)->attr).name);
+			goto rem_attr;
+		}
+	}
+
+	return 0;
+
+rem_attr:
+	for (bat_attr = mesh_attrs; *bat_attr; ++bat_attr)
+		sysfs_remove_file(bat_priv->mesh_obj, &((*bat_attr)->attr));
+
+	kobject_put(bat_priv->mesh_obj);
+	bat_priv->mesh_obj = NULL;
+out:
+	return -ENOMEM;
+}
+
+void sysfs_del_meshif(struct net_device *dev)
+{
+	struct bat_priv *bat_priv = netdev_priv(dev);
+	struct bat_attribute **bat_attr;
+
+	for (bat_attr = mesh_attrs; *bat_attr; ++bat_attr)
+		sysfs_remove_file(bat_priv->mesh_obj, &((*bat_attr)->attr));
+
+	kobject_put(bat_priv->mesh_obj);
+	bat_priv->mesh_obj = NULL;
+}
+
+static ssize_t show_mesh_iface(struct kobject *kobj, struct attribute *attr,
+			       char *buff)
+{
+	struct device *dev = to_dev(kobj->parent);
+	struct net_device *net_dev = to_net_dev(dev);
+	struct batman_if *batman_if = get_batman_if_by_netdev(net_dev);
+
+	if (!batman_if)
+		return 0;
+
+	return sprintf(buff, "%s\n",
+		       batman_if->if_status == IF_NOT_IN_USE ?
+							"none" : "bat0");
+}
+
+static ssize_t store_mesh_iface(struct kobject *kobj, struct attribute *attr,
+				char *buff, size_t count)
+{
+	struct device *dev = to_dev(kobj->parent);
+	struct net_device *net_dev = to_net_dev(dev);
+	struct batman_if *batman_if = get_batman_if_by_netdev(net_dev);
+	int status_tmp = -1;
+
+	if (!batman_if)
+		return count;
+
+	if (strncmp(buff, "none", 4) == 0)
+		status_tmp = IF_NOT_IN_USE;
+
+	if (strncmp(buff, "bat0", 4) == 0)
+		status_tmp = IF_I_WANT_YOU;
+
+	if (status_tmp < 0) {
+		if (buff[count - 1] == '\n')
+			buff[count - 1] = '\0';
+
+		printk(KERN_ERR "batman-adv:Invalid parameter for 'mesh_iface' setting received: %s\n",
+		       buff);
+		return -EINVAL;
+	}
+
+	if ((batman_if->if_status == status_tmp) ||
+	    ((status_tmp == IF_I_WANT_YOU) &&
+	     (batman_if->if_status != IF_NOT_IN_USE)))
+		return count;
+
+	if (status_tmp == IF_I_WANT_YOU)
+		status_tmp = hardif_enable_interface(batman_if);
+	else
+		hardif_disable_interface(batman_if);
+
+	return (status_tmp < 0 ? status_tmp : count);
+}
+
+static ssize_t show_iface_status(struct kobject *kobj, struct attribute *attr,
+				 char *buff)
+{
+	struct device *dev = to_dev(kobj->parent);
+	struct net_device *net_dev = to_net_dev(dev);
+	struct batman_if *batman_if = get_batman_if_by_netdev(net_dev);
+
+	if (!batman_if)
+		return 0;
+
+	switch (batman_if->if_status) {
+	case IF_TO_BE_REMOVED:
+		return sprintf(buff, "disabling\n");
+	case IF_INACTIVE:
+		return sprintf(buff, "inactive\n");
+	case IF_ACTIVE:
+		return sprintf(buff, "active\n");
+	case IF_TO_BE_ACTIVATED:
+		return sprintf(buff, "enabling\n");
+	case IF_NOT_IN_USE:
+	default:
+		return sprintf(buff, "not in use\n");
+	}
+}
+
+static BAT_ATTR(mesh_iface, S_IRUGO | S_IWUSR,
+		show_mesh_iface, store_mesh_iface);
+static BAT_ATTR(iface_status, S_IRUGO, show_iface_status, NULL);
+
+static struct bat_attribute *batman_attrs[] = {
+	&bat_attr_mesh_iface,
+	&bat_attr_iface_status,
+	NULL,
+};
+
+int sysfs_add_hardif(struct kobject **hardif_obj, struct net_device *dev)
+{
+	struct kobject *hardif_kobject = &dev->dev.kobj;
+	struct bat_attribute **bat_attr;
+	int err;
+
+	*hardif_obj = kobject_create_and_add(SYSFS_IF_BAT_SUBDIR,
+						    hardif_kobject);
+
+	if (!*hardif_obj) {
+		printk(KERN_ERR "batman-adv:Can't add sysfs directory: %s/%s\n",
+		       dev->name, SYSFS_IF_BAT_SUBDIR);
+		goto out;
+	}
+
+	for (bat_attr = batman_attrs; *bat_attr; ++bat_attr) {
+		err = sysfs_create_file(*hardif_obj, &((*bat_attr)->attr));
+		if (err) {
+			printk(KERN_ERR "batman-adv:Can't add sysfs file: %s/%s/%s\n",
+			       dev->name, SYSFS_IF_BAT_SUBDIR,
+			       ((*bat_attr)->attr).name);
+			goto rem_attr;
+		}
+	}
+
+	return 0;
+
+rem_attr:
+	for (bat_attr = batman_attrs; *bat_attr; ++bat_attr)
+		sysfs_remove_file(*hardif_obj, &((*bat_attr)->attr));
+out:
+	return -ENOMEM;
+}
+
+void sysfs_del_hardif(struct kobject **hardif_obj)
+{
+	kobject_put(*hardif_obj);
+	*hardif_obj = NULL;
+}
diff --git a/net/batman-adv/bat_sysfs.h b/net/batman-adv/bat_sysfs.h
new file mode 100644
index 0000000..7f186c0
--- /dev/null
+++ b/net/batman-adv/bat_sysfs.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (C) 2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+
+#ifndef _NET_BATMAN_ADV_SYSFS_H_
+#define _NET_BATMAN_ADV_SYSFS_H_
+
+#define SYSFS_IF_MESH_SUBDIR "mesh"
+#define SYSFS_IF_BAT_SUBDIR "batman_adv"
+
+struct bat_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct kobject *kobj, struct attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct kobject *kobj, struct attribute *attr,
+			 char *buf, size_t count);
+};
+
+int sysfs_add_meshif(struct net_device *dev);
+void sysfs_del_meshif(struct net_device *dev);
+int sysfs_add_hardif(struct kobject **hardif_obj, struct net_device *dev);
+void sysfs_del_hardif(struct kobject **hardif_obj);
+
+#endif /* _NET_BATMAN_ADV_SYSFS_H_ */
diff --git a/net/batman-adv/bitarray.c b/net/batman-adv/bitarray.c
new file mode 100644
index 0000000..c10fe03
--- /dev/null
+++ b/net/batman-adv/bitarray.c
@@ -0,0 +1,204 @@
+/*
+ * Copyright (C) 2006-2010 B.A.T.M.A.N. contributors:
+ *
+ * Simon Wunderlich, Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include "main.h"
+#include "bitarray.h"
+
+/* returns true if the corresponding bit in the given seq_bits indicates true
+ * and curr_seqno is within range of last_seqno */
+uint8_t get_bit_status(TYPE_OF_WORD *seq_bits, uint32_t last_seqno,
+		       uint32_t curr_seqno)
+{
+	int32_t diff, word_offset, word_num;
+
+	diff = last_seqno - curr_seqno;
+	if (diff < 0 || diff >= TQ_LOCAL_WINDOW_SIZE) {
+		return 0;
+	} else {
+		/* which word */
+		word_num = (last_seqno - curr_seqno) / WORD_BIT_SIZE;
+		/* which position in the selected word */
+		word_offset = (last_seqno - curr_seqno) % WORD_BIT_SIZE;
+
+		if (seq_bits[word_num] & 1 << word_offset)
+			return 1;
+		else
+			return 0;
+	}
+}
+
+/* turn corresponding bit on, so we can remember that we got the packet */
+void bit_mark(TYPE_OF_WORD *seq_bits, int32_t n)
+{
+	int32_t word_offset, word_num;
+
+	/* if too old, just drop it */
+	if (n < 0 || n >= TQ_LOCAL_WINDOW_SIZE)
+		return;
+
+	/* which word */
+	word_num = n / WORD_BIT_SIZE;
+	/* which position in the selected word */
+	word_offset = n % WORD_BIT_SIZE;
+
+	seq_bits[word_num] |= 1 << word_offset;	/* turn the position on */
+}
+
+/* shift the packet array by n places. */
+static void bit_shift(TYPE_OF_WORD *seq_bits, int32_t n)
+{
+	int32_t word_offset, word_num;
+	int32_t i;
+
+	if (n <= 0 || n >= TQ_LOCAL_WINDOW_SIZE)
+		return;
+
+	word_offset = n % WORD_BIT_SIZE;/* shift how much inside each word */
+	word_num = n / WORD_BIT_SIZE;	/* shift over how much (full) words */
+
+	for (i = NUM_WORDS - 1; i > word_num; i--) {
+		/* going from old to new, so we don't overwrite the data we copy
+		 * from.
+		 *
+		 * left is high, right is low: FEDC BA98 7654 3210
+		 *					  ^^ ^^
+		 *			       vvvv
+		 * ^^^^ = from, vvvvv =to, we'd have word_num==1 and
+		 * word_offset==WORD_BIT_SIZE/2 ????? in this example.
+		 * (=24 bits)
+		 *
+		 * our desired output would be: 9876 5432 1000 0000
+		 * */
+
+		seq_bits[i] =
+			(seq_bits[i - word_num] << word_offset) +
+			/* take the lower port from the left half, shift it left
+			 * to its final position */
+			(seq_bits[i - word_num - 1] >>
+			 (WORD_BIT_SIZE-word_offset));
+		/* and the upper part of the right half and shift it left to
+		 * it's position */
+		/* for our example that would be: word[0] = 9800 + 0076 =
+		 * 9876 */
+	}
+	/* now for our last word, i==word_num, we only have the it's "left"
+	 * half. that's the 1000 word in our example.*/
+
+	seq_bits[i] = (seq_bits[i - word_num] << word_offset);
+
+	/* pad the rest with 0, if there is anything */
+	i--;
+
+	for (; i >= 0; i--)
+		seq_bits[i] = 0;
+}
+
+static void bit_reset_window(TYPE_OF_WORD *seq_bits)
+{
+	int i;
+	for (i = 0; i < NUM_WORDS; i++)
+		seq_bits[i] = 0;
+}
+
+
+/* receive and process one packet within the sequence number window.
+ *
+ * returns:
+ *  1 if the window was moved (either new or very old)
+ *  0 if the window was not moved/shifted.
+ */
+char bit_get_packet(TYPE_OF_WORD *seq_bits, int32_t seq_num_diff,
+		    int8_t set_mark)
+{
+	/* sequence number is slightly older. We already got a sequence number
+	 * higher than this one, so we just mark it. */
+
+	if ((seq_num_diff <= 0) && (seq_num_diff > -TQ_LOCAL_WINDOW_SIZE)) {
+		if (set_mark)
+			bit_mark(seq_bits, -seq_num_diff);
+		return 0;
+	}
+
+	/* sequence number is slightly newer, so we shift the window and
+	 * set the mark if required */
+
+	if ((seq_num_diff > 0) && (seq_num_diff < TQ_LOCAL_WINDOW_SIZE)) {
+		bit_shift(seq_bits, seq_num_diff);
+
+		if (set_mark)
+			bit_mark(seq_bits, 0);
+		return 1;
+	}
+
+	/* sequence number is much newer, probably missed a lot of packets */
+
+	if ((seq_num_diff >= TQ_LOCAL_WINDOW_SIZE)
+		|| (seq_num_diff < EXPECTED_SEQNO_RANGE)) {
+		bat_dbg(DBG_BATMAN,
+			"We missed a lot of packets (%i) !\n",
+			seq_num_diff - 1);
+		bit_reset_window(seq_bits);
+		if (set_mark)
+			bit_mark(seq_bits, 0);
+		return 1;
+	}
+
+	/* received a much older packet. The other host either restarted
+	 * or the old packet got delayed somewhere in the network. The
+	 * packet should be dropped without calling this function if the
+	 * seqno window is protected. */
+
+	if ((seq_num_diff <= -TQ_LOCAL_WINDOW_SIZE)
+		|| (seq_num_diff >= EXPECTED_SEQNO_RANGE)) {
+
+		bat_dbg(DBG_BATMAN,
+			"Other host probably restarted!\n");
+
+		bit_reset_window(seq_bits);
+		if (set_mark)
+			bit_mark(seq_bits, 0);
+
+		return 1;
+	}
+
+	/* never reached */
+	return 0;
+}
+
+/* count the hamming weight, how many good packets did we receive? just count
+ * the 1's. The inner loop uses the Kernighan algorithm, see
+ * http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
+ */
+int bit_packet_count(TYPE_OF_WORD *seq_bits)
+{
+	int i, hamming = 0;
+	TYPE_OF_WORD word;
+
+	for (i = 0; i < NUM_WORDS; i++) {
+		word = seq_bits[i];
+
+		while (word) {
+			word &= word-1;
+			hamming++;
+		}
+	}
+	return hamming;
+}
diff --git a/net/batman-adv/bitarray.h b/net/batman-adv/bitarray.h
new file mode 100644
index 0000000..01897d6
--- /dev/null
+++ b/net/batman-adv/bitarray.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright (C) 2006-2010 B.A.T.M.A.N. contributors:
+ *
+ * Simon Wunderlich, Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_BITARRAY_H_
+#define _NET_BATMAN_ADV_BITARRAY_H_
+
+/* you should choose something big, if you don't want to waste cpu */
+#define TYPE_OF_WORD unsigned long
+#define WORD_BIT_SIZE (sizeof(TYPE_OF_WORD) * 8)
+
+/* returns true if the corresponding bit in the given seq_bits indicates true
+ * and curr_seqno is within range of last_seqno */
+uint8_t get_bit_status(TYPE_OF_WORD *seq_bits, uint32_t last_seqno,
+					   uint32_t curr_seqno);
+
+/* turn corresponding bit on, so we can remember that we got the packet */
+void bit_mark(TYPE_OF_WORD *seq_bits, int32_t n);
+
+
+/* receive and process one packet, returns 1 if received seq_num is considered
+ * new, 0 if old  */
+char bit_get_packet(TYPE_OF_WORD *seq_bits, int32_t seq_num_diff,
+					int8_t set_mark);
+
+/* count the hamming weight, how many good packets did we receive? */
+int  bit_packet_count(TYPE_OF_WORD *seq_bits);
+
+#endif /* _NET_BATMAN_ADV_BITARRAY_H_ */
diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
new file mode 100644
index 0000000..ba7ddfc
--- /dev/null
+++ b/net/batman-adv/hard-interface.c
@@ -0,0 +1,548 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include "main.h"
+#include "hard-interface.h"
+#include "soft-interface.h"
+#include "send.h"
+#include "translation-table.h"
+#include "routing.h"
+#include "bat_sysfs.h"
+#include "originator.h"
+#include "hash.h"
+
+#include <linux/if_arp.h>
+#include <linux/netfilter_bridge.h>
+
+#define MIN(x, y) ((x) < (y) ? (x) : (y))
+
+struct batman_if *get_batman_if_by_netdev(struct net_device *net_dev)
+{
+	struct batman_if *batman_if;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(batman_if, &if_list, list) {
+		if (batman_if->net_dev == net_dev)
+			goto out;
+	}
+
+	batman_if = NULL;
+
+out:
+	rcu_read_unlock();
+	return batman_if;
+}
+
+static int is_valid_iface(struct net_device *net_dev)
+{
+	if (net_dev->flags & IFF_LOOPBACK)
+		return 0;
+
+	if (net_dev->type != ARPHRD_ETHER)
+		return 0;
+
+	if (net_dev->addr_len != ETH_ALEN)
+		return 0;
+
+	/* no batman over batman */
+#ifdef HAVE_NET_DEVICE_OPS
+	if (net_dev->netdev_ops->ndo_start_xmit == interface_tx)
+		return 0;
+#else
+	if (net_dev->hard_start_xmit == interface_tx)
+		return 0;
+#endif
+
+	/* Device is being bridged */
+	/* if (net_dev->priv_flags & IFF_BRIDGE_PORT)
+		return 0; */
+
+	return 1;
+}
+
+static struct batman_if *get_active_batman_if(void)
+{
+	struct batman_if *batman_if;
+
+	/* TODO: should check interfaces belonging to bat_priv */
+	rcu_read_lock();
+	list_for_each_entry_rcu(batman_if, &if_list, list) {
+		if (batman_if->if_status == IF_ACTIVE)
+			goto out;
+	}
+
+	batman_if = NULL;
+
+out:
+	rcu_read_unlock();
+	return batman_if;
+}
+
+static void set_primary_if(struct bat_priv *bat_priv,
+			   struct batman_if *batman_if)
+{
+	struct batman_packet *batman_packet;
+
+	bat_priv->primary_if = batman_if;
+
+	if (!bat_priv->primary_if)
+		return;
+
+	set_main_if_addr(batman_if->net_dev->dev_addr);
+
+	batman_packet = (struct batman_packet *)(batman_if->packet_buff);
+	batman_packet->flags = PRIMARIES_FIRST_HOP;
+	batman_packet->ttl = TTL;
+
+	/***
+	 * hacky trick to make sure that we send the HNA information via
+	 * our new primary interface
+	 */
+	atomic_set(&hna_local_changed, 1);
+}
+
+static bool hardif_is_iface_up(struct batman_if *batman_if)
+{
+	if (batman_if->net_dev->flags & IFF_UP)
+		return true;
+
+	return false;
+}
+
+static void update_mac_addresses(struct batman_if *batman_if)
+{
+	addr_to_string(batman_if->addr_str, batman_if->net_dev->dev_addr);
+
+	memcpy(((struct batman_packet *)(batman_if->packet_buff))->orig,
+	       batman_if->net_dev->dev_addr, ETH_ALEN);
+	memcpy(((struct batman_packet *)(batman_if->packet_buff))->prev_sender,
+	       batman_if->net_dev->dev_addr, ETH_ALEN);
+}
+
+static void check_known_mac_addr(uint8_t *addr)
+{
+	struct batman_if *batman_if;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(batman_if, &if_list, list) {
+		if ((batman_if->if_status != IF_ACTIVE) &&
+		    (batman_if->if_status != IF_TO_BE_ACTIVATED))
+			continue;
+
+		if (!compare_orig(batman_if->net_dev->dev_addr, addr))
+			continue;
+
+		printk(KERN_WARNING "batman-adv:"
+		    "The newly added mac address (%pM) already exists on: %s\n",
+		    addr, batman_if->dev);
+		printk(KERN_WARNING "batman-adv:"
+		    "It is strongly recommended to keep mac addresses unique"
+		    "to avoid problems!\n");
+	}
+	rcu_read_unlock();
+}
+
+int hardif_min_mtu(void)
+{
+	struct batman_if *batman_if;
+	/* allow big frames if all devices are capable to do so
+	 * (have MTU > 1500 + BAT_HEADER_LEN) */
+	int min_mtu = ETH_DATA_LEN;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(batman_if, &if_list, list) {
+		if ((batman_if->if_status == IF_ACTIVE) ||
+		    (batman_if->if_status == IF_TO_BE_ACTIVATED))
+			min_mtu = MIN(batman_if->net_dev->mtu - BAT_HEADER_LEN,
+				      min_mtu);
+	}
+	rcu_read_unlock();
+
+	return min_mtu;
+}
+
+/* adjusts the MTU if a new interface with a smaller MTU appeared. */
+void update_min_mtu(void)
+{
+	int min_mtu;
+
+	min_mtu = hardif_min_mtu();
+	if (soft_device->mtu != min_mtu)
+		soft_device->mtu = min_mtu;
+}
+
+static void hardif_activate_interface(struct bat_priv *bat_priv,
+				      struct batman_if *batman_if)
+{
+	if (batman_if->if_status != IF_INACTIVE)
+		return;
+
+	dev_hold(batman_if->net_dev);
+
+	update_mac_addresses(batman_if);
+	batman_if->if_status = IF_TO_BE_ACTIVATED;
+
+	/**
+	 * the first active interface becomes our primary interface or
+	 * the next active interface after the old primay interface was removed
+	 */
+	if (!bat_priv->primary_if)
+		set_primary_if(bat_priv, batman_if);
+
+	printk(KERN_INFO "batman-adv:Interface activated: %s\n",
+	       batman_if->dev);
+
+	if (atomic_read(&module_state) == MODULE_INACTIVE)
+		activate_module();
+
+	update_min_mtu();
+	return;
+}
+
+static void hardif_deactivate_interface(struct batman_if *batman_if)
+{
+	if ((batman_if->if_status != IF_ACTIVE) &&
+	   (batman_if->if_status != IF_TO_BE_ACTIVATED))
+		return;
+
+	dev_put(batman_if->net_dev);
+
+	batman_if->if_status = IF_INACTIVE;
+
+	printk(KERN_INFO "batman-adv:Interface deactivated: %s\n",
+	       batman_if->dev);
+
+	update_min_mtu();
+}
+
+int hardif_enable_interface(struct batman_if *batman_if)
+{
+	/* FIXME: each batman_if will be attached to a softif */
+	struct bat_priv *bat_priv = netdev_priv(soft_device);
+	struct batman_packet *batman_packet;
+
+	if (batman_if->if_status != IF_NOT_IN_USE)
+		goto out;
+
+	batman_if->packet_len = BAT_PACKET_LEN;
+	batman_if->packet_buff = kmalloc(batman_if->packet_len, GFP_ATOMIC);
+
+	if (!batman_if->packet_buff) {
+		printk(KERN_ERR "batman-adv:"
+		       "Can't add interface packet (%s): out of memory\n",
+		       batman_if->dev);
+		goto err;
+	}
+
+	batman_packet = (struct batman_packet *)(batman_if->packet_buff);
+	batman_packet->packet_type = BAT_PACKET;
+	batman_packet->version = COMPAT_VERSION;
+	batman_packet->flags = 0;
+	batman_packet->ttl = 2;
+	batman_packet->tq = TQ_MAX_VALUE;
+	batman_packet->num_hna = 0;
+
+	batman_if->if_num = bat_priv->num_ifaces;
+	bat_priv->num_ifaces++;
+	batman_if->if_status = IF_INACTIVE;
+	orig_hash_add_if(batman_if, bat_priv->num_ifaces);
+
+	atomic_set(&batman_if->seqno, 1);
+	printk(KERN_INFO "batman-adv:Adding interface: %s\n", batman_if->dev);
+
+	if (hardif_is_iface_up(batman_if))
+		hardif_activate_interface(bat_priv, batman_if);
+	else
+		printk(KERN_ERR "batman-adv:"
+		       "Not using interface %s "
+		       "(retrying later): interface not active\n",
+		       batman_if->dev);
+
+	/* begin scheduling originator messages on that interface */
+	schedule_own_packet(batman_if);
+
+out:
+	return 0;
+
+err:
+	return -ENOMEM;
+}
+
+void hardif_disable_interface(struct batman_if *batman_if)
+{
+	/* FIXME: each batman_if will be attached to a softif */
+	struct bat_priv *bat_priv = netdev_priv(soft_device);
+
+	if (batman_if->if_status == IF_ACTIVE)
+		hardif_deactivate_interface(batman_if);
+
+	if (batman_if->if_status != IF_INACTIVE)
+		return;
+
+	printk(KERN_INFO "batman-adv:Removing interface: %s\n", batman_if->dev);
+	bat_priv->num_ifaces--;
+	orig_hash_del_if(batman_if, bat_priv->num_ifaces);
+
+	if (batman_if == bat_priv->primary_if)
+		set_primary_if(bat_priv, get_active_batman_if());
+
+	kfree(batman_if->packet_buff);
+	batman_if->packet_buff = NULL;
+	batman_if->if_status = IF_NOT_IN_USE;
+
+	if ((atomic_read(&module_state) == MODULE_ACTIVE) &&
+	    (bat_priv->num_ifaces == 0))
+		deactivate_module();
+}
+
+static struct batman_if *hardif_add_interface(struct net_device *net_dev)
+{
+	struct batman_if *batman_if;
+	int ret;
+
+	ret = is_valid_iface(net_dev);
+	if (ret != 1)
+		goto out;
+
+	batman_if = kmalloc(sizeof(struct batman_if), GFP_ATOMIC);
+	if (!batman_if) {
+		printk(KERN_ERR "batman-adv:"
+		       "Can't add interface (%s): out of memory\n",
+		       net_dev->name);
+		goto out;
+	}
+
+	batman_if->dev = kstrdup(net_dev->name, GFP_ATOMIC);
+	if (!batman_if->dev)
+		goto free_if;
+
+	ret = sysfs_add_hardif(&batman_if->hardif_obj, net_dev);
+	if (ret)
+		goto free_dev;
+
+	batman_if->if_num = -1;
+	batman_if->net_dev = net_dev;
+	batman_if->if_status = IF_NOT_IN_USE;
+	INIT_LIST_HEAD(&batman_if->list);
+
+	check_known_mac_addr(batman_if->net_dev->dev_addr);
+	list_add_tail_rcu(&batman_if->list, &if_list);
+	return batman_if;
+
+free_dev:
+	kfree(batman_if->dev);
+free_if:
+	kfree(batman_if);
+out:
+	return NULL;
+}
+
+static void hardif_free_interface(struct rcu_head *rcu)
+{
+	struct batman_if *batman_if = container_of(rcu, struct batman_if, rcu);
+
+	/* delete all references to this batman_if */
+	purge_orig(NULL);
+	purge_outstanding_packets(batman_if);
+
+	kfree(batman_if->dev);
+	kfree(batman_if);
+}
+
+static void hardif_remove_interface(struct batman_if *batman_if)
+{
+	/* first deactivate interface */
+	if (batman_if->if_status != IF_NOT_IN_USE)
+		hardif_disable_interface(batman_if);
+
+	if (batman_if->if_status != IF_NOT_IN_USE)
+		return;
+
+	batman_if->if_status = IF_TO_BE_REMOVED;
+	list_del_rcu(&batman_if->list);
+	sysfs_del_hardif(&batman_if->hardif_obj);
+	call_rcu(&batman_if->rcu, hardif_free_interface);
+}
+
+void hardif_remove_interfaces(void)
+{
+	struct batman_if *batman_if, *batman_if_tmp;
+
+	list_for_each_entry_safe(batman_if, batman_if_tmp, &if_list, list)
+		hardif_remove_interface(batman_if);
+}
+
+static int hard_if_event(struct notifier_block *this,
+			 unsigned long event, void *ptr)
+{
+	struct net_device *net_dev = (struct net_device *)ptr;
+	struct batman_if *batman_if = get_batman_if_by_netdev(net_dev);
+	/* FIXME: each batman_if will be attached to a softif */
+	struct bat_priv *bat_priv = netdev_priv(soft_device);
+
+	if (!batman_if)
+		batman_if = hardif_add_interface(net_dev);
+
+	if (!batman_if)
+		goto out;
+
+	switch (event) {
+	case NETDEV_REGISTER:
+		break;
+	case NETDEV_UP:
+		hardif_activate_interface(bat_priv, batman_if);
+		break;
+	case NETDEV_GOING_DOWN:
+	case NETDEV_DOWN:
+		hardif_deactivate_interface(batman_if);
+		break;
+	case NETDEV_UNREGISTER:
+		hardif_remove_interface(batman_if);
+		break;
+	case NETDEV_CHANGENAME:
+		break;
+	case NETDEV_CHANGEADDR:
+		check_known_mac_addr(batman_if->net_dev->dev_addr);
+		update_mac_addresses(batman_if);
+		if (batman_if == bat_priv->primary_if)
+			set_primary_if(bat_priv, batman_if);
+		break;
+	default:
+		break;
+	};
+
+out:
+	return NOTIFY_DONE;
+}
+
+static int batman_skb_recv_finish(struct sk_buff *skb)
+{
+	return NF_ACCEPT;
+}
+
+/* receive a packet with the batman ethertype coming on a hard
+ * interface */
+int batman_skb_recv(struct sk_buff *skb, struct net_device *dev,
+	struct packet_type *ptype, struct net_device *orig_dev)
+{
+	struct batman_packet *batman_packet;
+	struct batman_if *batman_if;
+	struct net_device_stats *stats;
+	int ret;
+
+	skb = skb_share_check(skb, GFP_ATOMIC);
+
+	/* skb was released by skb_share_check() */
+	if (!skb)
+		goto err_out;
+
+	if (atomic_read(&module_state) != MODULE_ACTIVE)
+		goto err_free;
+
+	/* if netfilter/ebtables wants to block incoming batman
+	 * packets then give them a chance to do so here */
+	ret = NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, dev, NULL,
+		      batman_skb_recv_finish);
+	if (ret != 1)
+		goto err_out;
+
+	/* packet should hold at least type and version */
+	if (unlikely(skb_headlen(skb) < 2))
+		goto err_free;
+
+	/* expect a valid ethernet header here. */
+	if (unlikely(skb->mac_len != sizeof(struct ethhdr)
+				|| !skb_mac_header(skb)))
+		goto err_free;
+
+	batman_if = get_batman_if_by_netdev(skb->dev);
+	if (!batman_if)
+		goto err_free;
+
+	/* discard frames on not active interfaces */
+	if (batman_if->if_status != IF_ACTIVE)
+		goto err_free;
+
+	stats = (struct net_device_stats *)dev_get_stats(skb->dev);
+	if (stats) {
+		stats->rx_packets++;
+		stats->rx_bytes += skb->len;
+	}
+
+	batman_packet = (struct batman_packet *)skb->data;
+
+	if (batman_packet->version != COMPAT_VERSION) {
+		bat_dbg(DBG_BATMAN,
+			"Drop packet: incompatible batman version (%i)\n",
+			batman_packet->version);
+		goto err_free;
+	}
+
+	/* all receive handlers return whether they received or reused
+	 * the supplied skb. if not, we have to free the skb. */
+
+	switch (batman_packet->packet_type) {
+		/* batman originator packet */
+	case BAT_PACKET:
+		ret = recv_bat_packet(skb, batman_if);
+		break;
+
+		/* batman icmp packet */
+	case BAT_ICMP:
+		ret = recv_icmp_packet(skb);
+		break;
+
+		/* unicast packet */
+	case BAT_UNICAST:
+		ret = recv_unicast_packet(skb, batman_if);
+		break;
+
+		/* broadcast packet */
+	case BAT_BCAST:
+		ret = recv_bcast_packet(skb);
+		break;
+
+		/* vis packet */
+	case BAT_VIS:
+		ret = recv_vis_packet(skb);
+		break;
+	default:
+		ret = NET_RX_DROP;
+	}
+
+	if (ret == NET_RX_DROP)
+		kfree_skb(skb);
+
+	/* return NET_RX_SUCCESS in any case as we
+	 * most probably dropped the packet for
+	 * routing-logical reasons. */
+
+	return NET_RX_SUCCESS;
+
+err_free:
+	kfree_skb(skb);
+err_out:
+	return NET_RX_DROP;
+}
+
+struct notifier_block hard_if_notifier = {
+	.notifier_call = hard_if_event,
+};
diff --git a/net/batman-adv/hard-interface.h b/net/batman-adv/hard-interface.h
new file mode 100644
index 0000000..d5640b0
--- /dev/null
+++ b/net/batman-adv/hard-interface.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_HARD_INTERFACE_H_
+#define _NET_BATMAN_ADV_HARD_INTERFACE_H_
+
+#define IF_NOT_IN_USE 0
+#define IF_TO_BE_REMOVED 1
+#define IF_INACTIVE 2
+#define IF_ACTIVE 3
+#define IF_TO_BE_ACTIVATED 4
+#define IF_I_WANT_YOU 5
+
+extern struct notifier_block hard_if_notifier;
+
+struct batman_if *get_batman_if_by_netdev(struct net_device *net_dev);
+int hardif_enable_interface(struct batman_if *batman_if);
+void hardif_disable_interface(struct batman_if *batman_if);
+void hardif_remove_interfaces(void);
+int batman_skb_recv(struct sk_buff *skb,
+				struct net_device *dev,
+				struct packet_type *ptype,
+				struct net_device *orig_dev);
+int hardif_min_mtu(void);
+void update_min_mtu(void);
+
+#endif /* _NET_BATMAN_ADV_HARD_INTERFACE_H_ */
diff --git a/net/batman-adv/hash.c b/net/batman-adv/hash.c
new file mode 100644
index 0000000..1286f8f
--- /dev/null
+++ b/net/batman-adv/hash.c
@@ -0,0 +1,306 @@
+/*
+ * Copyright (C) 2006-2010 B.A.T.M.A.N. contributors:
+ *
+ * Simon Wunderlich, Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include "main.h"
+#include "hash.h"
+
+/* clears the hash */
+static void hash_init(struct hashtable_t *hash)
+{
+	int i;
+
+	hash->elements = 0;
+
+	for (i = 0 ; i < hash->size; i++)
+		hash->table[i] = NULL;
+}
+
+/* remove the hash structure. if hashdata_free_cb != NULL, this function will be
+ * called to remove the elements inside of the hash.  if you don't remove the
+ * elements, memory might be leaked. */
+void hash_delete(struct hashtable_t *hash, hashdata_free_cb free_cb)
+{
+	struct element_t *bucket, *last_bucket;
+	int i;
+
+	for (i = 0; i < hash->size; i++) {
+		bucket = hash->table[i];
+
+		while (bucket != NULL) {
+			if (free_cb != NULL)
+				free_cb(bucket->data);
+
+			last_bucket = bucket;
+			bucket = bucket->next;
+			kfree(last_bucket);
+		}
+	}
+
+	hash_destroy(hash);
+}
+
+/* free only the hashtable and the hash itself. */
+void hash_destroy(struct hashtable_t *hash)
+{
+	kfree(hash->table);
+	kfree(hash);
+}
+
+/* iterate though the hash. First element is selected if an iterator
+ * initialized with HASHIT() is supplied as iter. Use the returned
+ * (or supplied) iterator to access the elements until hash_iterate returns
+ * NULL. */
+
+struct hash_it_t *hash_iterate(struct hashtable_t *hash,
+			       struct hash_it_t *iter)
+{
+	if (!hash)
+		return NULL;
+	if (!iter)
+		return NULL;
+
+	/* sanity checks first (if our bucket got deleted in the last
+	 * iteration): */
+	if (iter->bucket != NULL) {
+		if (iter->first_bucket != NULL) {
+			/* we're on the first element and it got removed after
+			 * the last iteration. */
+			if ((*iter->first_bucket) != iter->bucket) {
+				/* there are still other elements in the list */
+				if ((*iter->first_bucket) != NULL) {
+					iter->prev_bucket = NULL;
+					iter->bucket = (*iter->first_bucket);
+					iter->first_bucket =
+						&hash->table[iter->index];
+					return iter;
+				} else {
+					iter->bucket = NULL;
+				}
+			}
+		} else if (iter->prev_bucket != NULL) {
+			/*
+			* we're not on the first element, and the bucket got
+			* removed after the last iteration.  the last bucket's
+			* next pointer is not pointing to our actual bucket
+			* anymore.  select the next.
+			*/
+			if (iter->prev_bucket->next != iter->bucket)
+				iter->bucket = iter->prev_bucket;
+		}
+	}
+
+	/* now as we are sane, select the next one if there is some */
+	if (iter->bucket != NULL) {
+		if (iter->bucket->next != NULL) {
+			iter->prev_bucket = iter->bucket;
+			iter->bucket = iter->bucket->next;
+			iter->first_bucket = NULL;
+			return iter;
+		}
+	}
+
+	/* if not returned yet, we've reached the last one on the index and have
+	 * to search forward */
+	iter->index++;
+	/* go through the entries of the hash table */
+	while (iter->index < hash->size) {
+		if ((hash->table[iter->index]) != NULL) {
+			iter->prev_bucket = NULL;
+			iter->bucket = hash->table[iter->index];
+			iter->first_bucket = &hash->table[iter->index];
+			return iter;
+		} else {
+			iter->index++;
+		}
+	}
+
+	/* nothing to iterate over anymore */
+	return NULL;
+}
+
+/* allocates and clears the hash */
+struct hashtable_t *hash_new(int size, hashdata_compare_cb compare,
+			     hashdata_choose_cb choose)
+{
+	struct hashtable_t *hash;
+
+	hash = kmalloc(sizeof(struct hashtable_t) , GFP_ATOMIC);
+
+	if (hash == NULL)
+		return NULL;
+
+	hash->size = size;
+	hash->table = kmalloc(sizeof(struct element_t *) * size, GFP_ATOMIC);
+
+	if (hash->table == NULL) {
+		kfree(hash);
+		return NULL;
+	}
+
+	hash_init(hash);
+
+	hash->compare = compare;
+	hash->choose = choose;
+
+	return hash;
+}
+
+/* adds data to the hashtable. returns 0 on success, -1 on error */
+int hash_add(struct hashtable_t *hash, void *data)
+{
+	int index;
+	struct element_t *bucket, *prev_bucket = NULL;
+
+	if (!hash)
+		return -1;
+
+	index = hash->choose(data, hash->size);
+	bucket = hash->table[index];
+
+	while (bucket != NULL) {
+		if (hash->compare(bucket->data, data))
+			return -1;
+
+		prev_bucket = bucket;
+		bucket = bucket->next;
+	}
+
+	/* found the tail of the list, add new element */
+	bucket = kmalloc(sizeof(struct element_t), GFP_ATOMIC);
+
+	if (bucket == NULL)
+		return -1;
+
+	bucket->data = data;
+	bucket->next = NULL;
+
+	/* and link it */
+	if (prev_bucket == NULL)
+		hash->table[index] = bucket;
+	else
+		prev_bucket->next = bucket;
+
+	hash->elements++;
+	return 0;
+}
+
+/* finds data, based on the key in keydata. returns the found data on success,
+ * or NULL on error */
+void *hash_find(struct hashtable_t *hash, void *keydata)
+{
+	int index;
+	struct element_t *bucket;
+
+	if (!hash)
+		return NULL;
+
+	index = hash->choose(keydata , hash->size);
+	bucket = hash->table[index];
+
+	while (bucket != NULL) {
+		if (hash->compare(bucket->data, keydata))
+			return bucket->data;
+
+		bucket = bucket->next;
+	}
+
+	return NULL;
+}
+
+/* remove bucket (this might be used in hash_iterate() if you already found the
+ * bucket you want to delete and don't need the overhead to find it again with
+ * hash_remove(). But usually, you don't want to use this function, as it
+ * fiddles with hash-internals. */
+void *hash_remove_bucket(struct hashtable_t *hash, struct hash_it_t *hash_it_t)
+{
+	void *data_save;
+
+	data_save = hash_it_t->bucket->data;
+
+	if (hash_it_t->prev_bucket != NULL)
+		hash_it_t->prev_bucket->next = hash_it_t->bucket->next;
+	else if (hash_it_t->first_bucket != NULL)
+		(*hash_it_t->first_bucket) = hash_it_t->bucket->next;
+
+	kfree(hash_it_t->bucket);
+	hash->elements--;
+
+	return data_save;
+}
+
+/* removes data from hash, if found. returns pointer do data on success, so you
+ * can remove the used structure yourself, or NULL on error .  data could be the
+ * structure you use with just the key filled, we just need the key for
+ * comparing. */
+void *hash_remove(struct hashtable_t *hash, void *data)
+{
+	struct hash_it_t hash_it_t;
+
+	hash_it_t.index = hash->choose(data, hash->size);
+	hash_it_t.bucket = hash->table[hash_it_t.index];
+	hash_it_t.prev_bucket = NULL;
+
+	while (hash_it_t.bucket != NULL) {
+		if (hash->compare(hash_it_t.bucket->data, data)) {
+			hash_it_t.first_bucket =
+				(hash_it_t.bucket ==
+				 hash->table[hash_it_t.index] ?
+				 &hash->table[hash_it_t.index] : NULL);
+			return hash_remove_bucket(hash, &hash_it_t);
+		}
+
+		hash_it_t.prev_bucket = hash_it_t.bucket;
+		hash_it_t.bucket = hash_it_t.bucket->next;
+	}
+
+	return NULL;
+}
+
+/* resize the hash, returns the pointer to the new hash or NULL on
+ * error. removes the old hash on success. */
+struct hashtable_t *hash_resize(struct hashtable_t *hash, int size)
+{
+	struct hashtable_t *new_hash;
+	struct element_t *bucket;
+	int i;
+
+	/* initialize a new hash with the new size */
+	new_hash = hash_new(size, hash->compare, hash->choose);
+
+	if (new_hash == NULL)
+		return NULL;
+
+	/* copy the elements */
+	for (i = 0; i < hash->size; i++) {
+		bucket = hash->table[i];
+
+		while (bucket != NULL) {
+			hash_add(new_hash, bucket->data);
+			bucket = bucket->next;
+		}
+	}
+
+	/* remove hash and eventual overflow buckets but not the content
+	 * itself. */
+	hash_delete(hash, NULL);
+
+	return new_hash;
+}
diff --git a/net/batman-adv/hash.h b/net/batman-adv/hash.h
new file mode 100644
index 0000000..c483e11
--- /dev/null
+++ b/net/batman-adv/hash.h
@@ -0,0 +1,100 @@
+/*
+ * Copyright (C) 2006-2010 B.A.T.M.A.N. contributors:
+ *
+ * Simon Wunderlich, Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_HASH_H_
+#define _NET_BATMAN_ADV_HASH_H_
+
+#define HASHIT(name) struct hash_it_t name = { \
+		.index = -1, .bucket = NULL, \
+		.prev_bucket = NULL, \
+		.first_bucket = NULL }
+
+
+typedef int (*hashdata_compare_cb)(void *, void *);
+typedef int (*hashdata_choose_cb)(void *, int);
+typedef void (*hashdata_free_cb)(void *);
+
+struct element_t {
+	void *data;		/* pointer to the data */
+	struct element_t *next;	/* overflow bucket pointer */
+};
+
+struct hash_it_t {
+	int index;
+	struct element_t *bucket;
+	struct element_t *prev_bucket;
+	struct element_t **first_bucket;
+};
+
+struct hashtable_t {
+	struct element_t **table;   /* the hashtable itself, with the buckets */
+	int elements;		    /* number of elements registered */
+	int size;		    /* size of hashtable */
+	hashdata_compare_cb compare;/* callback to a compare function.  should
+				     * compare 2 element datas for their keys,
+				     * return 0 if same and not 0 if not
+				     * same */
+	hashdata_choose_cb choose;  /* the hashfunction, should return an index
+				     * based on the key in the data of the first
+				     * argument and the size the second */
+};
+
+/* allocates and clears the hash */
+struct hashtable_t *hash_new(int size, hashdata_compare_cb compare,
+			     hashdata_choose_cb choose);
+
+/* remove bucket (this might be used in hash_iterate() if you already found the
+ * bucket you want to delete and don't need the overhead to find it again with
+ * hash_remove().  But usually, you don't want to use this function, as it
+ * fiddles with hash-internals. */
+void *hash_remove_bucket(struct hashtable_t *hash, struct hash_it_t *hash_it_t);
+
+/* remove the hash structure. if hashdata_free_cb != NULL, this function will be
+ * called to remove the elements inside of the hash.  if you don't remove the
+ * elements, memory might be leaked. */
+void hash_delete(struct hashtable_t *hash, hashdata_free_cb free_cb);
+
+/* free only the hashtable and the hash itself. */
+void hash_destroy(struct hashtable_t *hash);
+
+/* adds data to the hashtable. returns 0 on success, -1 on error */
+int hash_add(struct hashtable_t *hash, void *data);
+
+/* removes data from hash, if found. returns pointer do data on success, so you
+ * can remove the used structure yourself, or NULL on error .  data could be the
+ * structure you use with just the key filled, we just need the key for
+ * comparing. */
+void *hash_remove(struct hashtable_t *hash, void *data);
+
+/* finds data, based on the key in keydata. returns the found data on success,
+ * or NULL on error */
+void *hash_find(struct hashtable_t *hash, void *keydata);
+
+/* resize the hash, returns the pointer to the new hash or NULL on
+ * error. removes the old hash on success */
+struct hashtable_t *hash_resize(struct hashtable_t *hash, int size);
+
+/* iterate though the hash. first element is selected with iter_in NULL.  use
+ * the returned iterator to access the elements until hash_it_t returns NULL. */
+struct hash_it_t *hash_iterate(struct hashtable_t *hash,
+			       struct hash_it_t *iter_in);
+
+#endif /* _NET_BATMAN_ADV_HASH_H_ */
diff --git a/net/batman-adv/icmp_socket.c b/net/batman-adv/icmp_socket.c
new file mode 100644
index 0000000..08a5f7b
--- /dev/null
+++ b/net/batman-adv/icmp_socket.c
@@ -0,0 +1,337 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include <linux/debugfs.h>
+#include <linux/slab.h>
+#include "main.h"
+#include "icmp_socket.h"
+#include "send.h"
+#include "types.h"
+#include "hash.h"
+#include "hard-interface.h"
+
+
+static struct socket_client *socket_client_hash[256];
+
+static void bat_socket_add_packet(struct socket_client *socket_client,
+				  struct icmp_packet_rr *icmp_packet,
+				  size_t icmp_len);
+
+void bat_socket_init(void)
+{
+	memset(socket_client_hash, 0, sizeof(socket_client_hash));
+}
+
+static int bat_socket_open(struct inode *inode, struct file *file)
+{
+	unsigned int i;
+	struct socket_client *socket_client;
+
+	socket_client = kmalloc(sizeof(struct socket_client), GFP_KERNEL);
+
+	if (!socket_client)
+		return -ENOMEM;
+
+	for (i = 0; i < ARRAY_SIZE(socket_client_hash); i++) {
+		if (!socket_client_hash[i]) {
+			socket_client_hash[i] = socket_client;
+			break;
+		}
+	}
+
+	if (i == ARRAY_SIZE(socket_client_hash)) {
+		printk(KERN_ERR "batman-adv:"
+		       "Error - can't add another packet client: "
+		       "maximum number of clients reached\n");
+		kfree(socket_client);
+		return -EXFULL;
+	}
+
+	INIT_LIST_HEAD(&socket_client->queue_list);
+	socket_client->queue_len = 0;
+	socket_client->index = i;
+	spin_lock_init(&socket_client->lock);
+	init_waitqueue_head(&socket_client->queue_wait);
+
+	file->private_data = socket_client;
+
+	inc_module_count();
+	return 0;
+}
+
+static int bat_socket_release(struct inode *inode, struct file *file)
+{
+	struct socket_client *socket_client =
+		(struct socket_client *)file->private_data;
+	struct socket_packet *socket_packet;
+	struct list_head *list_pos, *list_pos_tmp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&socket_client->lock, flags);
+
+	/* for all packets in the queue ... */
+	list_for_each_safe(list_pos, list_pos_tmp, &socket_client->queue_list) {
+		socket_packet = list_entry(list_pos,
+					   struct socket_packet, list);
+
+		list_del(list_pos);
+		kfree(socket_packet);
+	}
+
+	socket_client_hash[socket_client->index] = NULL;
+	spin_unlock_irqrestore(&socket_client->lock, flags);
+
+	kfree(socket_client);
+	dec_module_count();
+
+	return 0;
+}
+
+static ssize_t bat_socket_read(struct file *file, char __user *buf,
+			       size_t count, loff_t *ppos)
+{
+	struct socket_client *socket_client =
+		(struct socket_client *)file->private_data;
+	struct socket_packet *socket_packet;
+	size_t packet_len;
+	int error;
+	unsigned long flags;
+
+	if ((file->f_flags & O_NONBLOCK) && (socket_client->queue_len == 0))
+		return -EAGAIN;
+
+	if ((!buf) || (count < sizeof(struct icmp_packet)))
+		return -EINVAL;
+
+	if (!access_ok(VERIFY_WRITE, buf, count))
+		return -EFAULT;
+
+	error = wait_event_interruptible(socket_client->queue_wait,
+					 socket_client->queue_len);
+
+	if (error)
+		return error;
+
+	spin_lock_irqsave(&socket_client->lock, flags);
+
+	socket_packet = list_first_entry(&socket_client->queue_list,
+					 struct socket_packet, list);
+	list_del(&socket_packet->list);
+	socket_client->queue_len--;
+
+	spin_unlock_irqrestore(&socket_client->lock, flags);
+
+	error = __copy_to_user(buf, &socket_packet->icmp_packet,
+			       socket_packet->icmp_len);
+
+	packet_len = socket_packet->icmp_len;
+	kfree(socket_packet);
+
+	if (error)
+		return -EFAULT;
+
+	return packet_len;
+}
+
+static ssize_t bat_socket_write(struct file *file, const char __user *buff,
+				size_t len, loff_t *off)
+{
+	struct socket_client *socket_client =
+		(struct socket_client *)file->private_data;
+	struct icmp_packet_rr icmp_packet;
+	struct orig_node *orig_node;
+	struct batman_if *batman_if;
+	size_t packet_len = sizeof(struct icmp_packet);
+	uint8_t dstaddr[ETH_ALEN];
+	unsigned long flags;
+
+	if (len < sizeof(struct icmp_packet)) {
+		bat_dbg(DBG_BATMAN, "batman-adv:"
+			"Error - can't send packet from char device: "
+			"invalid packet size\n");
+		return -EINVAL;
+	}
+
+	if (len >= sizeof(struct icmp_packet_rr))
+		packet_len = sizeof(struct icmp_packet_rr);
+
+	if (!access_ok(VERIFY_READ, buff, packet_len))
+		return -EFAULT;
+
+	if (__copy_from_user(&icmp_packet, buff, packet_len))
+		return -EFAULT;
+
+	if (icmp_packet.packet_type != BAT_ICMP) {
+		bat_dbg(DBG_BATMAN, "batman-adv:"
+			"Error - can't send packet from char device: "
+			"got bogus packet type (expected: BAT_ICMP)\n");
+		return -EINVAL;
+	}
+
+	if (icmp_packet.msg_type != ECHO_REQUEST) {
+		bat_dbg(DBG_BATMAN, "batman-adv:"
+			"Error - can't send packet from char device: "
+			"got bogus message type (expected: ECHO_REQUEST)\n");
+		return -EINVAL;
+	}
+
+	icmp_packet.uid = socket_client->index;
+
+	if (icmp_packet.version != COMPAT_VERSION) {
+		icmp_packet.msg_type = PARAMETER_PROBLEM;
+		icmp_packet.ttl = COMPAT_VERSION;
+		bat_socket_add_packet(socket_client, &icmp_packet, packet_len);
+		goto out;
+	}
+
+	if (atomic_read(&module_state) != MODULE_ACTIVE)
+		goto dst_unreach;
+
+	spin_lock_irqsave(&orig_hash_lock, flags);
+	orig_node = ((struct orig_node *)hash_find(orig_hash, icmp_packet.dst));
+
+	if (!orig_node)
+		goto unlock;
+
+	if (!orig_node->router)
+		goto unlock;
+
+	batman_if = orig_node->router->if_incoming;
+	memcpy(dstaddr, orig_node->router->addr, ETH_ALEN);
+
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+	if (!batman_if)
+		goto dst_unreach;
+
+	if (batman_if->if_status != IF_ACTIVE)
+		goto dst_unreach;
+
+	memcpy(icmp_packet.orig, batman_if->net_dev->dev_addr, ETH_ALEN);
+
+	if (packet_len == sizeof(struct icmp_packet_rr))
+		memcpy(icmp_packet.rr, batman_if->net_dev->dev_addr, ETH_ALEN);
+
+	send_raw_packet((unsigned char *)&icmp_packet,
+			packet_len, batman_if, dstaddr);
+
+	goto out;
+
+unlock:
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+dst_unreach:
+	icmp_packet.msg_type = DESTINATION_UNREACHABLE;
+	bat_socket_add_packet(socket_client, &icmp_packet, packet_len);
+out:
+	return len;
+}
+
+static unsigned int bat_socket_poll(struct file *file, poll_table *wait)
+{
+	struct socket_client *socket_client =
+		(struct socket_client *)file->private_data;
+
+	poll_wait(file, &socket_client->queue_wait, wait);
+
+	if (socket_client->queue_len > 0)
+		return POLLIN | POLLRDNORM;
+
+	return 0;
+}
+
+static const struct file_operations fops = {
+	.owner = THIS_MODULE,
+	.open = bat_socket_open,
+	.release = bat_socket_release,
+	.read = bat_socket_read,
+	.write = bat_socket_write,
+	.poll = bat_socket_poll,
+};
+
+int bat_socket_setup(struct bat_priv *bat_priv)
+{
+	struct dentry *d;
+
+	if (!bat_priv->debug_dir)
+		goto err;
+
+	d = debugfs_create_file(ICMP_SOCKET, S_IFREG | S_IWUSR | S_IRUSR,
+				bat_priv->debug_dir, NULL, &fops);
+	if (d)
+		goto err;
+
+	return 0;
+
+err:
+	return 1;
+}
+
+static void bat_socket_add_packet(struct socket_client *socket_client,
+				  struct icmp_packet_rr *icmp_packet,
+				  size_t icmp_len)
+{
+	struct socket_packet *socket_packet;
+	unsigned long flags;
+
+	socket_packet = kmalloc(sizeof(struct socket_packet), GFP_ATOMIC);
+
+	if (!socket_packet)
+		return;
+
+	INIT_LIST_HEAD(&socket_packet->list);
+	memcpy(&socket_packet->icmp_packet, icmp_packet, icmp_len);
+	socket_packet->icmp_len = icmp_len;
+
+	spin_lock_irqsave(&socket_client->lock, flags);
+
+	/* while waiting for the lock the socket_client could have been
+	 * deleted */
+	if (!socket_client_hash[icmp_packet->uid]) {
+		spin_unlock_irqrestore(&socket_client->lock, flags);
+		kfree(socket_packet);
+		return;
+	}
+
+	list_add_tail(&socket_packet->list, &socket_client->queue_list);
+	socket_client->queue_len++;
+
+	if (socket_client->queue_len > 100) {
+		socket_packet = list_first_entry(&socket_client->queue_list,
+						 struct socket_packet, list);
+
+		list_del(&socket_packet->list);
+		kfree(socket_packet);
+		socket_client->queue_len--;
+	}
+
+	spin_unlock_irqrestore(&socket_client->lock, flags);
+
+	wake_up(&socket_client->queue_wait);
+}
+
+void bat_socket_receive_packet(struct icmp_packet_rr *icmp_packet,
+			       size_t icmp_len)
+{
+	struct socket_client *hash = socket_client_hash[icmp_packet->uid];
+
+	if (hash)
+		bat_socket_add_packet(hash, icmp_packet, icmp_len);
+}
diff --git a/net/batman-adv/icmp_socket.h b/net/batman-adv/icmp_socket.h
new file mode 100644
index 0000000..bf9b348
--- /dev/null
+++ b/net/batman-adv/icmp_socket.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_ICMP_SOCKET_H_
+#define _NET_BATMAN_ADV_ICMP_SOCKET_H_
+
+#include "types.h"
+
+#define ICMP_SOCKET "socket"
+
+void bat_socket_init(void);
+int bat_socket_setup(struct bat_priv *bat_priv);
+void bat_socket_receive_packet(struct icmp_packet_rr *icmp_packet,
+			       size_t icmp_len);
+
+#endif /* _NET_BATMAN_ADV_ICMP_SOCKET_H_ */
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
new file mode 100644
index 0000000..d7e2960
--- /dev/null
+++ b/net/batman-adv/main.c
@@ -0,0 +1,299 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include "main.h"
+#include "bat_sysfs.h"
+#include "bat_debugfs.h"
+#include "routing.h"
+#include "send.h"
+#include "originator.h"
+#include "soft-interface.h"
+#include "icmp_socket.h"
+#include "translation-table.h"
+#include "hard-interface.h"
+#include "types.h"
+#include "vis.h"
+#include "hash.h"
+
+struct list_head if_list;
+struct hlist_head forw_bat_list;
+struct hlist_head forw_bcast_list;
+struct hashtable_t *orig_hash;
+
+DEFINE_SPINLOCK(orig_hash_lock);
+DEFINE_SPINLOCK(forw_bat_list_lock);
+DEFINE_SPINLOCK(forw_bcast_list_lock);
+
+atomic_t bcast_queue_left;
+atomic_t batman_queue_left;
+
+int16_t num_hna;
+
+struct net_device *soft_device;
+
+unsigned char broadcast_addr[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
+atomic_t module_state;
+
+static struct packet_type batman_adv_packet_type __read_mostly = {
+	.type = __constant_htons(ETH_P_BATMAN),
+	.func = batman_skb_recv,
+};
+
+struct workqueue_struct *bat_event_workqueue;
+
+#ifdef CONFIG_BATMAN_ADV_DEBUG
+int debug;
+
+module_param(debug, int, 0644);
+
+int bat_debug_type(int type)
+{
+	return debug & type;
+}
+#endif
+
+int init_module(void)
+{
+	int retval;
+
+	INIT_LIST_HEAD(&if_list);
+	INIT_HLIST_HEAD(&forw_bat_list);
+	INIT_HLIST_HEAD(&forw_bcast_list);
+
+	atomic_set(&module_state, MODULE_INACTIVE);
+
+	atomic_set(&bcast_queue_left, BCAST_QUEUE_LEN);
+	atomic_set(&batman_queue_left, BATMAN_QUEUE_LEN);
+
+	/* the name should not be longer than 10 chars - see
+	 * http://lwn.net/Articles/23634/ */
+	bat_event_workqueue = create_singlethread_workqueue("bat_events");
+
+	if (!bat_event_workqueue)
+		return -ENOMEM;
+
+	bat_socket_init();
+	debugfs_init();
+
+	/* initialize layer 2 interface */
+	soft_device = alloc_netdev(sizeof(struct bat_priv) , "bat%d",
+				   interface_setup);
+
+	if (!soft_device) {
+		printk(KERN_ERR "batman-adv:"
+		       "Unable to allocate the batman interface\n");
+		goto end;
+	}
+
+	retval = register_netdev(soft_device);
+
+	if (retval < 0) {
+		printk(KERN_ERR "batman-adv:"
+		       "Unable to register the batman interface: %i\n", retval);
+		goto free_soft_device;
+	}
+
+	retval = sysfs_add_meshif(soft_device);
+
+	if (retval < 0)
+		goto unreg_soft_device;
+
+	retval = debugfs_add_meshif(soft_device);
+
+	if (retval < 0)
+		goto unreg_sysfs;
+
+	register_netdevice_notifier(&hard_if_notifier);
+	dev_add_pack(&batman_adv_packet_type);
+
+	printk(KERN_INFO "batman-adv:"
+	       "B.A.T.M.A.N. advanced %s%s (compatibility version %i) loaded\n",
+	       SOURCE_VERSION, REVISION_VERSION_STR, COMPAT_VERSION);
+
+	return 0;
+
+unreg_sysfs:
+	sysfs_del_meshif(soft_device);
+unreg_soft_device:
+	unregister_netdev(soft_device);
+	soft_device = NULL;
+	return -ENOMEM;
+
+free_soft_device:
+	free_netdev(soft_device);
+	soft_device = NULL;
+end:
+	return -ENOMEM;
+}
+
+void cleanup_module(void)
+{
+	deactivate_module();
+
+	debugfs_destroy();
+	unregister_netdevice_notifier(&hard_if_notifier);
+	hardif_remove_interfaces();
+
+	if (soft_device) {
+		debugfs_del_meshif(soft_device);
+		sysfs_del_meshif(soft_device);
+		unregister_netdev(soft_device);
+		soft_device = NULL;
+	}
+
+	dev_remove_pack(&batman_adv_packet_type);
+
+	destroy_workqueue(bat_event_workqueue);
+	bat_event_workqueue = NULL;
+}
+
+/* activates the module, starts timer ... */
+void activate_module(void)
+{
+	if (originator_init() < 1)
+		goto err;
+
+	if (hna_local_init() < 1)
+		goto err;
+
+	if (hna_global_init() < 1)
+		goto err;
+
+	hna_local_add(soft_device->dev_addr);
+
+	if (vis_init() < 1)
+		goto err;
+
+	update_min_mtu();
+	atomic_set(&module_state, MODULE_ACTIVE);
+	goto end;
+
+err:
+	printk(KERN_ERR "batman-adv:"
+	       "Unable to allocate memory for mesh information structures: "
+	       "out of mem ?\n");
+	deactivate_module();
+end:
+	return;
+}
+
+/* shuts down the whole module.*/
+void deactivate_module(void)
+{
+	atomic_set(&module_state, MODULE_DEACTIVATING);
+
+	purge_outstanding_packets(NULL);
+	flush_workqueue(bat_event_workqueue);
+
+	vis_quit();
+
+	/* TODO: unregister BATMAN pack */
+
+	originator_free();
+
+	hna_local_free();
+	hna_global_free();
+
+	synchronize_net();
+
+	synchronize_rcu();
+	atomic_set(&module_state, MODULE_INACTIVE);
+}
+
+void inc_module_count(void)
+{
+	try_module_get(THIS_MODULE);
+}
+
+void dec_module_count(void)
+{
+	module_put(THIS_MODULE);
+}
+
+int addr_to_string(char *buff, uint8_t *addr)
+{
+	return sprintf(buff, "%pM", addr);
+}
+
+/* returns 1 if they are the same originator */
+
+int compare_orig(void *data1, void *data2)
+{
+	return (memcmp(data1, data2, ETH_ALEN) == 0 ? 1 : 0);
+}
+
+/* hashfunction to choose an entry in a hash table of given size */
+/* hash algorithm from http://en.wikipedia.org/wiki/Hash_table */
+int choose_orig(void *data, int32_t size)
+{
+	unsigned char *key = data;
+	uint32_t hash = 0;
+	size_t i;
+
+	for (i = 0; i < 6; i++) {
+		hash += key[i];
+		hash += (hash << 10);
+		hash ^= (hash >> 6);
+	}
+
+	hash += (hash << 3);
+	hash ^= (hash >> 11);
+	hash += (hash << 15);
+
+	return hash % size;
+}
+
+int is_my_mac(uint8_t *addr)
+{
+	struct batman_if *batman_if;
+	rcu_read_lock();
+	list_for_each_entry_rcu(batman_if, &if_list, list) {
+		if ((batman_if->net_dev) &&
+		    (compare_orig(batman_if->net_dev->dev_addr, addr))) {
+			rcu_read_unlock();
+			return 1;
+		}
+	}
+	rcu_read_unlock();
+	return 0;
+
+}
+
+int is_bcast(uint8_t *addr)
+{
+	return (addr[0] == (uint8_t)0xff) && (addr[1] == (uint8_t)0xff);
+}
+
+int is_mcast(uint8_t *addr)
+{
+	return *addr & 0x01;
+}
+
+MODULE_LICENSE("GPL");
+
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
+MODULE_SUPPORTED_DEVICE(DRIVER_DEVICE);
+#ifdef REVISION_VERSION
+MODULE_VERSION(SOURCE_VERSION "-" REVISION_VERSION);
+#else
+MODULE_VERSION(SOURCE_VERSION);
+#endif
diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h
new file mode 100644
index 0000000..e308673
--- /dev/null
+++ b/net/batman-adv/main.h
@@ -0,0 +1,169 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_MAIN_H_
+#define _NET_BATMAN_ADV_MAIN_H_
+
+/* Kernel Programming */
+#define LINUX
+
+#define DRIVER_AUTHOR "Marek Lindner <lindner_marek@yahoo.de>, " \
+		      "Simon Wunderlich <siwu@hrz.tu-chemnitz.de>"
+#define DRIVER_DESC   "B.A.T.M.A.N. advanced"
+#define DRIVER_DEVICE "batman-adv"
+
+#define SOURCE_VERSION "maint"
+
+
+/* B.A.T.M.A.N. parameters */
+
+#define TQ_MAX_VALUE 255
+#define JITTER 20
+#define TTL 50			  /* Time To Live of broadcast messages */
+
+#define PURGE_TIMEOUT 200	/* purge originators after time in seconds if no
+				   * valid packet comes in -> TODO: check
+				   * influence on TQ_LOCAL_WINDOW_SIZE */
+#define LOCAL_HNA_TIMEOUT 3600 /* in seconds */
+
+#define TQ_LOCAL_WINDOW_SIZE 64	  /* sliding packet range of received originator
+				   * messages in squence numbers (should be a
+				   * multiple of our word size) */
+#define TQ_GLOBAL_WINDOW_SIZE 5
+#define TQ_LOCAL_BIDRECT_SEND_MINIMUM 1
+#define TQ_LOCAL_BIDRECT_RECV_MINIMUM 1
+#define TQ_TOTAL_BIDRECT_LIMIT 1
+
+#define TQ_HOP_PENALTY 10
+
+#define NUM_WORDS (TQ_LOCAL_WINDOW_SIZE / WORD_BIT_SIZE)
+
+#define PACKBUFF_SIZE 2000
+#define LOG_BUF_LEN 8192	  /* has to be a power of 2 */
+#define ETH_STR_LEN 20
+
+#define VIS_INTERVAL 5000	/* 5 seconds */
+
+/* how much worse secondary interfaces may be to
+ * to be considered as bonding candidates */
+
+#define BONDING_TQ_THRESHOLD	50
+
+#define MAX_AGGREGATION_BYTES 512 /* should not be bigger than 512 bytes or
+				   * change the size of
+				   * forw_packet->direct_link_flags */
+#define MAX_AGGREGATION_MS 100
+
+#define RESET_PROTECTION_MS 30000
+#define EXPECTED_SEQNO_RANGE	65536
+/* don't reset again within 30 seconds */
+
+#define MODULE_INACTIVE 0
+#define MODULE_ACTIVE 1
+#define MODULE_DEACTIVATING 2
+
+#define BCAST_QUEUE_LEN		256
+#define BATMAN_QUEUE_LEN	256
+
+/*
+ * Debug Messages
+ */
+
+#define DBG_BATMAN 1	/* all messages related to routing / flooding /
+			 * broadcasting / etc */
+#define DBG_ROUTES 2	/* route or hna added / changed / deleted */
+
+#ifdef CONFIG_BATMAN_ADV_DEBUG
+extern int debug;
+
+extern int bat_debug_type(int type);
+#define bat_dbg(type, fmt, arg...) do {					\
+		if (bat_debug_type(type))				\
+			printk(KERN_DEBUG "batman-adv:" fmt, ## arg);	\
+	}								\
+	while (0)
+#else /* !CONFIG_BATMAN_ADV_DEBUG */
+#define bat_dbg(type, fmt, arg...) do {		\
+	}					\
+	while (0)
+#endif
+
+/*
+ *  Vis
+ */
+
+/* #define VIS_SUBCLUSTERS_DISABLED */
+
+/*
+ * Kernel headers
+ */
+
+#include <linux/mutex.h>	/* mutex */
+#include <linux/module.h>	/* needed by all modules */
+#include <linux/netdevice.h>	/* netdevice */
+#include <linux/if_ether.h>	/* ethernet header */
+#include <linux/poll.h>		/* poll_table */
+#include <linux/kthread.h>	/* kernel threads */
+#include <linux/pkt_sched.h>	/* schedule types */
+#include <linux/workqueue.h>	/* workqueue */
+#include <linux/slab.h>
+#include <net/sock.h>		/* struct sock */
+#include <linux/jiffies.h>
+#include <linux/seq_file.h>
+#include "types.h"
+
+#ifndef REVISION_VERSION
+#define REVISION_VERSION_STR ""
+#else
+#define REVISION_VERSION_STR " "REVISION_VERSION
+#endif
+
+extern struct list_head if_list;
+extern struct hlist_head forw_bat_list;
+extern struct hlist_head forw_bcast_list;
+extern struct hashtable_t *orig_hash;
+
+extern spinlock_t orig_hash_lock;
+extern spinlock_t forw_bat_list_lock;
+extern spinlock_t forw_bcast_list_lock;
+
+extern atomic_t bcast_queue_left;
+extern atomic_t batman_queue_left;
+extern int16_t num_hna;
+
+extern struct net_device *soft_device;
+
+extern unsigned char broadcast_addr[];
+extern atomic_t module_state;
+extern struct workqueue_struct *bat_event_workqueue;
+
+void activate_module(void);
+void deactivate_module(void);
+void inc_module_count(void);
+void dec_module_count(void);
+int addr_to_string(char *buff, uint8_t *addr);
+int compare_orig(void *data1, void *data2);
+int choose_orig(void *data, int32_t size);
+int is_my_mac(uint8_t *addr);
+int is_bcast(uint8_t *addr);
+int is_mcast(uint8_t *addr);
+
+#endif /* _NET_BATMAN_ADV_MAIN_H_ */
diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
new file mode 100644
index 0000000..26bbf2d
--- /dev/null
+++ b/net/batman-adv/originator.c
@@ -0,0 +1,499 @@
+/*
+ * Copyright (C) 2009-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+/* increase the reference counter for this originator */
+
+#include "main.h"
+#include "originator.h"
+#include "hash.h"
+#include "translation-table.h"
+#include "routing.h"
+#include "hard-interface.h"
+
+static DECLARE_DELAYED_WORK(purge_orig_wq, purge_orig);
+
+static void start_purge_timer(void)
+{
+	queue_delayed_work(bat_event_workqueue, &purge_orig_wq, 1 * HZ);
+}
+
+int originator_init(void)
+{
+	unsigned long flags;
+	if (orig_hash)
+		return 1;
+
+	spin_lock_irqsave(&orig_hash_lock, flags);
+	orig_hash = hash_new(128, compare_orig, choose_orig);
+
+	if (!orig_hash)
+		goto err;
+
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+	start_purge_timer();
+	return 1;
+
+err:
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+	return 0;
+}
+
+struct neigh_node *
+create_neighbor(struct orig_node *orig_node, struct orig_node *orig_neigh_node,
+		uint8_t *neigh, struct batman_if *if_incoming)
+{
+	struct neigh_node *neigh_node;
+
+	bat_dbg(DBG_BATMAN, "Creating new last-hop neighbor of originator\n");
+
+	neigh_node = kzalloc(sizeof(struct neigh_node), GFP_ATOMIC);
+	if (!neigh_node)
+		return NULL;
+
+	INIT_LIST_HEAD(&neigh_node->list);
+
+	memcpy(neigh_node->addr, neigh, ETH_ALEN);
+	neigh_node->orig_node = orig_neigh_node;
+	neigh_node->if_incoming = if_incoming;
+
+	list_add_tail(&neigh_node->list, &orig_node->neigh_list);
+	return neigh_node;
+}
+
+static void free_orig_node(void *data)
+{
+	struct list_head *list_pos, *list_pos_tmp;
+	struct neigh_node *neigh_node;
+	struct orig_node *orig_node = (struct orig_node *)data;
+
+	/* for all neighbors towards this originator ... */
+	list_for_each_safe(list_pos, list_pos_tmp, &orig_node->neigh_list) {
+		neigh_node = list_entry(list_pos, struct neigh_node, list);
+
+		list_del(list_pos);
+		kfree(neigh_node);
+	}
+
+	hna_global_del_orig(orig_node, "originator timed out");
+
+	kfree(orig_node->bcast_own);
+	kfree(orig_node->bcast_own_sum);
+	kfree(orig_node);
+}
+
+void originator_free(void)
+{
+	unsigned long flags;
+
+	if (!orig_hash)
+		return;
+
+	cancel_delayed_work_sync(&purge_orig_wq);
+
+	spin_lock_irqsave(&orig_hash_lock, flags);
+	hash_delete(orig_hash, free_orig_node);
+	orig_hash = NULL;
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+}
+
+/* this function finds or creates an originator entry for the given
+ * address if it does not exits */
+struct orig_node *get_orig_node(uint8_t *addr)
+{
+	/* FIXME: each batman_if will be attached to a softif */
+	struct bat_priv *bat_priv = netdev_priv(soft_device);
+	struct orig_node *orig_node;
+	struct hashtable_t *swaphash;
+	int size;
+
+	orig_node = ((struct orig_node *)hash_find(orig_hash, addr));
+
+	if (orig_node != NULL)
+		return orig_node;
+
+	bat_dbg(DBG_BATMAN, "Creating new originator: %pM\n", addr);
+
+	orig_node = kzalloc(sizeof(struct orig_node), GFP_ATOMIC);
+	if (!orig_node)
+		return NULL;
+
+	INIT_LIST_HEAD(&orig_node->neigh_list);
+
+	memcpy(orig_node->orig, addr, ETH_ALEN);
+	orig_node->router = NULL;
+	orig_node->hna_buff = NULL;
+	orig_node->bcast_seqno_reset = jiffies - 1
+					- msecs_to_jiffies(RESET_PROTECTION_MS);
+	orig_node->batman_seqno_reset = jiffies - 1
+					- msecs_to_jiffies(RESET_PROTECTION_MS);
+
+	size = bat_priv->num_ifaces * sizeof(TYPE_OF_WORD) * NUM_WORDS;
+
+	orig_node->bcast_own = kzalloc(size, GFP_ATOMIC);
+	if (!orig_node->bcast_own)
+		goto free_orig_node;
+
+	size = bat_priv->num_ifaces * sizeof(uint8_t);
+	orig_node->bcast_own_sum = kzalloc(size, GFP_ATOMIC);
+	if (!orig_node->bcast_own_sum)
+		goto free_bcast_own;
+
+	if (hash_add(orig_hash, orig_node) < 0)
+		goto free_bcast_own_sum;
+
+	if (orig_hash->elements * 4 > orig_hash->size) {
+		swaphash = hash_resize(orig_hash, orig_hash->size * 2);
+
+		if (swaphash == NULL)
+			printk(KERN_ERR
+			       "batman-adv:Couldn't resize orig hash table\n");
+		else
+			orig_hash = swaphash;
+	}
+
+	return orig_node;
+free_bcast_own_sum:
+	kfree(orig_node->bcast_own_sum);
+free_bcast_own:
+	kfree(orig_node->bcast_own);
+free_orig_node:
+	kfree(orig_node);
+	return NULL;
+}
+
+static bool purge_orig_neighbors(struct orig_node *orig_node,
+				 struct neigh_node **best_neigh_node)
+{
+	struct list_head *list_pos, *list_pos_tmp;
+	struct neigh_node *neigh_node;
+	bool neigh_purged = false;
+
+	*best_neigh_node = NULL;
+
+	/* for all neighbors towards this originator ... */
+	list_for_each_safe(list_pos, list_pos_tmp, &orig_node->neigh_list) {
+		neigh_node = list_entry(list_pos, struct neigh_node, list);
+
+		if ((time_after(jiffies,
+			neigh_node->last_valid + PURGE_TIMEOUT * HZ)) ||
+		    (neigh_node->if_incoming->if_status ==
+						IF_TO_BE_REMOVED)) {
+
+			if (neigh_node->if_incoming->if_status ==
+							IF_TO_BE_REMOVED)
+				bat_dbg(DBG_BATMAN,
+					"neighbor purge: originator %pM, "
+					"neighbor: %pM, iface: %s\n",
+					orig_node->orig, neigh_node->addr,
+					neigh_node->if_incoming->dev);
+			else
+				bat_dbg(DBG_BATMAN,
+					"neighbor timeout: originator %pM, "
+					"neighbor: %pM, last_valid: %lu\n",
+					orig_node->orig, neigh_node->addr,
+					(neigh_node->last_valid / HZ));
+
+			neigh_purged = true;
+			list_del(list_pos);
+			kfree(neigh_node);
+		} else {
+			if ((*best_neigh_node == NULL) ||
+			    (neigh_node->tq_avg > (*best_neigh_node)->tq_avg))
+				*best_neigh_node = neigh_node;
+		}
+	}
+	return neigh_purged;
+}
+
+static bool purge_orig_node(struct orig_node *orig_node)
+{
+	/* FIXME: each batman_if will be attached to a softif */
+	struct bat_priv *bat_priv = netdev_priv(soft_device);
+	struct neigh_node *best_neigh_node;
+
+	if (time_after(jiffies,
+		orig_node->last_valid + 2 * PURGE_TIMEOUT * HZ)) {
+
+		bat_dbg(DBG_BATMAN,
+			"Originator timeout: originator %pM, last_valid %lu\n",
+			orig_node->orig, (orig_node->last_valid / HZ));
+		return true;
+	} else {
+		if (purge_orig_neighbors(orig_node, &best_neigh_node)) {
+			update_routes(orig_node, best_neigh_node,
+				      orig_node->hna_buff,
+				      orig_node->hna_buff_len);
+			/* update bonding candidates, we could have lost
+			 * some candidates. */
+			update_bonding_candidates(bat_priv, orig_node);
+		}
+	}
+
+	return false;
+}
+
+void purge_orig(struct work_struct *work)
+{
+	HASHIT(hashit);
+	struct orig_node *orig_node;
+	unsigned long flags;
+
+	spin_lock_irqsave(&orig_hash_lock, flags);
+
+	/* for all origins... */
+	while (hash_iterate(orig_hash, &hashit)) {
+		orig_node = hashit.bucket->data;
+		if (purge_orig_node(orig_node)) {
+			hash_remove_bucket(orig_hash, &hashit);
+			free_orig_node(orig_node);
+		}
+	}
+
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+	/* if work == NULL we were not called by the timer
+	 * and thus do not need to re-arm the timer */
+	if (work)
+		start_purge_timer();
+}
+
+int orig_seq_print_text(struct seq_file *seq, void *offset)
+{
+	HASHIT(hashit);
+	struct net_device *net_dev = (struct net_device *)seq->private;
+	struct bat_priv *bat_priv = netdev_priv(net_dev);
+	struct orig_node *orig_node;
+	struct neigh_node *neigh_node;
+	int batman_count = 0;
+	unsigned long flags;
+	char orig_str[ETH_STR_LEN], router_str[ETH_STR_LEN];
+
+	if ((!bat_priv->primary_if) ||
+	    (bat_priv->primary_if->if_status != IF_ACTIVE)) {
+		if (!bat_priv->primary_if)
+			return seq_printf(seq, "BATMAN mesh %s disabled - "
+				     "please specify interfaces to enable it\n",
+				     net_dev->name);
+
+		return seq_printf(seq, "BATMAN mesh %s "
+				  "disabled - primary interface not active\n",
+				  net_dev->name);
+	}
+
+	rcu_read_lock();
+	seq_printf(seq, "  %-14s (%s/%i) %17s [%10s]: %20s "
+		   "... [B.A.T.M.A.N. adv %s%s, MainIF/MAC: %s/%s (%s)]\n",
+		   "Originator", "#", TQ_MAX_VALUE, "Nexthop", "outgoingIF",
+		   "Potential nexthops", SOURCE_VERSION, REVISION_VERSION_STR,
+		   bat_priv->primary_if->dev, bat_priv->primary_if->addr_str,
+		   net_dev->name);
+	rcu_read_unlock();
+
+	spin_lock_irqsave(&orig_hash_lock, flags);
+
+	while (hash_iterate(orig_hash, &hashit)) {
+
+		orig_node = hashit.bucket->data;
+
+		if (!orig_node->router)
+			continue;
+
+		if (orig_node->router->tq_avg == 0)
+			continue;
+
+		addr_to_string(orig_str, orig_node->orig);
+		addr_to_string(router_str, orig_node->router->addr);
+
+		seq_printf(seq, "%-17s  (%3i) %17s [%10s]:",
+			   orig_str, orig_node->router->tq_avg, router_str,
+			   orig_node->router->if_incoming->dev);
+
+		list_for_each_entry(neigh_node, &orig_node->neigh_list, list) {
+			addr_to_string(orig_str, neigh_node->addr);
+			seq_printf(seq, " %17s (%3i)", orig_str,
+					   neigh_node->tq_avg);
+		}
+
+		seq_printf(seq, "\n");
+		batman_count++;
+	}
+
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+	if ((batman_count == 0))
+		seq_printf(seq, "No batman nodes in range ...\n");
+
+	return 0;
+}
+
+static int orig_node_add_if(struct orig_node *orig_node, int max_if_num)
+{
+	void *data_ptr;
+
+	data_ptr = kmalloc(max_if_num * sizeof(TYPE_OF_WORD) * NUM_WORDS,
+			   GFP_ATOMIC);
+	if (!data_ptr) {
+		printk(KERN_ERR
+		       "batman-adv:Can't resize orig: out of memory\n");
+		return -1;
+	}
+
+	memcpy(data_ptr, orig_node->bcast_own,
+	       (max_if_num - 1) * sizeof(TYPE_OF_WORD) * NUM_WORDS);
+	kfree(orig_node->bcast_own);
+	orig_node->bcast_own = data_ptr;
+
+	data_ptr = kmalloc(max_if_num * sizeof(uint8_t), GFP_ATOMIC);
+	if (!data_ptr) {
+		printk(KERN_ERR
+		       "batman-adv:Can't resize orig: out of memory\n");
+		return -1;
+	}
+
+	memcpy(data_ptr, orig_node->bcast_own_sum,
+	       (max_if_num - 1) * sizeof(uint8_t));
+	kfree(orig_node->bcast_own_sum);
+	orig_node->bcast_own_sum = data_ptr;
+
+	return 0;
+}
+
+int orig_hash_add_if(struct batman_if *batman_if, int max_if_num)
+{
+	struct orig_node *orig_node;
+	HASHIT(hashit);
+
+	/* resize all orig nodes because orig_node->bcast_own(_sum) depend on
+	 * if_num */
+	spin_lock(&orig_hash_lock);
+
+	while (hash_iterate(orig_hash, &hashit)) {
+		orig_node = hashit.bucket->data;
+
+		if (orig_node_add_if(orig_node, max_if_num) == -1)
+			goto err;
+	}
+
+	spin_unlock(&orig_hash_lock);
+	return 0;
+
+err:
+	spin_unlock(&orig_hash_lock);
+	return -ENOMEM;
+}
+
+static int orig_node_del_if(struct orig_node *orig_node,
+		     int max_if_num, int del_if_num)
+{
+	void *data_ptr = NULL;
+	int chunk_size;
+
+	/* last interface was removed */
+	if (max_if_num == 0)
+		goto free_bcast_own;
+
+	chunk_size = sizeof(TYPE_OF_WORD) * NUM_WORDS;
+	data_ptr = kmalloc(max_if_num * chunk_size, GFP_ATOMIC);
+	if (!data_ptr) {
+		printk(KERN_ERR
+		       "batman-adv:Can't resize orig: out of memory\n");
+		return -1;
+	}
+
+	/* copy first part */
+	memcpy(data_ptr, orig_node->bcast_own, del_if_num * chunk_size);
+
+	/* copy second part */
+	memcpy(data_ptr,
+	       orig_node->bcast_own + ((del_if_num + 1) * chunk_size),
+	       (max_if_num - del_if_num) * chunk_size);
+
+free_bcast_own:
+	kfree(orig_node->bcast_own);
+	orig_node->bcast_own = data_ptr;
+
+	if (max_if_num == 0)
+		goto free_own_sum;
+
+	data_ptr = kmalloc(max_if_num * sizeof(uint8_t), GFP_ATOMIC);
+	if (!data_ptr) {
+		printk(KERN_ERR
+		       "batman-adv:Can't resize orig: out of memory\n");
+		return -1;
+	}
+
+	memcpy(data_ptr, orig_node->bcast_own_sum,
+	       del_if_num * sizeof(uint8_t));
+
+	memcpy(data_ptr,
+	       orig_node->bcast_own_sum + ((del_if_num + 1) * sizeof(uint8_t)),
+	       (max_if_num - del_if_num) * sizeof(uint8_t));
+
+free_own_sum:
+	kfree(orig_node->bcast_own_sum);
+	orig_node->bcast_own_sum = data_ptr;
+
+	return 0;
+}
+
+int orig_hash_del_if(struct batman_if *batman_if, int max_if_num)
+{
+	struct batman_if *batman_if_tmp;
+	struct orig_node *orig_node;
+	HASHIT(hashit);
+	int ret;
+
+	/* resize all orig nodes because orig_node->bcast_own(_sum) depend on
+	 * if_num */
+	spin_lock(&orig_hash_lock);
+
+	while (hash_iterate(orig_hash, &hashit)) {
+		orig_node = hashit.bucket->data;
+
+		ret = orig_node_del_if(orig_node, max_if_num,
+				       batman_if->if_num);
+
+		if (ret == -1)
+			goto err;
+	}
+
+	/* renumber remaining batman interfaces _inside_ of orig_hash_lock */
+	rcu_read_lock();
+	list_for_each_entry_rcu(batman_if_tmp, &if_list, list) {
+		if (batman_if_tmp->if_status == IF_NOT_IN_USE)
+			continue;
+
+		if (batman_if == batman_if_tmp)
+			continue;
+
+		if (batman_if_tmp->if_num > batman_if->if_num)
+			batman_if_tmp->if_num--;
+	}
+	rcu_read_unlock();
+
+	batman_if->if_num = -1;
+	spin_unlock(&orig_hash_lock);
+	return 0;
+
+err:
+	spin_unlock(&orig_hash_lock);
+	return -ENOMEM;
+}
diff --git a/net/batman-adv/originator.h b/net/batman-adv/originator.h
new file mode 100644
index 0000000..e88411d
--- /dev/null
+++ b/net/batman-adv/originator.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_ORIGINATOR_H_
+#define _NET_BATMAN_ADV_ORIGINATOR_H_
+
+int originator_init(void);
+void originator_free(void);
+void purge_orig(struct work_struct *work);
+struct orig_node *get_orig_node(uint8_t *addr);
+struct neigh_node *
+create_neighbor(struct orig_node *orig_node, struct orig_node *orig_neigh_node,
+		uint8_t *neigh, struct batman_if *if_incoming);
+int orig_seq_print_text(struct seq_file *seq, void *offset);
+int orig_hash_add_if(struct batman_if *batman_if, int max_if_num);
+int orig_hash_del_if(struct batman_if *batman_if, int max_if_num);
+
+#endif /* _NET_BATMAN_ADV_ORIGINATOR_H_ */
diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h
new file mode 100644
index 0000000..abb5e46
--- /dev/null
+++ b/net/batman-adv/packet.h
@@ -0,0 +1,120 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_PACKET_H_
+#define _NET_BATMAN_ADV_PACKET_H_
+
+#define ETH_P_BATMAN  0x4305	/* unofficial/not registered Ethertype */
+
+#define BAT_PACKET    0x01
+#define BAT_ICMP      0x02
+#define BAT_UNICAST   0x03
+#define BAT_BCAST     0x04
+#define BAT_VIS       0x05
+
+/* this file is included by batctl which needs these defines */
+#define COMPAT_VERSION 11
+#define DIRECTLINK 0x40
+#define VIS_SERVER 0x20
+#define PRIMARIES_FIRST_HOP 0x10
+
+/* ICMP message types */
+#define ECHO_REPLY 0
+#define DESTINATION_UNREACHABLE 3
+#define ECHO_REQUEST 8
+#define TTL_EXCEEDED 11
+#define PARAMETER_PROBLEM 12
+
+/* vis defines */
+#define VIS_TYPE_SERVER_SYNC		0
+#define VIS_TYPE_CLIENT_UPDATE		1
+
+struct batman_packet {
+	uint8_t  packet_type;
+	uint8_t  version;  /* batman version field */
+	uint8_t  flags;    /* 0x40: DIRECTLINK flag, 0x20 VIS_SERVER flag... */
+	uint8_t  tq;
+	uint32_t seqno;
+	uint8_t  orig[6];
+	uint8_t  prev_sender[6];
+	uint8_t  ttl;
+	uint8_t  num_hna;
+} __attribute__((packed));
+
+#define BAT_PACKET_LEN sizeof(struct batman_packet)
+
+struct icmp_packet {
+	uint8_t  packet_type;
+	uint8_t  version;  /* batman version field */
+	uint8_t  msg_type; /* see ICMP message types above */
+	uint8_t  ttl;
+	uint8_t  dst[6];
+	uint8_t  orig[6];
+	uint16_t seqno;
+	uint8_t  uid;
+} __attribute__((packed));
+
+#define BAT_RR_LEN 16
+
+/* icmp_packet_rr must start with all fields from imcp_packet
+   as this is assumed by code that handles ICMP packets */
+struct icmp_packet_rr {
+	uint8_t  packet_type;
+	uint8_t  version;  /* batman version field */
+	uint8_t  msg_type; /* see ICMP message types above */
+	uint8_t  ttl;
+	uint8_t  dst[6];
+	uint8_t  orig[6];
+	uint16_t seqno;
+	uint8_t  uid;
+	uint8_t  rr_cur;
+	uint8_t  rr[BAT_RR_LEN][ETH_ALEN];
+} __attribute__((packed));
+
+struct unicast_packet {
+	uint8_t  packet_type;
+	uint8_t  version;  /* batman version field */
+	uint8_t  dest[6];
+	uint8_t  ttl;
+} __attribute__((packed));
+
+struct bcast_packet {
+	uint8_t  packet_type;
+	uint8_t  version;  /* batman version field */
+	uint8_t  orig[6];
+	uint8_t  ttl;
+	uint32_t seqno;
+} __attribute__((packed));
+
+struct vis_packet {
+	uint8_t  packet_type;
+	uint8_t  version;        /* batman version field */
+	uint8_t  vis_type;	 /* which type of vis-participant sent this? */
+	uint8_t  entries;	 /* number of entries behind this struct */
+	uint32_t seqno;		 /* sequence number */
+	uint8_t  ttl;		 /* TTL */
+	uint8_t  vis_orig[6];	 /* originator that informs about its
+				  * neighbors */
+	uint8_t  target_orig[6]; /* who should receive this packet */
+	uint8_t  sender_orig[6]; /* who sent or rebroadcasted this packet */
+} __attribute__((packed));
+
+#endif /* _NET_BATMAN_ADV_PACKET_H_ */
diff --git a/net/batman-adv/ring_buffer.c b/net/batman-adv/ring_buffer.c
new file mode 100644
index 0000000..defd37c
--- /dev/null
+++ b/net/batman-adv/ring_buffer.c
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include "main.h"
+#include "ring_buffer.h"
+
+void ring_buffer_set(uint8_t lq_recv[], uint8_t *lq_index, uint8_t value)
+{
+	lq_recv[*lq_index] = value;
+	*lq_index = (*lq_index + 1) % TQ_GLOBAL_WINDOW_SIZE;
+}
+
+uint8_t ring_buffer_avg(uint8_t lq_recv[])
+{
+	uint8_t *ptr;
+	uint16_t count = 0, i = 0, sum = 0;
+
+	ptr = lq_recv;
+
+	while (i < TQ_GLOBAL_WINDOW_SIZE) {
+		if (*ptr != 0) {
+			count++;
+			sum += *ptr;
+		}
+
+		i++;
+		ptr++;
+	}
+
+	if (count == 0)
+		return 0;
+
+	return (uint8_t)(sum / count);
+}
diff --git a/net/batman-adv/ring_buffer.h b/net/batman-adv/ring_buffer.h
new file mode 100644
index 0000000..6b0cb9a
--- /dev/null
+++ b/net/batman-adv/ring_buffer.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_RING_BUFFER_H_
+#define _NET_BATMAN_ADV_RING_BUFFER_H_
+
+void ring_buffer_set(uint8_t lq_recv[], uint8_t *lq_index, uint8_t value);
+uint8_t ring_buffer_avg(uint8_t lq_recv[]);
+
+#endif /* _NET_BATMAN_ADV_RING_BUFFER_H_ */
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
new file mode 100644
index 0000000..127cfbf
--- /dev/null
+++ b/net/batman-adv/routing.c
@@ -0,0 +1,1291 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include "main.h"
+#include "routing.h"
+#include "send.h"
+#include "hash.h"
+#include "soft-interface.h"
+#include "hard-interface.h"
+#include "icmp_socket.h"
+#include "translation-table.h"
+#include "originator.h"
+#include "types.h"
+#include "ring_buffer.h"
+#include "vis.h"
+#include "aggregation.h"
+
+static DECLARE_WAIT_QUEUE_HEAD(thread_wait);
+
+void slide_own_bcast_window(struct batman_if *batman_if)
+{
+	HASHIT(hashit);
+	struct orig_node *orig_node;
+	TYPE_OF_WORD *word;
+	unsigned long flags;
+
+	spin_lock_irqsave(&orig_hash_lock, flags);
+
+	while (hash_iterate(orig_hash, &hashit)) {
+		orig_node = hashit.bucket->data;
+		word = &(orig_node->bcast_own[batman_if->if_num * NUM_WORDS]);
+
+		bit_get_packet(word, 1, 0);
+		orig_node->bcast_own_sum[batman_if->if_num] =
+			bit_packet_count(word);
+	}
+
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+}
+
+static void update_HNA(struct orig_node *orig_node,
+		       unsigned char *hna_buff, int hna_buff_len)
+{
+	if ((hna_buff_len != orig_node->hna_buff_len) ||
+	    ((hna_buff_len > 0) &&
+	     (orig_node->hna_buff_len > 0) &&
+	     (memcmp(orig_node->hna_buff, hna_buff, hna_buff_len) != 0))) {
+
+		if (orig_node->hna_buff_len > 0)
+			hna_global_del_orig(orig_node,
+					    "originator changed hna");
+
+		if ((hna_buff_len > 0) && (hna_buff != NULL))
+			hna_global_add_orig(orig_node, hna_buff, hna_buff_len);
+	}
+}
+
+static void update_route(struct orig_node *orig_node,
+			 struct neigh_node *neigh_node,
+			 unsigned char *hna_buff, int hna_buff_len)
+{
+	/* route deleted */
+	if ((orig_node->router != NULL) && (neigh_node == NULL)) {
+
+		bat_dbg(DBG_ROUTES, "Deleting route towards: %pM\n",
+			orig_node->orig);
+		hna_global_del_orig(orig_node, "originator timed out");
+
+		/* route added */
+	} else if ((orig_node->router == NULL) && (neigh_node != NULL)) {
+
+		bat_dbg(DBG_ROUTES,
+			"Adding route towards: %pM (via %pM)\n",
+			orig_node->orig, neigh_node->addr);
+		hna_global_add_orig(orig_node, hna_buff, hna_buff_len);
+
+		/* route changed */
+	} else {
+		bat_dbg(DBG_ROUTES,
+			"Changing route towards: %pM "
+			"(now via %pM - was via %pM)\n",
+			orig_node->orig, neigh_node->addr,
+			orig_node->router->addr);
+	}
+
+	orig_node->router = neigh_node;
+}
+
+
+void update_routes(struct orig_node *orig_node,
+			  struct neigh_node *neigh_node,
+			  unsigned char *hna_buff, int hna_buff_len)
+{
+
+	if (orig_node == NULL)
+		return;
+
+	if (orig_node->router != neigh_node)
+		update_route(orig_node, neigh_node, hna_buff, hna_buff_len);
+	/* may be just HNA changed */
+	else
+		update_HNA(orig_node, hna_buff, hna_buff_len);
+}
+
+static int is_bidirectional_neigh(struct orig_node *orig_node,
+				struct orig_node *orig_neigh_node,
+				struct batman_packet *batman_packet,
+				struct batman_if *if_incoming)
+{
+	struct neigh_node *neigh_node = NULL, *tmp_neigh_node = NULL;
+	unsigned char total_count;
+
+	if (orig_node == orig_neigh_node) {
+		list_for_each_entry(tmp_neigh_node,
+				    &orig_node->neigh_list,
+				    list) {
+
+			if (compare_orig(tmp_neigh_node->addr,
+					 orig_neigh_node->orig) &&
+			    (tmp_neigh_node->if_incoming == if_incoming))
+				neigh_node = tmp_neigh_node;
+		}
+
+		if (!neigh_node)
+			neigh_node = create_neighbor(orig_node,
+						     orig_neigh_node,
+						     orig_neigh_node->orig,
+						     if_incoming);
+		/* create_neighbor failed, return 0 */
+		if (!neigh_node)
+			return 0;
+
+		neigh_node->last_valid = jiffies;
+	} else {
+		/* find packet count of corresponding one hop neighbor */
+		list_for_each_entry(tmp_neigh_node,
+				    &orig_neigh_node->neigh_list, list) {
+
+			if (compare_orig(tmp_neigh_node->addr,
+					 orig_neigh_node->orig) &&
+			    (tmp_neigh_node->if_incoming == if_incoming))
+				neigh_node = tmp_neigh_node;
+		}
+
+		if (!neigh_node)
+			neigh_node = create_neighbor(orig_neigh_node,
+						     orig_neigh_node,
+						     orig_neigh_node->orig,
+						     if_incoming);
+		/* create_neighbor failed, return 0 */
+		if (!neigh_node)
+			return 0;
+	}
+
+	orig_node->last_valid = jiffies;
+
+	/* pay attention to not get a value bigger than 100 % */
+	total_count = (orig_neigh_node->bcast_own_sum[if_incoming->if_num] >
+		       neigh_node->real_packet_count ?
+		       neigh_node->real_packet_count :
+		       orig_neigh_node->bcast_own_sum[if_incoming->if_num]);
+
+	/* if we have too few packets (too less data) we set tq_own to zero */
+	/* if we receive too few packets it is not considered bidirectional */
+	if ((total_count < TQ_LOCAL_BIDRECT_SEND_MINIMUM) ||
+	    (neigh_node->real_packet_count < TQ_LOCAL_BIDRECT_RECV_MINIMUM))
+		orig_neigh_node->tq_own = 0;
+	else
+		/* neigh_node->real_packet_count is never zero as we
+		 * only purge old information when getting new
+		 * information */
+		orig_neigh_node->tq_own = (TQ_MAX_VALUE * total_count) /
+			neigh_node->real_packet_count;
+
+	/*
+	 * 1 - ((1-x) ** 3), normalized to TQ_MAX_VALUE this does
+	 * affect the nearly-symmetric links only a little, but
+	 * punishes asymmetric links more.  This will give a value
+	 * between 0 and TQ_MAX_VALUE
+	 */
+	orig_neigh_node->tq_asym_penalty =
+		TQ_MAX_VALUE -
+		(TQ_MAX_VALUE *
+		 (TQ_LOCAL_WINDOW_SIZE - neigh_node->real_packet_count) *
+		 (TQ_LOCAL_WINDOW_SIZE - neigh_node->real_packet_count) *
+		 (TQ_LOCAL_WINDOW_SIZE - neigh_node->real_packet_count)) /
+		(TQ_LOCAL_WINDOW_SIZE *
+		 TQ_LOCAL_WINDOW_SIZE *
+		 TQ_LOCAL_WINDOW_SIZE);
+
+	batman_packet->tq = ((batman_packet->tq *
+			      orig_neigh_node->tq_own *
+			      orig_neigh_node->tq_asym_penalty) /
+			     (TQ_MAX_VALUE * TQ_MAX_VALUE));
+
+	bat_dbg(DBG_BATMAN,
+		"bidirectional: "
+		"orig = %-15pM neigh = %-15pM => own_bcast = %2i, "
+		"real recv = %2i, local tq: %3i, asym_penalty: %3i, "
+		"total tq: %3i\n",
+		orig_node->orig, orig_neigh_node->orig, total_count,
+		neigh_node->real_packet_count, orig_neigh_node->tq_own,
+		orig_neigh_node->tq_asym_penalty, batman_packet->tq);
+
+	/* if link has the minimum required transmission quality
+	 * consider it bidirectional */
+	if (batman_packet->tq >= TQ_TOTAL_BIDRECT_LIMIT)
+		return 1;
+
+	return 0;
+}
+
+static void update_orig(struct orig_node *orig_node, struct ethhdr *ethhdr,
+			struct batman_packet *batman_packet,
+			struct batman_if *if_incoming,
+			unsigned char *hna_buff, int hna_buff_len,
+			char is_duplicate)
+{
+	struct neigh_node *neigh_node = NULL, *tmp_neigh_node = NULL;
+	int tmp_hna_buff_len;
+
+	bat_dbg(DBG_BATMAN, "update_originator(): "
+		"Searching and updating originator entry of received packet\n");
+
+	list_for_each_entry(tmp_neigh_node, &orig_node->neigh_list, list) {
+		if (compare_orig(tmp_neigh_node->addr, ethhdr->h_source) &&
+		    (tmp_neigh_node->if_incoming == if_incoming)) {
+			neigh_node = tmp_neigh_node;
+			continue;
+		}
+
+		if (is_duplicate)
+			continue;
+
+		ring_buffer_set(tmp_neigh_node->tq_recv,
+				&tmp_neigh_node->tq_index, 0);
+		tmp_neigh_node->tq_avg =
+			ring_buffer_avg(tmp_neigh_node->tq_recv);
+	}
+
+	if (!neigh_node) {
+		struct orig_node *orig_tmp;
+
+		orig_tmp = get_orig_node(ethhdr->h_source);
+		if (!orig_tmp)
+			return;
+
+		neigh_node = create_neighbor(orig_node,
+					     orig_tmp,
+					     ethhdr->h_source, if_incoming);
+		if (!neigh_node)
+			return;
+	} else
+		bat_dbg(DBG_BATMAN,
+			"Updating existing last-hop neighbor of originator\n");
+
+	orig_node->flags = batman_packet->flags;
+	neigh_node->last_valid = jiffies;
+
+	ring_buffer_set(neigh_node->tq_recv,
+			&neigh_node->tq_index,
+			batman_packet->tq);
+	neigh_node->tq_avg = ring_buffer_avg(neigh_node->tq_recv);
+
+	if (!is_duplicate) {
+		orig_node->last_ttl = batman_packet->ttl;
+		neigh_node->last_ttl = batman_packet->ttl;
+	}
+
+	tmp_hna_buff_len = (hna_buff_len > batman_packet->num_hna * ETH_ALEN ?
+			    batman_packet->num_hna * ETH_ALEN : hna_buff_len);
+
+	/* if this neighbor already is our next hop there is nothing
+	 * to change */
+	if (orig_node->router == neigh_node)
+		goto update_hna;
+
+	/* if this neighbor does not offer a better TQ we won't consider it */
+	if ((orig_node->router) &&
+	    (orig_node->router->tq_avg > neigh_node->tq_avg))
+		goto update_hna;
+
+	/* if the TQ is the same and the link not more symetric we
+	 * won't consider it either */
+	if ((orig_node->router) &&
+	     ((neigh_node->tq_avg == orig_node->router->tq_avg) &&
+	     (orig_node->router->orig_node->bcast_own_sum[if_incoming->if_num]
+	      >= neigh_node->orig_node->bcast_own_sum[if_incoming->if_num])))
+		goto update_hna;
+
+	update_routes(orig_node, neigh_node, hna_buff, tmp_hna_buff_len);
+	return;
+
+update_hna:
+	update_routes(orig_node, orig_node->router, hna_buff, tmp_hna_buff_len);
+}
+
+/* checks whether the host restarted and is in the protection time.
+ * returns:
+ *  0 if the packet is to be accepted
+ *  1 if the packet is to be ignored.
+ */
+static int window_protected(int32_t seq_num_diff,
+				unsigned long *last_reset)
+{
+	if ((seq_num_diff <= -TQ_LOCAL_WINDOW_SIZE)
+		|| (seq_num_diff >= EXPECTED_SEQNO_RANGE)) {
+		if (time_after(jiffies, *last_reset +
+			msecs_to_jiffies(RESET_PROTECTION_MS))) {
+
+			*last_reset = jiffies;
+			bat_dbg(DBG_BATMAN,
+				"old packet received, start protection\n");
+
+			return 0;
+		} else
+			return 1;
+	}
+	return 0;
+}
+
+/* processes a batman packet for all interfaces, adjusts the sequence number and
+ * finds out whether it is a duplicate.
+ * returns:
+ *   1 the packet is a duplicate
+ *   0 the packet has not yet been received
+ *  -1 the packet is old and has been received while the seqno window
+ *     was protected. Caller should drop it.
+ */
+static char count_real_packets(struct ethhdr *ethhdr,
+			       struct batman_packet *batman_packet,
+			       struct batman_if *if_incoming)
+{
+	struct orig_node *orig_node;
+	struct neigh_node *tmp_neigh_node;
+	char is_duplicate = 0;
+	int32_t seq_diff;
+	int need_update = 0;
+	int set_mark;
+
+	orig_node = get_orig_node(batman_packet->orig);
+	if (orig_node == NULL)
+		return 0;
+
+	seq_diff = batman_packet->seqno - orig_node->last_real_seqno;
+
+	/* signalize caller that the packet is to be dropped. */
+	if (window_protected(seq_diff, &orig_node->batman_seqno_reset))
+		return -1;
+
+	list_for_each_entry(tmp_neigh_node, &orig_node->neigh_list, list) {
+
+		is_duplicate |= get_bit_status(tmp_neigh_node->real_bits,
+					       orig_node->last_real_seqno,
+					       batman_packet->seqno);
+
+		if (compare_orig(tmp_neigh_node->addr, ethhdr->h_source) &&
+		    (tmp_neigh_node->if_incoming == if_incoming))
+			set_mark = 1;
+		else
+			set_mark = 0;
+
+		/* if the window moved, set the update flag. */
+		need_update |= bit_get_packet(tmp_neigh_node->real_bits,
+						seq_diff, set_mark);
+
+		tmp_neigh_node->real_packet_count =
+			bit_packet_count(tmp_neigh_node->real_bits);
+	}
+
+	if (need_update) {
+		bat_dbg(DBG_BATMAN, "updating last_seqno: old %d, new %d\n",
+			orig_node->last_real_seqno, batman_packet->seqno);
+		orig_node->last_real_seqno = batman_packet->seqno;
+	}
+
+	return is_duplicate;
+}
+
+/* copy primary address for bonding */
+static void mark_bonding_address(struct bat_priv *bat_priv,
+				 struct orig_node *orig_node,
+				 struct orig_node *orig_neigh_node,
+				 struct batman_packet *batman_packet)
+
+{
+	if (batman_packet->flags & PRIMARIES_FIRST_HOP)
+		memcpy(orig_neigh_node->primary_addr,
+		       orig_node->orig, ETH_ALEN);
+
+	return;
+}
+
+/* mark possible bond.candidates in the neighbor list */
+void update_bonding_candidates(struct bat_priv *bat_priv,
+			       struct orig_node *orig_node)
+{
+	int candidates;
+	int interference_candidate;
+	int best_tq;
+	struct neigh_node *tmp_neigh_node, *tmp_neigh_node2;
+	struct neigh_node *first_candidate, *last_candidate;
+
+	/* update the candidates for this originator */
+	if (!orig_node->router) {
+		orig_node->bond.candidates = 0;
+		return;
+	}
+
+	best_tq = orig_node->router->tq_avg;
+
+	/* update bond.candidates */
+
+	candidates = 0;
+
+	/* mark other nodes which also received "PRIMARIES FIRST HOP" packets
+	 * as "bonding partner" */
+
+	/* first, zero the list */
+	list_for_each_entry(tmp_neigh_node, &orig_node->neigh_list, list) {
+		tmp_neigh_node->next_bond_candidate = NULL;
+	}
+
+	first_candidate = NULL;
+	last_candidate = NULL;
+	list_for_each_entry(tmp_neigh_node, &orig_node->neigh_list, list) {
+
+		/* only consider if it has the same primary address ...  */
+		if (memcmp(orig_node->orig,
+				tmp_neigh_node->orig_node->primary_addr,
+				ETH_ALEN) != 0)
+			continue;
+
+		/* ... and is good enough to be considered */
+		if (tmp_neigh_node->tq_avg < best_tq - BONDING_TQ_THRESHOLD)
+			continue;
+
+		/* check if we have another candidate with the same
+		 * mac address or interface. If we do, we won't
+		 * select this candidate because of possible interference. */
+
+		interference_candidate = 0;
+		list_for_each_entry(tmp_neigh_node2,
+				&orig_node->neigh_list, list) {
+
+			if (tmp_neigh_node2 == tmp_neigh_node)
+				continue;
+
+			/* we only care if the other candidate is even
+			 * considered as candidate. */
+			if (tmp_neigh_node2->next_bond_candidate == NULL)
+				continue;
+
+
+			if ((tmp_neigh_node->if_incoming ==
+				tmp_neigh_node2->if_incoming)
+				|| (memcmp(tmp_neigh_node->addr,
+				tmp_neigh_node2->addr, ETH_ALEN) == 0)) {
+
+				interference_candidate = 1;
+				break;
+			}
+		}
+		/* don't care further if it is an interference candidate */
+		if (interference_candidate)
+			continue;
+
+		if (first_candidate == NULL) {
+			first_candidate = tmp_neigh_node;
+			tmp_neigh_node->next_bond_candidate = first_candidate;
+		} else
+			tmp_neigh_node->next_bond_candidate = last_candidate;
+
+		last_candidate = tmp_neigh_node;
+
+		candidates++;
+	}
+
+	if (candidates > 0) {
+		first_candidate->next_bond_candidate = last_candidate;
+		orig_node->bond.selected = first_candidate;
+	}
+
+	orig_node->bond.candidates = candidates;
+}
+
+void receive_bat_packet(struct ethhdr *ethhdr,
+				struct batman_packet *batman_packet,
+				unsigned char *hna_buff, int hna_buff_len,
+				struct batman_if *if_incoming)
+{
+	/* FIXME: each orig_node->batman_if will be attached to a softif */
+	struct bat_priv *bat_priv = netdev_priv(soft_device);
+	struct batman_if *batman_if;
+	struct orig_node *orig_neigh_node, *orig_node;
+	char has_directlink_flag;
+	char is_my_addr = 0, is_my_orig = 0, is_my_oldorig = 0;
+	char is_broadcast = 0, is_bidirectional, is_single_hop_neigh;
+	char is_duplicate;
+	uint32_t if_incoming_seqno;
+
+	/* Silently drop when the batman packet is actually not a
+	 * correct packet.
+	 *
+	 * This might happen if a packet is padded (e.g. Ethernet has a
+	 * minimum frame length of 64 byte) and the aggregation interprets
+	 * it as an additional length.
+	 *
+	 * TODO: A more sane solution would be to have a bit in the
+	 * batman_packet to detect whether the packet is the last
+	 * packet in an aggregation.  Here we expect that the padding
+	 * is always zero (or not 0x01)
+	 */
+	if (batman_packet->packet_type != BAT_PACKET)
+		return;
+
+	/* could be changed by schedule_own_packet() */
+	if_incoming_seqno = atomic_read(&if_incoming->seqno);
+
+	has_directlink_flag = (batman_packet->flags & DIRECTLINK ? 1 : 0);
+
+	is_single_hop_neigh = (compare_orig(ethhdr->h_source,
+					    batman_packet->orig) ? 1 : 0);
+
+	bat_dbg(DBG_BATMAN, "Received BATMAN packet via NB: %pM, IF: %s [%s] "
+		"(from OG: %pM, via prev OG: %pM, seqno %d, tq %d, "
+		"TTL %d, V %d, IDF %d)\n",
+		ethhdr->h_source, if_incoming->dev, if_incoming->addr_str,
+		batman_packet->orig, batman_packet->prev_sender,
+		batman_packet->seqno, batman_packet->tq, batman_packet->ttl,
+		batman_packet->version, has_directlink_flag);
+
+	list_for_each_entry_rcu(batman_if, &if_list, list) {
+		if (batman_if->if_status != IF_ACTIVE)
+			continue;
+
+		if (compare_orig(ethhdr->h_source,
+				 batman_if->net_dev->dev_addr))
+			is_my_addr = 1;
+
+		if (compare_orig(batman_packet->orig,
+				 batman_if->net_dev->dev_addr))
+			is_my_orig = 1;
+
+		if (compare_orig(batman_packet->prev_sender,
+				 batman_if->net_dev->dev_addr))
+			is_my_oldorig = 1;
+
+		if (compare_orig(ethhdr->h_source, broadcast_addr))
+			is_broadcast = 1;
+	}
+
+	if (batman_packet->version != COMPAT_VERSION) {
+		bat_dbg(DBG_BATMAN,
+			"Drop packet: incompatible batman version (%i)\n",
+			batman_packet->version);
+		return;
+	}
+
+	if (is_my_addr) {
+		bat_dbg(DBG_BATMAN,
+			"Drop packet: received my own broadcast (sender: %pM"
+			")\n",
+			ethhdr->h_source);
+		return;
+	}
+
+	if (is_broadcast) {
+		bat_dbg(DBG_BATMAN, "Drop packet: "
+		"ignoring all packets with broadcast source addr (sender: %pM"
+		")\n", ethhdr->h_source);
+		return;
+	}
+
+	if (is_my_orig) {
+		TYPE_OF_WORD *word;
+		int offset;
+
+		orig_neigh_node = get_orig_node(ethhdr->h_source);
+
+		if (!orig_neigh_node)
+			return;
+
+		/* neighbor has to indicate direct link and it has to
+		 * come via the corresponding interface */
+		/* if received seqno equals last send seqno save new
+		 * seqno for bidirectional check */
+		if (has_directlink_flag &&
+		    compare_orig(if_incoming->net_dev->dev_addr,
+				 batman_packet->orig) &&
+		    (batman_packet->seqno - if_incoming_seqno + 2 == 0)) {
+			offset = if_incoming->if_num * NUM_WORDS;
+			word = &(orig_neigh_node->bcast_own[offset]);
+			bit_mark(word, 0);
+			orig_neigh_node->bcast_own_sum[if_incoming->if_num] =
+				bit_packet_count(word);
+		}
+
+		bat_dbg(DBG_BATMAN, "Drop packet: "
+			"originator packet from myself (via neighbor)\n");
+		return;
+	}
+
+	if (is_my_oldorig) {
+		bat_dbg(DBG_BATMAN,
+			"Drop packet: ignoring all rebroadcast echos (sender: "
+			"%pM)\n", ethhdr->h_source);
+		return;
+	}
+
+	orig_node = get_orig_node(batman_packet->orig);
+	if (orig_node == NULL)
+		return;
+
+	is_duplicate = count_real_packets(ethhdr, batman_packet, if_incoming);
+
+	if (is_duplicate == -1) {
+		bat_dbg(DBG_BATMAN,
+			"Drop packet: packet within seqno protection time "
+			"(sender: %pM)\n", ethhdr->h_source);
+		return;
+	}
+
+	if (batman_packet->tq == 0) {
+		bat_dbg(DBG_BATMAN,
+			"Drop packet: originator packet with tq equal 0\n");
+		return;
+	}
+
+	/* avoid temporary routing loops */
+	if ((orig_node->router) &&
+	    (orig_node->router->orig_node->router) &&
+	    (compare_orig(orig_node->router->addr,
+			  batman_packet->prev_sender)) &&
+	    !(compare_orig(batman_packet->orig, batman_packet->prev_sender)) &&
+	    (compare_orig(orig_node->router->addr,
+			  orig_node->router->orig_node->router->addr))) {
+		bat_dbg(DBG_BATMAN,
+			"Drop packet: ignoring all rebroadcast packets that "
+			"may make me loop (sender: %pM)\n", ethhdr->h_source);
+		return;
+	}
+
+	/* if sender is a direct neighbor the sender mac equals
+	 * originator mac */
+	orig_neigh_node = (is_single_hop_neigh ?
+			   orig_node : get_orig_node(ethhdr->h_source));
+	if (orig_neigh_node == NULL)
+		return;
+
+	/* drop packet if sender is not a direct neighbor and if we
+	 * don't route towards it */
+	if (!is_single_hop_neigh &&
+	    (orig_neigh_node->router == NULL)) {
+		bat_dbg(DBG_BATMAN, "Drop packet: OGM via unknown neighbor!\n");
+		return;
+	}
+
+	is_bidirectional = is_bidirectional_neigh(orig_node, orig_neigh_node,
+						batman_packet, if_incoming);
+
+	/* update ranking if it is not a duplicate or has the same
+	 * seqno and similar ttl as the non-duplicate */
+	if (is_bidirectional &&
+	    (!is_duplicate ||
+	     ((orig_node->last_real_seqno == batman_packet->seqno) &&
+	      (orig_node->last_ttl - 3 <= batman_packet->ttl))))
+		update_orig(orig_node, ethhdr, batman_packet,
+			    if_incoming, hna_buff, hna_buff_len, is_duplicate);
+
+	mark_bonding_address(bat_priv, orig_node,
+			     orig_neigh_node, batman_packet);
+	update_bonding_candidates(bat_priv, orig_node);
+
+	/* is single hop (direct) neighbor */
+	if (is_single_hop_neigh) {
+
+		/* mark direct link on incoming interface */
+		schedule_forward_packet(orig_node, ethhdr, batman_packet,
+					1, hna_buff_len, if_incoming);
+
+		bat_dbg(DBG_BATMAN, "Forwarding packet: "
+			"rebroadcast neighbor packet with direct link flag\n");
+		return;
+	}
+
+	/* multihop originator */
+	if (!is_bidirectional) {
+		bat_dbg(DBG_BATMAN,
+			"Drop packet: not received via bidirectional link\n");
+		return;
+	}
+
+	if (is_duplicate) {
+		bat_dbg(DBG_BATMAN, "Drop packet: duplicate packet received\n");
+		return;
+	}
+
+	bat_dbg(DBG_BATMAN,
+		"Forwarding packet: rebroadcast originator packet\n");
+	schedule_forward_packet(orig_node, ethhdr, batman_packet,
+				0, hna_buff_len, if_incoming);
+}
+
+int recv_bat_packet(struct sk_buff *skb,
+				struct batman_if *batman_if)
+{
+	struct ethhdr *ethhdr;
+	unsigned long flags;
+	struct sk_buff *skb_old;
+
+	/* drop packet if it has not necessary minimum size */
+	if (skb_headlen(skb) < sizeof(struct batman_packet))
+		return NET_RX_DROP;
+
+	ethhdr = (struct ethhdr *)skb_mac_header(skb);
+
+	/* packet with broadcast indication but unicast recipient */
+	if (!is_bcast(ethhdr->h_dest))
+		return NET_RX_DROP;
+
+	/* packet with broadcast sender address */
+	if (is_bcast(ethhdr->h_source))
+		return NET_RX_DROP;
+
+	/* TODO: we use headlen instead of "length", because
+	 * only this data is paged in. */
+
+	/* create a copy of the skb, if needed, to modify it. */
+	if (!skb_clone_writable(skb, skb_headlen(skb))) {
+		skb_old = skb;
+		skb = skb_copy(skb, GFP_ATOMIC);
+		if (!skb)
+			return NET_RX_DROP;
+		ethhdr = (struct ethhdr *)skb_mac_header(skb);
+		kfree_skb(skb_old);
+	}
+
+	spin_lock_irqsave(&orig_hash_lock, flags);
+	receive_aggr_bat_packet(ethhdr,
+				skb->data,
+				skb_headlen(skb),
+				batman_if);
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+	kfree_skb(skb);
+	return NET_RX_SUCCESS;
+}
+
+static int recv_my_icmp_packet(struct sk_buff *skb, size_t icmp_len)
+{
+	struct orig_node *orig_node;
+	struct icmp_packet_rr *icmp_packet;
+	struct ethhdr *ethhdr;
+	struct sk_buff *skb_old;
+	struct batman_if *batman_if;
+	int ret;
+	unsigned long flags;
+	uint8_t dstaddr[ETH_ALEN];
+
+	icmp_packet = (struct icmp_packet_rr *)skb->data;
+	ethhdr = (struct ethhdr *)skb_mac_header(skb);
+
+	/* add data to device queue */
+	if (icmp_packet->msg_type != ECHO_REQUEST) {
+		bat_socket_receive_packet(icmp_packet, icmp_len);
+		return NET_RX_DROP;
+	}
+
+	/* answer echo request (ping) */
+	/* get routing information */
+	spin_lock_irqsave(&orig_hash_lock, flags);
+	orig_node = ((struct orig_node *)hash_find(orig_hash,
+						   icmp_packet->orig));
+	ret = NET_RX_DROP;
+
+	if ((orig_node != NULL) &&
+	    (orig_node->router != NULL)) {
+
+		/* don't lock while sending the packets ... we therefore
+		 * copy the required data before sending */
+		batman_if = orig_node->router->if_incoming;
+		memcpy(dstaddr, orig_node->router->addr, ETH_ALEN);
+		spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+		/* create a copy of the skb, if needed, to modify it. */
+		skb_old = NULL;
+		if (!skb_clone_writable(skb, icmp_len)) {
+			skb_old = skb;
+			skb = skb_copy(skb, GFP_ATOMIC);
+			if (!skb)
+				return NET_RX_DROP;
+			icmp_packet = (struct icmp_packet_rr *)skb->data;
+			ethhdr = (struct ethhdr *)skb_mac_header(skb);
+			kfree_skb(skb_old);
+		}
+
+		memcpy(icmp_packet->dst, icmp_packet->orig, ETH_ALEN);
+		memcpy(icmp_packet->orig, ethhdr->h_dest, ETH_ALEN);
+		icmp_packet->msg_type = ECHO_REPLY;
+		icmp_packet->ttl = TTL;
+
+		send_skb_packet(skb, batman_if, dstaddr);
+		ret = NET_RX_SUCCESS;
+
+	} else
+		spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+	return ret;
+}
+
+static int recv_icmp_ttl_exceeded(struct sk_buff *skb, size_t icmp_len)
+{
+	struct orig_node *orig_node;
+	struct icmp_packet *icmp_packet;
+	struct ethhdr *ethhdr;
+	struct sk_buff *skb_old;
+	struct batman_if *batman_if;
+	int ret;
+	unsigned long flags;
+	uint8_t dstaddr[ETH_ALEN];
+
+	icmp_packet = (struct icmp_packet *)skb->data;
+	ethhdr = (struct ethhdr *)skb_mac_header(skb);
+
+	/* send TTL exceeded if packet is an echo request (traceroute) */
+	if (icmp_packet->msg_type != ECHO_REQUEST) {
+		printk(KERN_WARNING "batman-adv:"
+		       "Warning - can't forward icmp packet from %pM to %pM: "
+		       "ttl exceeded\n",
+		       icmp_packet->orig, icmp_packet->dst);
+		return NET_RX_DROP;
+	}
+
+	/* get routing information */
+	spin_lock_irqsave(&orig_hash_lock, flags);
+	orig_node = ((struct orig_node *)
+		     hash_find(orig_hash, icmp_packet->orig));
+	ret = NET_RX_DROP;
+
+	if ((orig_node != NULL) &&
+	    (orig_node->router != NULL)) {
+
+		/* don't lock while sending the packets ... we therefore
+		 * copy the required data before sending */
+		batman_if = orig_node->router->if_incoming;
+		memcpy(dstaddr, orig_node->router->addr, ETH_ALEN);
+		spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+		/* create a copy of the skb, if needed, to modify it. */
+		if (!skb_clone_writable(skb, icmp_len)) {
+			skb_old = skb;
+			skb = skb_copy(skb, GFP_ATOMIC);
+			if (!skb)
+				return NET_RX_DROP;
+			icmp_packet = (struct icmp_packet *) skb->data;
+			ethhdr = (struct ethhdr *)skb_mac_header(skb);
+			kfree_skb(skb_old);
+		}
+
+		memcpy(icmp_packet->dst, icmp_packet->orig, ETH_ALEN);
+		memcpy(icmp_packet->orig, ethhdr->h_dest, ETH_ALEN);
+		icmp_packet->msg_type = TTL_EXCEEDED;
+		icmp_packet->ttl = TTL;
+
+		send_skb_packet(skb, batman_if, dstaddr);
+		ret = NET_RX_SUCCESS;
+
+	} else
+		spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+	return ret;
+}
+
+
+int recv_icmp_packet(struct sk_buff *skb)
+{
+	struct icmp_packet_rr *icmp_packet;
+	struct ethhdr *ethhdr;
+	struct orig_node *orig_node;
+	struct sk_buff *skb_old;
+	struct batman_if *batman_if;
+	int hdr_size = sizeof(struct icmp_packet);
+	int ret;
+	unsigned long flags;
+	uint8_t dstaddr[ETH_ALEN];
+
+	/**
+	 * we truncate all incoming icmp packets if they don't match our size
+	 */
+	if (skb_headlen(skb) >= sizeof(struct icmp_packet_rr))
+		hdr_size = sizeof(struct icmp_packet_rr);
+
+	/* drop packet if it has not necessary minimum size */
+	if (skb_headlen(skb) < hdr_size)
+		return NET_RX_DROP;
+
+	ethhdr = (struct ethhdr *)skb_mac_header(skb);
+
+	/* packet with unicast indication but broadcast recipient */
+	if (is_bcast(ethhdr->h_dest))
+		return NET_RX_DROP;
+
+	/* packet with broadcast sender address */
+	if (is_bcast(ethhdr->h_source))
+		return NET_RX_DROP;
+
+	/* not for me */
+	if (!is_my_mac(ethhdr->h_dest))
+		return NET_RX_DROP;
+
+	icmp_packet = (struct icmp_packet_rr *)skb->data;
+
+	/* add record route information if not full */
+	if ((hdr_size == sizeof(struct icmp_packet_rr)) &&
+	    (icmp_packet->rr_cur < BAT_RR_LEN)) {
+		memcpy(&(icmp_packet->rr[icmp_packet->rr_cur]),
+			ethhdr->h_dest, ETH_ALEN);
+		icmp_packet->rr_cur++;
+	}
+
+	/* packet for me */
+	if (is_my_mac(icmp_packet->dst))
+		return recv_my_icmp_packet(skb, hdr_size);
+
+	/* TTL exceeded */
+	if (icmp_packet->ttl < 2)
+		return recv_icmp_ttl_exceeded(skb, hdr_size);
+
+	ret = NET_RX_DROP;
+
+	/* get routing information */
+	spin_lock_irqsave(&orig_hash_lock, flags);
+	orig_node = ((struct orig_node *)
+		     hash_find(orig_hash, icmp_packet->dst));
+
+	if ((orig_node != NULL) &&
+	    (orig_node->router != NULL)) {
+
+		/* don't lock while sending the packets ... we therefore
+		 * copy the required data before sending */
+		batman_if = orig_node->router->if_incoming;
+		memcpy(dstaddr, orig_node->router->addr, ETH_ALEN);
+		spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+		/* create a copy of the skb, if needed, to modify it. */
+		if (!skb_clone_writable(skb, hdr_size)) {
+			skb_old = skb;
+			skb = skb_copy(skb, GFP_ATOMIC);
+			if (!skb)
+				return NET_RX_DROP;
+			icmp_packet = (struct icmp_packet_rr *)skb->data;
+			ethhdr = (struct ethhdr *)skb_mac_header(skb);
+			kfree_skb(skb_old);
+		}
+
+		/* decrement ttl */
+		icmp_packet->ttl--;
+
+		/* route it */
+		send_skb_packet(skb, batman_if, dstaddr);
+		ret = NET_RX_SUCCESS;
+
+	} else
+		spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+	return ret;
+}
+
+/* find a suitable router for this originator, and use
+ * bonding if possible. */
+struct neigh_node *find_router(struct orig_node *orig_node,
+		struct batman_if *recv_if)
+{
+	/* FIXME: each orig_node->batman_if will be attached to a softif */
+	struct bat_priv *bat_priv = netdev_priv(soft_device);
+	struct orig_node *primary_orig_node;
+	struct orig_node *router_orig;
+	struct neigh_node *router, *first_candidate, *best_router;
+	static uint8_t zero_mac[ETH_ALEN] = {0, 0, 0, 0, 0, 0};
+	int bonding_enabled;
+
+	if (!orig_node)
+		return NULL;
+
+	if (!orig_node->router)
+		return NULL;
+
+	/* without bonding, the first node should
+	 * always choose the default router. */
+
+	bonding_enabled = atomic_read(&bat_priv->bonding_enabled);
+	if (!bonding_enabled && (recv_if == NULL))
+			return orig_node->router;
+
+	router_orig = orig_node->router->orig_node;
+
+	/* if we have something in the primary_addr, we can search
+	 * for a potential bonding candidate. */
+	if (memcmp(router_orig->primary_addr, zero_mac, ETH_ALEN) == 0)
+		return orig_node->router;
+
+	/* find the orig_node which has the primary interface. might
+	 * even be the same as our router_orig in many cases */
+
+	if (memcmp(router_orig->primary_addr,
+				router_orig->orig, ETH_ALEN) == 0) {
+		primary_orig_node = router_orig;
+	} else {
+		primary_orig_node = hash_find(orig_hash,
+						router_orig->primary_addr);
+		if (!primary_orig_node)
+			return orig_node->router;
+	}
+
+	/* with less than 2 candidates, we can't do any
+	 * bonding and prefer the original router. */
+
+	if (primary_orig_node->bond.candidates < 2)
+		return orig_node->router;
+
+
+	/* all nodes between should choose a candidate which
+	 * is is not on the interface where the packet came
+	 * in. */
+	first_candidate = primary_orig_node->bond.selected;
+	router = first_candidate;
+
+	if (bonding_enabled) {
+		/* in the bonding case, send the packets in a round
+		 * robin fashion over the remaining interfaces. */
+		do {
+			/* recv_if == NULL on the first node. */
+			if (router->if_incoming != recv_if)
+				break;
+
+			router = router->next_bond_candidate;
+		} while (router != first_candidate);
+
+		primary_orig_node->bond.selected = router->next_bond_candidate;
+
+	} else {
+		/* if bonding is disabled, use the best of the
+		 * remaining candidates which are not using
+		 * this interface. */
+		best_router = first_candidate;
+
+		do {
+			/* recv_if == NULL on the first node. */
+			if ((router->if_incoming != recv_if) &&
+				(router->tq_avg > best_router->tq_avg))
+					best_router = router;
+
+			router = router->next_bond_candidate;
+		} while (router != first_candidate);
+
+		router = best_router;
+	}
+
+	return router;
+}
+
+int recv_unicast_packet(struct sk_buff *skb, struct batman_if *recv_if)
+{
+	struct unicast_packet *unicast_packet;
+	struct orig_node *orig_node;
+	struct neigh_node *router;
+	struct ethhdr *ethhdr;
+	struct batman_if *batman_if;
+	struct sk_buff *skb_old;
+	uint8_t dstaddr[ETH_ALEN];
+	int hdr_size = sizeof(struct unicast_packet);
+	unsigned long flags;
+
+	/* drop packet if it has not necessary minimum size */
+	if (skb_headlen(skb) < hdr_size)
+		return NET_RX_DROP;
+
+	ethhdr = (struct ethhdr *) skb_mac_header(skb);
+
+	/* packet with unicast indication but broadcast recipient */
+	if (is_bcast(ethhdr->h_dest))
+		return NET_RX_DROP;
+
+	/* packet with broadcast sender address */
+	if (is_bcast(ethhdr->h_source))
+		return NET_RX_DROP;
+
+	/* not for me */
+	if (!is_my_mac(ethhdr->h_dest))
+		return NET_RX_DROP;
+
+	unicast_packet = (struct unicast_packet *) skb->data;
+
+	/* packet for me */
+	if (is_my_mac(unicast_packet->dest)) {
+		interface_rx(skb, hdr_size);
+		return NET_RX_SUCCESS;
+	}
+
+	/* TTL exceeded */
+	if (unicast_packet->ttl < 2) {
+		printk(KERN_WARNING "batman-adv:Warning - "
+		       "can't forward unicast packet from %pM to %pM: "
+		       "ttl exceeded\n",
+		       ethhdr->h_source, unicast_packet->dest);
+		return NET_RX_DROP;
+	}
+
+	/* get routing information */
+	spin_lock_irqsave(&orig_hash_lock, flags);
+	orig_node = ((struct orig_node *)
+		     hash_find(orig_hash, unicast_packet->dest));
+
+	router = find_router(orig_node, recv_if);
+
+	if (!router) {
+		spin_unlock_irqrestore(&orig_hash_lock, flags);
+		return NET_RX_DROP;
+	}
+
+	/* don't lock while sending the packets ... we therefore
+	 * copy the required data before sending */
+
+	batman_if = router->if_incoming;
+	memcpy(dstaddr, router->addr, ETH_ALEN);
+
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+	/* create a copy of the skb, if needed, to modify it. */
+	if (!skb_clone_writable(skb, sizeof(struct unicast_packet))) {
+		skb_old = skb;
+		skb = skb_copy(skb, GFP_ATOMIC);
+		if (!skb)
+			return NET_RX_DROP;
+		unicast_packet = (struct unicast_packet *) skb->data;
+		ethhdr = (struct ethhdr *)skb_mac_header(skb);
+		kfree_skb(skb_old);
+	}
+
+	/* decrement ttl */
+	unicast_packet->ttl--;
+
+	/* route it */
+	send_skb_packet(skb, batman_if, dstaddr);
+
+	return NET_RX_SUCCESS;
+}
+
+int recv_bcast_packet(struct sk_buff *skb)
+{
+	struct orig_node *orig_node;
+	struct bcast_packet *bcast_packet;
+	struct ethhdr *ethhdr;
+	int hdr_size = sizeof(struct bcast_packet);
+	int32_t seq_diff;
+	unsigned long flags;
+
+	/* drop packet if it has not necessary minimum size */
+	if (skb_headlen(skb) < hdr_size)
+		return NET_RX_DROP;
+
+	ethhdr = (struct ethhdr *)skb_mac_header(skb);
+
+	/* packet with broadcast indication but unicast recipient */
+	if (!is_bcast(ethhdr->h_dest))
+		return NET_RX_DROP;
+
+	/* packet with broadcast sender address */
+	if (is_bcast(ethhdr->h_source))
+		return NET_RX_DROP;
+
+	/* ignore broadcasts sent by myself */
+	if (is_my_mac(ethhdr->h_source))
+		return NET_RX_DROP;
+
+	bcast_packet = (struct bcast_packet *)skb->data;
+
+	/* ignore broadcasts originated by myself */
+	if (is_my_mac(bcast_packet->orig))
+		return NET_RX_DROP;
+
+	if (bcast_packet->ttl < 2)
+		return NET_RX_DROP;
+
+	spin_lock_irqsave(&orig_hash_lock, flags);
+	orig_node = ((struct orig_node *)
+		     hash_find(orig_hash, bcast_packet->orig));
+
+	if (orig_node == NULL) {
+		spin_unlock_irqrestore(&orig_hash_lock, flags);
+		return NET_RX_DROP;
+	}
+
+	/* check whether the packet is a duplicate */
+	if (get_bit_status(orig_node->bcast_bits,
+			   orig_node->last_bcast_seqno,
+			   ntohl(bcast_packet->seqno))) {
+		spin_unlock_irqrestore(&orig_hash_lock, flags);
+		return NET_RX_DROP;
+	}
+
+	seq_diff = ntohl(bcast_packet->seqno) - orig_node->last_bcast_seqno;
+
+	/* check whether the packet is old and the host just restarted. */
+	if (window_protected(seq_diff, &orig_node->bcast_seqno_reset)) {
+		spin_unlock_irqrestore(&orig_hash_lock, flags);
+		return NET_RX_DROP;
+	}
+
+	/* mark broadcast in flood history, update window position
+	 * if required. */
+	if (bit_get_packet(orig_node->bcast_bits, seq_diff, 1))
+		orig_node->last_bcast_seqno = ntohl(bcast_packet->seqno);
+
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+	/* rebroadcast packet */
+	add_bcast_packet_to_list(skb);
+
+	/* broadcast for me */
+	interface_rx(skb, hdr_size);
+
+	return NET_RX_SUCCESS;
+}
+
+int recv_vis_packet(struct sk_buff *skb)
+{
+	struct vis_packet *vis_packet;
+	struct ethhdr *ethhdr;
+	struct bat_priv *bat_priv;
+	int hdr_size = sizeof(struct vis_packet);
+
+	if (skb_headlen(skb) < hdr_size)
+		return NET_RX_DROP;
+
+	vis_packet = (struct vis_packet *) skb->data;
+	ethhdr = (struct ethhdr *)skb_mac_header(skb);
+
+	/* not for me */
+	if (!is_my_mac(ethhdr->h_dest))
+		return NET_RX_DROP;
+
+	/* ignore own packets */
+	if (is_my_mac(vis_packet->vis_orig))
+		return NET_RX_DROP;
+
+	if (is_my_mac(vis_packet->sender_orig))
+		return NET_RX_DROP;
+
+	/* FIXME: each batman_if will be attached to a softif */
+	bat_priv = netdev_priv(soft_device);
+
+	switch (vis_packet->vis_type) {
+	case VIS_TYPE_SERVER_SYNC:
+		/* TODO: handle fragmented skbs properly */
+		receive_server_sync_packet(bat_priv, vis_packet,
+					   skb_headlen(skb));
+		break;
+
+	case VIS_TYPE_CLIENT_UPDATE:
+		/* TODO: handle fragmented skbs properly */
+		receive_client_update_packet(bat_priv, vis_packet,
+					     skb_headlen(skb));
+		break;
+
+	default:	/* ignore unknown packet */
+		break;
+	}
+
+	/* We take a copy of the data in the packet, so we should
+	   always free the skbuf. */
+	return NET_RX_DROP;
+}
diff --git a/net/batman-adv/routing.h b/net/batman-adv/routing.h
new file mode 100644
index 0000000..3eac64e
--- /dev/null
+++ b/net/batman-adv/routing.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_ROUTING_H_
+#define _NET_BATMAN_ADV_ROUTING_H_
+
+#include "types.h"
+
+void slide_own_bcast_window(struct batman_if *batman_if);
+void receive_bat_packet(struct ethhdr *ethhdr,
+				struct batman_packet *batman_packet,
+				unsigned char *hna_buff, int hna_buff_len,
+				struct batman_if *if_incoming);
+void update_routes(struct orig_node *orig_node,
+				struct neigh_node *neigh_node,
+				unsigned char *hna_buff, int hna_buff_len);
+int recv_icmp_packet(struct sk_buff *skb);
+int recv_unicast_packet(struct sk_buff *skb, struct batman_if *recv_if);
+int recv_bcast_packet(struct sk_buff *skb);
+int recv_vis_packet(struct sk_buff *skb);
+int recv_bat_packet(struct sk_buff *skb,
+				struct batman_if *batman_if);
+struct neigh_node *find_router(struct orig_node *orig_node,
+		struct batman_if *recv_if);
+void update_bonding_candidates(struct bat_priv *bat_priv,
+			       struct orig_node *orig_node);
+
+#endif /* _NET_BATMAN_ADV_ROUTING_H_ */
diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c
new file mode 100644
index 0000000..d4b92ef
--- /dev/null
+++ b/net/batman-adv/send.c
@@ -0,0 +1,576 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include "main.h"
+#include "send.h"
+#include "routing.h"
+#include "translation-table.h"
+#include "soft-interface.h"
+#include "hard-interface.h"
+#include "types.h"
+#include "vis.h"
+#include "aggregation.h"
+
+#include <linux/netfilter_bridge.h>
+
+static void send_outstanding_bcast_packet(struct work_struct *work);
+
+/* apply hop penalty for a normal link */
+static uint8_t hop_penalty(const uint8_t tq)
+{
+	return (tq * (TQ_MAX_VALUE - TQ_HOP_PENALTY)) / (TQ_MAX_VALUE);
+}
+
+/* when do we schedule our own packet to be sent */
+static unsigned long own_send_time(struct bat_priv *bat_priv)
+{
+	return jiffies + msecs_to_jiffies(
+		   atomic_read(&bat_priv->orig_interval) -
+		   JITTER + (random32() % 2*JITTER));
+}
+
+/* when do we schedule a forwarded packet to be sent */
+static unsigned long forward_send_time(struct bat_priv *bat_priv)
+{
+	return jiffies + msecs_to_jiffies(random32() % (JITTER/2));
+}
+
+/* send out an already prepared packet to the given address via the
+ * specified batman interface */
+int send_skb_packet(struct sk_buff *skb,
+				struct batman_if *batman_if,
+				uint8_t *dst_addr)
+{
+	struct ethhdr *ethhdr;
+
+	if (batman_if->if_status != IF_ACTIVE)
+		goto send_skb_err;
+
+	if (unlikely(!batman_if->net_dev))
+		goto send_skb_err;
+
+	if (!(batman_if->net_dev->flags & IFF_UP)) {
+		printk(KERN_WARNING
+		       "batman-adv:Interface %s "
+		       "is not up - can't send packet via that interface!\n",
+		       batman_if->dev);
+		goto send_skb_err;
+	}
+
+	/* push to the ethernet header. */
+	if (my_skb_push(skb, sizeof(struct ethhdr)) < 0)
+		goto send_skb_err;
+
+	skb_reset_mac_header(skb);
+
+	ethhdr = (struct ethhdr *) skb_mac_header(skb);
+	memcpy(ethhdr->h_source, batman_if->net_dev->dev_addr, ETH_ALEN);
+	memcpy(ethhdr->h_dest, dst_addr, ETH_ALEN);
+	ethhdr->h_proto = __constant_htons(ETH_P_BATMAN);
+
+	skb_set_network_header(skb, ETH_HLEN);
+	skb->priority = TC_PRIO_CONTROL;
+	skb->protocol = __constant_htons(ETH_P_BATMAN);
+
+	skb->dev = batman_if->net_dev;
+
+	/* dev_queue_xmit() returns a negative result on error.	 However on
+	 * congestion and traffic shaping, it drops and returns NET_XMIT_DROP
+	 * (which is > 0). This will not be treated as an error.
+	 * Also, if netfilter/ebtables wants to block outgoing batman
+	 * packets then giving them a chance to do so here */
+
+	return NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
+		       dev_queue_xmit);
+send_skb_err:
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+}
+
+/* sends a raw packet. */
+void send_raw_packet(unsigned char *pack_buff, int pack_buff_len,
+		     struct batman_if *batman_if, uint8_t *dst_addr)
+{
+	struct sk_buff *skb;
+	char *data;
+
+	skb = dev_alloc_skb(pack_buff_len + sizeof(struct ethhdr));
+	if (!skb)
+		return;
+	data = skb_put(skb, pack_buff_len + sizeof(struct ethhdr));
+	memcpy(data + sizeof(struct ethhdr), pack_buff, pack_buff_len);
+	/* pull back to the batman "network header" */
+	skb_pull(skb, sizeof(struct ethhdr));
+	send_skb_packet(skb, batman_if, dst_addr);
+}
+
+/* Send a packet to a given interface */
+static void send_packet_to_if(struct forw_packet *forw_packet,
+			      struct batman_if *batman_if)
+{
+	char *fwd_str;
+	uint8_t packet_num;
+	int16_t buff_pos;
+	struct batman_packet *batman_packet;
+
+	if (batman_if->if_status != IF_ACTIVE)
+		return;
+
+	packet_num = 0;
+	buff_pos = 0;
+	batman_packet = (struct batman_packet *)
+		(forw_packet->packet_buff);
+
+	/* adjust all flags and log packets */
+	while (aggregated_packet(buff_pos,
+				 forw_packet->packet_len,
+				 batman_packet->num_hna)) {
+
+		/* we might have aggregated direct link packets with an
+		 * ordinary base packet */
+		if ((forw_packet->direct_link_flags & (1 << packet_num)) &&
+		    (forw_packet->if_incoming == batman_if))
+			batman_packet->flags |= DIRECTLINK;
+		else
+			batman_packet->flags &= ~DIRECTLINK;
+
+		fwd_str = (packet_num > 0 ? "Forwarding" : (forw_packet->own ?
+							    "Sending own" :
+							    "Forwarding"));
+		bat_dbg(DBG_BATMAN,
+			"%s %spacket (originator %pM, seqno %d, TQ %d, TTL %d,"
+			" IDF %s) on interface %s [%s]\n",
+			fwd_str, (packet_num > 0 ? "aggregated " : ""),
+			batman_packet->orig, ntohl(batman_packet->seqno),
+			batman_packet->tq, batman_packet->ttl,
+			(batman_packet->flags & DIRECTLINK ?
+			 "on" : "off"),
+			batman_if->dev, batman_if->addr_str);
+
+		buff_pos += sizeof(struct batman_packet) +
+			(batman_packet->num_hna * ETH_ALEN);
+		packet_num++;
+		batman_packet = (struct batman_packet *)
+			(forw_packet->packet_buff + buff_pos);
+	}
+
+	send_raw_packet(forw_packet->packet_buff,
+			forw_packet->packet_len,
+			batman_if, broadcast_addr);
+}
+
+/* send a batman packet */
+static void send_packet(struct forw_packet *forw_packet)
+{
+	struct batman_if *batman_if;
+	struct batman_packet *batman_packet =
+		(struct batman_packet *)(forw_packet->packet_buff);
+	unsigned char directlink = (batman_packet->flags & DIRECTLINK ? 1 : 0);
+
+	if (!forw_packet->if_incoming) {
+		printk(KERN_ERR "batman-adv: Error - can't forward packet: "
+		       "incoming iface not specified\n");
+		return;
+	}
+
+	if (forw_packet->if_incoming->if_status != IF_ACTIVE)
+		return;
+
+	/* multihomed peer assumed */
+	/* non-primary OGMs are only broadcasted on their interface */
+	if ((directlink && (batman_packet->ttl == 1)) ||
+	    (forw_packet->own && (forw_packet->if_incoming->if_num > 0))) {
+
+		/* FIXME: what about aggregated packets ? */
+		bat_dbg(DBG_BATMAN,
+			"%s packet (originator %pM, seqno %d, TTL %d) "
+			"on interface %s [%s]\n",
+			(forw_packet->own ? "Sending own" : "Forwarding"),
+			batman_packet->orig, ntohl(batman_packet->seqno),
+			batman_packet->ttl, forw_packet->if_incoming->dev,
+			forw_packet->if_incoming->addr_str);
+
+		send_raw_packet(forw_packet->packet_buff,
+				forw_packet->packet_len,
+				forw_packet->if_incoming,
+				broadcast_addr);
+		return;
+	}
+
+	/* broadcast on every interface */
+	rcu_read_lock();
+	list_for_each_entry_rcu(batman_if, &if_list, list)
+		send_packet_to_if(forw_packet, batman_if);
+	rcu_read_unlock();
+}
+
+static void rebuild_batman_packet(struct batman_if *batman_if)
+{
+	int new_len;
+	unsigned char *new_buff;
+	struct batman_packet *batman_packet;
+
+	new_len = sizeof(struct batman_packet) + (num_hna * ETH_ALEN);
+	new_buff = kmalloc(new_len, GFP_ATOMIC);
+
+	/* keep old buffer if kmalloc should fail */
+	if (new_buff) {
+		memcpy(new_buff, batman_if->packet_buff,
+		       sizeof(struct batman_packet));
+		batman_packet = (struct batman_packet *)new_buff;
+
+		batman_packet->num_hna = hna_local_fill_buffer(
+			new_buff + sizeof(struct batman_packet),
+			new_len - sizeof(struct batman_packet));
+
+		kfree(batman_if->packet_buff);
+		batman_if->packet_buff = new_buff;
+		batman_if->packet_len = new_len;
+	}
+}
+
+void schedule_own_packet(struct batman_if *batman_if)
+{
+	/* FIXME: each batman_if will be attached to a softif */
+	struct bat_priv *bat_priv = netdev_priv(soft_device);
+	unsigned long send_time;
+	struct batman_packet *batman_packet;
+	int vis_server;
+
+	if ((batman_if->if_status == IF_NOT_IN_USE) ||
+	    (batman_if->if_status == IF_TO_BE_REMOVED))
+		return;
+
+	vis_server = atomic_read(&bat_priv->vis_mode);
+
+	/**
+	 * the interface gets activated here to avoid race conditions between
+	 * the moment of activating the interface in
+	 * hardif_activate_interface() where the originator mac is set and
+	 * outdated packets (especially uninitialized mac addresses) in the
+	 * packet queue
+	 */
+	if (batman_if->if_status == IF_TO_BE_ACTIVATED)
+		batman_if->if_status = IF_ACTIVE;
+
+	/* if local hna has changed and interface is a primary interface */
+	if ((atomic_read(&hna_local_changed)) &&
+	    (batman_if == bat_priv->primary_if))
+		rebuild_batman_packet(batman_if);
+
+	/**
+	 * NOTE: packet_buff might just have been re-allocated in
+	 * rebuild_batman_packet()
+	 */
+	batman_packet = (struct batman_packet *)batman_if->packet_buff;
+
+	/* change sequence number to network order */
+	batman_packet->seqno =
+		htonl((uint32_t)atomic_read(&batman_if->seqno));
+
+	if (vis_server == VIS_TYPE_SERVER_SYNC)
+		batman_packet->flags |= VIS_SERVER;
+	else
+		batman_packet->flags &= ~VIS_SERVER;
+
+	atomic_inc(&batman_if->seqno);
+
+	slide_own_bcast_window(batman_if);
+	send_time = own_send_time(bat_priv);
+	add_bat_packet_to_list(bat_priv,
+			       batman_if->packet_buff,
+			       batman_if->packet_len,
+			       batman_if, 1, send_time);
+}
+
+void schedule_forward_packet(struct orig_node *orig_node,
+			     struct ethhdr *ethhdr,
+			     struct batman_packet *batman_packet,
+			     uint8_t directlink, int hna_buff_len,
+			     struct batman_if *if_incoming)
+{
+	/* FIXME: each batman_if will be attached to a softif */
+	struct bat_priv *bat_priv = netdev_priv(soft_device);
+	unsigned char in_tq, in_ttl, tq_avg = 0;
+	unsigned long send_time;
+
+	if (batman_packet->ttl <= 1) {
+		bat_dbg(DBG_BATMAN, "ttl exceeded\n");
+		return;
+	}
+
+	in_tq = batman_packet->tq;
+	in_ttl = batman_packet->ttl;
+
+	batman_packet->ttl--;
+	memcpy(batman_packet->prev_sender, ethhdr->h_source, ETH_ALEN);
+
+	/* rebroadcast tq of our best ranking neighbor to ensure the rebroadcast
+	 * of our best tq value */
+	if ((orig_node->router) && (orig_node->router->tq_avg != 0)) {
+
+		/* rebroadcast ogm of best ranking neighbor as is */
+		if (!compare_orig(orig_node->router->addr, ethhdr->h_source)) {
+			batman_packet->tq = orig_node->router->tq_avg;
+
+			if (orig_node->router->last_ttl)
+				batman_packet->ttl = orig_node->router->last_ttl
+							- 1;
+		}
+
+		tq_avg = orig_node->router->tq_avg;
+	}
+
+	/* apply hop penalty */
+	batman_packet->tq = hop_penalty(batman_packet->tq);
+
+	bat_dbg(DBG_BATMAN, "Forwarding packet: tq_orig: %i, tq_avg: %i, "
+		"tq_forw: %i, ttl_orig: %i, ttl_forw: %i\n",
+		in_tq, tq_avg, batman_packet->tq, in_ttl - 1,
+		batman_packet->ttl);
+
+	batman_packet->seqno = htonl(batman_packet->seqno);
+
+	/* switch of primaries first hop flag when forwarding */
+	batman_packet->flags &= ~PRIMARIES_FIRST_HOP;
+	if (directlink)
+		batman_packet->flags |= DIRECTLINK;
+	else
+		batman_packet->flags &= ~DIRECTLINK;
+
+	send_time = forward_send_time(bat_priv);
+	add_bat_packet_to_list(bat_priv,
+			       (unsigned char *)batman_packet,
+			       sizeof(struct batman_packet) + hna_buff_len,
+			       if_incoming, 0, send_time);
+}
+
+static void forw_packet_free(struct forw_packet *forw_packet)
+{
+	if (forw_packet->skb)
+		kfree_skb(forw_packet->skb);
+	kfree(forw_packet->packet_buff);
+	kfree(forw_packet);
+}
+
+static void _add_bcast_packet_to_list(struct forw_packet *forw_packet,
+				      unsigned long send_time)
+{
+	unsigned long flags;
+	INIT_HLIST_NODE(&forw_packet->list);
+
+	/* add new packet to packet list */
+	spin_lock_irqsave(&forw_bcast_list_lock, flags);
+	hlist_add_head(&forw_packet->list, &forw_bcast_list);
+	spin_unlock_irqrestore(&forw_bcast_list_lock, flags);
+
+	/* start timer for this packet */
+	INIT_DELAYED_WORK(&forw_packet->delayed_work,
+			  send_outstanding_bcast_packet);
+	queue_delayed_work(bat_event_workqueue, &forw_packet->delayed_work,
+			   send_time);
+}
+
+#define atomic_dec_not_zero(v)          atomic_add_unless((v), -1, 0)
+/* add a broadcast packet to the queue and setup timers. broadcast packets
+ * are sent multiple times to increase probability for beeing received.
+ *
+ * This function returns NETDEV_TX_OK on success and NETDEV_TX_BUSY on
+ * errors.
+ *
+ * The skb is not consumed, so the caller should make sure that the
+ * skb is freed. */
+int add_bcast_packet_to_list(struct sk_buff *skb)
+{
+	struct forw_packet *forw_packet;
+	struct bcast_packet *bcast_packet;
+
+	if (!atomic_dec_not_zero(&bcast_queue_left)) {
+		bat_dbg(DBG_BATMAN, "bcast packet queue full\n");
+		goto out;
+	}
+
+	forw_packet = kmalloc(sizeof(struct forw_packet), GFP_ATOMIC);
+
+	if (!forw_packet)
+		goto out_and_inc;
+
+	skb = skb_copy(skb, GFP_ATOMIC);
+	if (!skb)
+		goto packet_free;
+
+	/* as we have a copy now, it is safe to decrease the TTL */
+	bcast_packet = (struct bcast_packet *)skb->data;
+	bcast_packet->ttl--;
+
+	skb_reset_mac_header(skb);
+
+	forw_packet->skb = skb;
+	forw_packet->packet_buff = NULL;
+
+	/* how often did we send the bcast packet ? */
+	forw_packet->num_packets = 0;
+
+	_add_bcast_packet_to_list(forw_packet, 1);
+	return NETDEV_TX_OK;
+
+packet_free:
+	kfree(forw_packet);
+out_and_inc:
+	atomic_inc(&bcast_queue_left);
+out:
+	return NETDEV_TX_BUSY;
+}
+
+static void send_outstanding_bcast_packet(struct work_struct *work)
+{
+	struct batman_if *batman_if;
+	struct delayed_work *delayed_work =
+		container_of(work, struct delayed_work, work);
+	struct forw_packet *forw_packet =
+		container_of(delayed_work, struct forw_packet, delayed_work);
+	unsigned long flags;
+	struct sk_buff *skb1;
+
+	spin_lock_irqsave(&forw_bcast_list_lock, flags);
+	hlist_del(&forw_packet->list);
+	spin_unlock_irqrestore(&forw_bcast_list_lock, flags);
+
+	if (atomic_read(&module_state) == MODULE_DEACTIVATING)
+		goto out;
+
+	/* rebroadcast packet */
+	rcu_read_lock();
+	list_for_each_entry_rcu(batman_if, &if_list, list) {
+		/* send a copy of the saved skb */
+		skb1 = skb_copy(forw_packet->skb, GFP_ATOMIC);
+		if (skb1)
+			send_skb_packet(skb1,
+				batman_if, broadcast_addr);
+	}
+	rcu_read_unlock();
+
+	forw_packet->num_packets++;
+
+	/* if we still have some more bcasts to send */
+	if (forw_packet->num_packets < 3) {
+		_add_bcast_packet_to_list(forw_packet, ((5 * HZ) / 1000));
+		return;
+	}
+
+out:
+	forw_packet_free(forw_packet);
+	atomic_inc(&bcast_queue_left);
+}
+
+void send_outstanding_bat_packet(struct work_struct *work)
+{
+	struct delayed_work *delayed_work =
+		container_of(work, struct delayed_work, work);
+	struct forw_packet *forw_packet =
+		container_of(delayed_work, struct forw_packet, delayed_work);
+	unsigned long flags;
+
+	spin_lock_irqsave(&forw_bat_list_lock, flags);
+	hlist_del(&forw_packet->list);
+	spin_unlock_irqrestore(&forw_bat_list_lock, flags);
+
+	if (atomic_read(&module_state) == MODULE_DEACTIVATING)
+		goto out;
+
+	send_packet(forw_packet);
+
+	/**
+	 * we have to have at least one packet in the queue
+	 * to determine the queues wake up time unless we are
+	 * shutting down
+	 */
+	if (forw_packet->own)
+		schedule_own_packet(forw_packet->if_incoming);
+
+out:
+	/* don't count own packet */
+	if (!forw_packet->own)
+		atomic_inc(&batman_queue_left);
+
+	forw_packet_free(forw_packet);
+}
+
+void purge_outstanding_packets(struct batman_if *batman_if)
+{
+	struct forw_packet *forw_packet;
+	struct hlist_node *tmp_node, *safe_tmp_node;
+	unsigned long flags;
+
+	if (batman_if)
+		bat_dbg(DBG_BATMAN, "purge_outstanding_packets(): %s\n",
+			batman_if->dev);
+	else
+		bat_dbg(DBG_BATMAN, "purge_outstanding_packets()\n");
+
+	/* free bcast list */
+	spin_lock_irqsave(&forw_bcast_list_lock, flags);
+	hlist_for_each_entry_safe(forw_packet, tmp_node, safe_tmp_node,
+				  &forw_bcast_list, list) {
+
+		/**
+		 * if purge_outstanding_packets() was called with an argmument
+		 * we delete only packets belonging to the given interface
+		 */
+		if ((batman_if) &&
+		    (forw_packet->if_incoming != batman_if))
+			continue;
+
+		spin_unlock_irqrestore(&forw_bcast_list_lock, flags);
+
+		/**
+		 * send_outstanding_bcast_packet() will lock the list to
+		 * delete the item from the list
+		 */
+		cancel_delayed_work_sync(&forw_packet->delayed_work);
+		spin_lock_irqsave(&forw_bcast_list_lock, flags);
+	}
+	spin_unlock_irqrestore(&forw_bcast_list_lock, flags);
+
+	/* free batman packet list */
+	spin_lock_irqsave(&forw_bat_list_lock, flags);
+	hlist_for_each_entry_safe(forw_packet, tmp_node, safe_tmp_node,
+				  &forw_bat_list, list) {
+
+		/**
+		 * if purge_outstanding_packets() was called with an argmument
+		 * we delete only packets belonging to the given interface
+		 */
+		if ((batman_if) &&
+		    (forw_packet->if_incoming != batman_if))
+			continue;
+
+		spin_unlock_irqrestore(&forw_bat_list_lock, flags);
+
+		/**
+		 * send_outstanding_bat_packet() will lock the list to
+		 * delete the item from the list
+		 */
+		cancel_delayed_work_sync(&forw_packet->delayed_work);
+		spin_lock_irqsave(&forw_bat_list_lock, flags);
+	}
+	spin_unlock_irqrestore(&forw_bat_list_lock, flags);
+}
diff --git a/net/batman-adv/send.h b/net/batman-adv/send.h
new file mode 100644
index 0000000..b64c627
--- /dev/null
+++ b/net/batman-adv/send.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_SEND_H_
+#define _NET_BATMAN_ADV_SEND_H_
+
+#include "types.h"
+
+int send_skb_packet(struct sk_buff *skb,
+				struct batman_if *batman_if,
+				uint8_t *dst_addr);
+void send_raw_packet(unsigned char *pack_buff, int pack_buff_len,
+		     struct batman_if *batman_if, uint8_t *dst_addr);
+void schedule_own_packet(struct batman_if *batman_if);
+void schedule_forward_packet(struct orig_node *orig_node,
+			     struct ethhdr *ethhdr,
+			     struct batman_packet *batman_packet,
+			     uint8_t directlink, int hna_buff_len,
+			     struct batman_if *if_outgoing);
+int  add_bcast_packet_to_list(struct sk_buff *skb);
+void send_outstanding_bat_packet(struct work_struct *work);
+void purge_outstanding_packets(struct batman_if *batman_if);
+
+#endif /* _NET_BATMAN_ADV_SEND_H_ */
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
new file mode 100644
index 0000000..2ea97de
--- /dev/null
+++ b/net/batman-adv/soft-interface.c
@@ -0,0 +1,361 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include "main.h"
+#include "soft-interface.h"
+#include "hard-interface.h"
+#include "routing.h"
+#include "send.h"
+#include "translation-table.h"
+#include "types.h"
+#include "hash.h"
+#include <linux/slab.h>
+#include <linux/ethtool.h>
+#include <linux/etherdevice.h>
+
+static uint32_t bcast_seqno = 1; /* give own bcast messages seq numbers to avoid
+				  * broadcast storms */
+static int32_t skb_packets;
+static int32_t skb_bad_packets;
+
+unsigned char main_if_addr[ETH_ALEN];
+static int bat_get_settings(struct net_device *dev, struct ethtool_cmd *cmd);
+static void bat_get_drvinfo(struct net_device *dev,
+			    struct ethtool_drvinfo *info);
+static u32 bat_get_msglevel(struct net_device *dev);
+static void bat_set_msglevel(struct net_device *dev, u32 value);
+static u32 bat_get_link(struct net_device *dev);
+static u32 bat_get_rx_csum(struct net_device *dev);
+static int bat_set_rx_csum(struct net_device *dev, u32 data);
+
+static const struct ethtool_ops bat_ethtool_ops = {
+	.get_settings = bat_get_settings,
+	.get_drvinfo = bat_get_drvinfo,
+	.get_msglevel = bat_get_msglevel,
+	.set_msglevel = bat_set_msglevel,
+	.get_link = bat_get_link,
+	.get_rx_csum = bat_get_rx_csum,
+	.set_rx_csum = bat_set_rx_csum
+};
+
+void set_main_if_addr(uint8_t *addr)
+{
+	memcpy(main_if_addr, addr, ETH_ALEN);
+}
+
+int my_skb_push(struct sk_buff *skb, unsigned int len)
+{
+	int result = 0;
+
+	skb_packets++;
+	if (skb_headroom(skb) < len) {
+		skb_bad_packets++;
+		result = pskb_expand_head(skb, len, 0, GFP_ATOMIC);
+
+		if (result < 0)
+			return result;
+	}
+
+	skb_push(skb, len);
+	return 0;
+}
+
+static int interface_open(struct net_device *dev)
+{
+	netif_start_queue(dev);
+	return 0;
+}
+
+static int interface_release(struct net_device *dev)
+{
+	netif_stop_queue(dev);
+	return 0;
+}
+
+static struct net_device_stats *interface_stats(struct net_device *dev)
+{
+	struct bat_priv *priv = netdev_priv(dev);
+	return &priv->stats;
+}
+
+static int interface_set_mac_addr(struct net_device *dev, void *p)
+{
+	struct sockaddr *addr = p;
+
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+
+	/* only modify hna-table if it has been initialised before */
+	if (atomic_read(&module_state) == MODULE_ACTIVE) {
+		hna_local_remove(dev->dev_addr, "mac address changed");
+		hna_local_add(addr->sa_data);
+	}
+
+	memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
+
+	return 0;
+}
+
+static int interface_change_mtu(struct net_device *dev, int new_mtu)
+{
+	/* check ranges */
+	if ((new_mtu < 68) || (new_mtu > hardif_min_mtu()))
+		return -EINVAL;
+
+	dev->mtu = new_mtu;
+
+	return 0;
+}
+
+int interface_tx(struct sk_buff *skb, struct net_device *dev)
+{
+	struct unicast_packet *unicast_packet;
+	struct bcast_packet *bcast_packet;
+	struct orig_node *orig_node;
+	struct neigh_node *router;
+	struct ethhdr *ethhdr = (struct ethhdr *)skb->data;
+	struct bat_priv *priv = netdev_priv(dev);
+	struct batman_if *batman_if;
+	struct bat_priv *bat_priv;
+	uint8_t dstaddr[6];
+	int data_len = skb->len;
+	unsigned long flags;
+
+	if (atomic_read(&module_state) != MODULE_ACTIVE)
+		goto dropped;
+
+	/* FIXME: each batman_if will be attached to a softif */
+	bat_priv = netdev_priv(soft_device);
+
+	dev->trans_start = jiffies;
+	/* TODO: check this for locks */
+	hna_local_add(ethhdr->h_source);
+
+	/* ethernet packet should be broadcasted */
+	if (is_bcast(ethhdr->h_dest) || is_mcast(ethhdr->h_dest)) {
+
+		if (my_skb_push(skb, sizeof(struct bcast_packet)) < 0)
+			goto dropped;
+
+		bcast_packet = (struct bcast_packet *)skb->data;
+		bcast_packet->version = COMPAT_VERSION;
+		bcast_packet->ttl = TTL;
+
+		/* batman packet type: broadcast */
+		bcast_packet->packet_type = BAT_BCAST;
+
+		/* hw address of first interface is the orig mac because only
+		 * this mac is known throughout the mesh */
+		memcpy(bcast_packet->orig, main_if_addr, ETH_ALEN);
+
+		/* set broadcast sequence number */
+		bcast_packet->seqno = htonl(bcast_seqno);
+
+		/* broadcast packet. on success, increase seqno. */
+		if (add_bcast_packet_to_list(skb) == NETDEV_TX_OK)
+			bcast_seqno++;
+
+		/* a copy is stored in the bcast list, therefore removing
+		 * the original skb. */
+		kfree_skb(skb);
+
+	/* unicast packet */
+	} else {
+		spin_lock_irqsave(&orig_hash_lock, flags);
+		/* get routing information */
+		orig_node = ((struct orig_node *)hash_find(orig_hash,
+							   ethhdr->h_dest));
+
+		/* check for hna host */
+		if (!orig_node)
+			orig_node = transtable_search(ethhdr->h_dest);
+
+		router = find_router(orig_node, NULL);
+
+		if (!router)
+			goto unlock;
+
+		/* don't lock while sending the packets ... we therefore
+		 * copy the required data before sending */
+
+		batman_if = router->if_incoming;
+		memcpy(dstaddr, router->addr, ETH_ALEN);
+
+		spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+		if (batman_if->if_status != IF_ACTIVE)
+			goto dropped;
+
+		if (my_skb_push(skb, sizeof(struct unicast_packet)) < 0)
+			goto dropped;
+
+		unicast_packet = (struct unicast_packet *)skb->data;
+
+		unicast_packet->version = COMPAT_VERSION;
+		/* batman packet type: unicast */
+		unicast_packet->packet_type = BAT_UNICAST;
+		/* set unicast ttl */
+		unicast_packet->ttl = TTL;
+		/* copy the destination for faster routing */
+		memcpy(unicast_packet->dest, orig_node->orig, ETH_ALEN);
+
+		send_skb_packet(skb, batman_if, dstaddr);
+	}
+
+	priv->stats.tx_packets++;
+	priv->stats.tx_bytes += data_len;
+	goto end;
+
+unlock:
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+dropped:
+	priv->stats.tx_dropped++;
+	kfree_skb(skb);
+end:
+	return NETDEV_TX_OK;
+}
+
+void interface_rx(struct sk_buff *skb, int hdr_size)
+{
+	struct net_device *dev = soft_device;
+	struct bat_priv *priv = netdev_priv(dev);
+
+	/* check if enough space is available for pulling, and pull */
+	if (!pskb_may_pull(skb, hdr_size)) {
+		kfree_skb(skb);
+		return;
+	}
+	skb_pull_rcsum(skb, hdr_size);
+/*	skb_set_mac_header(skb, -sizeof(struct ethhdr));*/
+
+	skb->dev = dev;
+	skb->protocol = eth_type_trans(skb, dev);
+
+	/* should not be neccesary anymore as we use skb_pull_rcsum()
+	 * TODO: please verify this and remove this TODO
+	 * -- Dec 21st 2009, Simon Wunderlich */
+
+/*	skb->ip_summed = CHECKSUM_UNNECESSARY;*/
+
+	/* TODO: set skb->pkt_type to PACKET_BROADCAST, PACKET_MULTICAST,
+	 * PACKET_OTHERHOST or PACKET_HOST */
+
+	priv->stats.rx_packets++;
+	priv->stats.rx_bytes += skb->len;
+
+	dev->last_rx = jiffies;
+
+	netif_rx(skb);
+}
+
+#ifdef HAVE_NET_DEVICE_OPS
+static const struct net_device_ops bat_netdev_ops = {
+	.ndo_open = interface_open,
+	.ndo_stop = interface_release,
+	.ndo_get_stats = interface_stats,
+	.ndo_set_mac_address = interface_set_mac_addr,
+	.ndo_change_mtu = interface_change_mtu,
+	.ndo_start_xmit = interface_tx,
+	.ndo_validate_addr = eth_validate_addr
+};
+#endif
+
+void interface_setup(struct net_device *dev)
+{
+	struct bat_priv *priv = netdev_priv(dev);
+	char dev_addr[ETH_ALEN];
+
+	ether_setup(dev);
+
+#ifdef HAVE_NET_DEVICE_OPS
+	dev->netdev_ops = &bat_netdev_ops;
+#else
+	dev->open = interface_open;
+	dev->stop = interface_release;
+	dev->get_stats = interface_stats;
+	dev->set_mac_address = interface_set_mac_addr;
+	dev->change_mtu = interface_change_mtu;
+	dev->hard_start_xmit = interface_tx;
+#endif
+	dev->destructor = free_netdev;
+
+	dev->mtu = hardif_min_mtu();
+	dev->hard_header_len = BAT_HEADER_LEN; /* reserve more space in the
+						* skbuff for our header */
+
+	/* generate random address */
+	random_ether_addr(dev_addr);
+	memcpy(dev->dev_addr, dev_addr, ETH_ALEN);
+
+	SET_ETHTOOL_OPS(dev, &bat_ethtool_ops);
+
+	memset(priv, 0, sizeof(struct bat_priv));
+}
+
+/* ethtool */
+static int bat_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+	cmd->supported = 0;
+	cmd->advertising = 0;
+	cmd->speed = SPEED_10;
+	cmd->duplex = DUPLEX_FULL;
+	cmd->port = PORT_TP;
+	cmd->phy_address = 0;
+	cmd->transceiver = XCVR_INTERNAL;
+	cmd->autoneg = AUTONEG_DISABLE;
+	cmd->maxtxpkt = 0;
+	cmd->maxrxpkt = 0;
+
+	return 0;
+}
+
+static void bat_get_drvinfo(struct net_device *dev,
+			    struct ethtool_drvinfo *info)
+{
+	strcpy(info->driver, "B.A.T.M.A.N. advanced");
+	strcpy(info->version, SOURCE_VERSION);
+	strcpy(info->fw_version, "N/A");
+	strcpy(info->bus_info, "batman");
+}
+
+static u32 bat_get_msglevel(struct net_device *dev)
+{
+	return -EOPNOTSUPP;
+}
+
+static void bat_set_msglevel(struct net_device *dev, u32 value)
+{
+}
+
+static u32 bat_get_link(struct net_device *dev)
+{
+	return 1;
+}
+
+static u32 bat_get_rx_csum(struct net_device *dev)
+{
+	return 0;
+}
+
+static int bat_set_rx_csum(struct net_device *dev, u32 data)
+{
+	return -EOPNOTSUPP;
+}
diff --git a/net/batman-adv/soft-interface.h b/net/batman-adv/soft-interface.h
new file mode 100644
index 0000000..6364854
--- /dev/null
+++ b/net/batman-adv/soft-interface.h
@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_SOFT_INTERFACE_H_
+#define _NET_BATMAN_ADV_SOFT_INTERFACE_H_
+
+void set_main_if_addr(uint8_t *addr);
+void interface_setup(struct net_device *dev);
+int interface_tx(struct sk_buff *skb, struct net_device *dev);
+void interface_rx(struct sk_buff *skb, int hdr_size);
+int my_skb_push(struct sk_buff *skb, unsigned int len);
+
+extern unsigned char main_if_addr[];
+
+#endif /* _NET_BATMAN_ADV_SOFT_INTERFACE_H_ */
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
new file mode 100644
index 0000000..9e3c845
--- /dev/null
+++ b/net/batman-adv/translation-table.c
@@ -0,0 +1,498 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include "main.h"
+#include "translation-table.h"
+#include "soft-interface.h"
+#include "types.h"
+#include "hash.h"
+
+struct hashtable_t *hna_local_hash;
+static struct hashtable_t *hna_global_hash;
+atomic_t hna_local_changed;
+
+DEFINE_SPINLOCK(hna_local_hash_lock);
+static DEFINE_SPINLOCK(hna_global_hash_lock);
+
+static void hna_local_purge(struct work_struct *work);
+static DECLARE_DELAYED_WORK(hna_local_purge_wq, hna_local_purge);
+static void _hna_global_del_orig(struct hna_global_entry *hna_global_entry,
+				 char *message);
+
+static void hna_local_start_timer(void)
+{
+	queue_delayed_work(bat_event_workqueue, &hna_local_purge_wq, 10 * HZ);
+}
+
+int hna_local_init(void)
+{
+	if (hna_local_hash)
+		return 1;
+
+	hna_local_hash = hash_new(128, compare_orig, choose_orig);
+
+	if (!hna_local_hash)
+		return 0;
+
+	atomic_set(&hna_local_changed, 0);
+	hna_local_start_timer();
+
+	return 1;
+}
+
+void hna_local_add(uint8_t *addr)
+{
+	struct hna_local_entry *hna_local_entry;
+	struct hna_global_entry *hna_global_entry;
+	struct hashtable_t *swaphash;
+	unsigned long flags;
+
+	spin_lock_irqsave(&hna_local_hash_lock, flags);
+	hna_local_entry =
+		((struct hna_local_entry *)hash_find(hna_local_hash, addr));
+	spin_unlock_irqrestore(&hna_local_hash_lock, flags);
+
+	if (hna_local_entry != NULL) {
+		hna_local_entry->last_seen = jiffies;
+		return;
+	}
+
+	/* only announce as many hosts as possible in the batman-packet and
+	   space in batman_packet->num_hna That also should give a limit to
+	   MAC-flooding. */
+	if ((num_hna + 1 > (ETH_DATA_LEN - BAT_PACKET_LEN) / ETH_ALEN) ||
+	    (num_hna + 1 > 255)) {
+		bat_dbg(DBG_ROUTES,
+			"Can't add new local hna entry (%pM): "
+			"number of local hna entries exceeds packet size\n",
+			addr);
+		return;
+	}
+
+	bat_dbg(DBG_ROUTES, "Creating new local hna entry: %pM\n",
+		addr);
+
+	hna_local_entry = kmalloc(sizeof(struct hna_local_entry), GFP_ATOMIC);
+	if (!hna_local_entry)
+		return;
+
+	memcpy(hna_local_entry->addr, addr, ETH_ALEN);
+	hna_local_entry->last_seen = jiffies;
+
+	/* the batman interface mac address should never be purged */
+	if (compare_orig(addr, soft_device->dev_addr))
+		hna_local_entry->never_purge = 1;
+	else
+		hna_local_entry->never_purge = 0;
+
+	spin_lock_irqsave(&hna_local_hash_lock, flags);
+
+	hash_add(hna_local_hash, hna_local_entry);
+	num_hna++;
+	atomic_set(&hna_local_changed, 1);
+
+	if (hna_local_hash->elements * 4 > hna_local_hash->size) {
+		swaphash = hash_resize(hna_local_hash,
+				       hna_local_hash->size * 2);
+
+		if (swaphash == NULL)
+			printk(KERN_ERR "batman-adv:"
+			       "Couldn't resize local hna hash table\n");
+		else
+			hna_local_hash = swaphash;
+	}
+
+	spin_unlock_irqrestore(&hna_local_hash_lock, flags);
+
+	/* remove address from global hash if present */
+	spin_lock_irqsave(&hna_global_hash_lock, flags);
+
+	hna_global_entry =
+		((struct hna_global_entry *)hash_find(hna_global_hash, addr));
+
+	if (hna_global_entry != NULL)
+		_hna_global_del_orig(hna_global_entry, "local hna received");
+
+	spin_unlock_irqrestore(&hna_global_hash_lock, flags);
+}
+
+int hna_local_fill_buffer(unsigned char *buff, int buff_len)
+{
+	struct hna_local_entry *hna_local_entry;
+	HASHIT(hashit);
+	int i = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&hna_local_hash_lock, flags);
+
+	while (hash_iterate(hna_local_hash, &hashit)) {
+
+		if (buff_len < (i + 1) * ETH_ALEN)
+			break;
+
+		hna_local_entry = hashit.bucket->data;
+		memcpy(buff + (i * ETH_ALEN), hna_local_entry->addr, ETH_ALEN);
+
+		i++;
+	}
+
+	/* if we did not get all new local hnas see you next time  ;-) */
+	if (i == num_hna)
+		atomic_set(&hna_local_changed, 0);
+
+	spin_unlock_irqrestore(&hna_local_hash_lock, flags);
+
+	return i;
+}
+
+int hna_local_seq_print_text(struct seq_file *seq, void *offset)
+{
+	struct net_device *net_dev = (struct net_device *)seq->private;
+	struct bat_priv *bat_priv = netdev_priv(net_dev);
+	struct hna_local_entry *hna_local_entry;
+	HASHIT(hashit);
+	HASHIT(hashit_count);
+	unsigned long flags;
+	size_t buf_size, pos;
+	char *buff;
+
+	if (!bat_priv->primary_if) {
+		return seq_printf(seq, "BATMAN mesh %s disabled - "
+			       "please specify interfaces to enable it\n",
+			       net_dev->name);
+	}
+
+	seq_printf(seq, "Locally retrieved addresses (from %s) "
+		   "announced via HNA:\n",
+		   net_dev->name);
+
+	spin_lock_irqsave(&hna_local_hash_lock, flags);
+
+	buf_size = 1;
+	/* Estimate length for: " * xx:xx:xx:xx:xx:xx\n" */
+	while (hash_iterate(hna_local_hash, &hashit_count))
+		buf_size += 21;
+
+	buff = kmalloc(buf_size, GFP_ATOMIC);
+	if (!buff) {
+		spin_unlock_irqrestore(&hna_local_hash_lock, flags);
+		return -ENOMEM;
+	}
+	buff[0] = '\0';
+	pos = 0;
+
+	while (hash_iterate(hna_local_hash, &hashit)) {
+		hna_local_entry = hashit.bucket->data;
+
+		pos += snprintf(buff + pos, 22, " * %pM\n",
+				hna_local_entry->addr);
+	}
+
+	spin_unlock_irqrestore(&hna_local_hash_lock, flags);
+
+	seq_printf(seq, "%s", buff);
+	kfree(buff);
+	return 0;
+}
+
+static void _hna_local_del(void *data)
+{
+	kfree(data);
+	num_hna--;
+	atomic_set(&hna_local_changed, 1);
+}
+
+static void hna_local_del(struct hna_local_entry *hna_local_entry,
+			  char *message)
+{
+	bat_dbg(DBG_ROUTES, "Deleting local hna entry (%pM): %s\n",
+		hna_local_entry->addr, message);
+
+	hash_remove(hna_local_hash, hna_local_entry->addr);
+	_hna_local_del(hna_local_entry);
+}
+
+void hna_local_remove(uint8_t *addr, char *message)
+{
+	struct hna_local_entry *hna_local_entry;
+	unsigned long flags;
+
+	spin_lock_irqsave(&hna_local_hash_lock, flags);
+
+	hna_local_entry = (struct hna_local_entry *)
+		hash_find(hna_local_hash, addr);
+	if (hna_local_entry)
+		hna_local_del(hna_local_entry, message);
+
+	spin_unlock_irqrestore(&hna_local_hash_lock, flags);
+}
+
+static void hna_local_purge(struct work_struct *work)
+{
+	struct hna_local_entry *hna_local_entry;
+	HASHIT(hashit);
+	unsigned long flags;
+	unsigned long timeout;
+
+	spin_lock_irqsave(&hna_local_hash_lock, flags);
+
+	while (hash_iterate(hna_local_hash, &hashit)) {
+		hna_local_entry = hashit.bucket->data;
+
+		timeout = hna_local_entry->last_seen + LOCAL_HNA_TIMEOUT * HZ;
+		if ((!hna_local_entry->never_purge) &&
+		    time_after(jiffies, timeout))
+			hna_local_del(hna_local_entry, "address timed out");
+	}
+
+	spin_unlock_irqrestore(&hna_local_hash_lock, flags);
+	hna_local_start_timer();
+}
+
+void hna_local_free(void)
+{
+	if (!hna_local_hash)
+		return;
+
+	cancel_delayed_work_sync(&hna_local_purge_wq);
+	hash_delete(hna_local_hash, _hna_local_del);
+	hna_local_hash = NULL;
+}
+
+int hna_global_init(void)
+{
+	if (hna_global_hash)
+		return 1;
+
+	hna_global_hash = hash_new(128, compare_orig, choose_orig);
+
+	if (!hna_global_hash)
+		return 0;
+
+	return 1;
+}
+
+void hna_global_add_orig(struct orig_node *orig_node,
+			 unsigned char *hna_buff, int hna_buff_len)
+{
+	struct hna_global_entry *hna_global_entry;
+	struct hna_local_entry *hna_local_entry;
+	struct hashtable_t *swaphash;
+	int hna_buff_count = 0;
+	unsigned long flags;
+	unsigned char *hna_ptr;
+
+	while ((hna_buff_count + 1) * ETH_ALEN <= hna_buff_len) {
+		spin_lock_irqsave(&hna_global_hash_lock, flags);
+
+		hna_ptr = hna_buff + (hna_buff_count * ETH_ALEN);
+		hna_global_entry = (struct hna_global_entry *)
+			hash_find(hna_global_hash, hna_ptr);
+
+		if (hna_global_entry == NULL) {
+			spin_unlock_irqrestore(&hna_global_hash_lock, flags);
+
+			hna_global_entry =
+				kmalloc(sizeof(struct hna_global_entry),
+					GFP_ATOMIC);
+
+			if (!hna_global_entry)
+				break;
+
+			memcpy(hna_global_entry->addr, hna_ptr, ETH_ALEN);
+
+			bat_dbg(DBG_ROUTES,
+				"Creating new global hna entry: "
+				"%pM (via %pM)\n",
+				hna_global_entry->addr, orig_node->orig);
+
+			spin_lock_irqsave(&hna_global_hash_lock, flags);
+			hash_add(hna_global_hash, hna_global_entry);
+
+		}
+
+		hna_global_entry->orig_node = orig_node;
+		spin_unlock_irqrestore(&hna_global_hash_lock, flags);
+
+		/* remove address from local hash if present */
+		spin_lock_irqsave(&hna_local_hash_lock, flags);
+
+		hna_ptr = hna_buff + (hna_buff_count * ETH_ALEN);
+		hna_local_entry = (struct hna_local_entry *)
+			hash_find(hna_local_hash, hna_ptr);
+
+		if (hna_local_entry != NULL)
+			hna_local_del(hna_local_entry, "global hna received");
+
+		spin_unlock_irqrestore(&hna_local_hash_lock, flags);
+
+		hna_buff_count++;
+	}
+
+	/* initialize, and overwrite if malloc succeeds */
+	orig_node->hna_buff = NULL;
+	orig_node->hna_buff_len = 0;
+
+	if (hna_buff_len > 0) {
+		orig_node->hna_buff = kmalloc(hna_buff_len, GFP_ATOMIC);
+		if (orig_node->hna_buff) {
+			memcpy(orig_node->hna_buff, hna_buff, hna_buff_len);
+			orig_node->hna_buff_len = hna_buff_len;
+		}
+	}
+
+	spin_lock_irqsave(&hna_global_hash_lock, flags);
+
+	if (hna_global_hash->elements * 4 > hna_global_hash->size) {
+		swaphash = hash_resize(hna_global_hash,
+				       hna_global_hash->size * 2);
+
+		if (swaphash == NULL)
+			printk(KERN_ERR "batman-adv:"
+			       "Couldn't resize global hna hash table\n");
+		else
+			hna_global_hash = swaphash;
+	}
+
+	spin_unlock_irqrestore(&hna_global_hash_lock, flags);
+}
+
+int hna_global_seq_print_text(struct seq_file *seq, void *offset)
+{
+	struct net_device *net_dev = (struct net_device *)seq->private;
+	struct bat_priv *bat_priv = netdev_priv(net_dev);
+	struct hna_global_entry *hna_global_entry;
+	HASHIT(hashit);
+	HASHIT(hashit_count);
+	unsigned long flags;
+	size_t buf_size, pos;
+	char *buff;
+
+	if (!bat_priv->primary_if) {
+		return seq_printf(seq, "BATMAN mesh %s disabled - "
+				  "please specify interfaces to enable it\n",
+				  net_dev->name);
+	}
+
+	seq_printf(seq, "Globally announced HNAs received via the mesh %s\n",
+		   net_dev->name);
+
+	spin_lock_irqsave(&hna_global_hash_lock, flags);
+
+	buf_size = 1;
+	/* Estimate length for: " * xx:xx:xx:xx:xx:xx via xx:xx:xx:xx:xx:xx\n"*/
+	while (hash_iterate(hna_global_hash, &hashit_count))
+		buf_size += 43;
+
+	buff = kmalloc(buf_size, GFP_ATOMIC);
+	if (!buff) {
+		spin_unlock_irqrestore(&hna_global_hash_lock, flags);
+		return -ENOMEM;
+	}
+	buff[0] = '\0';
+	pos = 0;
+
+	while (hash_iterate(hna_global_hash, &hashit)) {
+		hna_global_entry = hashit.bucket->data;
+
+		pos += snprintf(buff + pos, 44,
+				" * %pM via %pM\n", hna_global_entry->addr,
+				hna_global_entry->orig_node->orig);
+	}
+
+	spin_unlock_irqrestore(&hna_global_hash_lock, flags);
+
+	seq_printf(seq, "%s", buff);
+	kfree(buff);
+	return 0;
+}
+
+static void _hna_global_del_orig(struct hna_global_entry *hna_global_entry,
+				 char *message)
+{
+	bat_dbg(DBG_ROUTES, "Deleting global hna entry %pM (via %pM): %s\n",
+		hna_global_entry->addr, hna_global_entry->orig_node->orig,
+		message);
+
+	hash_remove(hna_global_hash, hna_global_entry->addr);
+	kfree(hna_global_entry);
+}
+
+void hna_global_del_orig(struct orig_node *orig_node, char *message)
+{
+	struct hna_global_entry *hna_global_entry;
+	int hna_buff_count = 0;
+	unsigned long flags;
+	unsigned char *hna_ptr;
+
+	if (orig_node->hna_buff_len == 0)
+		return;
+
+	spin_lock_irqsave(&hna_global_hash_lock, flags);
+
+	while ((hna_buff_count + 1) * ETH_ALEN <= orig_node->hna_buff_len) {
+		hna_ptr = orig_node->hna_buff + (hna_buff_count * ETH_ALEN);
+		hna_global_entry = (struct hna_global_entry *)
+			hash_find(hna_global_hash, hna_ptr);
+
+		if ((hna_global_entry != NULL) &&
+		    (hna_global_entry->orig_node == orig_node))
+			_hna_global_del_orig(hna_global_entry, message);
+
+		hna_buff_count++;
+	}
+
+	spin_unlock_irqrestore(&hna_global_hash_lock, flags);
+
+	orig_node->hna_buff_len = 0;
+	kfree(orig_node->hna_buff);
+	orig_node->hna_buff = NULL;
+}
+
+static void hna_global_del(void *data)
+{
+	kfree(data);
+}
+
+void hna_global_free(void)
+{
+	if (!hna_global_hash)
+		return;
+
+	hash_delete(hna_global_hash, hna_global_del);
+	hna_global_hash = NULL;
+}
+
+struct orig_node *transtable_search(uint8_t *addr)
+{
+	struct hna_global_entry *hna_global_entry;
+	unsigned long flags;
+
+	spin_lock_irqsave(&hna_global_hash_lock, flags);
+	hna_global_entry = (struct hna_global_entry *)
+		hash_find(hna_global_hash, addr);
+	spin_unlock_irqrestore(&hna_global_hash_lock, flags);
+
+	if (hna_global_entry == NULL)
+		return NULL;
+
+	return hna_global_entry->orig_node;
+}
diff --git a/net/batman-adv/translation-table.h b/net/batman-adv/translation-table.h
new file mode 100644
index 0000000..fa93e37
--- /dev/null
+++ b/net/batman-adv/translation-table.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_TRANSLATION_TABLE_H_
+#define _NET_BATMAN_ADV_TRANSLATION_TABLE_H_
+
+#include "types.h"
+
+int hna_local_init(void);
+void hna_local_add(uint8_t *addr);
+void hna_local_remove(uint8_t *addr, char *message);
+int hna_local_fill_buffer(unsigned char *buff, int buff_len);
+int hna_local_seq_print_text(struct seq_file *seq, void *offset);
+void hna_local_free(void);
+int hna_global_init(void);
+void hna_global_add_orig(struct orig_node *orig_node, unsigned char *hna_buff,
+			 int hna_buff_len);
+int hna_global_seq_print_text(struct seq_file *seq, void *offset);
+void hna_global_del_orig(struct orig_node *orig_node, char *message);
+void hna_global_free(void);
+struct orig_node *transtable_search(uint8_t *addr);
+
+extern spinlock_t hna_local_hash_lock;
+extern struct hashtable_t *hna_local_hash;
+extern atomic_t hna_local_changed;
+
+#endif /* _NET_BATMAN_ADV_TRANSLATION_TABLE_H_ */
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
new file mode 100644
index 0000000..82602d0
--- /dev/null
+++ b/net/batman-adv/types.h
@@ -0,0 +1,175 @@
+/*
+ * Copyright (C) 2007-2010 B.A.T.M.A.N. contributors:
+ *
+ * Marek Lindner, Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+
+
+
+
+#ifndef _NET_BATMAN_ADV_TYPES_H_
+#define _NET_BATMAN_ADV_TYPES_H_
+
+#include "packet.h"
+#include "bitarray.h"
+
+#define BAT_HEADER_LEN (sizeof(struct ethhdr) + \
+	((sizeof(struct unicast_packet) > sizeof(struct bcast_packet) ? \
+	 sizeof(struct unicast_packet) : \
+	 sizeof(struct bcast_packet))))
+
+
+struct batman_if {
+	struct list_head list;
+	int16_t if_num;
+	char *dev;
+	char if_status;
+	char addr_str[ETH_STR_LEN];
+	struct net_device *net_dev;
+	atomic_t seqno;
+	unsigned char *packet_buff;
+	int packet_len;
+	struct kobject *hardif_obj;
+	struct rcu_head rcu;
+
+};
+
+/**
+  *	orig_node - structure for orig_list maintaining nodes of mesh
+  *	@primary_addr: hosts primary interface address
+  *	@last_valid: when last packet from this node was received
+  *	@bcast_seqno_reset: time when the broadcast seqno window was reset
+  *	@batman_seqno_reset: time when the batman seqno window was reset
+  *	@flags: for now only VIS_SERVER flag
+  *	@last_real_seqno: last and best known squence number
+  *	@last_ttl: ttl of last received packet
+  *	@last_bcast_seqno: last broadcast sequence number received by this host
+  *
+  *	@candidates: how many candidates are available
+  *	@selected: next bonding candidate
+ */
+struct orig_node {
+	uint8_t orig[ETH_ALEN];
+	uint8_t primary_addr[ETH_ALEN];
+	struct neigh_node *router;
+	TYPE_OF_WORD *bcast_own;
+	uint8_t *bcast_own_sum;
+	uint8_t tq_own;
+	int tq_asym_penalty;
+	unsigned long last_valid;
+	unsigned long bcast_seqno_reset;
+	unsigned long batman_seqno_reset;
+	uint8_t  flags;
+	unsigned char *hna_buff;
+	int16_t hna_buff_len;
+	uint32_t last_real_seqno;
+	uint8_t last_ttl;
+	TYPE_OF_WORD bcast_bits[NUM_WORDS];
+	uint32_t last_bcast_seqno;
+	struct list_head neigh_list;
+	struct {
+		uint8_t candidates;
+		struct neigh_node *selected;
+	} bond;
+};
+
+/**
+  *	neigh_node
+  *	@last_valid: when last packet via this neighbor was received
+ */
+struct neigh_node {
+	struct list_head list;
+	uint8_t addr[ETH_ALEN];
+	uint8_t real_packet_count;
+	uint8_t tq_recv[TQ_GLOBAL_WINDOW_SIZE];
+	uint8_t tq_index;
+	uint8_t tq_avg;
+	uint8_t last_ttl;
+	struct neigh_node *next_bond_candidate;
+	unsigned long last_valid;
+	TYPE_OF_WORD real_bits[NUM_WORDS];
+	struct orig_node *orig_node;
+	struct batman_if *if_incoming;
+};
+
+struct bat_priv {
+	struct net_device_stats stats;
+	atomic_t aggregation_enabled;
+	atomic_t bonding_enabled;
+	atomic_t vis_mode;
+	atomic_t orig_interval;
+	char num_ifaces;
+	struct batman_if *primary_if;
+	struct kobject *mesh_obj;
+	struct dentry *debug_dir;
+};
+
+struct socket_client {
+	struct list_head queue_list;
+	unsigned int queue_len;
+	unsigned char index;
+	spinlock_t lock;
+	wait_queue_head_t queue_wait;
+};
+
+struct socket_packet {
+	struct list_head list;
+	size_t icmp_len;
+	struct icmp_packet_rr icmp_packet;
+};
+
+struct hna_local_entry {
+	uint8_t addr[ETH_ALEN];
+	unsigned long last_seen;
+	char never_purge;
+};
+
+struct hna_global_entry {
+	uint8_t addr[ETH_ALEN];
+	struct orig_node *orig_node;
+};
+
+/**
+  *	forw_packet - structure for forw_list maintaining packets to be
+  *	              send/forwarded
+ */
+struct forw_packet {
+	struct hlist_node list;
+	unsigned long send_time;
+	uint8_t own;
+	struct sk_buff *skb;
+	unsigned char *packet_buff;
+	uint16_t packet_len;
+	uint32_t direct_link_flags;
+	uint8_t num_packets;
+	struct delayed_work delayed_work;
+	struct batman_if *if_incoming;
+};
+
+/* While scanning for vis-entries of a particular vis-originator
+ * this list collects its interfaces to create a subgraph/cluster
+ * out of them later
+ */
+struct if_list_entry {
+	uint8_t addr[ETH_ALEN];
+	bool primary;
+	struct hlist_node list;
+};
+
+#endif /* _NET_BATMAN_ADV_TYPES_H_ */
diff --git a/net/batman-adv/vis.c b/net/batman-adv/vis.c
new file mode 100644
index 0000000..ddee0f2
--- /dev/null
+++ b/net/batman-adv/vis.c
@@ -0,0 +1,818 @@
+/*
+ * Copyright (C) 2008-2010 B.A.T.M.A.N. contributors:
+ *
+ * Simon Wunderlich
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#include "main.h"
+#include "send.h"
+#include "translation-table.h"
+#include "vis.h"
+#include "soft-interface.h"
+#include "hard-interface.h"
+#include "hash.h"
+
+/* Returns the smallest signed integer in two's complement with the sizeof x */
+#define smallest_signed_int(x) (1u << (7u + 8u * (sizeof(x) - 1u)))
+
+/* Checks if a sequence number x is a predecessor/successor of y.
+   they handle overflows/underflows and can correctly check for a
+   predecessor/successor unless the variable sequence number has grown by
+   more then 2**(bitwidth(x)-1)-1.
+   This means that for a uint8_t with the maximum value 255, it would think:
+    * when adding nothing - it is neither a predecessor nor a successor
+    * before adding more than 127 to the starting value - it is a predecessor,
+    * when adding 128 - it is neither a predecessor nor a successor,
+    * after adding more than 127 to the starting value - it is a successor */
+#define seq_before(x, y) ({typeof(x) _dummy = (x - y); \
+			_dummy > smallest_signed_int(_dummy); })
+#define seq_after(x, y) seq_before(y, x)
+
+static struct hashtable_t *vis_hash;
+static DEFINE_SPINLOCK(vis_hash_lock);
+static DEFINE_SPINLOCK(recv_list_lock);
+static struct vis_info *my_vis_info;
+static struct list_head send_list;	/* always locked with vis_hash_lock */
+
+static void start_vis_timer(void);
+
+/* free the info */
+static void free_info(struct kref *ref)
+{
+	struct vis_info *info = container_of(ref, struct vis_info, refcount);
+	struct recvlist_node *entry, *tmp;
+	unsigned long flags;
+
+	list_del_init(&info->send_list);
+	spin_lock_irqsave(&recv_list_lock, flags);
+	list_for_each_entry_safe(entry, tmp, &info->recv_list, list) {
+		list_del(&entry->list);
+		kfree(entry);
+	}
+	spin_unlock_irqrestore(&recv_list_lock, flags);
+	kfree(info);
+}
+
+/* Compare two vis packets, used by the hashing algorithm */
+static int vis_info_cmp(void *data1, void *data2)
+{
+	struct vis_info *d1, *d2;
+	d1 = data1;
+	d2 = data2;
+	return compare_orig(d1->packet.vis_orig, d2->packet.vis_orig);
+}
+
+/* hash function to choose an entry in a hash table of given size */
+/* hash algorithm from http://en.wikipedia.org/wiki/Hash_table */
+static int vis_info_choose(void *data, int size)
+{
+	struct vis_info *vis_info = data;
+	unsigned char *key;
+	uint32_t hash = 0;
+	size_t i;
+
+	key = vis_info->packet.vis_orig;
+	for (i = 0; i < ETH_ALEN; i++) {
+		hash += key[i];
+		hash += (hash << 10);
+		hash ^= (hash >> 6);
+	}
+
+	hash += (hash << 3);
+	hash ^= (hash >> 11);
+	hash += (hash << 15);
+
+	return hash % size;
+}
+
+/* insert interface to the list of interfaces of one originator, if it
+ * does not already exist in the list */
+static void vis_data_insert_interface(const uint8_t *interface,
+				      struct hlist_head *if_list,
+				      bool primary)
+{
+	struct if_list_entry *entry;
+	struct hlist_node *pos;
+
+	hlist_for_each_entry(entry, pos, if_list, list) {
+		if (compare_orig(entry->addr, (void *)interface))
+			return;
+	}
+
+	/* its a new address, add it to the list */
+	entry = kmalloc(sizeof(*entry), GFP_ATOMIC);
+	if (!entry)
+		return;
+	memcpy(entry->addr, interface, ETH_ALEN);
+	entry->primary = primary;
+	hlist_add_head(&entry->list, if_list);
+}
+
+static ssize_t vis_data_read_prim_sec(char *buff, struct hlist_head *if_list)
+{
+	struct if_list_entry *entry;
+	struct hlist_node *pos;
+	char tmp_addr_str[ETH_STR_LEN];
+	size_t len = 0;
+
+	hlist_for_each_entry(entry, pos, if_list, list) {
+		if (entry->primary)
+			len += sprintf(buff + len, "PRIMARY, ");
+		else {
+			addr_to_string(tmp_addr_str, entry->addr);
+			len += sprintf(buff + len,  "SEC %s, ", tmp_addr_str);
+		}
+	}
+
+	return len;
+}
+
+static size_t vis_data_count_prim_sec(struct hlist_head *if_list)
+{
+	struct if_list_entry *entry;
+	struct hlist_node *pos;
+	size_t count = 0;
+
+	hlist_for_each_entry(entry, pos, if_list, list) {
+		if (entry->primary)
+			count += 9;
+		else
+			count += 23;
+	}
+
+	return count;
+}
+
+/* read an entry  */
+static ssize_t vis_data_read_entry(char *buff, struct vis_info_entry *entry,
+				   uint8_t *src, bool primary)
+{
+	char to[18];
+
+	/* maximal length: max(4+17+2, 3+17+1+3+2) == 26 */
+	addr_to_string(to, entry->dest);
+	if (primary && entry->quality == 0)
+		return sprintf(buff, "HNA %s, ", to);
+	else if (compare_orig(entry->src, src))
+		return sprintf(buff, "TQ %s %d, ", to, entry->quality);
+
+	return 0;
+}
+
+int vis_seq_print_text(struct seq_file *seq, void *offset)
+{
+	HASHIT(hashit);
+	HASHIT(hashit_count);
+	struct vis_info *info;
+	struct vis_info_entry *entries;
+	struct net_device *net_dev = (struct net_device *)seq->private;
+	struct bat_priv *bat_priv = netdev_priv(net_dev);
+	HLIST_HEAD(vis_if_list);
+	struct if_list_entry *entry;
+	struct hlist_node *pos, *n;
+	int i;
+	char tmp_addr_str[ETH_STR_LEN];
+	unsigned long flags;
+	int vis_server = atomic_read(&bat_priv->vis_mode);
+	size_t buff_pos, buf_size;
+	char *buff;
+
+	if ((!bat_priv->primary_if) ||
+	    (vis_server == VIS_TYPE_CLIENT_UPDATE))
+		return 0;
+
+	buf_size = 1;
+	/* Estimate length */
+	spin_lock_irqsave(&vis_hash_lock, flags);
+	while (hash_iterate(vis_hash, &hashit_count)) {
+		info = hashit_count.bucket->data;
+		entries = (struct vis_info_entry *)
+			((char *)info + sizeof(struct vis_info));
+
+		for (i = 0; i < info->packet.entries; i++) {
+			if (entries[i].quality == 0)
+				continue;
+			vis_data_insert_interface(entries[i].src, &vis_if_list,
+				compare_orig(entries[i].src,
+						info->packet.vis_orig));
+		}
+
+		hlist_for_each_entry(entry, pos, &vis_if_list, list) {
+			buf_size += 18 + 26 * info->packet.entries;
+
+			/* add primary/secondary records */
+			if (compare_orig(entry->addr, info->packet.vis_orig))
+				buf_size +=
+					vis_data_count_prim_sec(&vis_if_list);
+
+			buf_size += 1;
+		}
+
+		hlist_for_each_entry_safe(entry, pos, n, &vis_if_list, list) {
+			hlist_del(&entry->list);
+			kfree(entry);
+		}
+	}
+
+	buff = kmalloc(buf_size, GFP_ATOMIC);
+	if (!buff) {
+		spin_unlock_irqrestore(&vis_hash_lock, flags);
+		return -ENOMEM;
+	}
+	buff[0] = '\0';
+	buff_pos = 0;
+
+	while (hash_iterate(vis_hash, &hashit)) {
+		info = hashit.bucket->data;
+		entries = (struct vis_info_entry *)
+			((char *)info + sizeof(struct vis_info));
+
+		for (i = 0; i < info->packet.entries; i++) {
+			if (entries[i].quality == 0)
+				continue;
+			vis_data_insert_interface(entries[i].src, &vis_if_list,
+				compare_orig(entries[i].src,
+						info->packet.vis_orig));
+		}
+
+		hlist_for_each_entry(entry, pos, &vis_if_list, list) {
+			addr_to_string(tmp_addr_str, entry->addr);
+			buff_pos += sprintf(buff + buff_pos, "%s,",
+					    tmp_addr_str);
+
+			for (i = 0; i < info->packet.entries; i++)
+				buff_pos += vis_data_read_entry(buff + buff_pos,
+								&entries[i],
+								entry->addr,
+								entry->primary);
+
+			/* add primary/secondary records */
+			if (compare_orig(entry->addr, info->packet.vis_orig))
+				buff_pos +=
+					vis_data_read_prim_sec(buff + buff_pos,
+							       &vis_if_list);
+
+			buff_pos += sprintf(buff + buff_pos, "\n");
+		}
+
+		hlist_for_each_entry_safe(entry, pos, n, &vis_if_list, list) {
+			hlist_del(&entry->list);
+			kfree(entry);
+		}
+	}
+
+	spin_unlock_irqrestore(&vis_hash_lock, flags);
+
+	seq_printf(seq, "%s", buff);
+	kfree(buff);
+
+	return 0;
+}
+
+/* add the info packet to the send list, if it was not
+ * already linked in. */
+static void send_list_add(struct vis_info *info)
+{
+	if (list_empty(&info->send_list)) {
+		kref_get(&info->refcount);
+		list_add_tail(&info->send_list, &send_list);
+	}
+}
+
+/* delete the info packet from the send list, if it was
+ * linked in. */
+static void send_list_del(struct vis_info *info)
+{
+	if (!list_empty(&info->send_list)) {
+		list_del_init(&info->send_list);
+		kref_put(&info->refcount, free_info);
+	}
+}
+
+/* tries to add one entry to the receive list. */
+static void recv_list_add(struct list_head *recv_list, char *mac)
+{
+	struct recvlist_node *entry;
+	unsigned long flags;
+
+	entry = kmalloc(sizeof(struct recvlist_node), GFP_ATOMIC);
+	if (!entry)
+		return;
+
+	memcpy(entry->mac, mac, ETH_ALEN);
+	spin_lock_irqsave(&recv_list_lock, flags);
+	list_add_tail(&entry->list, recv_list);
+	spin_unlock_irqrestore(&recv_list_lock, flags);
+}
+
+/* returns 1 if this mac is in the recv_list */
+static int recv_list_is_in(struct list_head *recv_list, char *mac)
+{
+	struct recvlist_node *entry;
+	unsigned long flags;
+
+	spin_lock_irqsave(&recv_list_lock, flags);
+	list_for_each_entry(entry, recv_list, list) {
+		if (memcmp(entry->mac, mac, ETH_ALEN) == 0) {
+			spin_unlock_irqrestore(&recv_list_lock, flags);
+			return 1;
+		}
+	}
+	spin_unlock_irqrestore(&recv_list_lock, flags);
+	return 0;
+}
+
+/* try to add the packet to the vis_hash. return NULL if invalid (e.g. too old,
+ * broken.. ).	vis hash must be locked outside.  is_new is set when the packet
+ * is newer than old entries in the hash. */
+static struct vis_info *add_packet(struct vis_packet *vis_packet,
+				   int vis_info_len, int *is_new,
+				   int make_broadcast)
+{
+	struct vis_info *info, *old_info;
+	struct vis_info search_elem;
+
+	*is_new = 0;
+	/* sanity check */
+	if (vis_hash == NULL)
+		return NULL;
+
+	/* see if the packet is already in vis_hash */
+	memcpy(search_elem.packet.vis_orig, vis_packet->vis_orig, ETH_ALEN);
+	old_info = hash_find(vis_hash, &search_elem);
+
+	if (old_info != NULL) {
+		if (!seq_after(ntohl(vis_packet->seqno),
+				ntohl(old_info->packet.seqno))) {
+			if (old_info->packet.seqno == vis_packet->seqno) {
+				recv_list_add(&old_info->recv_list,
+					      vis_packet->sender_orig);
+				return old_info;
+			} else {
+				/* newer packet is already in hash. */
+				return NULL;
+			}
+		}
+		/* remove old entry */
+		hash_remove(vis_hash, old_info);
+		send_list_del(old_info);
+		kref_put(&old_info->refcount, free_info);
+	}
+
+	info = kmalloc(sizeof(struct vis_info) + vis_info_len, GFP_ATOMIC);
+	if (info == NULL)
+		return NULL;
+
+	kref_init(&info->refcount);
+	INIT_LIST_HEAD(&info->send_list);
+	INIT_LIST_HEAD(&info->recv_list);
+	info->first_seen = jiffies;
+	memcpy(&info->packet, vis_packet,
+	       sizeof(struct vis_packet) + vis_info_len);
+
+	/* initialize and add new packet. */
+	*is_new = 1;
+
+	/* Make it a broadcast packet, if required */
+	if (make_broadcast)
+		memcpy(info->packet.target_orig, broadcast_addr, ETH_ALEN);
+
+	/* repair if entries is longer than packet. */
+	if (info->packet.entries * sizeof(struct vis_info_entry) > vis_info_len)
+		info->packet.entries = vis_info_len /
+			sizeof(struct vis_info_entry);
+
+	recv_list_add(&info->recv_list, info->packet.sender_orig);
+
+	/* try to add it */
+	if (hash_add(vis_hash, info) < 0) {
+		/* did not work (for some reason) */
+		kref_put(&old_info->refcount, free_info);
+		info = NULL;
+	}
+
+	return info;
+}
+
+/* handle the server sync packet, forward if needed. */
+void receive_server_sync_packet(struct bat_priv *bat_priv,
+				struct vis_packet *vis_packet,
+				int vis_info_len)
+{
+	struct vis_info *info;
+	int is_new, make_broadcast;
+	unsigned long flags;
+	int vis_server = atomic_read(&bat_priv->vis_mode);
+
+	make_broadcast = (vis_server == VIS_TYPE_SERVER_SYNC);
+
+	spin_lock_irqsave(&vis_hash_lock, flags);
+	info = add_packet(vis_packet, vis_info_len, &is_new, make_broadcast);
+	if (info == NULL)
+		goto end;
+
+	/* only if we are server ourselves and packet is newer than the one in
+	 * hash.*/
+	if (vis_server == VIS_TYPE_SERVER_SYNC && is_new)
+		send_list_add(info);
+end:
+	spin_unlock_irqrestore(&vis_hash_lock, flags);
+}
+
+/* handle an incoming client update packet and schedule forward if needed. */
+void receive_client_update_packet(struct bat_priv *bat_priv,
+				  struct vis_packet *vis_packet,
+				  int vis_info_len)
+{
+	struct vis_info *info;
+	int is_new;
+	unsigned long flags;
+	int vis_server = atomic_read(&bat_priv->vis_mode);
+	int are_target = 0;
+
+	/* clients shall not broadcast. */
+	if (is_bcast(vis_packet->target_orig))
+		return;
+
+	/* Are we the target for this VIS packet? */
+	if (vis_server == VIS_TYPE_SERVER_SYNC	&&
+	    is_my_mac(vis_packet->target_orig))
+		are_target = 1;
+
+	spin_lock_irqsave(&vis_hash_lock, flags);
+	info = add_packet(vis_packet, vis_info_len, &is_new, are_target);
+	if (info == NULL)
+		goto end;
+	/* note that outdated packets will be dropped at this point. */
+
+
+	/* send only if we're the target server or ... */
+	if (are_target && is_new) {
+		info->packet.vis_type = VIS_TYPE_SERVER_SYNC;	/* upgrade! */
+		send_list_add(info);
+
+		/* ... we're not the recipient (and thus need to forward). */
+	} else if (!is_my_mac(info->packet.target_orig)) {
+		send_list_add(info);
+	}
+end:
+	spin_unlock_irqrestore(&vis_hash_lock, flags);
+}
+
+/* Walk the originators and find the VIS server with the best tq. Set the packet
+ * address to its address and return the best_tq.
+ *
+ * Must be called with the originator hash locked */
+static int find_best_vis_server(struct vis_info *info)
+{
+	HASHIT(hashit);
+	struct orig_node *orig_node;
+	int best_tq = -1;
+
+	while (hash_iterate(orig_hash, &hashit)) {
+		orig_node = hashit.bucket->data;
+		if ((orig_node != NULL) &&
+		    (orig_node->router != NULL) &&
+		    (orig_node->flags & VIS_SERVER) &&
+		    (orig_node->router->tq_avg > best_tq)) {
+			best_tq = orig_node->router->tq_avg;
+			memcpy(info->packet.target_orig, orig_node->orig,
+			       ETH_ALEN);
+		}
+	}
+	return best_tq;
+}
+
+/* Return true if the vis packet is full. */
+static bool vis_packet_full(struct vis_info *info)
+{
+	if (info->packet.entries + 1 >
+	    (1000 - sizeof(struct vis_info)) / sizeof(struct vis_info_entry))
+		return true;
+	return false;
+}
+
+/* generates a packet of own vis data,
+ * returns 0 on success, -1 if no packet could be generated */
+static int generate_vis_packet(struct bat_priv *bat_priv)
+{
+	HASHIT(hashit_local);
+	HASHIT(hashit_global);
+	struct orig_node *orig_node;
+	struct vis_info *info = (struct vis_info *)my_vis_info;
+	struct vis_info_entry *entry, *entry_array;
+	struct hna_local_entry *hna_local_entry;
+	int best_tq = -1;
+	unsigned long flags;
+
+	info->first_seen = jiffies;
+	info->packet.vis_type = atomic_read(&bat_priv->vis_mode);
+
+	spin_lock_irqsave(&orig_hash_lock, flags);
+	memcpy(info->packet.target_orig, broadcast_addr, ETH_ALEN);
+	info->packet.ttl = TTL;
+	info->packet.seqno = htonl(ntohl(info->packet.seqno) + 1);
+	info->packet.entries = 0;
+
+	if (info->packet.vis_type == VIS_TYPE_CLIENT_UPDATE) {
+		best_tq = find_best_vis_server(info);
+		if (best_tq < 0) {
+			spin_unlock_irqrestore(&orig_hash_lock, flags);
+			return -1;
+		}
+	}
+
+	entry_array = (struct vis_info_entry *)
+		((char *)info + sizeof(struct vis_info));
+
+	while (hash_iterate(orig_hash, &hashit_global)) {
+		orig_node = hashit_global.bucket->data;
+		if (orig_node->router != NULL
+			&& compare_orig(orig_node->router->addr,
+					orig_node->orig)
+			&& (orig_node->router->if_incoming->if_status ==
+								IF_ACTIVE)
+		    && orig_node->router->tq_avg > 0) {
+
+			/* fill one entry into buffer. */
+			entry = &entry_array[info->packet.entries];
+			memcpy(entry->src,
+			     orig_node->router->if_incoming->net_dev->dev_addr,
+			       ETH_ALEN);
+			memcpy(entry->dest, orig_node->orig, ETH_ALEN);
+			entry->quality = orig_node->router->tq_avg;
+			info->packet.entries++;
+
+			if (vis_packet_full(info)) {
+				spin_unlock_irqrestore(&orig_hash_lock, flags);
+				return 0;
+			}
+		}
+	}
+
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+	spin_lock_irqsave(&hna_local_hash_lock, flags);
+	while (hash_iterate(hna_local_hash, &hashit_local)) {
+		hna_local_entry = hashit_local.bucket->data;
+		entry = &entry_array[info->packet.entries];
+		memset(entry->src, 0, ETH_ALEN);
+		memcpy(entry->dest, hna_local_entry->addr, ETH_ALEN);
+		entry->quality = 0; /* 0 means HNA */
+		info->packet.entries++;
+
+		if (vis_packet_full(info)) {
+			spin_unlock_irqrestore(&hna_local_hash_lock, flags);
+			return 0;
+		}
+	}
+	spin_unlock_irqrestore(&hna_local_hash_lock, flags);
+	return 0;
+}
+
+/* free old vis packets. Must be called with this vis_hash_lock
+ * held */
+static void purge_vis_packets(void)
+{
+	HASHIT(hashit);
+	struct vis_info *info;
+
+	while (hash_iterate(vis_hash, &hashit)) {
+		info = hashit.bucket->data;
+		if (info == my_vis_info)	/* never purge own data. */
+			continue;
+		if (time_after(jiffies,
+			       info->first_seen + VIS_TIMEOUT * HZ)) {
+			hash_remove_bucket(vis_hash, &hashit);
+			send_list_del(info);
+			kref_put(&info->refcount, free_info);
+		}
+	}
+}
+
+static void broadcast_vis_packet(struct vis_info *info, int packet_length)
+{
+	HASHIT(hashit);
+	struct orig_node *orig_node;
+	unsigned long flags;
+	struct batman_if *batman_if;
+	uint8_t dstaddr[ETH_ALEN];
+
+	spin_lock_irqsave(&orig_hash_lock, flags);
+
+	/* send to all routers in range. */
+	while (hash_iterate(orig_hash, &hashit)) {
+		orig_node = hashit.bucket->data;
+
+		/* if it's a vis server and reachable, send it. */
+		if ((!orig_node) || (!orig_node->router))
+			continue;
+		if (!(orig_node->flags & VIS_SERVER))
+			continue;
+		/* don't send it if we already received the packet from
+		 * this node. */
+		if (recv_list_is_in(&info->recv_list, orig_node->orig))
+			continue;
+
+		memcpy(info->packet.target_orig, orig_node->orig, ETH_ALEN);
+		batman_if = orig_node->router->if_incoming;
+		memcpy(dstaddr, orig_node->router->addr, ETH_ALEN);
+		spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+		send_raw_packet((unsigned char *)&info->packet,
+				packet_length, batman_if, dstaddr);
+
+		spin_lock_irqsave(&orig_hash_lock, flags);
+
+	}
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+	memcpy(info->packet.target_orig, broadcast_addr, ETH_ALEN);
+}
+
+static void unicast_vis_packet(struct vis_info *info, int packet_length)
+{
+	struct orig_node *orig_node;
+	unsigned long flags;
+	struct batman_if *batman_if;
+	uint8_t dstaddr[ETH_ALEN];
+
+	spin_lock_irqsave(&orig_hash_lock, flags);
+	orig_node = ((struct orig_node *)
+		     hash_find(orig_hash, info->packet.target_orig));
+
+	if ((!orig_node) || (!orig_node->router))
+		goto out;
+
+	/* don't lock while sending the packets ... we therefore
+	 * copy the required data before sending */
+	batman_if = orig_node->router->if_incoming;
+	memcpy(dstaddr, orig_node->router->addr, ETH_ALEN);
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+
+	send_raw_packet((unsigned char *)&info->packet,
+			packet_length, batman_if, dstaddr);
+	return;
+
+out:
+	spin_unlock_irqrestore(&orig_hash_lock, flags);
+}
+
+/* only send one vis packet. called from send_vis_packets() */
+static void send_vis_packet(struct vis_info *info)
+{
+	int packet_length;
+
+	if (info->packet.ttl < 2) {
+		printk(KERN_WARNING "batman-adv: Error - can't send vis packet: ttl exceeded\n");
+		return;
+	}
+
+	memcpy(info->packet.sender_orig, main_if_addr, ETH_ALEN);
+	info->packet.ttl--;
+
+	packet_length = sizeof(struct vis_packet) +
+		info->packet.entries * sizeof(struct vis_info_entry);
+
+	if (is_bcast(info->packet.target_orig))
+		broadcast_vis_packet(info, packet_length);
+	else
+		unicast_vis_packet(info, packet_length);
+	info->packet.ttl++; /* restore TTL */
+}
+
+/* called from timer; send (and maybe generate) vis packet. */
+static void send_vis_packets(struct work_struct *work)
+{
+	struct vis_info *info, *temp;
+	unsigned long flags;
+	/* FIXME: each batman_if will be attached to a softif */
+	struct bat_priv *bat_priv = netdev_priv(soft_device);
+
+	spin_lock_irqsave(&vis_hash_lock, flags);
+
+	purge_vis_packets();
+
+	if (generate_vis_packet(bat_priv) == 0) {
+		/* schedule if generation was successful */
+		send_list_add(my_vis_info);
+	}
+
+	list_for_each_entry_safe(info, temp, &send_list, send_list) {
+
+		kref_get(&info->refcount);
+		spin_unlock_irqrestore(&vis_hash_lock, flags);
+
+		send_vis_packet(info);
+
+		spin_lock_irqsave(&vis_hash_lock, flags);
+		send_list_del(info);
+		kref_put(&info->refcount, free_info);
+	}
+	spin_unlock_irqrestore(&vis_hash_lock, flags);
+	start_vis_timer();
+}
+static DECLARE_DELAYED_WORK(vis_timer_wq, send_vis_packets);
+
+/* init the vis server. this may only be called when if_list is already
+ * initialized (e.g. bat0 is initialized, interfaces have been added) */
+int vis_init(void)
+{
+	unsigned long flags;
+	if (vis_hash)
+		return 1;
+
+	spin_lock_irqsave(&vis_hash_lock, flags);
+
+	vis_hash = hash_new(256, vis_info_cmp, vis_info_choose);
+	if (!vis_hash) {
+		printk(KERN_ERR "batman-adv:Can't initialize vis_hash\n");
+		goto err;
+	}
+
+	my_vis_info = kmalloc(1000, GFP_ATOMIC);
+	if (!my_vis_info) {
+		printk(KERN_ERR "batman-adv:Can't initialize vis packet\n");
+		goto err;
+	}
+
+	/* prefill the vis info */
+	my_vis_info->first_seen = jiffies - msecs_to_jiffies(VIS_INTERVAL);
+	INIT_LIST_HEAD(&my_vis_info->recv_list);
+	INIT_LIST_HEAD(&my_vis_info->send_list);
+	kref_init(&my_vis_info->refcount);
+	my_vis_info->packet.version = COMPAT_VERSION;
+	my_vis_info->packet.packet_type = BAT_VIS;
+	my_vis_info->packet.ttl = TTL;
+	my_vis_info->packet.seqno = 0;
+	my_vis_info->packet.entries = 0;
+
+	INIT_LIST_HEAD(&send_list);
+
+	memcpy(my_vis_info->packet.vis_orig, main_if_addr, ETH_ALEN);
+	memcpy(my_vis_info->packet.sender_orig, main_if_addr, ETH_ALEN);
+
+	if (hash_add(vis_hash, my_vis_info) < 0) {
+		printk(KERN_ERR
+		       "batman-adv:Can't add own vis packet into hash\n");
+		/* not in hash, need to remove it manually. */
+		kref_put(&my_vis_info->refcount, free_info);
+		goto err;
+	}
+
+	spin_unlock_irqrestore(&vis_hash_lock, flags);
+	start_vis_timer();
+	return 1;
+
+err:
+	spin_unlock_irqrestore(&vis_hash_lock, flags);
+	vis_quit();
+	return 0;
+}
+
+/* Decrease the reference count on a hash item info */
+static void free_info_ref(void *data)
+{
+	struct vis_info *info = data;
+
+	send_list_del(info);
+	kref_put(&info->refcount, free_info);
+}
+
+/* shutdown vis-server */
+void vis_quit(void)
+{
+	unsigned long flags;
+	if (!vis_hash)
+		return;
+
+	cancel_delayed_work_sync(&vis_timer_wq);
+
+	spin_lock_irqsave(&vis_hash_lock, flags);
+	/* properly remove, kill timers ... */
+	hash_delete(vis_hash, free_info_ref);
+	vis_hash = NULL;
+	my_vis_info = NULL;
+	spin_unlock_irqrestore(&vis_hash_lock, flags);
+}
+
+/* schedule packets for (re)transmission */
+static void start_vis_timer(void)
+{
+	queue_delayed_work(bat_event_workqueue, &vis_timer_wq,
+			   (VIS_INTERVAL * HZ) / 1000);
+}
diff --git a/net/batman-adv/vis.h b/net/batman-adv/vis.h
new file mode 100644
index 0000000..bb13bf1
--- /dev/null
+++ b/net/batman-adv/vis.h
@@ -0,0 +1,60 @@
+/*
+ * Copyright (C) 2008-2010 B.A.T.M.A.N. contributors:
+ *
+ * Simon Wunderlich, Marek Lindner
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ *
+ */
+
+#ifndef _NET_BATMAN_ADV_VIS_H_
+#define _NET_BATMAN_ADV_VIS_H_
+
+#define VIS_TIMEOUT		200	/* timeout of vis packets in seconds */
+
+struct vis_info {
+	unsigned long       first_seen;
+	struct list_head    recv_list;
+			    /* list of server-neighbors we received a vis-packet
+			     * from.  we should not reply to them. */
+	struct list_head send_list;
+	struct kref refcount;
+	/* this packet might be part of the vis send queue. */
+	struct vis_packet packet;
+	/* vis_info may follow here*/
+} __attribute__((packed));
+
+struct vis_info_entry {
+	uint8_t  src[ETH_ALEN];
+	uint8_t  dest[ETH_ALEN];
+	uint8_t  quality;	/* quality = 0 means HNA */
+} __attribute__((packed));
+
+struct recvlist_node {
+	struct list_head list;
+	uint8_t mac[ETH_ALEN];
+};
+
+int vis_seq_print_text(struct seq_file *seq, void *offset);
+void receive_server_sync_packet(struct bat_priv *bat_priv,
+				struct vis_packet *vis_packet,
+				int vis_info_len);
+void receive_client_update_packet(struct bat_priv *bat_priv,
+				  struct vis_packet *vis_packet,
+				  int vis_info_len);
+int vis_init(void);
+void vis_quit(void);
+
+#endif /* _NET_BATMAN_ADV_VIS_H_ */
-- 
1.7.1


^ permalink raw reply related

* Reviewing batman-adv for net/
From: Sven Eckelmann @ 2010-06-26  0:14 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwlltHuzzzSOjJt

[-- Attachment #1: Type: Text/Plain, Size: 533 bytes --]

Hi,

batman-adv is a meshing protocol currently in drivers/staging/batman-adv. We 
asked GregKH recently [1] if he thinks that it is ok to send a patch to you 
for review. He acknowledged that request and now we would like to ask you to 
review it.

I will send a patch referencing this mail. I hope you find time to send us 
your opinion and ideas.

thanks,
	Sven

[1] https://lists.open-mesh.org/pipermail/b.a.t.m.a.n/2010-June/002881.html
[2] https://lists.open-mesh.org/pipermail/b.a.t.m.a.n/2010-June/002968.html

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 2.6.35] bonding: prevent netpoll over bonded interfaces
From: Jay Vosburgh @ 2010-06-25 22:52 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andy Gospodarek, netdev, amwang
In-Reply-To: <87y6e2d9rd.fsf@basil.nowhere.org>

Andi Kleen <andi@firstfloor.org> wrote:

>Andy Gospodarek <andy@greyhouse.net> writes:
>
>> Support for netpoll over bonded interfaces was added here:
>>
>> 	commit f6dc31a85cd46a959bdd987adad14c3b645e03c1
>> 	Author: WANG Cong <amwang@redhat.com>
>> 	Date:   Thu May 6 00:48:51 2010 -0700
>>
>> 	    bonding: make bonding support netpoll
>>
>> but it is bad enough that we should probably just disable netpoll over
>> bonding until some of the locking logic in the bonding driver is changed
>> or converted completely to RCU.  Simple actions like changing the active
>> slave in active-backup mode will hang the box if a high enough printk
>> debugging level is enabled.
>
>Normally you just need to prevent the printks when called from poll
>context. That's not rocket science.

	The problem with bonding is that its own printks will come back
into bonding via netpoll and deadlock in various ways.  Without re-doing
the locking, the fix would really be to make every printk in bonding
happen without holding any locks.  Converting the driver to RCU is
probably the only way to resolve this, and that's at least verging on
rocket science.

	I'm in favor of the patch; netpoll over bonding is sufficiently
unstable that I'm comfortable requiring that it be explicitly enabled by
the user.  I'd also be comfortable with a patch that removes netpoll for
bonding, but I don't mind leaving the netpoll stuff there for use by
those who want to live dangerously.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>


	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [stable] Please apply ucc_geth patches to 2.6.32-stable
From: Greg KH @ 2010-06-25 22:52 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Petri Lehtinen, Linux Netdev List, David Miller, avorontsov,
	linux-kernel, stable
In-Reply-To: <AANLkTin_EvlveCvbCZ0vH54UEP479ufQ-irFqXSOnT0R@mail.gmail.com>

On Wed, Jun 09, 2010 at 11:02:55AM +0300, Pekka Enberg wrote:
> On Wed, Jun 9, 2010 at 10:16 AM, Petri Lehtinen <petri.lehtinen@inoi.fi> wrote:
> > The following three patches fix some serious flaws in the ucc_geth
> > network driver in 2.6.32. From oldest to newest:
> >
> > 7583605b6d29f1f7f6fc505b883328089f3485ad ucc_geth: Fix empty TX queue processing
> > 08b5e1c91ce95793c59a59529a362a1bcc81faae ucc_geth: Fix netdev watchdog triggering on link changes
> > 34692421bc7d6145ef383b014860f4fde10b7505 ucc_geth: Fix full TX queue processing
> >
> > They apply cleanly on top of 2.6.32-stable master. They were also
> > originally Cc'd to be included in 2.6.32-stable.
> 
> You forgot to CC stable@kernel.org, netdev, and the original author of
> those patches.

Wierd that our script didn't pick these up, sorry about that.  Now
applied.

greg k-h

^ permalink raw reply

* Re: [PATCH 2.6.35] bonding: prevent netpoll over bonded interfaces
From: Andi Kleen @ 2010-06-25 22:31 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: netdev, amwang, fubar
In-Reply-To: <20100625195044.GQ7497@gospo.rdu.redhat.com>

Andy Gospodarek <andy@greyhouse.net> writes:

> Support for netpoll over bonded interfaces was added here:
>
> 	commit f6dc31a85cd46a959bdd987adad14c3b645e03c1
> 	Author: WANG Cong <amwang@redhat.com>
> 	Date:   Thu May 6 00:48:51 2010 -0700
>
> 	    bonding: make bonding support netpoll
>
> but it is bad enough that we should probably just disable netpoll over
> bonding until some of the locking logic in the bonding driver is changed
> or converted completely to RCU.  Simple actions like changing the active
> slave in active-backup mode will hang the box if a high enough printk
> debugging level is enabled.

Normally you just need to prevent the printks when called from poll
context. That's not rocket science.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* [PATCH net-next-2.6 2/2] qlcnic: Add support for configuring eswitch and npars
From: Anirban Chakraborty @ 2010-06-25 22:23 UTC (permalink / raw)
  To: David Miller, netdev@vger.kernel.org
  Cc: Ameen Rahman, Rajesh Borundia, Dept_NX_Linux_NIC_Driver

From: Rajesh Borundia <rajesh.borundia@qlogic.com>

Following changes have been made:
1. Obtain capabilities of a NIC partition,
2. Configure tx bandwidth of particular NIC partition.
3. Configure the eswitch for setting port mirroring, enable mac
learning, promiscuous mode.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |   77 ++++++-
 drivers/net/qlcnic/qlcnic_ctx.c  |   90 ++------
 drivers/net/qlcnic/qlcnic_main.c |  450 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 542 insertions(+), 75 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 3675678..60ea7cb 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -959,8 +959,6 @@ struct qlcnic_adapter {
        u16 switch_mode;
        u16 max_tx_ques;
        u16 max_rx_ques;
-       u16 min_tx_bw;
-       u16 max_tx_bw;
        u16 max_mtu;

        u32 fw_hal_version;
@@ -984,7 +982,7 @@ struct qlcnic_adapter {

        u64 dev_rst_time;

-       struct qlcnic_pci_info *npars;
+       struct qlcnic_npar_info *npars;
        struct qlcnic_eswitch *eswitch;
        struct qlcnic_nic_template *nic_ops;

@@ -1042,6 +1040,18 @@ struct qlcnic_pci_info {
        u8      reserved2[106];
 };

+struct qlcnic_npar_info {
+       u16     vlan_id;
+       u8      phy_port;
+       u8      type;
+       u8      active;
+       u8      enable_pm;
+       u8      dest_npar;
+       u8      host_vlan_tag;
+       u8      promisc_mode;
+       u8      discard_tagged;
+       u8      mac_learning;
+};
 struct qlcnic_eswitch {
        u8      port;
        u8      active_vports;
@@ -1057,6 +1067,63 @@ struct qlcnic_eswitch {
 #define QLCNIC_SWITCH_PORT_MIRRORING   BIT_4
 };

+
+/* Return codes for Error handling */
+#define QL_STATUS_INVALID_PARAM        -1
+
+#define MAX_BW                 10000
+#define MIN_BW                 100
+#define MAX_VLAN_ID            4095
+#define MIN_VLAN_ID            2
+#define MAX_TX_QUEUES          1
+#define MAX_RX_QUEUES          4
+#define DEFAULT_MAC_LEARN      1
+
+#define IS_VALID_VLAN(vlan)    (vlan >= MIN_VLAN_ID && vlan <= MAX_VLAN_ID)
+#define IS_VALID_BW(bw)                (bw >= MIN_BW && bw <= MAX_BW \
+                                                       && (bw % 100) == 0)
+#define IS_VALID_TX_QUEUES(que)        (que > 0 && que <= MAX_TX_QUEUES)
+#define IS_VALID_RX_QUEUES(que)        (que > 0 && que <= MAX_RX_QUEUES)
+#define IS_VALID_MODE(mode)    (mode == 0 || mode == 1)
+
+struct qlcnic_pci_func_cfg {
+       u16     func_type;
+       u16     min_bw;
+       u16     max_bw;
+       u16     port_num;
+       u8      pci_func;
+       u8      func_state;
+       u8      def_mac_addr[6];
+};
+
+struct qlcnic_npar_func_cfg {
+       u32     fw_capab;
+       u16     port_num;
+       u16     min_bw;
+       u16     max_bw;
+       u16     max_tx_queues;
+       u16     max_rx_queues;
+       u8      pci_func;
+       u8      op_mode;
+};
+
+struct qlcnic_pm_func_cfg {
+       u8      pci_func;
+       u8      action;
+       u8      dest_npar;
+       u8      reserved[5];
+};
+
+struct qlcnic_esw_func_cfg {
+       u16     vlan_id;
+       u8      pci_func;
+       u8      host_vlan_tag;
+       u8      promisc_mode;
+       u8      discard_tagged;
+       u8      mac_learning;
+       u8      reserved;
+};
+
 int qlcnic_fw_cmd_query_phy(struct qlcnic_adapter *adapter, u32 reg, u32 *val);
 int qlcnic_fw_cmd_set_phy(struct qlcnic_adapter *adapter, u32 reg, u32 val);

@@ -1169,9 +1236,9 @@ void qlcnic_process_rcv_ring_diag(struct qlcnic_host_sds_ring *sds_ring);
 /* Management functions */
 int qlcnic_set_mac_address(struct qlcnic_adapter *, u8*);
 int qlcnic_get_mac_address(struct qlcnic_adapter *, u8*);
-int qlcnic_get_nic_info(struct qlcnic_adapter *, u8);
+int qlcnic_get_nic_info(struct qlcnic_adapter *, struct qlcnic_info *, u8);
 int qlcnic_set_nic_info(struct qlcnic_adapter *, struct qlcnic_info *);
-int qlcnic_get_pci_info(struct qlcnic_adapter *);
+int qlcnic_get_pci_info(struct qlcnic_adapter *, struct qlcnic_pci_info*);
 int qlcnic_reset_partition(struct qlcnic_adapter *, u8);

 /*  eSwitch management functions */
diff --git a/drivers/net/qlcnic/qlcnic_ctx.c b/drivers/net/qlcnic/qlcnic_ctx.c
index 7969a7a..cdd44b4 100644
--- a/drivers/net/qlcnic/qlcnic_ctx.c
+++ b/drivers/net/qlcnic/qlcnic_ctx.c
@@ -611,7 +611,8 @@ int qlcnic_get_mac_address(struct qlcnic_adapter *adapter, u8 *mac)
 }

 /* Get info of a NIC partition */
-int qlcnic_get_nic_info(struct qlcnic_adapter *adapter, u8 func_id)
+int qlcnic_get_nic_info(struct qlcnic_adapter *adapter,
+                               struct qlcnic_info *npar_info, u8 func_id)
 {
        int     err;
        dma_addr_t nic_dma_t;
@@ -635,29 +636,23 @@ int qlcnic_get_nic_info(struct qlcnic_adapter *adapter, u8 func_id)
                        QLCNIC_CDRP_CMD_GET_NIC_INFO);

        if (err == QLCNIC_RCODE_SUCCESS) {
-               adapter->physical_port = le16_to_cpu(nic_info->phys_port);
-               adapter->switch_mode = le16_to_cpu(nic_info->switch_mode);
-               adapter->max_tx_ques = le16_to_cpu(nic_info->max_tx_ques);
-               adapter->max_rx_ques = le16_to_cpu(nic_info->max_rx_ques);
-               adapter->min_tx_bw = le16_to_cpu(nic_info->min_tx_bw);
-               adapter->max_tx_bw = le16_to_cpu(nic_info->max_tx_bw);
-               adapter->max_mtu = le16_to_cpu(nic_info->max_mtu);
-               adapter->capabilities = le32_to_cpu(nic_info->capabilities);
-               adapter->max_mac_filters = nic_info->max_mac_filters;
-
-               if (adapter->capabilities & BIT_6)
-                       adapter->flags |= QLCNIC_ESWITCH_ENABLED;
-               else
-                       adapter->flags &= ~QLCNIC_ESWITCH_ENABLED;
+               npar_info->phys_port = le16_to_cpu(nic_info->phys_port);
+               npar_info->switch_mode = le16_to_cpu(nic_info->switch_mode);
+               npar_info->max_tx_ques = le16_to_cpu(nic_info->max_tx_ques);
+               npar_info->max_rx_ques = le16_to_cpu(nic_info->max_rx_ques);
+               npar_info->min_tx_bw = le16_to_cpu(nic_info->min_tx_bw);
+               npar_info->max_tx_bw = le16_to_cpu(nic_info->max_tx_bw);
+               npar_info->capabilities = le32_to_cpu(nic_info->capabilities);
+               npar_info->max_mtu = le16_to_cpu(nic_info->max_mtu);

                dev_info(&adapter->pdev->dev,
                        "phy port: %d switch_mode: %d,\n"
                        "\tmax_tx_q: %d max_rx_q: %d min_tx_bw: 0x%x,\n"
                        "\tmax_tx_bw: 0x%x max_mtu:0x%x, capabilities: 0x%x\n",
-                       adapter->physical_port, adapter->switch_mode,
-                       adapter->max_tx_ques, adapter->max_rx_ques,
-                       adapter->min_tx_bw, adapter->max_tx_bw,
-                       adapter->max_mtu, adapter->capabilities);
+                       npar_info->phys_port, npar_info->switch_mode,
+                       npar_info->max_tx_ques, npar_info->max_rx_ques,
+                       npar_info->min_tx_bw, npar_info->max_tx_bw,
+                       npar_info->max_mtu, npar_info->capabilities);
        } else {
                dev_err(&adapter->pdev->dev,
                        "Failed to get nic info%d\n", err);
@@ -672,7 +667,6 @@ int qlcnic_get_nic_info(struct qlcnic_adapter *adapter, u8 func_id)
 int qlcnic_set_nic_info(struct qlcnic_adapter *adapter, struct qlcnic_info *nic)
 {
        int err = -EIO;
-       u32 func_state;
        dma_addr_t nic_dma_t;
        void *nic_info_addr;
        struct qlcnic_info *nic_info;
@@ -681,17 +675,6 @@ int qlcnic_set_nic_info(struct qlcnic_adapter *adapter, struct qlcnic_info *nic)
        if (adapter->op_mode != QLCNIC_MGMT_FUNC)
                return err;

-       if (qlcnic_api_lock(adapter))
-               return err;
-
-       func_state = QLCRD32(adapter, QLCNIC_CRB_DEV_REF_COUNT);
-       if (QLC_DEV_CHECK_ACTIVE(func_state, nic->pci_func)) {
-               qlcnic_api_unlock(adapter);
-               return err;
-       }
-
-       qlcnic_api_unlock(adapter);
-
        nic_info_addr = pci_alloc_consistent(adapter->pdev, nic_size,
                        &nic_dma_t);
        if (!nic_info_addr)
@@ -716,7 +699,7 @@ int qlcnic_set_nic_info(struct qlcnic_adapter *adapter, struct qlcnic_info *nic)
                        adapter->fw_hal_version,
                        MSD(nic_dma_t),
                        LSD(nic_dma_t),
-                       nic_size,
+                       ((nic->pci_func << 16) | nic_size),
                        QLCNIC_CDRP_CMD_SET_NIC_INFO);

        if (err != QLCNIC_RCODE_SUCCESS) {
@@ -730,7 +713,8 @@ int qlcnic_set_nic_info(struct qlcnic_adapter *adapter, struct qlcnic_info *nic)
 }

 /* Get PCI Info of a partition */
-int qlcnic_get_pci_info(struct qlcnic_adapter *adapter)
+int qlcnic_get_pci_info(struct qlcnic_adapter *adapter,
+                               struct qlcnic_pci_info *pci_info)
 {
        int err = 0, i;
        dma_addr_t pci_info_dma_t;
@@ -745,21 +729,6 @@ int qlcnic_get_pci_info(struct qlcnic_adapter *adapter)
                return -ENOMEM;
        memset(pci_info_addr, 0, pci_size);

-       if (!adapter->npars)
-               adapter->npars = kzalloc(pci_size, GFP_KERNEL);
-       if (!adapter->npars) {
-               err = -ENOMEM;
-               goto err_npar;
-       }
-
-       if (!adapter->eswitch)
-               adapter->eswitch = kzalloc(sizeof(struct qlcnic_eswitch) *
-                               QLCNIC_NIU_MAX_XG_PORTS, GFP_KERNEL);
-       if (!adapter->eswitch) {
-               err = -ENOMEM;
-               goto err_eswitch;
-       }
-
        npar = (struct qlcnic_pci_info *) pci_info_addr;
        err = qlcnic_issue_cmd(adapter,
                        adapter->ahw.pci_func,
@@ -770,31 +739,24 @@ int qlcnic_get_pci_info(struct qlcnic_adapter *adapter)
                        QLCNIC_CDRP_CMD_GET_PCI_INFO);

        if (err == QLCNIC_RCODE_SUCCESS) {
-               for (i = 0; i < QLCNIC_MAX_PCI_FUNC; i++, npar++) {
-                       adapter->npars[i].id = le32_to_cpu(npar->id);
-                       adapter->npars[i].active = le32_to_cpu(npar->active);
-                       adapter->npars[i].type = le32_to_cpu(npar->type);
-                       adapter->npars[i].default_port =
+               for (i = 0; i < QLCNIC_MAX_PCI_FUNC; i++, npar++, pci_info++) {
+                       pci_info->id = le32_to_cpu(npar->id);
+                       pci_info->active = le32_to_cpu(npar->active);
+                       pci_info->type = le32_to_cpu(npar->type);
+                       pci_info->default_port =
                                le32_to_cpu(npar->default_port);
-                       adapter->npars[i].tx_min_bw =
+                       pci_info->tx_min_bw =
                                le32_to_cpu(npar->tx_min_bw);
-                       adapter->npars[i].tx_max_bw =
+                       pci_info->tx_max_bw =
                                le32_to_cpu(npar->tx_max_bw);
-                       memcpy(adapter->npars[i].mac, npar->mac, ETH_ALEN);
+                       memcpy(pci_info->mac, npar->mac, ETH_ALEN);
                }
        } else {
                dev_err(&adapter->pdev->dev,
                        "Failed to get PCI Info%d\n", err);
-               kfree(adapter->npars);
                err = -EIO;
        }
-       goto err_npar;
-
-err_eswitch:
-       kfree(adapter->npars);
-       adapter->npars = NULL;

-err_npar:
        pci_free_consistent(adapter->pdev, pci_size, pci_info_addr,
                pci_info_dma_t);
        return err;
@@ -1012,9 +974,7 @@ int qlcnic_config_switch_port(struct qlcnic_adapter *adapter, u8 id,
        if (err != QLCNIC_RCODE_SUCCESS) {
                dev_err(&adapter->pdev->dev,
                        "Failed to configure eswitch port%d\n", eswitch->port);
-               eswitch->flags |= QLCNIC_SWITCH_ENABLE;
        } else {
-               eswitch->flags &= ~QLCNIC_SWITCH_ENABLE;
                dev_info(&adapter->pdev->dev,
                        "Configured eSwitch for port %d\n", eswitch->port);
        }
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 981aa91..18e2b2e 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -476,6 +476,53 @@ qlcnic_cleanup_pci_map(struct qlcnic_adapter *adapter)
 }

 static int
+qlcnic_init_pci_info(struct qlcnic_adapter *adapter)
+{
+       struct qlcnic_pci_info pci_info[QLCNIC_MAX_PCI_FUNC];
+       int i, ret = 0, err;
+       u8 pfn;
+
+       if (!adapter->npars)
+               adapter->npars = kzalloc(sizeof(struct qlcnic_npar_info) *
+                               QLCNIC_MAX_PCI_FUNC, GFP_KERNEL);
+       if (!adapter->npars)
+               return -ENOMEM;
+
+       if (!adapter->eswitch)
+               adapter->eswitch = kzalloc(sizeof(struct qlcnic_eswitch) *
+                               QLCNIC_NIU_MAX_XG_PORTS, GFP_KERNEL);
+       if (!adapter->eswitch) {
+               err = -ENOMEM;
+               goto err_eswitch;
+       }
+
+       ret = qlcnic_get_pci_info(adapter, pci_info);
+       if (!ret) {
+               for (i = 0; i < QLCNIC_MAX_PCI_FUNC; i++) {
+                       pfn = pci_info[i].id;
+                       if (pfn > QLCNIC_MAX_PCI_FUNC)
+                               return QL_STATUS_INVALID_PARAM;
+                       adapter->npars[pfn].active = pci_info[i].active;
+                       adapter->npars[pfn].type = pci_info[i].type;
+                       adapter->npars[pfn].phy_port = pci_info[i].default_port;
+                       adapter->npars[pfn].mac_learning = DEFAULT_MAC_LEARN;
+               }
+
+               for (i = 0; i < QLCNIC_NIU_MAX_XG_PORTS; i++)
+                       adapter->eswitch[i].flags |= QLCNIC_SWITCH_ENABLE;
+
+               return ret;
+       }
+
+       kfree(adapter->eswitch);
+       adapter->eswitch = NULL;
+err_eswitch:
+       kfree(adapter->npars);
+
+       return ret;
+}
+
+static int
 qlcnic_set_function_modes(struct qlcnic_adapter *adapter)
 {
        u8 id;
@@ -494,7 +541,7 @@ qlcnic_set_function_modes(struct qlcnic_adapter *adapter)

        if (qlcnic_config_npars) {
                for (i = 0; i < QLCNIC_MAX_PCI_FUNC; i++) {
-                       id = adapter->npars[i].id;
+                       id = i;
                        if (adapter->npars[i].type != QLCNIC_TYPE_NIC ||
                                id == adapter->ahw.pci_func)
                                continue;
@@ -519,6 +566,7 @@ qlcnic_get_driver_mode(struct qlcnic_adapter *adapter)
 {
        void __iomem *msix_base_addr;
        void __iomem *priv_op;
+       struct qlcnic_info nic_info;
        u32 func;
        u32 msix_base;
        u32 op_mode, priv_level;
@@ -533,7 +581,14 @@ qlcnic_get_driver_mode(struct qlcnic_adapter *adapter)
        func = (func - msix_base)/QLCNIC_MSIX_TBL_PGSIZE;
        adapter->ahw.pci_func = func;

-       qlcnic_get_nic_info(adapter, adapter->ahw.pci_func);
+       if (!qlcnic_get_nic_info(adapter, &nic_info, adapter->ahw.pci_func)) {
+               adapter->capabilities = nic_info.capabilities;
+
+               if (adapter->capabilities & BIT_6)
+                       adapter->flags |= QLCNIC_ESWITCH_ENABLED;
+               else
+                       adapter->flags &= ~QLCNIC_ESWITCH_ENABLED;
+       }

        if (!(adapter->flags & QLCNIC_ESWITCH_ENABLED)) {
                adapter->nic_ops = &qlcnic_ops;
@@ -552,7 +607,7 @@ qlcnic_get_driver_mode(struct qlcnic_adapter *adapter)
        case QLCNIC_MGMT_FUNC:
                adapter->op_mode = QLCNIC_MGMT_FUNC;
                adapter->nic_ops = &qlcnic_ops;
-               qlcnic_get_pci_info(adapter);
+               qlcnic_init_pci_info(adapter);
                /* Set privilege level for other functions */
                qlcnic_set_function_modes(adapter);
                dev_info(&adapter->pdev->dev,
@@ -654,7 +709,7 @@ qlcnic_check_options(struct qlcnic_adapter *adapter)
        int i, offset, val;
        int *ptr32;
        struct pci_dev *pdev = adapter->pdev;
-
+       struct qlcnic_info nic_info;
        adapter->driver_mismatch = 0;

        ptr32 = (int *)&serial_num;
@@ -696,7 +751,15 @@ qlcnic_check_options(struct qlcnic_adapter *adapter)
                adapter->num_jumbo_rxd = MAX_JUMBO_RCV_DESCRIPTORS_1G;
        }

-       qlcnic_get_nic_info(adapter, adapter->ahw.pci_func);
+       if (!qlcnic_get_nic_info(adapter, &nic_info, adapter->ahw.pci_func)) {
+               adapter->physical_port = nic_info.phys_port;
+               adapter->switch_mode = nic_info.switch_mode;
+               adapter->max_tx_ques = nic_info.max_tx_ques;
+               adapter->max_rx_ques = nic_info.max_rx_ques;
+               adapter->capabilities = nic_info.capabilities;
+               adapter->max_mac_filters = nic_info.max_mac_filters;
+               adapter->max_mtu = nic_info.max_mtu;
+       }

        adapter->msix_supported = !!use_msi_x;
        adapter->rss_supported = !!use_msi_x;
@@ -2822,6 +2885,364 @@ static struct bin_attribute bin_attr_mem = {
        .write = qlcnic_sysfs_write_mem,
 };

+int
+validate_pm_config(struct qlcnic_adapter *adapter,
+                       struct qlcnic_pm_func_cfg *pm_cfg, int count)
+{
+
+       u8 src_pci_func, s_esw_id, d_esw_id;
+       u8 dest_pci_func;
+       int i;
+
+       for (i = 0; i < count; i++) {
+               src_pci_func = pm_cfg[i].pci_func;
+               dest_pci_func = pm_cfg[i].dest_npar;
+               if (src_pci_func >= QLCNIC_MAX_PCI_FUNC
+                               || dest_pci_func >= QLCNIC_MAX_PCI_FUNC)
+                       return QL_STATUS_INVALID_PARAM;
+
+               if (adapter->npars[src_pci_func].type != QLCNIC_TYPE_NIC)
+                       return QL_STATUS_INVALID_PARAM;
+
+               if (adapter->npars[dest_pci_func].type != QLCNIC_TYPE_NIC)
+                       return QL_STATUS_INVALID_PARAM;
+
+               if (!IS_VALID_MODE(pm_cfg[i].action))
+                       return QL_STATUS_INVALID_PARAM;
+
+               s_esw_id = adapter->npars[src_pci_func].phy_port;
+               d_esw_id = adapter->npars[dest_pci_func].phy_port;
+
+               if (s_esw_id != d_esw_id)
+                       return QL_STATUS_INVALID_PARAM;
+
+       }
+       return 0;
+
+}
+
+static ssize_t
+qlcnic_sysfs_write_pm_config(struct file *filp, struct kobject *kobj,
+       struct bin_attribute *attr, char *buf, loff_t offset, size_t size)
+{
+       struct device *dev = container_of(kobj, struct device, kobj);
+       struct qlcnic_adapter *adapter = dev_get_drvdata(dev);
+       struct qlcnic_pm_func_cfg *pm_cfg;
+       u32 id, action, pci_func;
+       int count, rem, i, ret;
+
+       count   = size / sizeof(struct qlcnic_pm_func_cfg);
+       rem     = size % sizeof(struct qlcnic_pm_func_cfg);
+       if (rem)
+               return QL_STATUS_INVALID_PARAM;
+
+       pm_cfg = (struct qlcnic_pm_func_cfg *) buf;
+
+       ret = validate_pm_config(adapter, pm_cfg, count);
+       if (ret)
+               return ret;
+       for (i = 0; i < count; i++) {
+               pci_func = pm_cfg[i].pci_func;
+               action = pm_cfg[i].action;
+               id = adapter->npars[pci_func].phy_port;
+               ret = qlcnic_config_port_mirroring(adapter, id,
+                                               action, pci_func);
+               if (ret)
+                       return ret;
+       }
+
+       for (i = 0; i < count; i++) {
+               pci_func = pm_cfg[i].pci_func;
+               id = adapter->npars[pci_func].phy_port;
+               adapter->npars[pci_func].enable_pm = pm_cfg[i].action;
+               adapter->npars[pci_func].dest_npar = id;
+       }
+       return size;
+}
+
+static ssize_t
+qlcnic_sysfs_read_pm_config(struct file *filp, struct kobject *kobj,
+       struct bin_attribute *attr, char *buf, loff_t offset, size_t size)
+{
+       struct device *dev = container_of(kobj, struct device, kobj);
+       struct qlcnic_adapter *adapter = dev_get_drvdata(dev);
+       struct qlcnic_pm_func_cfg pm_cfg[QLCNIC_MAX_PCI_FUNC];
+       int i;
+
+       if (size != sizeof(pm_cfg))
+               return QL_STATUS_INVALID_PARAM;
+
+       for (i = 0; i < QLCNIC_MAX_PCI_FUNC; i++) {
+               if (adapter->npars[i].type != QLCNIC_TYPE_NIC)
+                       continue;
+               pm_cfg[i].action = adapter->npars[i].enable_pm;
+               pm_cfg[i].dest_npar = 0;
+               pm_cfg[i].pci_func = i;
+       }
+       memcpy(buf, &pm_cfg, size);
+
+       return size;
+}
+
+int
+validate_esw_config(struct qlcnic_adapter *adapter,
+                       struct qlcnic_esw_func_cfg *esw_cfg, int count)
+{
+       u8 pci_func;
+       int i;
+
+       for (i = 0; i < count; i++) {
+               pci_func = esw_cfg[i].pci_func;
+               if (pci_func >= QLCNIC_MAX_PCI_FUNC)
+                       return QL_STATUS_INVALID_PARAM;
+
+               if (adapter->npars[i].type != QLCNIC_TYPE_NIC)
+                       return QL_STATUS_INVALID_PARAM;
+
+               if (esw_cfg->host_vlan_tag == 1)
+                       if (!IS_VALID_VLAN(esw_cfg[i].vlan_id))
+                               return QL_STATUS_INVALID_PARAM;
+
+               if (!IS_VALID_MODE(esw_cfg[i].promisc_mode)
+                               || !IS_VALID_MODE(esw_cfg[i].host_vlan_tag)
+                               || !IS_VALID_MODE(esw_cfg[i].mac_learning)
+                               || !IS_VALID_MODE(esw_cfg[i].discard_tagged))
+                       return QL_STATUS_INVALID_PARAM;
+       }
+
+       return 0;
+}
+
+static ssize_t
+qlcnic_sysfs_write_esw_config(struct file *file, struct kobject *kobj,
+       struct bin_attribute *attr, char *buf, loff_t offset, size_t size)
+{
+       struct device *dev = container_of(kobj, struct device, kobj);
+       struct qlcnic_adapter *adapter = dev_get_drvdata(dev);
+       struct qlcnic_esw_func_cfg *esw_cfg;
+       u8 id, discard_tagged, promsc_mode, mac_learn;
+       u8 vlan_tagging, pci_func, vlan_id;
+       int count, rem, i, ret;
+
+       count   = size / sizeof(struct qlcnic_esw_func_cfg);
+       rem     = size % sizeof(struct qlcnic_esw_func_cfg);
+       if (rem)
+               return QL_STATUS_INVALID_PARAM;
+
+       esw_cfg = (struct qlcnic_esw_func_cfg *) buf;
+       ret = validate_esw_config(adapter, esw_cfg, count);
+       if (ret)
+               return ret;
+
+       for (i = 0; i < count; i++) {
+               pci_func = esw_cfg[i].pci_func;
+               id = adapter->npars[pci_func].phy_port;
+               vlan_tagging = esw_cfg[i].host_vlan_tag;
+               promsc_mode = esw_cfg[i].promisc_mode;
+               mac_learn = esw_cfg[i].mac_learning;
+               vlan_id = esw_cfg[i].vlan_id;
+               discard_tagged = esw_cfg[i].discard_tagged;
+               ret = qlcnic_config_switch_port(adapter, id, vlan_tagging,
+                                               discard_tagged,
+                                               promsc_mode,
+                                               mac_learn,
+                                               pci_func,
+                                               vlan_id);
+               if (ret)
+                       return ret;
+       }
+
+       for (i = 0; i < count; i++) {
+               pci_func = esw_cfg[i].pci_func;
+               adapter->npars[pci_func].promisc_mode = esw_cfg[i].promisc_mode;
+               adapter->npars[pci_func].mac_learning = esw_cfg[i].mac_learning;
+               adapter->npars[pci_func].vlan_id = esw_cfg[i].vlan_id;
+               adapter->npars[pci_func].discard_tagged =
+                                               esw_cfg[i].discard_tagged;
+               adapter->npars[pci_func].host_vlan_tag =
+                                               esw_cfg[i].host_vlan_tag;
+       }
+
+       return size;
+}
+
+static ssize_t
+qlcnic_sysfs_read_esw_config(struct file *file, struct kobject *kobj,
+       struct bin_attribute *attr, char *buf, loff_t offset, size_t size)
+{
+       struct device *dev = container_of(kobj, struct device, kobj);
+       struct qlcnic_adapter *adapter = dev_get_drvdata(dev);
+       struct qlcnic_esw_func_cfg esw_cfg[QLCNIC_MAX_PCI_FUNC];
+       int i;
+
+       if (size != sizeof(esw_cfg))
+               return QL_STATUS_INVALID_PARAM;
+
+       for (i = 0; i < QLCNIC_MAX_PCI_FUNC; i++) {
+               if (adapter->npars[i].type != QLCNIC_TYPE_NIC)
+                       continue;
+
+               esw_cfg[i].host_vlan_tag = adapter->npars[i].host_vlan_tag;
+               esw_cfg[i].promisc_mode = adapter->npars[i].promisc_mode;
+               esw_cfg[i].discard_tagged = adapter->npars[i].discard_tagged;
+               esw_cfg[i].vlan_id = adapter->npars[i].vlan_id;
+               esw_cfg[i].mac_learning = adapter->npars[i].mac_learning;
+       }
+       memcpy(buf, &esw_cfg, size);
+
+       return size;
+}
+
+int
+validate_npar_config(struct qlcnic_adapter *adapter,
+                               struct qlcnic_npar_func_cfg *np_cfg, int count)
+{
+       u8 pci_func, i;
+
+       for (i = 0; i < count; i++) {
+               pci_func = np_cfg[i].pci_func;
+               if (pci_func >= QLCNIC_MAX_PCI_FUNC)
+                       return QL_STATUS_INVALID_PARAM;
+
+               if (adapter->npars[pci_func].type != QLCNIC_TYPE_NIC)
+                       return QL_STATUS_INVALID_PARAM;
+
+               if (!IS_VALID_BW(np_cfg[i].min_bw)
+                               || !IS_VALID_BW(np_cfg[i].max_bw)
+                               || !IS_VALID_RX_QUEUES(np_cfg[i].max_rx_queues)
+                               || !IS_VALID_TX_QUEUES(np_cfg[i].max_tx_queues))
+                       return QL_STATUS_INVALID_PARAM;
+       }
+       return 0;
+}
+
+static ssize_t
+qlcnic_sysfs_write_npar_config(struct file *file, struct kobject *kobj,
+       struct bin_attribute *attr, char *buf, loff_t offset, size_t size)
+{
+       struct device *dev = container_of(kobj, struct device, kobj);
+       struct qlcnic_adapter *adapter = dev_get_drvdata(dev);
+       struct qlcnic_info nic_info;
+       struct qlcnic_npar_func_cfg *np_cfg;
+       int i, count, rem, ret;
+       u8 pci_func;
+
+       count   = size / sizeof(struct qlcnic_npar_func_cfg);
+       rem     = size % sizeof(struct qlcnic_npar_func_cfg);
+       if (rem)
+               return QL_STATUS_INVALID_PARAM;
+
+       np_cfg = (struct qlcnic_npar_func_cfg *) buf;
+       ret = validate_npar_config(adapter, np_cfg, count);
+       if (ret)
+               return ret;
+
+       for (i = 0; i < count ; i++) {
+               pci_func = np_cfg[i].pci_func;
+               ret = qlcnic_get_nic_info(adapter, &nic_info, pci_func);
+               if (ret)
+                       return ret;
+               nic_info.pci_func = pci_func;
+               nic_info.min_tx_bw = np_cfg[i].min_bw;
+               nic_info.max_tx_bw = np_cfg[i].max_bw;
+               ret = qlcnic_set_nic_info(adapter, &nic_info);
+               if (ret)
+                       return ret;
+       }
+
+       return size;
+
+}
+static ssize_t
+qlcnic_sysfs_read_npar_config(struct file *file, struct kobject *kobj,
+       struct bin_attribute *attr, char *buf, loff_t offset, size_t size)
+{
+       struct device *dev = container_of(kobj, struct device, kobj);
+       struct qlcnic_adapter *adapter = dev_get_drvdata(dev);
+       struct qlcnic_info nic_info;
+       struct qlcnic_npar_func_cfg np_cfg[QLCNIC_MAX_PCI_FUNC];
+       int i, ret;
+
+       if (size != sizeof(np_cfg))
+               return QL_STATUS_INVALID_PARAM;
+
+       for (i = 0; i < QLCNIC_MAX_PCI_FUNC ; i++) {
+               if (adapter->npars[i].type != QLCNIC_TYPE_NIC)
+                       continue;
+               ret = qlcnic_get_nic_info(adapter, &nic_info, i);
+               if (ret)
+                       return ret;
+
+               np_cfg[i].pci_func = i;
+               np_cfg[i].op_mode = nic_info.op_mode;
+               np_cfg[i].port_num = nic_info.phys_port;
+               np_cfg[i].fw_capab = nic_info.capabilities;
+               np_cfg[i].min_bw = nic_info.min_tx_bw ;
+               np_cfg[i].max_bw = nic_info.max_tx_bw;
+               np_cfg[i].max_tx_queues = nic_info.max_tx_ques;
+               np_cfg[i].max_rx_queues = nic_info.max_rx_ques;
+       }
+       memcpy(buf, &np_cfg, size);
+       return size;
+}
+
+static ssize_t
+qlcnic_sysfs_read_pci_config(struct file *file, struct kobject *kobj,
+       struct bin_attribute *attr, char *buf, loff_t offset, size_t size)
+{
+       struct device *dev = container_of(kobj, struct device, kobj);
+       struct qlcnic_adapter *adapter = dev_get_drvdata(dev);
+       struct qlcnic_pci_func_cfg pci_cfg[QLCNIC_MAX_PCI_FUNC];
+       struct qlcnic_pci_info  pci_info[QLCNIC_MAX_PCI_FUNC];
+       int i, ret;
+
+       if (size != sizeof(pci_cfg))
+               return QL_STATUS_INVALID_PARAM;
+
+       ret = qlcnic_get_pci_info(adapter, pci_info);
+       if (ret)
+               return ret;
+
+       for (i = 0; i < QLCNIC_MAX_PCI_FUNC ; i++) {
+               pci_cfg[i].pci_func = pci_info[i].id;
+               pci_cfg[i].func_type = pci_info[i].type;
+               pci_cfg[i].port_num = pci_info[i].default_port;
+               pci_cfg[i].min_bw = pci_info[i].tx_min_bw;
+               pci_cfg[i].max_bw = pci_info[i].tx_max_bw;
+               memcpy(&pci_cfg[i].def_mac_addr, &pci_info[i].mac, ETH_ALEN);
+       }
+       memcpy(buf, &pci_cfg, size);
+       return size;
+
+}
+static struct bin_attribute bin_attr_npar_config = {
+       .attr = {.name = "npar_config", .mode = (S_IRUGO | S_IWUSR)},
+       .size = 0,
+       .read = qlcnic_sysfs_read_npar_config,
+       .write = qlcnic_sysfs_write_npar_config,
+};
+
+static struct bin_attribute bin_attr_pci_config = {
+       .attr = {.name = "pci_config", .mode = (S_IRUGO | S_IWUSR)},
+       .size = 0,
+       .read = qlcnic_sysfs_read_pci_config,
+       .write = NULL,
+};
+
+static struct bin_attribute bin_attr_esw_config = {
+       .attr = {.name = "esw_config", .mode = (S_IRUGO | S_IWUSR)},
+       .size = 0,
+       .read = qlcnic_sysfs_read_esw_config,
+       .write = qlcnic_sysfs_write_esw_config,
+};
+
+static struct bin_attribute bin_attr_pm_config = {
+       .attr = {.name = "pm_config", .mode = (S_IRUGO | S_IWUSR)},
+       .size = 0,
+       .read = qlcnic_sysfs_read_pm_config,
+       .write = qlcnic_sysfs_write_pm_config,
+};
+
 static void
 qlcnic_create_sysfs_entries(struct qlcnic_adapter *adapter)
 {
@@ -2853,6 +3274,18 @@ qlcnic_create_diag_entries(struct qlcnic_adapter *adapter)
                dev_info(dev, "failed to create crb sysfs entry\n");
        if (device_create_bin_file(dev, &bin_attr_mem))
                dev_info(dev, "failed to create mem sysfs entry\n");
+       if (!(adapter->flags & QLCNIC_ESWITCH_ENABLED) ||
+                       adapter->op_mode != QLCNIC_MGMT_FUNC)
+               return;
+       if (device_create_bin_file(dev, &bin_attr_pci_config))
+               dev_info(dev, "failed to create pci config sysfs entry");
+       if (device_create_bin_file(dev, &bin_attr_npar_config))
+               dev_info(dev, "failed to create npar config sysfs entry");
+       if (device_create_bin_file(dev, &bin_attr_esw_config))
+               dev_info(dev, "failed to create esw config sysfs entry");
+       if (device_create_bin_file(dev, &bin_attr_pm_config))
+               dev_info(dev, "failed to create pm config sysfs entry");
+
 }


@@ -2864,6 +3297,13 @@ qlcnic_remove_diag_entries(struct qlcnic_adapter *adapter)
        device_remove_file(dev, &dev_attr_diag_mode);
        device_remove_bin_file(dev, &bin_attr_crb);
        device_remove_bin_file(dev, &bin_attr_mem);
+       if (!(adapter->flags & QLCNIC_ESWITCH_ENABLED) ||
+                       adapter->op_mode != QLCNIC_MGMT_FUNC)
+               return;
+       device_remove_bin_file(dev, &bin_attr_pci_config);
+       device_remove_bin_file(dev, &bin_attr_npar_config);
+       device_remove_bin_file(dev, &bin_attr_esw_config);
+       device_remove_bin_file(dev, &bin_attr_pm_config);
 }

 #ifdef CONFIG_INET
--
1.6.0.2


^ permalink raw reply related

* [PATCH net-next-2.6 1/2] qlcnic: Remove stale code for old FW
From: Anirban Chakraborty @ 2010-06-25 22:23 UTC (permalink / raw)
  To: David Miller, netdev@vger.kernel.org
  Cc: Dept_NX_Linux_NIC_Driver, Ameen Rahman

Current driver uses FW API version 2 and thus the code that used to work with FW 
API version 1 is no longer required.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
---
 drivers/net/qlcnic/qlcnic_ctx.c  |   26 ++++----------------------
 drivers/net/qlcnic/qlcnic_hdr.h  |    6 ------
 drivers/net/qlcnic/qlcnic_init.c |   16 +++++++---------
 drivers/net/qlcnic/qlcnic_main.c |   22 ++--------------------
 4 files changed, 13 insertions(+), 57 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic_ctx.c b/drivers/net/qlcnic/qlcnic_ctx.c
index 941cd08..7969a7a 100644
--- a/drivers/net/qlcnic/qlcnic_ctx.c
+++ b/drivers/net/qlcnic/qlcnic_ctx.c
@@ -224,12 +224,7 @@ qlcnic_fw_cmd_create_rx_ctx(struct qlcnic_adapter *adapter)
 		rds_ring = &recv_ctx->rds_rings[i];
 
 		reg = le32_to_cpu(prsp_rds[i].host_producer_crb);
-		if (adapter->fw_hal_version == QLCNIC_FW_BASE)
-			rds_ring->crb_rcv_producer = qlcnic_get_ioaddr(adapter,
-				QLCNIC_REG(reg - 0x200));
-		else
-			rds_ring->crb_rcv_producer = adapter->ahw.pci_base0 +
-				reg;
+		rds_ring->crb_rcv_producer = adapter->ahw.pci_base0 + reg;
 	}
 
 	prsp_sds = ((struct qlcnic_cardrsp_sds_ring *)
@@ -241,16 +236,8 @@ qlcnic_fw_cmd_create_rx_ctx(struct qlcnic_adapter *adapter)
 		reg = le32_to_cpu(prsp_sds[i].host_consumer_crb);
 		reg2 = le32_to_cpu(prsp_sds[i].interrupt_crb);
 
-		if (adapter->fw_hal_version == QLCNIC_FW_BASE) {
-			sds_ring->crb_sts_consumer = qlcnic_get_ioaddr(adapter,
-				QLCNIC_REG(reg - 0x200));
-			sds_ring->crb_intr_mask = qlcnic_get_ioaddr(adapter,
-				QLCNIC_REG(reg2 - 0x200));
-		} else {
-			sds_ring->crb_sts_consumer = adapter->ahw.pci_base0 +
-				reg;
-			sds_ring->crb_intr_mask = adapter->ahw.pci_base0 + reg2;
-		}
+		sds_ring->crb_sts_consumer = adapter->ahw.pci_base0 + reg;
+		sds_ring->crb_intr_mask = adapter->ahw.pci_base0 + reg2;
 	}
 
 	recv_ctx->state = le32_to_cpu(prsp->host_ctx_state);
@@ -352,12 +339,7 @@ qlcnic_fw_cmd_create_tx_ctx(struct qlcnic_adapter *adapter)
 
 	if (err == QLCNIC_RCODE_SUCCESS) {
 		temp = le32_to_cpu(prsp->cds_ring.host_producer_crb);
-		if (adapter->fw_hal_version == QLCNIC_FW_BASE)
-			tx_ring->crb_cmd_producer = qlcnic_get_ioaddr(adapter,
-				QLCNIC_REG(temp - 0x200));
-		else
-			tx_ring->crb_cmd_producer = adapter->ahw.pci_base0 +
-				temp;
+		tx_ring->crb_cmd_producer = adapter->ahw.pci_base0 + temp;
 
 		adapter->tx_context_id =
 			le16_to_cpu(prsp->context_id);
diff --git a/drivers/net/qlcnic/qlcnic_hdr.h b/drivers/net/qlcnic/qlcnic_hdr.h
index 7b81cab..15fc320 100644
--- a/drivers/net/qlcnic/qlcnic_hdr.h
+++ b/drivers/net/qlcnic/qlcnic_hdr.h
@@ -778,12 +778,6 @@ enum {
 	QLCNIC_NON_PRIV_FUNC	= 2
 };
 
-/* FW HAL api version */
-enum {
-	QLCNIC_FW_BASE	= 1,
-	QLCNIC_FW_NPAR	= 2
-};
-
 #define QLC_DEV_DRV_DEFAULT 0x11111111
 
 #define LSB(x)	((uint8_t)(x))
diff --git a/drivers/net/qlcnic/qlcnic_init.c b/drivers/net/qlcnic/qlcnic_init.c
index 6678127..75ba744 100644
--- a/drivers/net/qlcnic/qlcnic_init.c
+++ b/drivers/net/qlcnic/qlcnic_init.c
@@ -548,16 +548,14 @@ qlcnic_setup_idc_param(struct qlcnic_adapter *adapter) {
 	int timeo;
 	u32 val;
 
-	if (adapter->fw_hal_version == QLCNIC_FW_BASE) {
-		val = QLCRD32(adapter, QLCNIC_CRB_DEV_PARTITION_INFO);
-		val = QLC_DEV_GET_DRV(val, adapter->portnum);
-		if ((val & 0x3) != QLCNIC_TYPE_NIC) {
-			dev_err(&adapter->pdev->dev,
-				"Not an Ethernet NIC func=%u\n", val);
-			return -EIO;
-		}
-		adapter->physical_port = (val >> 2);
+	val = QLCRD32(adapter, QLCNIC_CRB_DEV_PARTITION_INFO);
+	val = QLC_DEV_GET_DRV(val, adapter->portnum);
+	if ((val & 0x3) != QLCNIC_TYPE_NIC) {
+		dev_err(&adapter->pdev->dev,
+			"Not an Ethernet NIC func=%u\n", val);
+		return -EIO;
 	}
+	adapter->physical_port = (val >> 2);
 	if (qlcnic_rom_fast_read(adapter, QLCNIC_ROM_DEV_INIT_TIMEOUT, &timeo))
 		timeo = 30;
 
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 3b71dfc..981aa91 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -377,15 +377,6 @@ static const struct net_device_ops qlcnic_netdev_ops = {
 };
 
 static struct qlcnic_nic_template qlcnic_ops = {
-	.get_mac_addr = qlcnic_get_mac_addr,
-	.config_bridged_mode = qlcnic_config_bridged_mode,
-	.config_led = qlcnic_config_led,
-	.set_ilb_mode = qlcnic_set_ilb_mode,
-	.clear_ilb_mode = qlcnic_clear_ilb_mode,
-	.start_firmware = qlcnic_start_firmware
-};
-
-static struct qlcnic_nic_template qlcnic_pf_ops = {
 	.get_mac_addr = qlcnic_get_mac_address,
 	.config_bridged_mode = qlcnic_config_bridged_mode,
 	.config_led = qlcnic_config_led,
@@ -534,15 +525,6 @@ qlcnic_get_driver_mode(struct qlcnic_adapter *adapter)
 
 	/* Determine FW API version */
 	adapter->fw_hal_version = readl(adapter->ahw.pci_base0 + QLCNIC_FW_API);
-	if (adapter->fw_hal_version == ~0) {
-		adapter->nic_ops = &qlcnic_ops;
-		adapter->fw_hal_version = QLCNIC_FW_BASE;
-		adapter->ahw.pci_func = PCI_FUNC(adapter->pdev->devfn);
-		adapter->capabilities = QLCRD32(adapter, CRB_FW_CAPABILITIES_1);
-		dev_info(&adapter->pdev->dev,
-			"FW does not support nic partion\n");
-		return adapter->fw_hal_version;
-	}
 
 	/* Find PCI function number */
 	pci_read_config_dword(adapter->pdev, QLCNIC_MSIX_TABLE_OFFSET, &func);
@@ -569,7 +551,7 @@ qlcnic_get_driver_mode(struct qlcnic_adapter *adapter)
 	switch (priv_level) {
 	case QLCNIC_MGMT_FUNC:
 		adapter->op_mode = QLCNIC_MGMT_FUNC;
-		adapter->nic_ops = &qlcnic_pf_ops;
+		adapter->nic_ops = &qlcnic_ops;
 		qlcnic_get_pci_info(adapter);
 		/* Set privilege level for other functions */
 		qlcnic_set_function_modes(adapter);
@@ -582,7 +564,7 @@ qlcnic_get_driver_mode(struct qlcnic_adapter *adapter)
 		dev_info(&adapter->pdev->dev,
 			"HAL Version: %d, Privileged function\n",
 			adapter->fw_hal_version);
-		adapter->nic_ops = &qlcnic_pf_ops;
+		adapter->nic_ops = &qlcnic_ops;
 		break;
 	case QLCNIC_NON_PRIV_FUNC:
 		adapter->op_mode = QLCNIC_NON_PRIV_FUNC;
-- 
1.6.0.2


^ permalink raw reply related

* [PATCH net-next-2.6 0/2] qlcnic: Driver fixes
From: Anirban Chakraborty @ 2010-06-25 22:23 UTC (permalink / raw)
  To: David Miller, netdev@vger.kernel.org
  Cc: Dept_NX_Linux_NIC_Driver, Ameen Rahman, Rajesh Borundia

Please apply the following two patches.

thanks,
Anirban

^ permalink raw reply

* [PATCH 9/9] cxgb4vf: Stitch new T4 PCI-E SR-IOV Virtual Function driver into the build
From: Casey Leedom @ 2010-06-25 22:15 UTC (permalink / raw)
  To: netdev

Stitch new T4 PCI-E SR-IOV Virtual Function driver into the build.

Signed-off-by: Casey Leedom
---
 drivers/net/Kconfig  |   23 +++++++++++++++++++++++
 drivers/net/Makefile |    1 +
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 71e6f8f..ec7f8b9 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2602,6 +2602,29 @@ config CHELSIO_T4
 	  To compile this driver as a module choose M here; the module
 	  will be called cxgb4.
 
+config CHELSIO_T4VF_DEPENDS
+	tristate
+	depends on PCI && INET
+	default y
+
+config CHELSIO_T4VF
+	tristate "Chelsio Communications T4 Virtual Function Ethernet support"
+	depends on CHELSIO_T4VF_DEPENDS
+	help
+	  This driver supports Chelsio T4-based gigabit and 10Gb Ethernet
+	  adapters with PCI-E SR-IOV Virtual Functions.
+
+	  For general information about Chelsio and our products, visit
+	  our website at <http://www.chelsio.com>.
+
+	  For customer support, please visit our customer support page at
+	  <http://www.chelsio.com/support.htm>.
+
+	  Please send feedback to <linux-bugs@chelsio.com>.
+
+	  To compile this driver as a module choose M here; the module
+	  will be called cxgb4vf.
+
 config EHEA
 	tristate "eHEA Ethernet support"
 	depends on IBMEBUS && INET && SPARSEMEM
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 0a0512a..49384ca 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_IP1000) += ipg.o
 obj-$(CONFIG_CHELSIO_T1) += chelsio/
 obj-$(CONFIG_CHELSIO_T3) += cxgb3/
 obj-$(CONFIG_CHELSIO_T4) += cxgb4/
+obj-$(CONFIG_CHELSIO_T4VF) += cxgb4vf/
 obj-$(CONFIG_EHEA) += ehea/
 obj-$(CONFIG_CAN) += can/
 obj-$(CONFIG_BONDING) += bonding/

^ permalink raw reply related

* [PATCH 8/9] cxgb4vf: Add new Makefile for T4 PCI-E SR-IOV Virtual Function driver cxgb4vf
From: Casey Leedom @ 2010-06-25 22:14 UTC (permalink / raw)
  To: netdev

Add new Makefile for T4 PCI-E SR-IOV Virtual Function driver "cxgb4vf".

Signed-off-by: Casey Leedom
---
 drivers/net/cxgb4vf/Makefile |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/net/cxgb4vf/Makefile b/drivers/net/cxgb4vf/Makefile
new file mode 100644
index 0000000..d72ee26
--- /dev/null
+++ b/drivers/net/cxgb4vf/Makefile
@@ -0,0 +1,7 @@
+#
+# Chelsio T4 SR-IOV Virtual Function Driver
+#
+
+obj-$(CONFIG_CHELSIO_T4VF) += cxgb4vf.o
+
+cxgb4vf-objs := cxgb4vf_main.o t4vf_hw.o sge.o

^ permalink raw reply related

* [PATCH 7/9] cxgb4vf: Add main T4 PCI-E SR-IOV Virtual Function driver for cxgb4vf
From: Casey Leedom @ 2010-06-25 22:14 UTC (permalink / raw)
  To: netdev

Add main T4 PCI-E SR-IOV Virtual Function driver for "cxgb4vf".

Signed-off-by: Casey Leedom
---
 drivers/net/cxgb4vf/adapter.h      |  540 +++++++
 drivers/net/cxgb4vf/cxgb4vf_main.c | 2906 ++++++++++++++++++++++++++++++++++++
 2 files changed, 3446 insertions(+), 0 deletions(-)

diff --git a/drivers/net/cxgb4vf/adapter.h b/drivers/net/cxgb4vf/adapter.h
new file mode 100644
index 0000000..8ea0196
--- /dev/null
+++ b/drivers/net/cxgb4vf/adapter.h
@@ -0,0 +1,540 @@
+/*
+ * This file is part of the Chelsio T4 PCI-E SR-IOV Virtual Function Ethernet
+ * driver for Linux.
+ *
+ * Copyright (c) 2009-2010 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * This file should not be included directly.  Include t4vf_common.h instead.
+ */
+
+#ifndef __CXGB4VF_ADAPTER_H__
+#define __CXGB4VF_ADAPTER_H__
+
+#include <linux/pci.h>
+#include <linux/spinlock.h>
+#include <linux/skbuff.h>
+#include <linux/if_ether.h>
+#include <linux/netdevice.h>
+
+#include "../cxgb4/t4_hw.h"
+
+/*
+ * Constants of the implementation.
+ */
+enum {
+	MAX_NPORTS	= 1,		/* max # of "ports" */
+	MAX_PORT_QSETS	= 8,		/* max # of Queue Sets / "port" */
+	MAX_ETH_QSETS	= MAX_NPORTS*MAX_PORT_QSETS,
+
+	/*
+	 * MSI-X interrupt index usage.
+	 */
+	MSIX_FW		= 0,		/* MSI-X index for firmware Q */
+	MSIX_NIQFLINT	= 1,		/* MSI-X index base for Ingress Qs */
+	MSIX_EXTRAS	= 1,
+	MSIX_ENTRIES	= MAX_ETH_QSETS + MSIX_EXTRAS,
+
+	/*
+	 * The maximum number of Ingress and Egress Queues is determined by
+	 * the maximum number of "Queue Sets" which we support plus any
+	 * ancillary queues.  Each "Queue Set" requires one Ingress Queue
+	 * for RX Packet Ingress Event notifications and two Egress Queues for
+	 * a Free List and an Ethernet TX list.
+	 */
+	INGQ_EXTRAS	= 2,		/* firmware event queue and */
+					/*   forwarded interrupts */
+	MAX_INGQ	= MAX_ETH_QSETS+INGQ_EXTRAS,
+	MAX_EGRQ	= MAX_ETH_QSETS*2,
+};
+
+/*
+ * Forward structure definition references.
+ */
+struct adapter;
+struct sge_eth_rxq;
+struct sge_rspq;
+
+/*
+ * Per-"port" information.  This is really per-Virtual Interface information
+ * but the use of the "port" nomanclature makes it easier to go back and forth
+ * between the PF and VF drivers ...
+ */
+struct port_info {
+	struct adapter *adapter;	/* our adapter */
+	struct vlan_group *vlan_grp;	/* out VLAN group */
+	u16 viid;			/* virtual interface ID */
+	s16 xact_addr_filt;		/* index of our MAC address filter */
+	u16 rss_size;			/* size of VI's RSS table slice */
+	u8 pidx;			/* index into adapter port[] */
+	u8 port_id;			/* physical port ID */
+	u8 rx_offload;			/* CSO, etc. */
+	u8 nqsets;			/* # of "Queue Sets" */
+	u8 first_qset;			/* index of first "Queue Set" */
+	struct link_config link_cfg;	/* physical port configuration */
+};
+
+/* port_info.rx_offload flags */
+enum {
+	RX_CSO = 1 << 0,
+};
+
+/*
+ * Scatter Gather Engine resources for the "adapter".  Our ingress and egress
+ * queues are organized into "Queue Sets" with one ingress and one egress
+ * queue per Queue Set.  These Queue Sets are aportionable between the "ports"
+ * (Virtual Interfaces).  One extra ingress queue is used to receive
+ * asynchronous messages from the firmware.  Note that the "Queue IDs" that we
+ * use here are really "Relative Queue IDs" which are returned as part of the
+ * firmware command to allocate queues.  These queue IDs are relative to the
+ * absolute Queue ID base of the section of the Queue ID space allocated to
+ * the PF/VF.
+ */
+
+/*
+ * SGE free-list queue state.
+ */
+struct rx_sw_desc;
+struct sge_fl {
+	unsigned int avail;		/* # of available RX buffers */
+	unsigned int pend_cred;		/* new buffers since last FL DB ring */
+	unsigned int cidx;		/* consumer index */
+	unsigned int pidx;		/* producer index */
+	unsigned long alloc_failed;	/* # of buffer allocation failures */
+	unsigned long large_alloc_failed;
+	unsigned long starving;		/* # of times FL was found starving */
+
+	/*
+	 * Write-once/infrequently fields.
+	 * -------------------------------
+	 */
+
+	unsigned int cntxt_id;		/* SGE relative QID for the free list */
+	unsigned int abs_id;		/* SGE absolute QID for the free list */
+	unsigned int size;		/* capacity of free list */
+	struct rx_sw_desc *sdesc;	/* address of SW RX descriptor ring */
+	__be64 *desc;			/* address of HW RX descriptor ring */
+	dma_addr_t addr;		/* PCI bus address of hardware ring */
+};
+
+/*
+ * An ingress packet gather list.
+ */
+struct pkt_gl {
+	skb_frag_t frags[MAX_SKB_FRAGS];
+	void *va;			/* virtual address of first byte */
+	unsigned int nfrags;		/* # of fragments */
+	unsigned int tot_len;		/* total length of fragments */
+};
+
+typedef int (*rspq_handler_t)(struct sge_rspq *, const __be64 *,
+			      const struct pkt_gl *);
+
+/*
+ * State for an SGE Response Queue.
+ */
+struct sge_rspq {
+	struct napi_struct napi;	/* NAPI scheduling control */
+	const __be64 *cur_desc;		/* current descriptor in queue */
+	unsigned int cidx;		/* consumer index */
+	u8 gen;				/* current generation bit */
+	u8 next_intr_params;		/* holdoff params for next interrupt */
+	int offset;			/* offset into current FL buffer */
+
+	unsigned int unhandled_irqs;	/* bogus interrupts */
+
+	/*
+	 * Write-once/infrequently fields.
+	 * -------------------------------
+	 */
+
+	u8 intr_params;			/* interrupt holdoff parameters */
+	u8 pktcnt_idx;			/* interrupt packet threshold */
+	u8 idx;				/* queue index within its group */
+	u16 cntxt_id;			/* SGE rel QID for the response Q */
+	u16 abs_id;			/* SGE abs QID for the response Q */
+	__be64 *desc;			/* address of hardware response ring */
+	dma_addr_t phys_addr;		/* PCI bus address of ring */
+	unsigned int iqe_len;		/* entry size */
+	unsigned int size;		/* capcity of response Q */
+	struct adapter *adapter;	/* our adapter */
+	struct net_device *netdev;	/* associated net device */
+	rspq_handler_t handler;		/* the handler for this response Q */
+};
+
+/*
+ * Ethernet queue statistics
+ */
+struct sge_eth_stats {
+	unsigned long pkts;		/* # of ethernet packets */
+	unsigned long lro_pkts;		/* # of LRO super packets */
+	unsigned long lro_merged;	/* # of wire packets merged by LRO */
+	unsigned long rx_cso;		/* # of Rx checksum offloads */
+	unsigned long vlan_ex;		/* # of Rx VLAN extractions */
+	unsigned long rx_drops;		/* # of packets dropped due to no mem */
+};
+
+/*
+ * State for an Ethernet Receive Queue.
+ */
+struct sge_eth_rxq {
+	struct sge_rspq rspq;		/* Response Queue */
+	struct sge_fl fl;		/* Free List */
+	struct sge_eth_stats stats;	/* receive statistics */
+};
+
+/*
+ * SGE Transmit Queue state.  This contains all of the resources associated
+ * with the hardware status of a TX Queue which is a circular ring of hardware
+ * TX Descriptors.  For convenience, it also contains a pointer to a parallel
+ * "Software Descriptor" array but we don't know anything about it here other
+ * than its type name.
+ */
+struct tx_desc {
+	/*
+	 * Egress Queues are measured in units of SGE_EQ_IDXSIZE by the
+	 * hardware: Sizes, Producer and Consumer indices, etc.
+	 */
+	__be64 flit[SGE_EQ_IDXSIZE/sizeof(__be64)];
+};
+struct tx_sw_desc;
+struct sge_txq {
+	unsigned int in_use;		/* # of in-use TX descriptors */
+	unsigned int size;		/* # of descriptors */
+	unsigned int cidx;		/* SW consumer index */
+	unsigned int pidx;		/* producer index */
+	unsigned long stops;		/* # of times queue has been stopped */
+	unsigned long restarts;		/* # of queue restarts */
+
+	/*
+	 * Write-once/infrequently fields.
+	 * -------------------------------
+	 */
+
+	unsigned int cntxt_id;		/* SGE relative QID for the TX Q */
+	unsigned int abs_id;		/* SGE absolute QID for the TX Q */
+	struct tx_desc *desc;		/* address of HW TX descriptor ring */
+	struct tx_sw_desc *sdesc;	/* address of SW TX descriptor ring */
+	struct sge_qstat *stat;		/* queue status entry */
+	dma_addr_t phys_addr;		/* PCI bus address of hardware ring */
+};
+
+/*
+ * State for an Ethernet Transmit Queue.
+ */
+struct sge_eth_txq {
+	struct sge_txq q;		/* SGE TX Queue */
+	struct netdev_queue *txq;	/* associated netdev TX queue */
+	unsigned long tso;		/* # of TSO requests */
+	unsigned long tx_cso;		/* # of TX checksum offloads */
+	unsigned long vlan_ins;		/* # of TX VLAN insertions */
+	unsigned long mapping_err;	/* # of I/O MMU packet mapping errors */
+};
+
+/*
+ * The complete set of Scatter/Gather Engine resources.
+ */
+struct sge {
+	/*
+	 * Our "Queue Sets" ...
+	 */
+	struct sge_eth_txq ethtxq[MAX_ETH_QSETS];
+	struct sge_eth_rxq ethrxq[MAX_ETH_QSETS];
+
+	/*
+	 * Extra ingress queues for asynchronous firmware events and
+	 * forwarded interrupts (when in MSI mode).
+	 */
+	struct sge_rspq fw_evtq ____cacheline_aligned_in_smp;
+
+	struct sge_rspq intrq ____cacheline_aligned_in_smp;
+	spinlock_t intrq_lock;
+
+	/*
+	 * State for managing "starving Free Lists" -- Free Lists which have
+	 * fallen below a certain threshold of buffers available to the
+	 * hardware and attempts to refill them up to that threshold have
+	 * failed.  We have a regular "slow tick" timer process which will
+	 * make periodic attempts to refill these starving Free Lists ...
+	 */
+	DECLARE_BITMAP(starving_fl, MAX_EGRQ);
+	struct timer_list rx_timer;
+
+	/*
+	 * State for cleaning up completed TX descriptors.
+	 */
+	struct timer_list tx_timer;
+
+	/*
+	 * Write-once/infrequently fields.
+	 * -------------------------------
+	 */
+
+	u16 max_ethqsets;		/* # of available Ethernet queue sets */
+	u16 ethqsets;			/* # of active Ethernet queue sets */
+	u16 ethtxq_rover;		/* Tx queue to clean up next */
+	u16 timer_val[SGE_NTIMERS];	/* interrupt holdoff timer array */
+	u8 counter_val[SGE_NCOUNTERS];	/* interrupt RX threshold array */
+
+	/*
+	 * Reverse maps from Absolute Queue IDs to associated queue pointers.
+	 * The absolute Queue IDs are in a compact range which start at a
+	 * [potentially large] Base Queue ID.  We perform the reverse map by
+	 * first converting the Absolute Queue ID into a Relative Queue ID by
+	 * subtracting off the Base Queue ID and then use a Relative Queue ID
+	 * indexed table to get the pointer to the corresponding software
+	 * queue structure.
+	 */
+	unsigned int egr_base;
+	unsigned int ingr_base;
+	void *egr_map[MAX_EGRQ];
+	struct sge_rspq *ingr_map[MAX_INGQ];
+};
+
+/*
+ * Utility macros to convert Absolute- to Relative-Queue indices and Egress-
+ * and Ingress-Queues.  The EQ_MAP() and IQ_MAP() macros which provide
+ * pointers to Ingress- and Egress-Queues can be used as both L- and R-values
+ */
+#define EQ_IDX(s, abs_id) ((unsigned int)((abs_id) - (s)->egr_base))
+#define IQ_IDX(s, abs_id) ((unsigned int)((abs_id) - (s)->ingr_base))
+
+#define EQ_MAP(s, abs_id) ((s)->egr_map[EQ_IDX(s, abs_id)])
+#define IQ_MAP(s, abs_id) ((s)->ingr_map[IQ_IDX(s, abs_id)])
+
+/*
+ * Macro to iterate across Queue Sets ("rxq" is a historic misnomer).
+ */
+#define for_each_ethrxq(sge, iter) \
+	for (iter = 0; iter < (sge)->ethqsets; iter++)
+
+/*
+ * Per-"adapter" (Virtual Function) information.
+ */
+struct adapter {
+	/* PCI resources */
+	void __iomem *regs;
+	struct pci_dev *pdev;
+	struct device *pdev_dev;
+
+	/* "adapter" resources */
+	unsigned long registered_device_map;
+	unsigned long open_device_map;
+	unsigned long flags;
+	struct adapter_params params;
+
+	/* queue and interrupt resources */
+	struct {
+		unsigned short vec;
+		char desc[22];
+	} msix_info[MSIX_ENTRIES];
+	struct sge sge;
+
+	/* Linux network device resources */
+	struct net_device *port[MAX_NPORTS];
+	const char *name;
+	unsigned int msg_enable;
+
+	/* debugfs resources */
+	struct dentry *debugfs_root;
+
+	/* various locks */
+	spinlock_t stats_lock;
+};
+
+enum { /* adapter flags */
+	FULL_INIT_DONE     = (1UL << 0),
+	USING_MSI          = (1UL << 1),
+	USING_MSIX         = (1UL << 2),
+	QUEUES_BOUND       = (1UL << 3),
+};
+
+/*
+ * The following register read/write routine definitions are required by
+ * the common code.
+ */
+
+/**
+ * t4_read_reg - read a HW register
+ * @adapter: the adapter
+ * @reg_addr: the register address
+ *
+ * Returns the 32-bit value of the given HW register.
+ */
+static inline u32 t4_read_reg(struct adapter *adapter, u32 reg_addr)
+{
+	return readl(adapter->regs + reg_addr);
+}
+
+/**
+ * t4_write_reg - write a HW register
+ * @adapter: the adapter
+ * @reg_addr: the register address
+ * @val: the value to write
+ *
+ * Write a 32-bit value into the given HW register.
+ */
+static inline void t4_write_reg(struct adapter *adapter, u32 reg_addr, u32 val)
+{
+	writel(val, adapter->regs + reg_addr);
+}
+
+#ifndef readq
+static inline u64 readq(const volatile void __iomem *addr)
+{
+	return readl(addr) + ((u64)readl(addr + 4) << 32);
+}
+
+static inline void writeq(u64 val, volatile void __iomem *addr)
+{
+	writel(val, addr);
+	writel(val >> 32, addr + 4);
+}
+#endif
+
+/**
+ * t4_read_reg64 - read a 64-bit HW register
+ * @adapter: the adapter
+ * @reg_addr: the register address
+ *
+ * Returns the 64-bit value of the given HW register.
+ */
+static inline u64 t4_read_reg64(struct adapter *adapter, u32 reg_addr)
+{
+	return readq(adapter->regs + reg_addr);
+}
+
+/**
+ * t4_write_reg64 - write a 64-bit HW register
+ * @adapter: the adapter
+ * @reg_addr: the register address
+ * @val: the value to write
+ *
+ * Write a 64-bit value into the given HW register.
+ */
+static inline void t4_write_reg64(struct adapter *adapter, u32 reg_addr,
+				  u64 val)
+{
+	writeq(val, adapter->regs + reg_addr);
+}
+
+/**
+ * port_name - return the string name of a port
+ * @adapter: the adapter
+ * @pidx: the port index
+ *
+ * Return the string name of the selected port.
+ */
+static inline const char *port_name(struct adapter *adapter, int pidx)
+{
+	return adapter->port[pidx]->name;
+}
+
+/**
+ * t4_os_set_hw_addr - store a port's MAC address in SW
+ * @adapter: the adapter
+ * @pidx: the port index
+ * @hw_addr: the Ethernet address
+ *
+ * Store the Ethernet address of the given port in SW.  Called by the common
+ * code when it retrieves a port's Ethernet address from EEPROM.
+ */
+static inline void t4_os_set_hw_addr(struct adapter *adapter, int pidx,
+				     u8 hw_addr[])
+{
+	memcpy(adapter->port[pidx]->dev_addr, hw_addr, ETH_ALEN);
+	memcpy(adapter->port[pidx]->perm_addr, hw_addr, ETH_ALEN);
+}
+
+/**
+ * netdev2pinfo - return the port_info structure associated with a net_device
+ * @dev: the netdev
+ *
+ * Return the struct port_info associated with a net_device
+ */
+static inline struct port_info *netdev2pinfo(const struct net_device *dev)
+{
+	return netdev_priv(dev);
+}
+
+/**
+ * adap2pinfo - return the port_info of a port
+ * @adap: the adapter
+ * @pidx: the port index
+ *
+ * Return the port_info structure for the adapter.
+ */
+static inline struct port_info *adap2pinfo(struct adapter *adapter, int pidx)
+{
+	return netdev_priv(adapter->port[pidx]);
+}
+
+/**
+ * netdev2adap - return the adapter structure associated with a net_device
+ * @dev: the netdev
+ *
+ * Return the struct adapter associated with a net_device
+ */
+static inline struct adapter *netdev2adap(const struct net_device *dev)
+{
+	return netdev2pinfo(dev)->adapter;
+}
+
+/*
+ * OS "Callback" function declarations.  These are functions that the OS code
+ * is "contracted" to provide for the common code.
+ */
+void t4vf_os_link_changed(struct adapter *, int, int);
+
+/*
+ * SGE function prototype declarations.
+ */
+int t4vf_sge_alloc_rxq(struct adapter *, struct sge_rspq *, bool,
+		       struct net_device *, int,
+		       struct sge_fl *, rspq_handler_t);
+int t4vf_sge_alloc_eth_txq(struct adapter *, struct sge_eth_txq *,
+			   struct net_device *, struct netdev_queue *,
+			   unsigned int);
+void t4vf_free_sge_resources(struct adapter *);
+
+int t4vf_eth_xmit(struct sk_buff *, struct net_device *);
+int t4vf_ethrx_handler(struct sge_rspq *, const __be64 *,
+		       const struct pkt_gl *);
+
+irq_handler_t t4vf_intr_handler(struct adapter *);
+irqreturn_t t4vf_sge_intr_msix(int, void *);
+
+int t4vf_sge_init(struct adapter *);
+void t4vf_sge_start(struct adapter *);
+void t4vf_sge_stop(struct adapter *);
+
+#endif /* __CXGB4VF_ADAPTER_H__ */
diff --git a/drivers/net/cxgb4vf/cxgb4vf_main.c 
b/drivers/net/cxgb4vf/cxgb4vf_main.c
new file mode 100644
index 0000000..bd73ff5
--- /dev/null
+++ b/drivers/net/cxgb4vf/cxgb4vf_main.c
@@ -0,0 +1,2906 @@
+/*
+ * This file is part of the Chelsio T4 PCI-E SR-IOV Virtual Function Ethernet
+ * driver for Linux.
+ *
+ * Copyright (c) 2009-2010 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/init.h>
+#include <linux/pci.h>
+#include <linux/dma-mapping.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/debugfs.h>
+#include <linux/ethtool.h>
+
+#include "t4vf_common.h"
+#include "t4vf_defs.h"
+
+#include "../cxgb4/t4_regs.h"
+#include "../cxgb4/t4_msg.h"
+
+/*
+ * Generic information about the driver.
+ */
+#define DRV_VERSION "1.0.0"
+#define DRV_DESC "Chelsio T4 Virtual Function (VF) Network Driver"
+
+/*
+ * Module Parameters.
+ * ==================
+ */
+
+/*
+ * Default ethtool "message level" for adapters.
+ */
+#define DFLT_MSG_ENABLE (NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK | \
+			 NETIF_MSG_TIMER | NETIF_MSG_IFDOWN | NETIF_MSG_IFUP |\
+			 NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
+
+static int dflt_msg_enable = DFLT_MSG_ENABLE;
+
+module_param(dflt_msg_enable, int, 0644);
+MODULE_PARM_DESC(dflt_msg_enable,
+		 "default adapter ethtool message level bitmap");
+
+/*
+ * The driver uses the best interrupt scheme available on a platform in the
+ * order MSI-X then MSI.  This parameter determines which of these schemes the
+ * driver may consider as follows:
+ *
+ *     msi = 2: choose from among MSI-X and MSI
+ *     msi = 1: only consider MSI interrupts
+ *
+ * Note that unlike the Physical Function driver, this Virtual Function driver
+ * does _not_ support legacy INTx interrupts (this limitation is mandated by
+ * the PCI-E SR-IOV standard).
+ */
+#define MSI_MSIX	2
+#define MSI_MSI		1
+#define MSI_DEFAULT	MSI_MSIX
+
+static int msi = MSI_DEFAULT;
+
+module_param(msi, int, 0644);
+MODULE_PARM_DESC(msi, "whether to use MSI-X or MSI");
+
+/*
+ * Fundamental constants.
+ * ======================
+ */
+
+enum {
+	MAX_TXQ_ENTRIES		= 16384,
+	MAX_RSPQ_ENTRIES	= 16384,
+	MAX_RX_BUFFERS		= 16384,
+
+	MIN_TXQ_ENTRIES		= 32,
+	MIN_RSPQ_ENTRIES	= 128,
+	MIN_FL_ENTRIES		= 16,
+
+	/*
+	 * For purposes of manipulating the Free List size we need to
+	 * recognize that Free Lists are actually Egress Queues (the host
+	 * produces free buffers which the hardware consumes), Egress Queues
+	 * indices are all in units of Egress Context Units bytes, and free
+	 * list entries are 64-bit PCI DMA addresses.  And since the state of
+	 * the Producer Index == the Consumer Index implies an EMPTY list, we
+	 * always have at least one Egress Unit's worth of Free List entries
+	 * unused.  See sge.c for more details ...
+	 */
+	EQ_UNIT = SGE_EQ_IDXSIZE,
+	FL_PER_EQ_UNIT = EQ_UNIT / sizeof(__be64),
+	MIN_FL_RESID = FL_PER_EQ_UNIT,
+};
+
+/*
+ * Global driver state.
+ * ====================
+ */
+
+static struct dentry *cxgb4vf_debugfs_root;
+
+/*
+ * OS "Callback" functions.
+ * ========================
+ */
+
+/*
+ * The link status has changed on the indicated "port" (Virtual Interface).
+ */
+void t4vf_os_link_changed(struct adapter *adapter, int pidx, int link_ok)
+{
+	struct net_device *dev = adapter->port[pidx];
+
+	/*
+	 * If the port is disabled or the current recorded "link up"
+	 * status matches the new status, just return.
+	 */
+	if (!netif_running(dev) || link_ok == netif_carrier_ok(dev))
+		return;
+
+	/*
+	 * Tell the OS that the link status has changed and print a short
+	 * informative message on the console about the event.
+	 */
+	if (link_ok) {
+		const char *s;
+		const char *fc;
+		const struct port_info *pi = netdev_priv(dev);
+
+		netif_carrier_on(dev);
+
+		switch (pi->link_cfg.speed) {
+		case SPEED_10000:
+			s = "10Gbps";
+			break;
+
+		case SPEED_1000:
+			s = "1000Mbps";
+			break;
+
+		case SPEED_100:
+			s = "100Mbps";
+			break;
+
+		default:
+			s = "unknown";
+			break;
+		}
+
+		switch (pi->link_cfg.fc) {
+		case PAUSE_RX:
+			fc = "RX";
+			break;
+
+		case PAUSE_TX:
+			fc = "TX";
+			break;
+
+		case PAUSE_RX|PAUSE_TX:
+			fc = "RX/TX";
+			break;
+
+		default:
+			fc = "no";
+			break;
+		}
+
+		printk(KERN_INFO "%s: link up, %s, full-duplex, %s PAUSE\n",
+		       dev->name, s, fc);
+	} else {
+		netif_carrier_off(dev);
+		printk(KERN_INFO "%s: link down\n", dev->name);
+	}
+}
+
+/*
+ * Net device operations.
+ * ======================
+ */
+
+/*
+ * Record our new VLAN Group and enable/disable hardware VLAN Tag extraction
+ * based on whether the specified VLAN Group pointer is NULL or not.
+ */
+static void cxgb4vf_vlan_rx_register(struct net_device *dev,
+				     struct vlan_group *grp)
+{
+	struct port_info *pi = netdev_priv(dev);
+
+	pi->vlan_grp = grp;
+	t4vf_set_rxmode(pi->adapter, pi->viid, -1, -1, -1, -1, grp != NULL, 0);
+}
+
+/*
+ * Perform the MAC and PHY actions needed to enable a "port" (Virtual
+ * Interface).
+ */
+static int link_start(struct net_device *dev)
+{
+	int ret;
+	struct port_info *pi = netdev_priv(dev);
+
+	/*
+	 * We do not set address filters and promiscuity here, the stack does
+	 * that step explicitly.
+	 */
+	ret = t4vf_set_rxmode(pi->adapter, pi->viid, dev->mtu, -1, -1, -1, -1,
+			      true);
+	if (ret == 0) {
+		ret = t4vf_change_mac(pi->adapter, pi->viid,
+				      pi->xact_addr_filt, dev->dev_addr, true);
+		if (ret >= 0) {
+			pi->xact_addr_filt = ret;
+			ret = 0;
+		}
+	}
+
+	/*
+	 * We don't need to actually "start the link" itself since the
+	 * firmware will do that for us when the first Virtual Interface
+	 * is enabled on a port.
+	 */
+	if (ret == 0)
+		ret = t4vf_enable_vi(pi->adapter, pi->viid, true, true);
+	return ret;
+}
+
+/*
+ * Name the MSI-X interrupts.
+ */
+static void name_msix_vecs(struct adapter *adapter)
+{
+	int namelen = sizeof(adapter->msix_info[0].desc) - 1;
+	int pidx;
+
+	/*
+	 * Firmware events.
+	 */
+	snprintf(adapter->msix_info[MSIX_FW].desc, namelen,
+		 "%s-FWeventq", adapter->name);
+	adapter->msix_info[MSIX_FW].desc[namelen] = 0;
+
+	/*
+	 * Ethernet queues.
+	 */
+	for_each_port(adapter, pidx) {
+		struct net_device *dev = adapter->port[pidx];
+		const struct port_info *pi = netdev_priv(dev);
+		int qs, msi;
+
+		for (qs = 0, msi = MSIX_NIQFLINT;
+		     qs < pi->nqsets;
+		     qs++, msi++) {
+			snprintf(adapter->msix_info[msi].desc, namelen,
+				 "%s-%d", dev->name, qs);
+			adapter->msix_info[msi].desc[namelen] = 0;
+		}
+	}
+}
+
+/*
+ * Request all of our MSI-X resources.
+ */
+static int request_msix_queue_irqs(struct adapter *adapter)
+{
+	struct sge *s = &adapter->sge;
+	int rxq, msi, err;
+
+	/*
+	 * Firmware events.
+	 */
+	err = request_irq(adapter->msix_info[MSIX_FW].vec, t4vf_sge_intr_msix,
+			  0, adapter->msix_info[MSIX_FW].desc, &s->fw_evtq);
+	if (err)
+		return err;
+
+	/*
+	 * Ethernet queues.
+	 */
+	msi = MSIX_NIQFLINT;
+	for_each_ethrxq(s, rxq) {
+		err = request_irq(adapter->msix_info[msi].vec,
+				  t4vf_sge_intr_msix, 0,
+				  adapter->msix_info[msi].desc,
+				  &s->ethrxq[rxq].rspq);
+		if (err)
+			goto err_free_irqs;
+		msi++;
+	}
+	return 0;
+
+err_free_irqs:
+	while (--rxq >= 0)
+		free_irq(adapter->msix_info[--msi].vec, &s->ethrxq[rxq].rspq);
+	free_irq(adapter->msix_info[MSIX_FW].vec, &s->fw_evtq);
+	return err;
+}
+
+/*
+ * Free our MSI-X resources.
+ */
+static void free_msix_queue_irqs(struct adapter *adapter)
+{
+	struct sge *s = &adapter->sge;
+	int rxq, msi;
+
+	free_irq(adapter->msix_info[MSIX_FW].vec, &s->fw_evtq);
+	msi = MSIX_NIQFLINT;
+	for_each_ethrxq(s, rxq)
+		free_irq(adapter->msix_info[msi++].vec,
+			 &s->ethrxq[rxq].rspq);
+}
+
+/*
+ * Turn on NAPI and start up interrupts on a response queue.
+ */
+static void qenable(struct sge_rspq *rspq)
+{
+	napi_enable(&rspq->napi);
+
+	/*
+	 * 0-increment the Going To Sleep register to start the timer and
+	 * enable interrupts.
+	 */
+	t4_write_reg(rspq->adapter, T4VF_SGE_BASE_ADDR + SGE_VF_GTS,
+		     CIDXINC(0) |
+		     SEINTARM(rspq->intr_params) |
+		     INGRESSQID(rspq->cntxt_id));
+}
+
+/*
+ * Enable NAPI scheduling and interrupt generation for all Receive Queues.
+ */
+static void enable_rx(struct adapter *adapter)
+{
+	int rxq;
+	struct sge *s = &adapter->sge;
+
+	for_each_ethrxq(s, rxq)
+		qenable(&s->ethrxq[rxq].rspq);
+	qenable(&s->fw_evtq);
+
+	/*
+	 * The interrupt queue doesn't use NAPI so we do the 0-increment of
+	 * its Going To Sleep register here to get it started.
+	 */
+	if (adapter->flags & USING_MSI)
+		t4_write_reg(adapter, T4VF_SGE_BASE_ADDR + SGE_VF_GTS,
+			     CIDXINC(0) |
+			     SEINTARM(s->intrq.intr_params) |
+			     INGRESSQID(s->intrq.cntxt_id));
+
+}
+
+/*
+ * Wait until all NAPI handlers are descheduled.
+ */
+static void quiesce_rx(struct adapter *adapter)
+{
+	struct sge *s = &adapter->sge;
+	int rxq;
+
+	for_each_ethrxq(s, rxq)
+		napi_disable(&s->ethrxq[rxq].rspq.napi);
+	napi_disable(&s->fw_evtq.napi);
+}
+
+/*
+ * Response queue handler for the firmware event queue.
+ */
+static int fwevtq_handler(struct sge_rspq *rspq, const __be64 *rsp,
+			  const struct pkt_gl *gl)
+{
+	/*
+	 * Extract response opcode and get pointer to CPL message body.
+	 */
+	struct adapter *adapter = rspq->adapter;
+	u8 opcode = ((const struct rss_header *)rsp)->opcode;
+	void *cpl = (void *)(rsp + 1);
+
+	switch (opcode) {
+	case CPL_FW6_MSG: {
+		/*
+		 * We've received an asynchronous message from the firmware.
+		 */
+		const struct cpl_fw6_msg *fw_msg = cpl;
+		if (fw_msg->type == FW6_TYPE_CMD_RPL)
+			t4vf_handle_fw_rpl(adapter, fw_msg->data);
+		break;
+	}
+
+	case CPL_SGE_EGR_UPDATE: {
+		/*
+		 * We've received an Egress Queue status update message.
+		 * We get these, as the SGE is currently configured, when
+		 * the firmware passes certain points in processing our
+		 * TX Ethernet Queue.  We use these updates to determine
+		 * when we may need to restart a TX Ethernet Queue which
+		 * was stopped for lack of free slots ...
+		 */
+		const struct cpl_sge_egr_update *p = (void *)cpl;
+		unsigned int qid = EGR_QID(be32_to_cpu(p->opcode_qid));
+		struct sge *s = &adapter->sge;
+		struct sge_txq *tq;
+		struct sge_eth_txq *txq;
+		unsigned int eq_idx;
+		int hw_cidx, reclaimable, in_use;
+
+		/*
+		 * Perform sanity checking on the Queue ID to make sure it
+		 * really refers to one of our TX Ethernet Egress Queues which
+		 * is active and matches the queue's ID.  None of these error
+		 * conditions should ever happen so we may want to either make
+		 * them fatal and/or conditionalized under DEBUG.
+		 */
+		eq_idx = EQ_IDX(s, qid);
+		if (unlikely(eq_idx >= MAX_EGRQ)) {
+			dev_err(adapter->pdev_dev,
+				"Egress Update QID %d out of range\n", qid);
+			break;
+		}
+		tq = s->egr_map[eq_idx];
+		if (unlikely(tq == NULL)) {
+			dev_err(adapter->pdev_dev,
+				"Egress Update QID %d TXQ=NULL\n", qid);
+			break;
+		}
+		txq = container_of(tq, struct sge_eth_txq, q);
+		if (unlikely(tq->abs_id != qid)) {
+			dev_err(adapter->pdev_dev,
+				"Egress Update QID %d refers to TXQ %d\n",
+				qid, tq->abs_id);
+			break;
+		}
+
+		/*
+		 * Skip TX Queues which aren't stopped.
+		 */
+		if (likely(!netif_tx_queue_stopped(txq->txq)))
+			break;
+
+		/*
+		 * Skip stopped TX Queues which have more than half of their
+		 * DMA rings occupied with unacknowledged writes.
+		 */
+		hw_cidx = be16_to_cpu(txq->q.stat->cidx);
+		reclaimable = hw_cidx - txq->q.cidx;
+		if (reclaimable < 0)
+			reclaimable += txq->q.size;
+		in_use = txq->q.in_use - reclaimable;
+		if (in_use >= txq->q.size/2)
+			break;
+
+		/*
+		 * Restart a stopped TX Queue which has less than half of its
+		 * TX ring in use ...
+		 */
+		txq->q.restarts++;
+		netif_tx_wake_queue(txq->txq);
+		break;
+	}
+
+	default:
+		dev_err(adapter->pdev_dev,
+			"unexpected CPL %#x on FW event queue\n", opcode);
+	}
+
+	return 0;
+}
+
+/*
+ * Allocate SGE TX/RX response queues.  Determine how many sets of SGE queues
+ * to use and initializes them.  We support multiple "Queue Sets" per port if
+ * we have MSI-X, otherwise just one queue set per port.
+ */
+static int setup_sge_queues(struct adapter *adapter)
+{
+	struct sge *s = &adapter->sge;
+	int err, pidx, msix;
+
+	/*
+	 * Clear "Queue Set" Free List Starving and TX Queue Mapping Error
+	 * state.
+	 */
+	bitmap_zero(s->starving_fl, MAX_EGRQ);
+
+	/*
+	 * If we're using MSI interrupt mode we need to set up a "forwarded
+	 * interrupt" queue which we'll set up with our MSI vector.  The rest
+	 * of the ingress queues will be set up to forward their interrupts to
+	 * this queue ...  This must be first since t4vf_sge_alloc_rxq() uses
+	 * the intrq's queue ID as the interrupt forwarding queue for the
+	 * subsequent calls ...
+	 */
+	if (adapter->flags & USING_MSI) {
+		err = t4vf_sge_alloc_rxq(adapter, &s->intrq, false,
+					 adapter->port[0], 0, NULL, NULL);
+		if (err)
+			goto err_free_queues;
+	}
+
+	/*
+	 * Allocate our ingress queue for asynchronous firmware messages.
+	 */
+	err = t4vf_sge_alloc_rxq(adapter, &s->fw_evtq, true, adapter->port[0],
+				 MSIX_FW, NULL, fwevtq_handler);
+	if (err)
+		goto err_free_queues;
+
+	/*
+	 * Allocate each "port"'s initial Queue Sets.  These can be changed
+	 * later on ... up to the point where any interface on the adapter is
+	 * brought up at which point lots of things get nailed down
+	 * permanently ...
+	 */
+	msix = MSIX_NIQFLINT;
+	for_each_port(adapter, pidx) {
+		struct net_device *dev = adapter->port[pidx];
+		struct port_info *pi = netdev_priv(dev);
+		struct sge_eth_rxq *rxq = &s->ethrxq[pi->first_qset];
+		struct sge_eth_txq *txq = &s->ethtxq[pi->first_qset];
+		int nqsets = (adapter->flags & USING_MSIX) ? pi->nqsets : 1;
+		int qs;
+
+		for (qs = 0; qs < nqsets; qs++, rxq++, txq++) {
+			err = t4vf_sge_alloc_rxq(adapter, &rxq->rspq, false,
+						 dev, msix++,
+						 &rxq->fl, t4vf_ethrx_handler);
+			if (err)
+				goto err_free_queues;
+
+			err = t4vf_sge_alloc_eth_txq(adapter, txq, dev,
+					     netdev_get_tx_queue(dev, qs),
+					     s->fw_evtq.cntxt_id);
+			if (err)
+				goto err_free_queues;
+
+			rxq->rspq.idx = qs;
+			memset(&rxq->stats, 0, sizeof(rxq->stats));
+		}
+	}
+
+	/*
+	 * Create the reverse mappings for the queues.
+	 */
+	s->egr_base = s->ethtxq[0].q.abs_id - s->ethtxq[0].q.cntxt_id;
+	s->ingr_base = s->ethrxq[0].rspq.abs_id - s->ethrxq[0].rspq.cntxt_id;
+	IQ_MAP(s, s->fw_evtq.abs_id) = &s->fw_evtq;
+	for_each_port(adapter, pidx) {
+		struct net_device *dev = adapter->port[pidx];
+		struct port_info *pi = netdev_priv(dev);
+		struct sge_eth_rxq *rxq = &s->ethrxq[pi->first_qset];
+		struct sge_eth_txq *txq = &s->ethtxq[pi->first_qset];
+		int nqsets = (adapter->flags & USING_MSIX) ? pi->nqsets : 1;
+		int qs;
+
+		for (qs = 0; qs < nqsets; qs++, rxq++, txq++) {
+			IQ_MAP(s, rxq->rspq.abs_id) = &rxq->rspq;
+			EQ_MAP(s, txq->q.abs_id) = &txq->q;
+
+			/*
+			 * The FW_IQ_CMD doesn't return the Absolute Queue IDs
+			 * for Free Lists but since all of the Egress Queues
+			 * (including Free Lists) have Relative Queue IDs
+			 * which are computed as Absolute - Base Queue ID, we
+			 * can synthesize the Absolute Queue IDs for the Free
+			 * Lists.  This is useful for debugging purposes when
+			 * we want to dump Queue Contexts via the PF Driver.
+			 */
+			rxq->fl.abs_id = rxq->fl.cntxt_id + s->egr_base;
+			EQ_MAP(s, rxq->fl.abs_id) = &rxq->fl;
+		}
+	}
+	return 0;
+
+err_free_queues:
+	t4vf_free_sge_resources(adapter);
+	return err;
+}
+
+/*
+ * Set up Receive Side Scaling (RSS) to distribute packets to multiple receive
+ * queues.  We configure the RSS CPU lookup table to distribute to the number
+ * of HW receive queues, and the response queue lookup table to narrow that
+ * down to the response queues actually configured for each "port" (Virtual
+ * Interface).  We always configure the RSS mapping for all ports since the
+ * mapping table has plenty of entries.
+ */
+static int setup_rss(struct adapter *adapter)
+{
+	int pidx;
+
+	for_each_port(adapter, pidx) {
+		struct port_info *pi = adap2pinfo(adapter, pidx);
+		struct sge_eth_rxq *rxq = &adapter->sge.ethrxq[pi->first_qset];
+		u16 rss[MAX_PORT_QSETS];
+		int qs, err;
+
+		for (qs = 0; qs < pi->nqsets; qs++)
+			rss[qs] = rxq[qs].rspq.abs_id;
+
+		err = t4vf_config_rss_range(adapter, pi->viid,
+					    0, pi->rss_size, rss, pi->nqsets);
+		if (err)
+			return err;
+
+		/*
+		 * Perform Global RSS Mode-specific initialization.
+		 */
+		switch (adapter->params.rss.mode) {
+		case FW_RSS_GLB_CONFIG_CMD_MODE_BASICVIRTUAL:
+			/*
+			 * If Tunnel All Lookup isn't specified in the global
+			 * RSS Configuration, then we need to specify a
+			 * default Ingress Queue for any ingress packets which
+			 * aren't hashed.  We'll use our first ingress queue
+			 * ...
+			 */
+			if (!adapter->params.rss.u.basicvirtual.tnlalllookup) {
+				union rss_vi_config config;
+				err = t4vf_read_rss_vi_config(adapter,
+							      pi->viid,
+							      &config);
+				if (err)
+					return err;
+				config.basicvirtual.defaultq =
+					rxq[0].rspq.abs_id;
+				err = t4vf_write_rss_vi_config(adapter,
+							       pi->viid,
+							       &config);
+				if (err)
+					return err;
+			}
+			break;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Bring the adapter up.  Called whenever we go from no "ports" open to having
+ * one open.  This function performs the actions necessary to make an adapter
+ * operational, such as completing the initialization of HW modules, and
+ * enabling interrupts.  Must be called with the rtnl lock held.  (Note that
+ * this is called "cxgb_up" in the PF Driver.)
+ */
+static int adapter_up(struct adapter *adapter)
+{
+	int err;
+
+	/*
+	 * If this is the first time we've been called, perform basic
+	 * adapter setup.  Once we've done this, many of our adapter
+	 * parameters can no longer be changed ...
+	 */
+	if ((adapter->flags & FULL_INIT_DONE) == 0) {
+		err = setup_sge_queues(adapter);
+		if (err)
+			return err;
+		err = setup_rss(adapter);
+		if (err) {
+			t4vf_free_sge_resources(adapter);
+			return err;
+		}
+
+		if (adapter->flags & USING_MSIX)
+			name_msix_vecs(adapter);
+		adapter->flags |= FULL_INIT_DONE;
+	}
+
+	/*
+	 * Acquire our interrupt resources.  We only support MSI-X and MSI.
+	 */
+	BUG_ON((adapter->flags & (USING_MSIX|USING_MSI)) == 0);
+	if (adapter->flags & USING_MSIX)
+		err = request_msix_queue_irqs(adapter);
+	else
+		err = request_irq(adapter->pdev->irq,
+				  t4vf_intr_handler(adapter), 0,
+				  adapter->name, adapter);
+	if (err) {
+		dev_err(adapter->pdev_dev, "request_irq failed, err %d\n",
+			err);
+		return err;
+	}
+
+	/*
+	 * Enable NAPI ingress processing and return success.
+	 */
+	enable_rx(adapter);
+	t4vf_sge_start(adapter);
+	return 0;
+}
+
+/*
+ * Bring the adapter down.  Called whenever the last "port" (Virtual
+ * Interface) closed.  (Note that this routine is called "cxgb_down" in the PF
+ * Driver.)
+ */
+static void adapter_down(struct adapter *adapter)
+{
+	/*
+	 * Free interrupt resources.
+	 */
+	if (adapter->flags & USING_MSIX)
+		free_msix_queue_irqs(adapter);
+	else
+		free_irq(adapter->pdev->irq, adapter);
+
+	/*
+	 * Wait for NAPI handlers to finish.
+	 */
+	quiesce_rx(adapter);
+}
+
+/*
+ * Start up a net device.
+ */
+static int cxgb4vf_open(struct net_device *dev)
+{
+	int err;
+	struct port_info *pi = netdev_priv(dev);
+	struct adapter *adapter = pi->adapter;
+
+	/*
+	 * If this is the first interface that we're opening on the "adapter",
+	 * bring the "adapter" up now.
+	 */
+	if (adapter->open_device_map == 0) {
+		err = adapter_up(adapter);
+		if (err)
+			return err;
+	}
+
+	/*
+	 * Note that this interface is up and start everything up ...
+	 */
+	dev->real_num_tx_queues = pi->nqsets;
+	set_bit(pi->port_id, &adapter->open_device_map);
+	link_start(dev);
+	netif_tx_start_all_queues(dev);
+	return 0;
+}
+
+/*
+ * Shut down a net device.  This routine is called "cxgb_close" in the PF
+ * Driver ...
+ */
+static int cxgb4vf_stop(struct net_device *dev)
+{
+	int ret;
+	struct port_info *pi = netdev_priv(dev);
+	struct adapter *adapter = pi->adapter;
+
+	netif_tx_stop_all_queues(dev);
+	netif_carrier_off(dev);
+	ret = t4vf_enable_vi(adapter, pi->viid, false, false);
+	pi->link_cfg.link_ok = 0;
+
+	clear_bit(pi->port_id, &adapter->open_device_map);
+	if (adapter->open_device_map == 0)
+		adapter_down(adapter);
+	return 0;
+}
+
+/*
+ * Translate our basic statistics into the standard "ifconfig" statistics.
+ */
+static struct net_device_stats *cxgb4vf_get_stats(struct net_device *dev)
+{
+	struct t4vf_port_stats stats;
+	struct port_info *pi = netdev2pinfo(dev);
+	struct adapter *adapter = pi->adapter;
+	struct net_device_stats *ns = &dev->stats;
+	int err;
+
+	spin_lock(&adapter->stats_lock);
+	err = t4vf_get_port_stats(adapter, pi->pidx, &stats);
+	spin_unlock(&adapter->stats_lock);
+
+	memset(ns, 0, sizeof(*ns));
+	if (err)
+		return ns;
+
+	ns->tx_bytes = (stats.tx_bcast_bytes + stats.tx_mcast_bytes +
+			stats.tx_ucast_bytes + stats.tx_offload_bytes);
+	ns->tx_packets = (stats.tx_bcast_frames + stats.tx_mcast_frames +
+			  stats.tx_ucast_frames + stats.tx_offload_frames);
+	ns->rx_bytes = (stats.rx_bcast_bytes + stats.rx_mcast_bytes +
+			stats.rx_ucast_bytes);
+	ns->rx_packets = (stats.rx_bcast_frames + stats.rx_mcast_frames +
+			  stats.rx_ucast_frames);
+	ns->multicast = stats.rx_mcast_frames;
+	ns->tx_errors = stats.tx_drop_frames;
+	ns->rx_errors = stats.rx_err_frames;
+
+	return ns;
+}
+
+/*
+ * Collect up to maxaddrs worth of a netdevice's unicast addresses into an
+ * array of addrss pointers and return the number collected.
+ */
+static inline int collect_netdev_uc_list_addrs(const struct net_device *dev,
+					       const u8 **addr,
+					       unsigned int maxaddrs)
+{
+	unsigned int naddr = 0;
+	const struct netdev_hw_addr *ha;
+
+	for_each_dev_addr(dev, ha) {
+		addr[naddr++] = ha->addr;
+		if (naddr >= maxaddrs)
+			break;
+	}
+	return naddr;
+}
+
+/*
+ * Collect up to maxaddrs worth of a netdevice's multicast addresses into an
+ * array of addrss pointers and return the number collected.
+ */
+static inline int collect_netdev_mc_list_addrs(const struct net_device *dev,
+					       const u8 **addr,
+					       unsigned int maxaddrs)
+{
+	unsigned int naddr = 0;
+	const struct netdev_hw_addr *ha;
+
+	netdev_for_each_mc_addr(ha, dev) {
+		addr[naddr++] = ha->addr;
+		if (naddr >= maxaddrs)
+			break;
+	}
+	return naddr;
+}
+
+/*
+ * Configure the exact and hash address filters to handle a port's multicast
+ * and secondary unicast MAC addresses.
+ */
+static int set_addr_filters(const struct net_device *dev, bool sleep)
+{
+	u64 mhash = 0;
+	u64 uhash = 0;
+	bool free = true;
+	u16 filt_idx[7];
+	const u8 *addr[7];
+	int ret, naddr = 0;
+	const struct port_info *pi = netdev_priv(dev);
+
+	/* first do the secondary unicast addresses */
+	naddr = collect_netdev_uc_list_addrs(dev, addr, ARRAY_SIZE(addr));
+	if (naddr > 0) {
+		ret = t4vf_alloc_mac_filt(pi->adapter, pi->viid, free,
+					  naddr, addr, filt_idx, &uhash, sleep);
+		if (ret < 0)
+			return ret;
+
+		free = false;
+	}
+
+	/* next set up the multicast addresses */
+	naddr = collect_netdev_mc_list_addrs(dev, addr, ARRAY_SIZE(addr));
+	if (naddr > 0) {
+		ret = t4vf_alloc_mac_filt(pi->adapter, pi->viid, free,
+					  naddr, addr, filt_idx, &mhash, sleep);
+		if (ret < 0)
+			return ret;
+	}
+
+	return t4vf_set_addr_hash(pi->adapter, pi->viid, uhash != 0,
+				  uhash | mhash, sleep);
+}
+
+/*
+ * Set RX properties of a port, such as promiscruity, address filters, and MTU.
+ * If @mtu is -1 it is left unchanged.
+ */
+static int set_rxmode(struct net_device *dev, int mtu, bool sleep_ok)
+{
+	int ret;
+	struct port_info *pi = netdev_priv(dev);
+
+	ret = set_addr_filters(dev, sleep_ok);
+	if (ret == 0)
+		ret = t4vf_set_rxmode(pi->adapter, pi->viid, -1,
+				      (dev->flags & IFF_PROMISC) != 0,
+				      (dev->flags & IFF_ALLMULTI) != 0,
+				      1, -1, sleep_ok);
+	return ret;
+}
+
+/*
+ * Set the current receive modes on the device.
+ */
+static void cxgb4vf_set_rxmode(struct net_device *dev)
+{
+	/* unfortunately we can't return errors to the stack */
+	set_rxmode(dev, -1, false);
+}
+
+/*
+ * Find the entry in the interrupt holdoff timer value array which comes
+ * closest to the specified interrupt holdoff value.
+ */
+static int closest_timer(const struct sge *s, int us)
+{
+	int i, timer_idx = 0, min_delta = INT_MAX;
+
+	for (i = 0; i < ARRAY_SIZE(s->timer_val); i++) {
+		int delta = us - s->timer_val[i];
+		if (delta < 0)
+			delta = -delta;
+		if (delta < min_delta) {
+			min_delta = delta;
+			timer_idx = i;
+		}
+	}
+	return timer_idx;
+}
+
+static int closest_thres(const struct sge *s, int thres)
+{
+	int i, delta, pktcnt_idx = 0, min_delta = INT_MAX;
+
+	for (i = 0; i < ARRAY_SIZE(s->counter_val); i++) {
+		delta = thres - s->counter_val[i];
+		if (delta < 0)
+			delta = -delta;
+		if (delta < min_delta) {
+			min_delta = delta;
+			pktcnt_idx = i;
+		}
+	}
+	return pktcnt_idx;
+}
+
+/*
+ * Return a queue's interrupt hold-off time in us.  0 means no timer.
+ */
+static unsigned int qtimer_val(const struct adapter *adapter,
+			       const struct sge_rspq *rspq)
+{
+	unsigned int timer_idx = QINTR_TIMER_IDX_GET(rspq->intr_params);
+
+	return timer_idx < SGE_NTIMERS
+		? adapter->sge.timer_val[timer_idx]
+		: 0;
+}
+
+/**
+ *	set_rxq_intr_params - set a queue's interrupt holdoff parameters
+ *	@adapter: the adapter
+ *	@rspq: the RX response queue
+ *	@us: the hold-off time in us, or 0 to disable timer
+ *	@cnt: the hold-off packet count, or 0 to disable counter
+ *
+ *	Sets an RX response queue's interrupt hold-off time and packet count.
+ *	At least one of the two needs to be enabled for the queue to generate
+ *	interrupts.
+ */
+static int set_rxq_intr_params(struct adapter *adapter, struct sge_rspq *rspq,
+			       unsigned int us, unsigned int cnt)
+{
+	unsigned int timer_idx;
+
+	/*
+	 * If both the interrupt holdoff timer and count are specified as
+	 * zero, default to a holdoff count of 1 ...
+	 */
+	if ((us | cnt) == 0)
+		cnt = 1;
+
+	/*
+	 * If an interrupt holdoff count has been specified, then find the
+	 * closest configured holdoff count and use that.  If the response
+	 * queue has already been created, then update its queue context
+	 * parameters ...
+	 */
+	if (cnt) {
+		int err;
+		u32 v, pktcnt_idx;
+
+		pktcnt_idx = closest_thres(&adapter->sge, cnt);
+		if (rspq->desc && rspq->pktcnt_idx != pktcnt_idx) {
+			v = FW_PARAMS_MNEM(FW_PARAMS_MNEM_DMAQ) |
+			    FW_PARAMS_PARAM_X(
+					FW_PARAMS_PARAM_DMAQ_IQ_INTCNTTHRESH) |
+			    FW_PARAMS_PARAM_YZ(rspq->cntxt_id);
+			err = t4vf_set_params(adapter, 1, &v, &pktcnt_idx);
+			if (err)
+				return err;
+		}
+		rspq->pktcnt_idx = pktcnt_idx;
+	}
+
+	/*
+	 * Compute the closest holdoff timer index from the supplied holdoff
+	 * timer value.
+	 */
+	timer_idx = (us == 0
+		     ? SGE_TIMER_RSTRT_CNTR
+		     : closest_timer(&adapter->sge, us));
+
+	/*
+	 * Update the response queue's interrupt coalescing parameters and
+	 * return success.
+	 */
+	rspq->intr_params = (QINTR_TIMER_IDX(timer_idx) |
+			     (cnt > 0 ? QINTR_CNT_EN : 0));
+	return 0;
+}
+
+/*
+ * Return a version number to identify the type of adapter.  The scheme is:
+ * - bits 0..9: chip version
+ * - bits 10..15: chip revision
+ */
+static inline unsigned int mk_adap_vers(const struct adapter *adapter)
+{
+	/*
+	 * Chip version 4, revision 0x3f (cxgb4vf).
+	 */
+	return 4 | (0x3f << 10);
+}
+
+/*
+ * Execute the specified ioctl command.
+ */
+static int cxgb4vf_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
+{
+	int ret = 0;
+
+	switch (cmd) {
+	    /*
+	     * The VF Driver doesn't have access to any of the other
+	     * common Ethernet device ioctl()'s (like reading/writing
+	     * PHY registers, etc.
+	     */
+
+	default:
+		ret = -EOPNOTSUPP;
+		break;
+	}
+	return ret;
+}
+
+/*
+ * Change the device's MTU.
+ */
+static int cxgb4vf_change_mtu(struct net_device *dev, int new_mtu)
+{
+	int ret;
+	struct port_info *pi = netdev_priv(dev);
+
+	/* accommodate SACK */
+	if (new_mtu < 81)
+		return -EINVAL;
+
+	ret = t4vf_set_rxmode(pi->adapter, pi->viid, new_mtu,
+			      -1, -1, -1, -1, true);
+	if (!ret)
+		dev->mtu = new_mtu;
+	return ret;
+}
+
+/*
+ * Change the devices MAC address.
+ */
+static int cxgb4vf_set_mac_addr(struct net_device *dev, void *_addr)
+{
+	int ret;
+	struct sockaddr *addr = _addr;
+	struct port_info *pi = netdev_priv(dev);
+
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EINVAL;
+
+	ret = t4vf_change_mac(pi->adapter, pi->viid, pi->xact_addr_filt,
+			      addr->sa_data, true);
+	if (ret < 0)
+		return ret;
+
+	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
+	pi->xact_addr_filt = ret;
+	return 0;
+}
+
+/*
+ * Return a TX Queue on which to send the specified skb.
+ */
+static u16 cxgb4vf_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+	/*
+	 * XXX For now just use the default hash but we probably want to
+	 * XXX look at other possibilities ...
+	 */
+	return skb_tx_hash(dev, skb);
+}
+
+#ifdef CONFIG_NET_POLL_CONTROLLER
+/*
+ * Poll all of our receive queues.  This is called outside of normal interrupt
+ * context.
+ */
+static void cxgb4vf_poll_controller(struct net_device *dev)
+{
+	struct port_info *pi = netdev_priv(dev);
+	struct adapter *adapter = pi->adapter;
+
+	if (adapter->flags & USING_MSIX) {
+		struct sge_eth_rxq *rxq;
+		int nqsets;
+
+		rxq = &adapter->sge.ethrxq[pi->first_qset];
+		for (nqsets = pi->nqsets; nqsets; nqsets--) {
+			t4vf_sge_intr_msix(0, &rxq->rspq);
+			rxq++;
+		}
+	} else
+		t4vf_intr_handler(adapter)(0, adapter);
+}
+#endif
+
+/*
+ * Ethtool operations.
+ * ===================
+ *
+ * Note that we don't support any ethtool operations which change the physical
+ * state of the port to which we're linked.
+ */
+
+/*
+ * Return current port link settings.
+ */
+static int cxgb4vf_get_settings(struct net_device *dev,
+				struct ethtool_cmd *cmd)
+{
+	const struct port_info *pi = netdev_priv(dev);
+
+	cmd->supported = pi->link_cfg.supported;
+	cmd->advertising = pi->link_cfg.advertising;
+	cmd->speed = netif_carrier_ok(dev) ? pi->link_cfg.speed : -1;
+	cmd->duplex = DUPLEX_FULL;
+
+	cmd->port = (cmd->supported & SUPPORTED_TP) ? PORT_TP : PORT_FIBRE;
+	cmd->phy_address = pi->port_id;
+	cmd->transceiver = XCVR_EXTERNAL;
+	cmd->autoneg = pi->link_cfg.autoneg;
+	cmd->maxtxpkt = 0;
+	cmd->maxrxpkt = 0;
+	return 0;
+}
+
+/*
+ * Return our driver information.
+ */
+static void cxgb4vf_get_drvinfo(struct net_device *dev,
+				struct ethtool_drvinfo *drvinfo)
+{
+	struct adapter *adapter = netdev2adap(dev);
+
+	strcpy(drvinfo->driver, KBUILD_MODNAME);
+	strcpy(drvinfo->version, DRV_VERSION);
+	strcpy(drvinfo->bus_info, pci_name(to_pci_dev(dev->dev.parent)));
+	snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
+		 "%u.%u.%u.%u, TP %u.%u.%u.%u",
+		 FW_HDR_FW_VER_MAJOR_GET(adapter->params.dev.fwrev),
+		 FW_HDR_FW_VER_MINOR_GET(adapter->params.dev.fwrev),
+		 FW_HDR_FW_VER_MICRO_GET(adapter->params.dev.fwrev),
+		 FW_HDR_FW_VER_BUILD_GET(adapter->params.dev.fwrev),
+		 FW_HDR_FW_VER_MAJOR_GET(adapter->params.dev.tprev),
+		 FW_HDR_FW_VER_MINOR_GET(adapter->params.dev.tprev),
+		 FW_HDR_FW_VER_MICRO_GET(adapter->params.dev.tprev),
+		 FW_HDR_FW_VER_BUILD_GET(adapter->params.dev.tprev));
+}
+
+/*
+ * Return current adapter message level.
+ */
+static u32 cxgb4vf_get_msglevel(struct net_device *dev)
+{
+	return netdev2adap(dev)->msg_enable;
+}
+
+/*
+ * Set current adapter message level.
+ */
+static void cxgb4vf_set_msglevel(struct net_device *dev, u32 msglevel)
+{
+	netdev2adap(dev)->msg_enable = msglevel;
+}
+
+/*
+ * Return the device's current Queue Set ring size parameters along with the
+ * allowed maximum values.  Since ethtool doesn't understand the concept of
+ * multi-queue devices, we just return the current values associated with the
+ * first Queue Set.
+ */
+static void cxgb4vf_get_ringparam(struct net_device *dev,
+				  struct ethtool_ringparam *rp)
+{
+	const struct port_info *pi = netdev_priv(dev);
+	const struct sge *s = &pi->adapter->sge;
+
+	rp->rx_max_pending = MAX_RX_BUFFERS;
+	rp->rx_mini_max_pending = MAX_RSPQ_ENTRIES;
+	rp->rx_jumbo_max_pending = 0;
+	rp->tx_max_pending = MAX_TXQ_ENTRIES;
+
+	rp->rx_pending = s->ethrxq[pi->first_qset].fl.size - MIN_FL_RESID;
+	rp->rx_mini_pending = s->ethrxq[pi->first_qset].rspq.size;
+	rp->rx_jumbo_pending = 0;
+	rp->tx_pending = s->ethtxq[pi->first_qset].q.size;
+}
+
+/*
+ * Set the Queue Set ring size parameters for the device.  Again, since
+ * ethtool doesn't allow for the concept of multiple queues per device, we'll
+ * apply these new values across all of the Queue Sets associated with the
+ * device -- after vetting them of course!
+ */
+static int cxgb4vf_set_ringparam(struct net_device *dev,
+				 struct ethtool_ringparam *rp)
+{
+	const struct port_info *pi = netdev_priv(dev);
+	struct adapter *adapter = pi->adapter;
+	struct sge *s = &adapter->sge;
+	int qs;
+
+	if (rp->rx_pending > MAX_RX_BUFFERS ||
+	    rp->rx_jumbo_pending ||
+	    rp->tx_pending > MAX_TXQ_ENTRIES ||
+	    rp->rx_mini_pending > MAX_RSPQ_ENTRIES ||
+	    rp->rx_mini_pending < MIN_RSPQ_ENTRIES ||
+	    rp->rx_pending < MIN_FL_ENTRIES ||
+	    rp->tx_pending < MIN_TXQ_ENTRIES)
+		return -EINVAL;
+
+	if (adapter->flags & FULL_INIT_DONE)
+		return -EBUSY;
+
+	for (qs = pi->first_qset; qs < pi->first_qset + pi->nqsets; qs++) {
+		s->ethrxq[qs].fl.size = rp->rx_pending + MIN_FL_RESID;
+		s->ethrxq[qs].rspq.size = rp->rx_mini_pending;
+		s->ethtxq[qs].q.size = rp->tx_pending;
+	}
+	return 0;
+}
+
+/*
+ * Return the interrupt holdoff timer and count for the first Queue Set on the
+ * device.  Our extension ioctl() (the cxgbtool interface) allows the
+ * interrupt holdoff timer to be read on all of the device's Queue Sets.
+ */
+static int cxgb4vf_get_coalesce(struct net_device *dev,
+				struct ethtool_coalesce *coalesce)
+{
+	const struct port_info *pi = netdev_priv(dev);
+	const struct adapter *adapter = pi->adapter;
+	const struct sge_rspq *rspq = &adapter->sge.ethrxq[pi->first_qset].rspq;
+
+	coalesce->rx_coalesce_usecs = qtimer_val(adapter, rspq);
+	coalesce->rx_max_coalesced_frames =
+		((rspq->intr_params & QINTR_CNT_EN)
+		 ? adapter->sge.counter_val[rspq->pktcnt_idx]
+		 : 0);
+	return 0;
+}
+
+/*
+ * Set the RX interrupt holdoff timer and count for the first Queue Set on the
+ * interface.  Our extension ioctl() (the cxgbtool interface) allows us to set
+ * the interrupt holdoff timer on any of the device's Queue Sets.
+ */
+static int cxgb4vf_set_coalesce(struct net_device *dev,
+				struct ethtool_coalesce *coalesce)
+{
+	const struct port_info *pi = netdev_priv(dev);
+	struct adapter *adapter = pi->adapter;
+
+	return set_rxq_intr_params(adapter,
+				   &adapter->sge.ethrxq[pi->first_qset].rspq,
+				   coalesce->rx_coalesce_usecs,
+				   coalesce->rx_max_coalesced_frames);
+}
+
+/*
+ * Report current port link pause parameter settings.
+ */
+static void cxgb4vf_get_pauseparam(struct net_device *dev,
+				   struct ethtool_pauseparam *pauseparam)
+{
+	struct port_info *pi = netdev_priv(dev);
+
+	pauseparam->autoneg = (pi->link_cfg.requested_fc & PAUSE_AUTONEG) != 0;
+	pauseparam->rx_pause = (pi->link_cfg.fc & PAUSE_RX) != 0;
+	pauseparam->tx_pause = (pi->link_cfg.fc & PAUSE_TX) != 0;
+}
+
+/*
+ * Return whether RX Checksum Offloading is currently enabled for the device.
+ */
+static u32 cxgb4vf_get_rx_csum(struct net_device *dev)
+{
+	struct port_info *pi = netdev_priv(dev);
+
+	return (pi->rx_offload & RX_CSO) != 0;
+}
+
+/*
+ * Turn RX Checksum Offloading on or off for the device.
+ */
+static int cxgb4vf_set_rx_csum(struct net_device *dev, u32 csum)
+{
+	struct port_info *pi = netdev_priv(dev);
+
+	if (csum)
+		pi->rx_offload |= RX_CSO;
+	else
+		pi->rx_offload &= ~RX_CSO;
+	return 0;
+}
+
+/*
+ * Identify the port by blinking the port's LED.
+ */
+static int cxgb4vf_phys_id(struct net_device *dev, u32 id)
+{
+	struct port_info *pi = netdev_priv(dev);
+
+	return t4vf_identify_port(pi->adapter, pi->viid, 5);
+}
+
+/*
+ * Port stats maintained per queue of the port.
+ */
+struct queue_port_stats {
+	u64 tso;
+	u64 tx_csum;
+	u64 rx_csum;
+	u64 vlan_ex;
+	u64 vlan_ins;
+};
+
+/*
+ * Strings for the ETH_SS_STATS statistics set ("ethtool -S").  Note that
+ * these need to match the order of statistics returned by
+ * t4vf_get_port_stats().
+ */
+static const char stats_strings[][ETH_GSTRING_LEN] = {
+	/*
+	 * These must match the layout of the t4vf_port_stats structure.
+	 */
+	"TxBroadcastBytes  ",
+	"TxBroadcastFrames ",
+	"TxMulticastBytes  ",
+	"TxMulticastFrames ",
+	"TxUnicastBytes    ",
+	"TxUnicastFrames   ",
+	"TxDroppedFrames   ",
+	"TxOffloadBytes    ",
+	"TxOffloadFrames   ",
+	"RxBroadcastBytes  ",
+	"RxBroadcastFrames ",
+	"RxMulticastBytes  ",
+	"RxMulticastFrames ",
+	"RxUnicastBytes    ",
+	"RxUnicastFrames   ",
+	"RxErrorFrames     ",
+
+	/*
+	 * These are accumulated per-queue statistics and must match the
+	 * order of the fields in the queue_port_stats structure.
+	 */
+	"TSO               ",
+	"TxCsumOffload     ",
+	"RxCsumGood        ",
+	"VLANextractions   ",
+	"VLANinsertions    ",
+};
+
+/*
+ * Return the number of statistics in the specified statistics set.
+ */
+static int cxgb4vf_get_sset_count(struct net_device *dev, int sset)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		return ARRAY_SIZE(stats_strings);
+	default:
+		return -EOPNOTSUPP;
+	}
+	/*NOTREACHED*/
+}
+
+/*
+ * Return the strings for the specified statistics set.
+ */
+static void cxgb4vf_get_strings(struct net_device *dev,
+				u32 sset,
+				u8 *data)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		memcpy(data, stats_strings, sizeof(stats_strings));
+		break;
+	}
+}
+
+/*
+ * Small utility routine to accumulate queue statistics across the queues of
+ * a "port".
+ */
+static void collect_sge_port_stats(const struct adapter *adapter,
+				   const struct port_info *pi,
+				   struct queue_port_stats *stats)
+{
+	const struct sge_eth_txq *txq = &adapter->sge.ethtxq[pi->first_qset];
+	const struct sge_eth_rxq *rxq = &adapter->sge.ethrxq[pi->first_qset];
+	int qs;
+
+	memset(stats, 0, sizeof(*stats));
+	for (qs = 0; qs < pi->nqsets; qs++, rxq++, txq++) {
+		stats->tso += txq->tso;
+		stats->tx_csum += txq->tx_cso;
+		stats->rx_csum += rxq->stats.rx_cso;
+		stats->vlan_ex += rxq->stats.vlan_ex;
+		stats->vlan_ins += txq->vlan_ins;
+	}
+}
+
+/*
+ * Return the ETH_SS_STATS statistics set.
+ */
+static void cxgb4vf_get_ethtool_stats(struct net_device *dev,
+				      struct ethtool_stats *stats,
+				      u64 *data)
+{
+	struct port_info *pi = netdev2pinfo(dev);
+	struct adapter *adapter = pi->adapter;
+	int err = t4vf_get_port_stats(adapter, pi->pidx,
+				      (struct t4vf_port_stats *)data);
+	if (err)
+		memset(data, 0, sizeof(struct t4vf_port_stats));
+
+	data += sizeof(struct t4vf_port_stats) / sizeof(u64);
+	collect_sge_port_stats(adapter, pi, (struct queue_port_stats *)data);
+}
+
+/*
+ * Return the size of our register map.
+ */
+static int cxgb4vf_get_regs_len(struct net_device *dev)
+{
+	return T4VF_REGMAP_SIZE;
+}
+
+/*
+ * Dump a block of registers, start to end inclusive, into a buffer.
+ */
+static void reg_block_dump(struct adapter *adapter, void *regbuf,
+			   unsigned int start, unsigned int end)
+{
+	u32 *bp = regbuf + start - T4VF_REGMAP_START;
+
+	for ( ; start <= end; start += sizeof(u32)) {
+		/*
+		 * Avoid reading the Mailbox Control register since that
+		 * can trigger a Mailbox Ownership Arbitration cycle and
+		 * interfere with communication with the firmware.
+		 */
+		if (start == T4VF_CIM_BASE_ADDR + CIM_VF_EXT_MAILBOX_CTRL)
+			*bp++ = 0xffff;
+		else
+			*bp++ = t4_read_reg(adapter, start);
+	}
+}
+
+/*
+ * Copy our entire register map into the provided buffer.
+ */
+static void cxgb4vf_get_regs(struct net_device *dev,
+			     struct ethtool_regs *regs,
+			     void *regbuf)
+{
+	struct adapter *adapter = netdev2adap(dev);
+
+	regs->version = mk_adap_vers(adapter);
+
+	/*
+	 * Fill in register buffer with our register map.
+	 */
+	memset(regbuf, 0, T4VF_REGMAP_SIZE);
+
+	reg_block_dump(adapter, regbuf,
+		       T4VF_SGE_BASE_ADDR + T4VF_MOD_MAP_SGE_FIRST,
+		       T4VF_SGE_BASE_ADDR + T4VF_MOD_MAP_SGE_LAST);
+	reg_block_dump(adapter, regbuf,
+		       T4VF_MPS_BASE_ADDR + T4VF_MOD_MAP_MPS_FIRST,
+		       T4VF_MPS_BASE_ADDR + T4VF_MOD_MAP_MPS_LAST);
+	reg_block_dump(adapter, regbuf,
+		       T4VF_PL_BASE_ADDR + T4VF_MOD_MAP_PL_FIRST,
+		       T4VF_PL_BASE_ADDR + T4VF_MOD_MAP_PL_LAST);
+	reg_block_dump(adapter, regbuf,
+		       T4VF_CIM_BASE_ADDR + T4VF_MOD_MAP_CIM_FIRST,
+		       T4VF_CIM_BASE_ADDR + T4VF_MOD_MAP_CIM_LAST);
+
+	reg_block_dump(adapter, regbuf,
+		       T4VF_MBDATA_BASE_ADDR + T4VF_MBDATA_FIRST,
+		       T4VF_MBDATA_BASE_ADDR + T4VF_MBDATA_LAST);
+}
+
+/*
+ * Report current Wake On LAN settings.
+ */
+static void cxgb4vf_get_wol(struct net_device *dev,
+			    struct ethtool_wolinfo *wol)
+{
+	wol->supported = 0;
+	wol->wolopts = 0;
+	memset(&wol->sopass, 0, sizeof(wol->sopass));
+}
+
+/*
+ * Set TCP Segmentation Offloading feature capabilities.
+ */
+static int cxgb4vf_set_tso(struct net_device *dev, u32 tso)
+{
+	if (tso)
+		dev->features |= NETIF_F_TSO | NETIF_F_TSO6;
+	else
+		dev->features &= ~(NETIF_F_TSO | NETIF_F_TSO6);
+	return 0;
+}
+
+static struct ethtool_ops cxgb4vf_ethtool_ops = {
+	.get_settings		= cxgb4vf_get_settings,
+	.get_drvinfo		= cxgb4vf_get_drvinfo,
+	.get_msglevel		= cxgb4vf_get_msglevel,
+	.set_msglevel		= cxgb4vf_set_msglevel,
+	.get_ringparam		= cxgb4vf_get_ringparam,
+	.set_ringparam		= cxgb4vf_set_ringparam,
+	.get_coalesce		= cxgb4vf_get_coalesce,
+	.set_coalesce		= cxgb4vf_set_coalesce,
+	.get_pauseparam		= cxgb4vf_get_pauseparam,
+	.get_rx_csum		= cxgb4vf_get_rx_csum,
+	.set_rx_csum		= cxgb4vf_set_rx_csum,
+	.set_tx_csum		= ethtool_op_set_tx_ipv6_csum,
+	.set_sg			= ethtool_op_set_sg,
+	.get_link		= ethtool_op_get_link,
+	.get_strings		= cxgb4vf_get_strings,
+	.phys_id		= cxgb4vf_phys_id,
+	.get_sset_count		= cxgb4vf_get_sset_count,
+	.get_ethtool_stats	= cxgb4vf_get_ethtool_stats,
+	.get_regs_len		= cxgb4vf_get_regs_len,
+	.get_regs		= cxgb4vf_get_regs,
+	.get_wol		= cxgb4vf_get_wol,
+	.set_tso		= cxgb4vf_set_tso,
+};
+
+/*
+ * /sys/kernel/debug/cxgb4vf support code and data.
+ * ================================================
+ */
+
+/*
+ * Show SGE Queue Set information.  We display QPL Queues Sets per line.
+ */
+#define QPL	4
+
+static int sge_qinfo_show(struct seq_file *seq, void *v)
+{
+	struct adapter *adapter = seq->private;
+	int eth_entries = DIV_ROUND_UP(adapter->sge.ethqsets, QPL);
+	int qs, r = (uintptr_t)v - 1;
+
+	if (r)
+		seq_putc(seq, '\n');
+
+	#define S3(fmt_spec, s, v) \
+		do {\
+			seq_printf(seq, "%-12s", s); \
+			for (qs = 0; qs < n; ++qs) \
+				seq_printf(seq, " %16" fmt_spec, v); \
+			seq_putc(seq, '\n'); \
+		} while (0)
+	#define S(s, v)		S3("s", s, v)
+	#define T(s, v)		S3("u", s, txq[qs].v)
+	#define R(s, v)		S3("u", s, rxq[qs].v)
+
+	if (r < eth_entries) {
+		const struct sge_eth_rxq *rxq = &adapter->sge.ethrxq[r * QPL];
+		const struct sge_eth_txq *txq = &adapter->sge.ethtxq[r * QPL];
+		int n = min(QPL, adapter->sge.ethqsets - QPL * r);
+
+		S("QType:", "Ethernet");
+		S("Interface:",
+		  (rxq[qs].rspq.netdev
+		   ? rxq[qs].rspq.netdev->name
+		   : "N/A"));
+		S3("d", "Port:",
+		   (rxq[qs].rspq.netdev
+		    ? ((struct port_info *)
+		       netdev_priv(rxq[qs].rspq.netdev))->port_id
+		    : -1));
+		T("TxQ ID:", q.abs_id);
+		T("TxQ size:", q.size);
+		T("TxQ inuse:", q.in_use);
+		T("TxQ PIdx:", q.pidx);
+		T("TxQ CIdx:", q.cidx);
+		R("RspQ ID:", rspq.abs_id);
+		R("RspQ size:", rspq.size);
+		R("RspQE size:", rspq.iqe_len);
+		S3("u", "Intr delay:", qtimer_val(adapter, &rxq[qs].rspq));
+		S3("u", "Intr pktcnt:",
+		   adapter->sge.counter_val[rxq[qs].rspq.pktcnt_idx]);
+		R("RspQ CIdx:", rspq.cidx);
+		R("RspQ Gen:", rspq.gen);
+		R("FL ID:", fl.abs_id);
+		R("FL size:", fl.size - MIN_FL_RESID);
+		R("FL avail:", fl.avail);
+		R("FL PIdx:", fl.pidx);
+		R("FL CIdx:", fl.cidx);
+		return 0;
+	}
+
+	r -= eth_entries;
+	if (r == 0) {
+		const struct sge_rspq *evtq = &adapter->sge.fw_evtq;
+
+		seq_printf(seq, "%-12s %16s\n", "QType:", "FW event queue");
+		seq_printf(seq, "%-12s %16u\n", "RspQ ID:", evtq->abs_id);
+		seq_printf(seq, "%-12s %16u\n", "Intr delay:",
+			   qtimer_val(adapter, evtq));
+		seq_printf(seq, "%-12s %16u\n", "Intr pktcnt:",
+			   adapter->sge.counter_val[evtq->pktcnt_idx]);
+		seq_printf(seq, "%-12s %16u\n", "RspQ Cidx:", evtq->cidx);
+		seq_printf(seq, "%-12s %16u\n", "RspQ Gen:", evtq->gen);
+	} else if (r == 1) {
+		const struct sge_rspq *intrq = &adapter->sge.intrq;
+
+		seq_printf(seq, "%-12s %16s\n", "QType:", "Interrupt Queue");
+		seq_printf(seq, "%-12s %16u\n", "RspQ ID:", intrq->abs_id);
+		seq_printf(seq, "%-12s %16u\n", "Intr delay:",
+			   qtimer_val(adapter, intrq));
+		seq_printf(seq, "%-12s %16u\n", "Intr pktcnt:",
+			   adapter->sge.counter_val[intrq->pktcnt_idx]);
+		seq_printf(seq, "%-12s %16u\n", "RspQ Cidx:", intrq->cidx);
+		seq_printf(seq, "%-12s %16u\n", "RspQ Gen:", intrq->gen);
+	}
+
+	#undef R
+	#undef T
+	#undef S
+	#undef S3
+
+	return 0;
+}
+
+/*
+ * Return the number of "entries" in our "file".  We group the multi-Queue
+ * sections with QPL Queue Sets per "entry".  The sections of the output are:
+ *
+ *     Ethernet RX/TX Queue Sets
+ *     Firmware Event Queue
+ *     Forwarded Interrupt Queue (if in MSI mode)
+ */
+static int sge_queue_entries(const struct adapter *adapter)
+{
+	return DIV_ROUND_UP(adapter->sge.ethqsets, QPL) + 1 +
+		((adapter->flags & USING_MSI) != 0);
+}
+
+static void *sge_queue_start(struct seq_file *seq, loff_t *pos)
+{
+	int entries = sge_queue_entries(seq->private);
+
+	return *pos < entries ? (void *)((uintptr_t)*pos + 1) : NULL;
+}
+
+static void sge_queue_stop(struct seq_file *seq, void *v)
+{
+}
+
+static void *sge_queue_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	int entries = sge_queue_entries(seq->private);
+
+	++*pos;
+	return *pos < entries ? (void *)((uintptr_t)*pos + 1) : NULL;
+}
+
+static const struct seq_operations sge_qinfo_seq_ops = {
+	.start = sge_queue_start,
+	.next  = sge_queue_next,
+	.stop  = sge_queue_stop,
+	.show  = sge_qinfo_show
+};
+
+static int sge_qinfo_open(struct inode *inode, struct file *file)
+{
+	int res = seq_open(file, &sge_qinfo_seq_ops);
+
+	if (!res) {
+		struct seq_file *seq = file->private_data;
+		seq->private = inode->i_private;
+	}
+	return res;
+}
+
+static const struct file_operations sge_qinfo_debugfs_fops = {
+	.owner   = THIS_MODULE,
+	.open    = sge_qinfo_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = seq_release,
+};
+
+/*
+ * Show SGE Queue Set statistics.  We display QPL Queues Sets per line.
+ */
+#define QPL	4
+
+static int sge_qstats_show(struct seq_file *seq, void *v)
+{
+	struct adapter *adapter = seq->private;
+	int eth_entries = DIV_ROUND_UP(adapter->sge.ethqsets, QPL);
+	int qs, r = (uintptr_t)v - 1;
+
+	if (r)
+		seq_putc(seq, '\n');
+
+	#define S3(fmt, s, v) \
+		do { \
+			seq_printf(seq, "%-16s", s); \
+			for (qs = 0; qs < n; ++qs) \
+				seq_printf(seq, " %8" fmt, v); \
+			seq_putc(seq, '\n'); \
+		} while (0)
+	#define S(s, v)		S3("s", s, v)
+
+	#define T3(fmt, s, v)	S3(fmt, s, txq[qs].v)
+	#define T(s, v)		T3("lu", s, v)
+
+	#define R3(fmt, s, v)	S3(fmt, s, rxq[qs].v)
+	#define R(s, v)		R3("lu", s, v)
+
+	if (r < eth_entries) {
+		const struct sge_eth_rxq *rxq = &adapter->sge.ethrxq[r * QPL];
+		const struct sge_eth_txq *txq = &adapter->sge.ethtxq[r * QPL];
+		int n = min(QPL, adapter->sge.ethqsets - QPL * r);
+
+		S("QType:", "Ethernet");
+		S("Interface:",
+		  (rxq[qs].rspq.netdev
+		   ? rxq[qs].rspq.netdev->name
+		   : "N/A"));
+		R3("u", "RspQNullInts", rspq.unhandled_irqs);
+		R("RxPackets:", stats.pkts);
+		R("RxCSO:", stats.rx_cso);
+		R("VLANxtract:", stats.vlan_ex);
+		R("LROmerged:", stats.lro_merged);
+		R("LROpackets:", stats.lro_pkts);
+		R("RxDrops:", stats.rx_drops);
+		T("TSO:", tso);
+		T("TxCSO:", tx_cso);
+		T("VLANins:", vlan_ins);
+		T("TxQFull:", q.stops);
+		T("TxQRestarts:", q.restarts);
+		T("TxMapErr:", mapping_err);
+		R("FLAllocErr:", fl.alloc_failed);
+		R("FLLrgAlcErr:", fl.large_alloc_failed);
+		R("FLStarving:", fl.starving);
+		return 0;
+	}
+
+	r -= eth_entries;
+	if (r == 0) {
+		const struct sge_rspq *evtq = &adapter->sge.fw_evtq;
+
+		seq_printf(seq, "%-8s %16s\n", "QType:", "FW event queue");
+		/* no real response queue statistics available to display */
+		seq_printf(seq, "%-16s %8u\n", "RspQ CIdx:", evtq->cidx);
+		seq_printf(seq, "%-16s %8u\n", "RspQ Gen:", evtq->gen);
+	} else if (r == 1) {
+		const struct sge_rspq *intrq = &adapter->sge.intrq;
+
+		seq_printf(seq, "%-8s %16s\n", "QType:", "Interrupt Queue");
+		/* no real response queue statistics available to display */
+		seq_printf(seq, "%-16s %8u\n", "RspQ CIdx:", intrq->cidx);
+		seq_printf(seq, "%-16s %8u\n", "RspQ Gen:", intrq->gen);
+	}
+
+	#undef R
+	#undef T
+	#undef S
+	#undef R3
+	#undef T3
+	#undef S3
+
+	return 0;
+}
+
+/*
+ * Return the number of "entries" in our "file".  We group the multi-Queue
+ * sections with QPL Queue Sets per "entry".  The sections of the output are:
+ *
+ *     Ethernet RX/TX Queue Sets
+ *     Firmware Event Queue
+ *     Forwarded Interrupt Queue (if in MSI mode)
+ */
+static int sge_qstats_entries(const struct adapter *adapter)
+{
+	return DIV_ROUND_UP(adapter->sge.ethqsets, QPL) + 1 +
+		((adapter->flags & USING_MSI) != 0);
+}
+
+static void *sge_qstats_start(struct seq_file *seq, loff_t *pos)
+{
+	int entries = sge_qstats_entries(seq->private);
+
+	return *pos < entries ? (void *)((uintptr_t)*pos + 1) : NULL;
+}
+
+static void sge_qstats_stop(struct seq_file *seq, void *v)
+{
+}
+
+static void *sge_qstats_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	int entries = sge_qstats_entries(seq->private);
+
+	(*pos)++;
+	return *pos < entries ? (void *)((uintptr_t)*pos + 1) : NULL;
+}
+
+static const struct seq_operations sge_qstats_seq_ops = {
+	.start = sge_qstats_start,
+	.next  = sge_qstats_next,
+	.stop  = sge_qstats_stop,
+	.show  = sge_qstats_show
+};
+
+static int sge_qstats_open(struct inode *inode, struct file *file)
+{
+	int res = seq_open(file, &sge_qstats_seq_ops);
+
+	if (res == 0) {
+		struct seq_file *seq = file->private_data;
+		seq->private = inode->i_private;
+	}
+	return res;
+}
+
+static const struct file_operations sge_qstats_proc_fops = {
+	.owner   = THIS_MODULE,
+	.open    = sge_qstats_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = seq_release,
+};
+
+/*
+ * Show PCI-E SR-IOV Virtual Function Resource Limits.
+ */
+static int resources_show(struct seq_file *seq, void *v)
+{
+	struct adapter *adapter = seq->private;
+	struct vf_resources *vfres = &adapter->params.vfres;
+
+	#define S(desc, fmt, var) \
+		seq_printf(seq, "%-60s " fmt "\n", \
+			   desc " (" #var "):", vfres->var)
+
+	S("Virtual Interfaces", "%d", nvi);
+	S("Egress Queues", "%d", neq);
+	S("Ethernet Control", "%d", nethctrl);
+	S("Ingress Queues/w Free Lists/Interrupts", "%d", niqflint);
+	S("Ingress Queues", "%d", niq);
+	S("Traffic Class", "%d", tc);
+	S("Port Access Rights Mask", "%#x", pmask);
+	S("MAC Address Filters", "%d", nexactf);
+	S("Firmware Command Read Capabilities", "%#x", r_caps);
+	S("Firmware Command Write/Execute Capabilities", "%#x", wx_caps);
+
+	#undef S
+
+	return 0;
+}
+
+static int resources_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, resources_show, inode->i_private);
+}
+
+static const struct file_operations resources_proc_fops = {
+	.owner   = THIS_MODULE,
+	.open    = resources_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = single_release,
+};
+
+/*
+ * Show Virtual Interfaces.
+ */
+static int interfaces_show(struct seq_file *seq, void *v)
+{
+	if (v == SEQ_START_TOKEN) {
+		seq_puts(seq, "Interface  Port   VIID\n");
+	} else {
+		struct adapter *adapter = seq->private;
+		int pidx = (uintptr_t)v - 2;
+		struct net_device *dev = adapter->port[pidx];
+		struct port_info *pi = netdev_priv(dev);
+
+		seq_printf(seq, "%9s  %4d  %#5x\n",
+			   dev->name, pi->port_id, pi->viid);
+	}
+	return 0;
+}
+
+static inline void *interfaces_get_idx(struct adapter *adapter, loff_t pos)
+{
+	return pos <= adapter->params.nports
+		? (void *)(uintptr_t)(pos + 1)
+		: NULL;
+}
+
+static void *interfaces_start(struct seq_file *seq, loff_t *pos)
+{
+	return *pos
+		? interfaces_get_idx(seq->private, *pos)
+		: SEQ_START_TOKEN;
+}
+
+static void *interfaces_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	(*pos)++;
+	return interfaces_get_idx(seq->private, *pos);
+}
+
+static void interfaces_stop(struct seq_file *seq, void *v)
+{
+}
+
+static const struct seq_operations interfaces_seq_ops = {
+	.start = interfaces_start,
+	.next  = interfaces_next,
+	.stop  = interfaces_stop,
+	.show  = interfaces_show
+};
+
+static int interfaces_open(struct inode *inode, struct file *file)
+{
+	int res = seq_open(file, &interfaces_seq_ops);
+
+	if (res == 0) {
+		struct seq_file *seq = file->private_data;
+		seq->private = inode->i_private;
+	}
+	return res;
+}
+
+static const struct file_operations interfaces_proc_fops = {
+	.owner   = THIS_MODULE,
+	.open    = interfaces_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = seq_release,
+};
+
+/*
+ * /sys/kernel/debugfs/cxgb4vf/ files list.
+ */
+struct cxgb4vf_debugfs_entry {
+	const char *name;		/* name of debugfs node */
+	mode_t mode;			/* file system mode */
+	const struct file_operations *fops;
+};
+
+static struct cxgb4vf_debugfs_entry debugfs_files[] = {
+	{ "sge_qinfo",  S_IRUGO, &sge_qinfo_debugfs_fops },
+	{ "sge_qstats", S_IRUGO, &sge_qstats_proc_fops },
+	{ "resources",  S_IRUGO, &resources_proc_fops },
+	{ "interfaces", S_IRUGO, &interfaces_proc_fops },
+};
+
+/*
+ * Module and device initialization and cleanup code.
+ * ==================================================
+ */
+
+/*
+ * Set up out /sys/kernel/debug/cxgb4vf sub-nodes.  We assume that the
+ * directory (debugfs_root) has already been set up.
+ */
+static int __devinit setup_debugfs(struct adapter *adapter)
+{
+	int i;
+
+	BUG_ON(adapter->debugfs_root == NULL);
+
+	/*
+	 * Debugfs support is best effort.
+	 */
+	for (i = 0; i < ARRAY_SIZE(debugfs_files); i++)
+		(void)debugfs_create_file(debugfs_files[i].name,
+				  debugfs_files[i].mode,
+				  adapter->debugfs_root,
+				  (void *)adapter,
+				  debugfs_files[i].fops);
+
+	return 0;
+}
+
+/*
+ * Tear down the /sys/kernel/debug/cxgb4vf sub-nodes created above.  We leave
+ * it to our caller to tear down the directory (debugfs_root).
+ */
+static void __devexit cleanup_debugfs(struct adapter *adapter)
+{
+	BUG_ON(adapter->debugfs_root == NULL);
+
+	/*
+	 * Unlike our sister routine cleanup_proc(), we don't need to remove
+	 * individual entries because a call will be made to
+	 * debugfs_remove_recursive().  We just need to clean up any ancillary
+	 * persistent state.
+	 */
+	/* nothing to do */
+}
+
+/*
+ * Perform early "adapter" initialization.  This is where we discover what
+ * adapter parameters we're going to be using and initialize basic adapter
+ * hardware support.
+ */
+static int adap_init0(struct adapter *adapter)
+{
+	struct vf_resources *vfres = &adapter->params.vfres;
+	struct sge_params *sge_params = &adapter->params.sge;
+	struct sge *s = &adapter->sge;
+	unsigned int ethqsets;
+	int err;
+
+	/*
+	 * Wait for the device to become ready before proceeding ...
+	 */
+	err = t4vf_wait_dev_ready(adapter);
+	if (err) {
+		dev_err(adapter->pdev_dev, "device didn't become ready:"
+			" err=%d\n", err);
+		return err;
+	}
+
+	/*
+	 * Grab basic operational parameters.  These will predominantly have
+	 * been set up by the Physical Function Driver or will be hard coded
+	 * into the adapter.  We just have to live with them ...  Note that
+	 * we _must_ get our VPD parameters before our SGE parameters because
+	 * we need to know the adapter's core clock from the VPD in order to
+	 * properly decode the SGE Timer Values.
+	 */
+	err = t4vf_get_dev_params(adapter);
+	if (err) {
+		dev_err(adapter->pdev_dev, "unable to retrieve adapter"
+			" device parameters: err=%d\n", err);
+		return err;
+	}
+	err = t4vf_get_vpd_params(adapter);
+	if (err) {
+		dev_err(adapter->pdev_dev, "unable to retrieve adapter"
+			" VPD parameters: err=%d\n", err);
+		return err;
+	}
+	err = t4vf_get_sge_params(adapter);
+	if (err) {
+		dev_err(adapter->pdev_dev, "unable to retrieve adapter"
+			" SGE parameters: err=%d\n", err);
+		return err;
+	}
+	err = t4vf_get_rss_glb_config(adapter);
+	if (err) {
+		dev_err(adapter->pdev_dev, "unable to retrieve adapter"
+			" RSS parameters: err=%d\n", err);
+		return err;
+	}
+	if (adapter->params.rss.mode !=
+	    FW_RSS_GLB_CONFIG_CMD_MODE_BASICVIRTUAL) {
+		dev_err(adapter->pdev_dev, "unable to operate with global RSS"
+			" mode %d\n", adapter->params.rss.mode);
+		return -EINVAL;
+	}
+	err = t4vf_sge_init(adapter);
+	if (err) {
+		dev_err(adapter->pdev_dev, "unable to use adapter parameters:"
+			" err=%d\n", err);
+		return err;
+	}
+
+	/*
+	 * Retrieve our RX interrupt holdoff timer values and counter
+	 * threshold values from the SGE parameters.
+	 */
+	s->timer_val[0] = core_ticks_to_us(adapter,
+		TIMERVALUE0_GET(sge_params->sge_timer_value_0_and_1));
+	s->timer_val[1] = core_ticks_to_us(adapter,
+		TIMERVALUE1_GET(sge_params->sge_timer_value_0_and_1));
+	s->timer_val[2] = core_ticks_to_us(adapter,
+		TIMERVALUE0_GET(sge_params->sge_timer_value_2_and_3));
+	s->timer_val[3] = core_ticks_to_us(adapter,
+		TIMERVALUE1_GET(sge_params->sge_timer_value_2_and_3));
+	s->timer_val[4] = core_ticks_to_us(adapter,
+		TIMERVALUE0_GET(sge_params->sge_timer_value_4_and_5));
+	s->timer_val[5] = core_ticks_to_us(adapter,
+		TIMERVALUE1_GET(sge_params->sge_timer_value_4_and_5));
+
+	s->counter_val[0] =
+		THRESHOLD_0_GET(sge_params->sge_ingress_rx_threshold);
+	s->counter_val[1] =
+		THRESHOLD_1_GET(sge_params->sge_ingress_rx_threshold);
+	s->counter_val[2] =
+		THRESHOLD_2_GET(sge_params->sge_ingress_rx_threshold);
+	s->counter_val[3] =
+		THRESHOLD_3_GET(sge_params->sge_ingress_rx_threshold);
+
+	/*
+	 * Grab our Virtual Interface resource allocation, extract the
+	 * features that we're interested in and do a bit of sanity testing on
+	 * what we discover.
+	 */
+	err = t4vf_get_vfres(adapter);
+	if (err) {
+		dev_err(adapter->pdev_dev, "unable to get virtual interface"
+			" resources: err=%d\n", err);
+		return err;
+	}
+
+	/*
+	 * The number of "ports" which we support is equal to the number of
+	 * Virtual Interfaces with which we've been provisioned.
+	 */
+	adapter->params.nports = vfres->nvi;
+	if (adapter->params.nports > MAX_NPORTS) {
+		dev_warn(adapter->pdev_dev, "only using %d of %d allowed"
+			 " virtual interfaces\n", MAX_NPORTS,
+			 adapter->params.nports);
+		adapter->params.nports = MAX_NPORTS;
+	}
+
+	/*
+	 * We need to reserve a number of the ingress queues with Free List
+	 * and Interrupt capabilities for special interrupt purposes (like
+	 * asynchronous firmware messages, or forwarded interrupts if we're
+	 * using MSI).  The rest of the FL/Intr-capable ingress queues will be
+	 * matched up one-for-one with Ethernet/Control egress queues in order
+	 * to form "Queue Sets" which will be aportioned between the "ports".
+	 * For each Queue Set, we'll need the ability to allocate two Egress
+	 * Contexts -- one for the Ingress Queue Free List and one for the TX
+	 * Ethernet Queue.
+	 */
+	ethqsets = vfres->niqflint - INGQ_EXTRAS;
+	if (vfres->nethctrl != ethqsets) {
+		dev_warn(adapter->pdev_dev, "unequal number of [available]"
+			 " ingress/egress queues (%d/%d); using minimum for"
+			 " number of Queue Sets\n", ethqsets, vfres->nethctrl);
+		ethqsets = min(vfres->nethctrl, ethqsets);
+	}
+	if (vfres->neq < ethqsets*2) {
+		dev_warn(adapter->pdev_dev, "Not enough Egress Contexts (%d)"
+			 " to support Queue Sets (%d); reducing allowed Queue"
+			 " Sets\n", vfres->neq, ethqsets);
+		ethqsets = vfres->neq/2;
+	}
+	if (ethqsets > MAX_ETH_QSETS) {
+		dev_warn(adapter->pdev_dev, "only using %d of %d allowed Queue"
+			 " Sets\n", MAX_ETH_QSETS, adapter->sge.max_ethqsets);
+		ethqsets = MAX_ETH_QSETS;
+	}
+	if (vfres->niq != 0 || vfres->neq > ethqsets*2) {
+		dev_warn(adapter->pdev_dev, "unused resources niq/neq (%d/%d)"
+			 " ignored\n", vfres->niq, vfres->neq - ethqsets*2);
+	}
+	adapter->sge.max_ethqsets = ethqsets;
+
+	/*
+	 * Check for various parameter sanity issues.  Most checks simply
+	 * result in us using fewer resources than our provissioning but we
+	 * do need at least  one "port" with which to work ...
+	 */
+	if (adapter->sge.max_ethqsets < adapter->params.nports) {
+		dev_warn(adapter->pdev_dev, "only using %d of %d available"
+			 " virtual interfaces (too few Queue Sets)\n",
+			 adapter->sge.max_ethqsets, adapter->params.nports);
+		adapter->params.nports = adapter->sge.max_ethqsets;
+	}
+	if (adapter->params.nports == 0) {
+		dev_err(adapter->pdev_dev, "no virtual interfaces configured/"
+			"usable!\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static inline void init_rspq(struct sge_rspq *rspq, u8 timer_idx,
+			     u8 pkt_cnt_idx, unsigned int size,
+			     unsigned int iqe_size)
+{
+	rspq->intr_params = (QINTR_TIMER_IDX(timer_idx) |
+			     (pkt_cnt_idx < SGE_NCOUNTERS ? QINTR_CNT_EN : 0));
+	rspq->pktcnt_idx = (pkt_cnt_idx < SGE_NCOUNTERS
+			    ? pkt_cnt_idx
+			    : 0);
+	rspq->iqe_len = iqe_size;
+	rspq->size = size;
+}
+
+/*
+ * Perform default configuration of DMA queues depending on the number and
+ * type of ports we found and the number of available CPUs.  Most settings can
+ * be modified by the admin via ethtool and cxgbtool prior to the adapter
+ * being brought up for the first time.
+ */
+static void __devinit cfg_queues(struct adapter *adapter)
+{
+	struct sge *s = &adapter->sge;
+	int q10g, n10g, qidx, pidx, qs;
+
+	/*
+	 * We should not be called till we know how many Queue Sets we can
+	 * support.  In particular, this means that we need to know what kind
+	 * of interrupts we'll be using ...
+	 */
+	BUG_ON((adapter->flags & (USING_MSIX|USING_MSI)) == 0);
+
+	/*
+	 * Count the number of 10GbE Virtual Interfaces that we have.
+	 */
+	n10g = 0;
+	for_each_port(adapter, pidx)
+		n10g += is_10g_port(&adap2pinfo(adapter, pidx)->link_cfg);
+
+	/*
+	 * We default to 1 queue per non-10G port and up to # of cores queues
+	 * per 10G port.
+	 */
+	if (n10g == 0)
+		q10g = 0;
+	else {
+		int n1g = (adapter->params.nports - n10g);
+		q10g = (adapter->sge.max_ethqsets - n1g) / n10g;
+		if (q10g > num_online_cpus())
+			q10g = num_online_cpus();
+	}
+
+	/*
+	 * Allocate the "Queue Sets" to the various Virtual Interfaces.
+	 * The layout will be established in setup_sge_queues() when the
+	 * adapter is brough up for the first time.
+	 */
+	qidx = 0;
+	for_each_port(adapter, pidx) {
+		struct port_info *pi = adap2pinfo(adapter, pidx);
+
+		pi->first_qset = qidx;
+		pi->nqsets = is_10g_port(&pi->link_cfg) ? q10g : 1;
+		qidx += pi->nqsets;
+	}
+	s->ethqsets = qidx;
+
+	/*
+	 * Set up default Queue Set parameters ...  Start off with the
+	 * shortest interrupt holdoff timer.
+	 */
+	for (qs = 0; qs < s->max_ethqsets; qs++) {
+		struct sge_eth_rxq *rxq = &s->ethrxq[qs];
+		struct sge_eth_txq *txq = &s->ethtxq[qs];
+
+		init_rspq(&rxq->rspq, 0, 0, 1024, L1_CACHE_BYTES);
+		rxq->fl.size = 72;
+		txq->q.size = 1024;
+	}
+
+	/*
+	 * The firmware event queue is used for link state changes and
+	 * notifications of TX DMA completions.
+	 */
+	init_rspq(&s->fw_evtq, SGE_TIMER_RSTRT_CNTR, 0, 512,
+		  L1_CACHE_BYTES);
+
+	/*
+	 * The forwarded interrupt queue is used when we're in MSI interrupt
+	 * mode.  In this mode all interrupts associated with RX queues will
+	 * be forwarded to a single queue which we'll associate with our MSI
+	 * interrupt vector.  The messages dropped in the forwarded interrupt
+	 * queue will indicate which ingress queue needs servicing ...  This
+	 * queue needs to be large enough to accommodate all of the ingress
+	 * queues which are forwarding their interrupt (+1 to prevent the PIDX
+	 * from equalling the CIDX if every ingress queue has an outstanding
+	 * interrupt).  The queue doesn't need to be any larger because no
+	 * ingress queue will ever have more than one outstanding interrupt at
+	 * any time ...
+	 */
+	init_rspq(&s->intrq, SGE_TIMER_RSTRT_CNTR, 0, MSIX_ENTRIES + 1,
+		  L1_CACHE_BYTES);
+}
+
+/*
+ * Reduce the number of Ethernet queues across all ports to at most n.
+ * n provides at least one queue per port.
+ */
+static void __devinit reduce_ethqs(struct adapter *adapter, int n)
+{
+	int i;
+	struct port_info *pi;
+
+	/*
+	 * While we have too many active Ether Queue Sets, interate across the
+	 * "ports" and reduce their individual Queue Set allocations.
+	 */
+	BUG_ON(n < adapter->params.nports);
+	while (n < adapter->sge.ethqsets)
+		for_each_port(adapter, i) {
+			pi = adap2pinfo(adapter, i);
+			if (pi->nqsets > 1) {
+				pi->nqsets--;
+				adapter->sge.ethqsets--;
+				if (adapter->sge.ethqsets <= n)
+					break;
+			}
+		}
+
+	/*
+	 * Reassign the starting Queue Sets for each of the "ports" ...
+	 */
+	n = 0;
+	for_each_port(adapter, i) {
+		pi = adap2pinfo(adapter, i);
+		pi->first_qset = n;
+		n += pi->nqsets;
+	}
+}
+
+/*
+ * We need to grab enough MSI-X vectors to cover our interrupt needs.  Ideally
+ * we get a separate MSI-X vector for every "Queue Set" plus any extras we
+ * need.  Minimally we need one for every Virtual Interface plus those needed
+ * for our "extras".  Note that this process may lower the maximum number of
+ * allowed Queue Sets ...
+ */
+static int __devinit enable_msix(struct adapter *adapter)
+{
+	int i, err, want, need;
+	struct msix_entry entries[MSIX_ENTRIES];
+	struct sge *s = &adapter->sge;
+
+	for (i = 0; i < MSIX_ENTRIES; ++i)
+		entries[i].entry = i;
+
+	/*
+	 * We _want_ enough MSI-X interrupts to cover all of our "Queue Sets"
+	 * plus those needed for our "extras" (for example, the firmware
+	 * message queue).  We _need_ at least one "Queue Set" per Virtual
+	 * Interface plus those needed for our "extras".  So now we get to see
+	 * if the song is right ...
+	 */
+	want = s->max_ethqsets + MSIX_EXTRAS;
+	need = adapter->params.nports + MSIX_EXTRAS;
+	while ((err = pci_enable_msix(adapter->pdev, entries, want)) >= need)
+		want = err;
+
+	if (err == 0) {
+		int nqsets = want - MSIX_EXTRAS;
+		if (nqsets < s->max_ethqsets) {
+			dev_warn(adapter->pdev_dev, "only enough MSI-X vectors"
+				 " for %d Queue Sets\n", nqsets);
+			s->max_ethqsets = nqsets;
+			if (nqsets < s->ethqsets)
+				reduce_ethqs(adapter, nqsets);
+		}
+		for (i = 0; i < want; ++i)
+			adapter->msix_info[i].vec = entries[i].vector;
+	} else if (err > 0) {
+		pci_disable_msix(adapter->pdev);
+		dev_info(adapter->pdev_dev, "only %d MSI-X vectors left,"
+			 " not using MSI-X\n", err);
+	}
+	return err;
+}
+
+#ifdef HAVE_NET_DEVICE_OPS
+static const struct net_device_ops cxgb4vf_netdev_ops	= {
+	.ndo_open		= cxgb4vf_open,
+	.ndo_stop		= cxgb4vf_stop,
+	.ndo_start_xmit		= t4vf_eth_xmit,
+	.ndo_get_stats		= cxgb4vf_get_stats,
+	.ndo_set_rx_mode	= cxgb4vf_set_rxmode,
+	.ndo_set_mac_address	= cxgb4vf_set_mac_addr,
+	.ndo_select_queue	= cxgb4vf_select_queue,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_do_ioctl		= cxgb4vf_do_ioctl,
+	.ndo_change_mtu		= cxgb4vf_change_mtu,
+	.ndo_vlan_rx_register	= cxgb4vf_vlan_rx_register,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	.ndo_poll_controller	= cxgb4vf_poll_controller,
+#endif
+};
+#endif
+
+/*
+ * "Probe" a device: initialize a device and construct all kernel and driver
+ * state needed to manage the device.  This routine is called "init_one" in
+ * the PF Driver ...
+ */
+static int __devinit cxgb4vf_pci_probe(struct pci_dev *pdev,
+				       const struct pci_device_id *ent)
+{
+	static int version_printed;
+
+	int pci_using_dac;
+	int err, pidx;
+	unsigned int pmask;
+	struct adapter *adapter;
+	struct port_info *pi;
+	struct net_device *netdev;
+
+	/*
+	 * Vet our module parameters.
+	 */
+	if (msi != MSI_MSIX && msi != MSI_MSI) {
+		dev_err(&pdev->dev, "bad module parameter msi=%d; must be %d"
+			" (MSI-X or MSI) or %d (MSI)\n", msi, MSI_MSIX,
+			MSI_MSI);
+		err = -EINVAL;
+		goto err_out;
+	}
+
+	/*
+	 * Print our driver banner the first time we're called to initialize a
+	 * device.
+	 */
+	if (version_printed == 0) {
+		printk(KERN_INFO "%s - version %s\n", DRV_DESC, DRV_VERSION);
+		version_printed = 1;
+	}
+
+	/*
+	 * Reserve PCI resources for the device.  If we can't get them some
+	 * other driver may have already claimed the device ...
+	 */
+	err = pci_request_regions(pdev, KBUILD_MODNAME);
+	if (err) {
+		dev_err(&pdev->dev, "cannot obtain PCI resources\n");
+		return err;
+	}
+
+	/*
+	 * Initialize generic PCI device state.
+	 */
+	err = pci_enable_device(pdev);
+	if (err) {
+		dev_err(&pdev->dev, "cannot enable PCI device\n");
+		goto err_release_regions;
+	}
+
+	/*
+	 * Set up our DMA mask: try for 64-bit address masking first and
+	 * fall back to 32-bit if we can't get 64 bits ...
+	 */
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (err == 0) {
+		err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+		if (err) {
+			dev_err(&pdev->dev, "unable to obtain 64-bit DMA for"
+				" coherent allocations\n");
+			goto err_disable_device;
+		}
+		pci_using_dac = 1;
+	} else {
+		err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+		if (err != 0) {
+			dev_err(&pdev->dev, "no usable DMA configuration\n");
+			goto err_disable_device;
+		}
+		pci_using_dac = 0;
+	}
+
+	/*
+	 * Enable bus mastering for the device ...
+	 */
+	pci_set_master(pdev);
+
+	/*
+	 * Allocate our adapter data structure and attach it to the device.
+	 */
+	adapter = kzalloc(sizeof(*adapter), GFP_KERNEL);
+	if (!adapter) {
+		err = -ENOMEM;
+		goto err_disable_device;
+	}
+	pci_set_drvdata(pdev, adapter);
+	adapter->pdev = pdev;
+	adapter->pdev_dev = &pdev->dev;
+
+	/*
+	 * Initialize SMP data synchronization resources.
+	 */
+	spin_lock_init(&adapter->stats_lock);
+
+	/*
+	 * Map our I/O registers in BAR0.
+	 */
+	adapter->regs = pci_ioremap_bar(pdev, 0);
+	if (!adapter->regs) {
+		dev_err(&pdev->dev, "cannot map device registers\n");
+		err = -ENOMEM;
+		goto err_free_adapter;
+	}
+
+	/*
+	 * Initialize adapter level features.
+	 */
+	adapter->name = pci_name(pdev);
+	adapter->msg_enable = dflt_msg_enable;
+	err = adap_init0(adapter);
+	if (err)
+		goto err_unmap_bar;
+
+	/*
+	 * Allocate our "adapter ports" and stitch everything together.
+	 */
+	pmask = adapter->params.vfres.pmask;
+	for_each_port(adapter, pidx) {
+		int port_id, viid;
+
+		/*
+		 * We simplistically allocate our virtual interfaces
+		 * sequentially across the port numbers to which we have
+		 * access rights.  This should be configurable in some manner
+		 * ...
+		 */
+		if (pmask == 0)
+			break;
+		port_id = ffs(pmask) - 1;
+		pmask &= ~(1 << port_id);
+		viid = t4vf_alloc_vi(adapter, port_id);
+		if (viid < 0) {
+			dev_err(&pdev->dev, "cannot allocate VI for port %d:"
+				" err=%d\n", port_id, viid);
+			err = viid;
+			goto err_free_dev;
+		}
+
+		/*
+		 * Allocate our network device and stitch things together.
+		 */
+		netdev = alloc_etherdev_mq(sizeof(struct port_info),
+					   MAX_PORT_QSETS);
+		if (netdev == NULL) {
+			dev_err(&pdev->dev, "cannot allocate netdev for"
+				" port %d\n", port_id);
+			t4vf_free_vi(adapter, viid);
+			err = -ENOMEM;
+			goto err_free_dev;
+		}
+		adapter->port[pidx] = netdev;
+		SET_NETDEV_DEV(netdev, &pdev->dev);
+		pi = netdev_priv(netdev);
+		pi->adapter = adapter;
+		pi->pidx = pidx;
+		pi->port_id = port_id;
+		pi->viid = viid;
+
+		/*
+		 * Initialize the starting state of our "port" and register
+		 * it.
+		 */
+		pi->xact_addr_filt = -1;
+		pi->rx_offload = RX_CSO;
+		netif_carrier_off(netdev);
+		netif_tx_stop_all_queues(netdev);
+		netdev->irq = pdev->irq;
+
+		netdev->features = (NETIF_F_SG | NETIF_F_TSO | NETIF_F_TSO6 |
+				    NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
+				    NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX |
+				    NETIF_F_GRO);
+		if (pci_using_dac)
+			netdev->features |= NETIF_F_HIGHDMA;
+		netdev->vlan_features =
+			(netdev->features &
+			 ~(NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX));
+
+#ifdef HAVE_NET_DEVICE_OPS
+		netdev->netdev_ops = &cxgb4vf_netdev_ops;
+#else
+		netdev->vlan_rx_register = cxgb4vf_vlan_rx_register;
+		netdev->open = cxgb4vf_open;
+		netdev->stop = cxgb4vf_stop;
+		netdev->hard_start_xmit = t4vf_eth_xmit;
+		netdev->get_stats = cxgb4vf_get_stats;
+		netdev->set_rx_mode = cxgb4vf_set_rxmode;
+		netdev->do_ioctl = cxgb4vf_do_ioctl;
+		netdev->change_mtu = cxgb4vf_change_mtu;
+		netdev->set_mac_address = cxgb4vf_set_mac_addr;
+		netdev->select_queue = cxgb4vf_select_queue;
+#ifdef CONFIG_NET_POLL_CONTROLLER
+		netdev->poll_controller = cxgb4vf_poll_controller;
+#endif
+#endif
+		SET_ETHTOOL_OPS(netdev, &cxgb4vf_ethtool_ops);
+
+		/*
+		 * Initialize the hardware/software state for the port.
+		 */
+		err = t4vf_port_init(adapter, pidx);
+		if (err) {
+			dev_err(&pdev->dev, "cannot initialize port %d\n",
+				pidx);
+			goto err_free_dev;
+		}
+	}
+
+	/*
+	 * The "card" is now ready to go.  If any errors occur during device
+	 * registration we do not fail the whole "card" but rather proceed
+	 * only with the ports we manage to register successfully.  However we
+	 * must register at least one net device.
+	 */
+	for_each_port(adapter, pidx) {
+		netdev = adapter->port[pidx];
+		if (netdev == NULL)
+			continue;
+
+		err = register_netdev(netdev);
+		if (err) {
+			dev_warn(&pdev->dev, "cannot register net device %s,"
+				 " skipping\n", netdev->name);
+			continue;
+		}
+
+		set_bit(pidx, &adapter->registered_device_map);
+	}
+	if (adapter->registered_device_map == 0) {
+		dev_err(&pdev->dev, "could not register any net devices\n");
+		goto err_free_dev;
+	}
+
+	/*
+	 * Set up our debugfs entries.
+	 */
+	if (cxgb4vf_debugfs_root) {
+		adapter->debugfs_root =
+			debugfs_create_dir(pci_name(pdev),
+					   cxgb4vf_debugfs_root);
+		if (adapter->debugfs_root == NULL)
+			dev_warn(&pdev->dev, "could not create debugfs"
+				 " directory");
+		else
+			setup_debugfs(adapter);
+	}
+
+	/*
+	 * See what interrupts we'll be using.  If we've been configured to
+	 * use MSI-X interrupts, try to enable them but fall back to using
+	 * MSI interrupts if we can't enable MSI-X interrupts.  If we can't
+	 * get MSI interrupts we bail with the error.
+	 */
+	if (msi == MSI_MSIX && enable_msix(adapter) == 0)
+		adapter->flags |= USING_MSIX;
+	else {
+		err = pci_enable_msi(pdev);
+		if (err) {
+			dev_err(&pdev->dev, "Unable to allocate %s interrupts;"
+				" err=%d\n",
+				msi == MSI_MSIX ? "MSI-X or MSI" : "MSI", err);
+			goto err_free_debugfs;
+		}
+		adapter->flags |= USING_MSI;
+	}
+
+	/*
+	 * Now that we know how many "ports" we have and what their types are,
+	 * and how many Queue Sets we can support, we can configure our queue
+	 * resources.
+	 */
+	cfg_queues(adapter);
+
+	/*
+	 * Print a short notice on the existance and configuration of the new
+	 * VF network device ...
+	 */
+	for_each_port(adapter, pidx) {
+		dev_info(adapter->pdev_dev, "%s: Chelsio VF NIC PCIe %s\n",
+			 adapter->port[pidx]->name,
+			 (adapter->flags & USING_MSIX) ? "MSI-X" :
+			 (adapter->flags & USING_MSI)  ? "MSI" : "");
+	}
+
+	/*
+	 * Return success!
+	 */
+	return 0;
+
+	/*
+	 * Error recovery and exit code.  Unwind state that's been created
+	 * so far and return the error.
+	 */
+
+err_free_debugfs:
+	if (adapter->debugfs_root) {
+		cleanup_debugfs(adapter);
+		debugfs_remove_recursive(adapter->debugfs_root);
+	}
+
+err_free_dev:
+	for_each_port(adapter, pidx) {
+		netdev = adapter->port[pidx];
+		if (netdev == NULL)
+			continue;
+		pi = netdev_priv(netdev);
+		t4vf_free_vi(adapter, pi->viid);
+		if (test_bit(pidx, &adapter->registered_device_map))
+			unregister_netdev(netdev);
+		free_netdev(netdev);
+	}
+
+err_unmap_bar:
+	iounmap(adapter->regs);
+
+err_free_adapter:
+	kfree(adapter);
+	pci_set_drvdata(pdev, NULL);
+
+err_disable_device:
+	pci_disable_device(pdev);
+	pci_clear_master(pdev);
+
+err_release_regions:
+	pci_release_regions(pdev);
+	pci_set_drvdata(pdev, NULL);
+
+err_out:
+	return err;
+}
+
+/*
+ * "Remove" a device: tear down all kernel and driver state created in the
+ * "probe" routine and quiesce the device (disable interrupts, etc.).  (Note
+ * that this is called "remove_one" in the PF Driver.)
+ */
+static void __devexit cxgb4vf_pci_remove(struct pci_dev *pdev)
+{
+	struct adapter *adapter = pci_get_drvdata(pdev);
+
+	/*
+	 * Tear down driver state associated with device.
+	 */
+	if (adapter) {
+		int pidx;
+
+		/*
+		 * Stop all of our activity.  Unregister network port,
+		 * disable interrupts, etc.
+		 */
+		for_each_port(adapter, pidx)
+			if (test_bit(pidx, &adapter->registered_device_map))
+				unregister_netdev(adapter->port[pidx]);
+		t4vf_sge_stop(adapter);
+		if (adapter->flags & USING_MSIX) {
+			pci_disable_msix(adapter->pdev);
+			adapter->flags &= ~USING_MSIX;
+		} else if (adapter->flags & USING_MSI) {
+			pci_disable_msi(adapter->pdev);
+			adapter->flags &= ~USING_MSI;
+		}
+
+		/*
+		 * Tear down our debugfs entries.
+		 */
+		if (adapter->debugfs_root) {
+			cleanup_debugfs(adapter);
+			debugfs_remove_recursive(adapter->debugfs_root);
+		}
+
+		/*
+		 * Free all of the various resources which we've acquired ...
+		 */
+		t4vf_free_sge_resources(adapter);
+		for_each_port(adapter, pidx) {
+			struct net_device *netdev = adapter->port[pidx];
+			struct port_info *pi;
+
+			if (netdev == NULL)
+				continue;
+
+			pi = netdev_priv(netdev);
+			t4vf_free_vi(adapter, pi->viid);
+			free_netdev(netdev);
+		}
+		iounmap(adapter->regs);
+		kfree(adapter);
+		pci_set_drvdata(pdev, NULL);
+	}
+
+	/*
+	 * Disable the device and release its PCI resources.
+	 */
+	pci_disable_device(pdev);
+	pci_clear_master(pdev);
+	pci_release_regions(pdev);
+}
+
+/*
+ * PCI Device registration data structures.
+ */
+#define CH_DEVICE(devid, idx) \
+	{ PCI_VENDOR_ID_CHELSIO, devid, PCI_ANY_ID, PCI_ANY_ID, 0, 0, idx }
+
+static struct pci_device_id cxgb4vf_pci_tbl[] = {
+	CH_DEVICE(0xb000, 0),	/* PE10K FPGA */
+	CH_DEVICE(0x4800, 0),	/* T440-dbg */
+	CH_DEVICE(0x4801, 0),	/* T420-cr */
+	CH_DEVICE(0x4802, 0),	/* T422-cr */
+	{ 0, }
+};
+
+MODULE_DESCRIPTION(DRV_DESC);
+MODULE_AUTHOR("Chelsio Communications");
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_VERSION(DRV_VERSION);
+MODULE_DEVICE_TABLE(pci, cxgb4vf_pci_tbl);
+
+static struct pci_driver cxgb4vf_driver = {
+	.name		= KBUILD_MODNAME,
+	.id_table	= cxgb4vf_pci_tbl,
+	.probe		= cxgb4vf_pci_probe,
+	.remove		= __devexit_p(cxgb4vf_pci_remove),
+};
+
+/*
+ * Initialize global driver state.
+ */
+static int __init cxgb4vf_module_init(void)
+{
+	int ret;
+
+	/* Debugfs support is optional, just warn if this fails */
+	cxgb4vf_debugfs_root = debugfs_create_dir(KBUILD_MODNAME, NULL);
+	if (!cxgb4vf_debugfs_root)
+		printk(KERN_WARNING KBUILD_MODNAME ": could not create"
+		       " debugfs entry, continuing\n");
+
+	ret = pci_register_driver(&cxgb4vf_driver);
+	if (ret < 0)
+		debugfs_remove(cxgb4vf_debugfs_root);
+	return ret;
+}
+
+/*
+ * Tear down global driver state.
+ */
+static void __exit cxgb4vf_module_exit(void)
+{
+	pci_unregister_driver(&cxgb4vf_driver);
+	debugfs_remove(cxgb4vf_debugfs_root);
+}
+
+module_init(cxgb4vf_module_init);
+module_exit(cxgb4vf_module_exit);

^ permalink raw reply related

* [PATCH 6/9] cxgb4vf: Add T4 Virtual Function Scatter-Gather Engine DMA code
From: Casey Leedom @ 2010-06-25 22:13 UTC (permalink / raw)
  To: netdev

Add T4 Virtual Function Scatter-Gather Engine DMA code.

Signed-off-by: Casey Leedom
---
 drivers/net/cxgb4vf/sge.c | 2460 
+++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 2460 insertions(+), 0 deletions(-)

diff --git a/drivers/net/cxgb4vf/sge.c b/drivers/net/cxgb4vf/sge.c
new file mode 100644
index 0000000..f857d20
--- /dev/null
+++ b/drivers/net/cxgb4vf/sge.c
@@ -0,0 +1,2460 @@
+/*
+ * This file is part of the Chelsio T4 PCI-E SR-IOV Virtual Function Ethernet
+ * driver for Linux.
+ *
+ * Copyright (c) 2009-2010 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/if_vlan.h>
+#include <linux/ip.h>
+#include <net/ipv6.h>
+#include <net/tcp.h>
+#include <linux/dma-mapping.h>
+
+#include "t4vf_common.h"
+#include "t4vf_defs.h"
+
+#include "../cxgb4/t4_regs.h"
+#include "../cxgb4/t4fw_api.h"
+#include "../cxgb4/t4_msg.h"
+
+/*
+ * Decoded Adapter Parameters.
+ */
+static u32 FL_PG_ORDER;		/* large page allocation size */
+static u32 STAT_LEN;		/* length of status page at ring end */
+static u32 PKTSHIFT;		/* padding between CPL and packet data */
+static u32 FL_ALIGN;		/* response queue message alignment */
+
+/*
+ * Constants ...
+ */
+enum {
+	/*
+	 * Egress Queue sizes, producer and consumer indices are all in units
+	 * of Egress Context Units bytes.  Note that as far as the hardware is
+	 * concerned, the free list is an Egress Queue (the host produces free
+	 * buffers which the hardware consumes) and free list entries are
+	 * 64-bit PCI DMA addresses.
+	 */
+	EQ_UNIT = SGE_EQ_IDXSIZE,
+	FL_PER_EQ_UNIT = EQ_UNIT / sizeof(__be64),
+	TXD_PER_EQ_UNIT = EQ_UNIT / sizeof(__be64),
+
+	/*
+	 * Max number of TX descriptors we clean up at a time.  Should be
+	 * modest as freeing skbs isn't cheap and it happens while holding
+	 * locks.  We just need to free packets faster than they arrive, we
+	 * eventually catch up and keep the amortized cost reasonable.
+	 */
+	MAX_TX_RECLAIM = 16,
+
+	/*
+	 * Max number of Rx buffers we replenish at a time.  Again keep this
+	 * modest, allocating buffers isn't cheap either.
+	 */
+	MAX_RX_REFILL = 16,
+
+	/*
+	 * Period of the Rx queue check timer.  This timer is infrequent as it
+	 * has something to do only when the system experiences severe memory
+	 * shortage.
+	 */
+	RX_QCHECK_PERIOD = (HZ / 2),
+
+	/*
+	 * Period of the TX queue check timer and the maximum number of TX
+	 * descriptors to be reclaimed by the TX timer.
+	 */
+	TX_QCHECK_PERIOD = (HZ / 2),
+	MAX_TIMER_TX_RECLAIM = 100,
+
+	/*
+	 * An FL with <= FL_STARVE_THRES buffers is starving and a periodic
+	 * timer will attempt to refill it.
+	 */
+	FL_STARVE_THRES = 4,
+
+	/*
+	 * Suspend an Ethernet TX queue with fewer available descriptors than
+	 * this.  We always want to have room for a maximum sized packet:
+	 * inline immediate data + MAX_SKB_FRAGS. This is the same as
+	 * calc_tx_flits() for a TSO packet with nr_frags == MAX_SKB_FRAGS
+	 * (see that function and its helpers for a description of the
+	 * calculation).
+	 */
+	ETHTXQ_MAX_FRAGS = MAX_SKB_FRAGS + 1,
+	ETHTXQ_MAX_SGL_LEN = ((3 * (ETHTXQ_MAX_FRAGS-1))/2 +
+				   ((ETHTXQ_MAX_FRAGS-1) & 1) +
+				   2),
+	ETHTXQ_MAX_HDR = (sizeof(struct fw_eth_tx_pkt_vm_wr) +
+			  sizeof(struct cpl_tx_pkt_lso_core) +
+			  sizeof(struct cpl_tx_pkt_core)) / sizeof(__be64),
+	ETHTXQ_MAX_FLITS = ETHTXQ_MAX_SGL_LEN + ETHTXQ_MAX_HDR,
+
+	ETHTXQ_STOP_THRES = 1 + DIV_ROUND_UP(ETHTXQ_MAX_FLITS, TXD_PER_EQ_UNIT),
+
+	/*
+	 * Max TX descriptor space we allow for an Ethernet packet to be
+	 * inlined into a WR.  This is limited by the maximum value which
+	 * we can specify for immediate data in the firmware Ethernet TX
+	 * Work Request.
+	 */
+	MAX_IMM_TX_PKT_LEN = FW_WR_IMMDLEN_MASK,
+
+	/*
+	 * Max size of a WR sent through a control TX queue.
+	 */
+	MAX_CTRL_WR_LEN = 256,
+
+	/*
+	 * Maximum amount of data which we'll ever need to inline into a
+	 * TX ring: max(MAX_IMM_TX_PKT_LEN, MAX_CTRL_WR_LEN).
+	 */
+	MAX_IMM_TX_LEN = (MAX_IMM_TX_PKT_LEN > MAX_CTRL_WR_LEN
+			  ? MAX_IMM_TX_PKT_LEN
+			  : MAX_CTRL_WR_LEN),
+
+	/*
+	 * For incoming packets less than RX_COPY_THRES, we copy the data into
+	 * an skb rather than referencing the data.  We allocate enough
+	 * in-line room in skb's to accommodate pulling in RX_PULL_LEN bytes
+	 * of the data (header).
+	 */
+	RX_COPY_THRES = 256,
+	RX_PULL_LEN = 128,
+};
+
+/*
+ * Can't define this in the above enum because PKTSHIFT isn't a constant in
+ * the VF Driver ...
+ */
+#define RX_PKT_PULL_LEN (RX_PULL_LEN + PKTSHIFT)
+
+/*
+ * Software state per TX descriptor.
+ */
+struct tx_sw_desc {
+	struct sk_buff *skb;		/* socket buffer of TX data source */
+	struct ulptx_sgl *sgl;		/* scatter/gather list in TX Queue */
+};
+
+/*
+ * Software state per RX Free List descriptor.  We keep track of the allocated
+ * FL page, its size, and its PCI DMA address (if the page is mapped).  The FL
+ * page size and its PCI DMA mapped state are stored in the low bits of the
+ * PCI DMA address as per below.
+ */
+struct rx_sw_desc {
+	struct page *page;		/* Free List page buffer */
+	dma_addr_t dma_addr;		/* PCI DMA address (if mapped) */
+					/*   and flags (see below) */
+};
+
+/*
+ * The low bits of rx_sw_desc.dma_addr have special meaning.  Note that the
+ * SGE also uses the low 4 bits to determine the size of the buffer.  It uses
+ * those bits to index into the SGE_FL_BUFFER_SIZE[index] register array.
+ * Since we only use SGE_FL_BUFFER_SIZE0 and SGE_FL_BUFFER_SIZE1, these low 4
+ * bits can only contain a 0 or a 1 to indicate which size buffer we're giving
+ * to the SGE.  Thus, our software state of "is the buffer mapped for DMA" is
+ * maintained in an inverse sense so the hardware never sees that bit high.
+ */
+enum {
+	RX_LARGE_BUF    = 1 << 0,	/* buffer is SGE_FL_BUFFER_SIZE[1] */
+	RX_UNMAPPED_BUF = 1 << 1,	/* buffer is not mapped */
+};
+
+/**
+ *	get_buf_addr - return DMA buffer address of software descriptor
+ *	@sdesc: pointer to the software buffer descriptor
+ *
+ *	Return the DMA buffer address of a software descriptor (stripping out
+ *	our low-order flag bits).
+ */
+static inline dma_addr_t get_buf_addr(const struct rx_sw_desc *sdesc)
+{
+	return sdesc->dma_addr & ~(dma_addr_t)(RX_LARGE_BUF | RX_UNMAPPED_BUF);
+}
+
+/**
+ *	is_buf_mapped - is buffer mapped for DMA?
+ *	@sdesc: pointer to the software buffer descriptor
+ *
+ *	Determine whether the buffer associated with a software descriptor in
+ *	mapped for DMA or not.
+ */
+static inline bool is_buf_mapped(const struct rx_sw_desc *sdesc)
+{
+	return !(sdesc->dma_addr & RX_UNMAPPED_BUF);
+}
+
+/**
+ *	need_skb_unmap - does the platform need unmapping of sk_buffs?
+ *
+ *	Returns true if the platfrom needs sk_buff unmapping.  The compiler
+ *	optimizes away unecessary code if this returns true.
+ */
+static inline int need_skb_unmap(void)
+{
+	/*
+	 * This structure is used to tell if the platfrom needs buffer
+	 * unmapping by checking if DECLARE_PCI_UNMAP_ADDR defines anything.
+	 */
+	struct dummy {
+		DECLARE_PCI_UNMAP_ADDR(addr);
+	};
+
+	return sizeof(struct dummy) != 0;
+}
+
+/**
+ *	txq_avail - return the number of available slots in a TX queue
+ *	@tq: the TX queue
+ *
+ *	Returns the number of available descriptors in a TX queue.
+ */
+static inline unsigned int txq_avail(const struct sge_txq *tq)
+{
+	return tq->size - 1 - tq->in_use;
+}
+
+/**
+ *	fl_cap - return the capacity of a Free List
+ *	@fl: the Free List
+ *
+ *	Returns the capacity of a Free List.  The capacity is less than the
+ *	size because an Egress Queue Index Unit worth of descriptors needs to
+ *	be left unpopulated, otherwise the Producer and Consumer indices PIDX
+ *	and CIDX will match and the hardware will think the FL is empty.
+ */
+static inline unsigned int fl_cap(const struct sge_fl *fl)
+{
+	return fl->size - FL_PER_EQ_UNIT;
+}
+
+/**
+ *	fl_starving - return whether a Free List is starving.
+ *	@fl: the Free List
+ *
+ *	Tests specified Free List to see whether the number of buffers
+ *	available to the hardware has falled below our "starvation"
+ *	threshhold.
+ */
+static inline bool fl_starving(const struct sge_fl *fl)
+{
+	return fl->avail - fl->pend_cred <= FL_STARVE_THRES;
+}
+
+/**
+ *	map_skb -  map an skb for DMA to the device
+ *	@dev: the egress net device
+ *	@skb: the packet to map
+ *	@addr: a pointer to the base of the DMA mapping array
+ *
+ *	Map an skb for DMA to the device and return an array of DMA addresses.
+ */
+static int map_skb(struct device *dev, const struct sk_buff *skb,
+		   dma_addr_t *addr)
+{
+	const skb_frag_t *fp, *end;
+	const struct skb_shared_info *si;
+
+	*addr = dma_map_single(dev, skb->data, skb_headlen(skb), DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, *addr))
+		goto out_err;
+
+	si = skb_shinfo(skb);
+	end = &si->frags[si->nr_frags];
+	for (fp = si->frags; fp < end; fp++) {
+		*++addr = dma_map_page(dev, fp->page, fp->page_offset, fp->size,
+				       DMA_TO_DEVICE);
+		if (dma_mapping_error(dev, *addr))
+			goto unwind;
+	}
+	return 0;
+
+unwind:
+	while (fp-- > si->frags)
+		dma_unmap_page(dev, *--addr, fp->size, DMA_TO_DEVICE);
+	dma_unmap_single(dev, addr[-1], skb_headlen(skb), DMA_TO_DEVICE);
+
+out_err:
+	return -ENOMEM;
+}
+
+static void unmap_sgl(struct device *dev, const struct sk_buff *skb,
+		      const struct ulptx_sgl *sgl, const struct sge_txq *tq)
+{
+	const struct ulptx_sge_pair *p;
+	unsigned int nfrags = skb_shinfo(skb)->nr_frags;
+
+	if (likely(skb_headlen(skb)))
+		dma_unmap_single(dev, be64_to_cpu(sgl->addr0),
+				 be32_to_cpu(sgl->len0), DMA_TO_DEVICE);
+	else {
+		dma_unmap_page(dev, be64_to_cpu(sgl->addr0),
+			       be32_to_cpu(sgl->len0), DMA_TO_DEVICE);
+		nfrags--;
+	}
+
+	/*
+	 * the complexity below is because of the possibility of a wrap-around
+	 * in the middle of an SGL
+	 */
+	for (p = sgl->sge; nfrags >= 2; nfrags -= 2) {
+		if (likely((u8 *)(p + 1) <= (u8 *)tq->stat)) {
+unmap:
+			dma_unmap_page(dev, be64_to_cpu(p->addr[0]),
+				       be32_to_cpu(p->len[0]), DMA_TO_DEVICE);
+			dma_unmap_page(dev, be64_to_cpu(p->addr[1]),
+				       be32_to_cpu(p->len[1]), DMA_TO_DEVICE);
+			p++;
+		} else if ((u8 *)p == (u8 *)tq->stat) {
+			p = (const struct ulptx_sge_pair *)tq->desc;
+			goto unmap;
+		} else if ((u8 *)p + 8 == (u8 *)tq->stat) {
+			const __be64 *addr = (const __be64 *)tq->desc;
+
+			dma_unmap_page(dev, be64_to_cpu(addr[0]),
+				       be32_to_cpu(p->len[0]), DMA_TO_DEVICE);
+			dma_unmap_page(dev, be64_to_cpu(addr[1]),
+				       be32_to_cpu(p->len[1]), DMA_TO_DEVICE);
+			p = (const struct ulptx_sge_pair *)&addr[2];
+		} else {
+			const __be64 *addr = (const __be64 *)tq->desc;
+
+			dma_unmap_page(dev, be64_to_cpu(p->addr[0]),
+				       be32_to_cpu(p->len[0]), DMA_TO_DEVICE);
+			dma_unmap_page(dev, be64_to_cpu(addr[0]),
+				       be32_to_cpu(p->len[1]), DMA_TO_DEVICE);
+			p = (const struct ulptx_sge_pair *)&addr[1];
+		}
+	}
+	if (nfrags) {
+		__be64 addr;
+
+		if ((u8 *)p == (u8 *)tq->stat)
+			p = (const struct ulptx_sge_pair *)tq->desc;
+		addr = ((u8 *)p + 16 <= (u8 *)tq->stat
+			? p->addr[0]
+			: *(const __be64 *)tq->desc);
+		dma_unmap_page(dev, be64_to_cpu(addr), be32_to_cpu(p->len[0]),
+			       DMA_TO_DEVICE);
+	}
+}
+
+/**
+ *	free_tx_desc - reclaims TX descriptors and their buffers
+ *	@adapter: the adapter
+ *	@tq: the TX queue to reclaim descriptors from
+ *	@n: the number of descriptors to reclaim
+ *	@unmap: whether the buffers should be unmapped for DMA
+ *
+ *	Reclaims TX descriptors from an SGE TX queue and frees the associated
+ *	TX buffers.  Called with the TX queue lock held.
+ */
+static void free_tx_desc(struct adapter *adapter, struct sge_txq *tq,
+			 unsigned int n, bool unmap)
+{
+	struct tx_sw_desc *sdesc;
+	unsigned int cidx = tq->cidx;
+	struct device *dev = adapter->pdev_dev;
+
+	const int need_unmap = need_skb_unmap() && unmap;
+
+	sdesc = &tq->sdesc[cidx];
+	while (n--) {
+		/*
+		 * If we kept a reference to the original TX skb, we need to
+		 * unmap it from PCI DMA space (if required) and free it.
+		 */
+		if (sdesc->skb) {
+			if (need_unmap)
+				unmap_sgl(dev, sdesc->skb, sdesc->sgl, tq);
+			kfree_skb(sdesc->skb);
+			sdesc->skb = NULL;
+		}
+
+		sdesc++;
+		if (++cidx == tq->size) {
+			cidx = 0;
+			sdesc = tq->sdesc;
+		}
+	}
+	tq->cidx = cidx;
+}
+
+/*
+ * Return the number of reclaimable descriptors in a TX queue.
+ */
+static inline int reclaimable(const struct sge_txq *tq)
+{
+	int hw_cidx = be16_to_cpu(tq->stat->cidx);
+	int reclaimable = hw_cidx - tq->cidx;
+	if (reclaimable < 0)
+		reclaimable += tq->size;
+	return reclaimable;
+}
+
+/**
+ *	reclaim_completed_tx - reclaims completed TX descriptors
+ *	@adapter: the adapter
+ *	@tq: the TX queue to reclaim completed descriptors from
+ *	@unmap: whether the buffers should be unmapped for DMA
+ *
+ *	Reclaims TX descriptors that the SGE has indicated it has processed,
+ *	and frees the associated buffers if possible.  Called with the TX
+ *	queue locked.
+ */
+static inline void reclaim_completed_tx(struct adapter *adapter,
+					struct sge_txq *tq,
+					bool unmap)
+{
+	int avail = reclaimable(tq);
+
+	if (avail) {
+		/*
+		 * Limit the amount of clean up work we do at a time to keep
+		 * the TX lock hold time O(1).
+		 */
+		if (avail > MAX_TX_RECLAIM)
+			avail = MAX_TX_RECLAIM;
+
+		free_tx_desc(adapter, tq, avail, unmap);
+		tq->in_use -= avail;
+	}
+}
+
+/**
+ *	get_buf_size - return the size of an RX Free List buffer.
+ *	@sdesc: pointer to the software buffer descriptor
+ */
+static inline int get_buf_size(const struct rx_sw_desc *sdesc)
+{
+	return FL_PG_ORDER > 0 && (sdesc->dma_addr & RX_LARGE_BUF)
+		? (PAGE_SIZE << FL_PG_ORDER)
+		: PAGE_SIZE;
+}
+
+/**
+ *	free_rx_bufs - free RX buffers on an SGE Free List
+ *	@adapter: the adapter
+ *	@fl: the SGE Free List to free buffers from
+ *	@n: how many buffers to free
+ *
+ *	Release the next @n buffers on an SGE Free List RX queue.   The
+ *	buffers must be made inaccessible to hardware before calling this
+ *	function.
+ */
+static void free_rx_bufs(struct adapter *adapter, struct sge_fl *fl, int n)
+{
+	while (n--) {
+		struct rx_sw_desc *sdesc = &fl->sdesc[fl->cidx];
+
+		if (is_buf_mapped(sdesc))
+			dma_unmap_page(adapter->pdev_dev, get_buf_addr(sdesc),
+				       get_buf_size(sdesc), PCI_DMA_FROMDEVICE);
+		put_page(sdesc->page);
+		sdesc->page = NULL;
+		if (++fl->cidx == fl->size)
+			fl->cidx = 0;
+		fl->avail--;
+	}
+}
+
+/**
+ *	unmap_rx_buf - unmap the current RX buffer on an SGE Free List
+ *	@adapter: the adapter
+ *	@fl: the SGE Free List
+ *
+ *	Unmap the current buffer on an SGE Free List RX queue.   The
+ *	buffer must be made inaccessible to HW before calling this function.
+ *
+ *	This is similar to @free_rx_bufs above but does not free the buffer.
+ *	Do note that the FL still loses any further access to the buffer.
+ *	This is used predominantly to "transfer ownership" of an FL buffer
+ *	to another entity (typically an skb's fragment list).
+ */
+static void unmap_rx_buf(struct adapter *adapter, struct sge_fl *fl)
+{
+	struct rx_sw_desc *sdesc = &fl->sdesc[fl->cidx];
+
+	if (is_buf_mapped(sdesc))
+		dma_unmap_page(adapter->pdev_dev, get_buf_addr(sdesc),
+			       get_buf_size(sdesc), PCI_DMA_FROMDEVICE);
+	sdesc->page = NULL;
+	if (++fl->cidx == fl->size)
+		fl->cidx = 0;
+	fl->avail--;
+}
+
+/**
+ *	ring_fl_db - righ doorbell on free list
+ *	@adapter: the adapter
+ *	@fl: the Free List whose doorbell should be rung ...
+ *
+ *	Tell the Scatter Gather Engine that there are new free list entries
+ *	available.
+ */
+static inline void ring_fl_db(struct adapter *adapter, struct sge_fl *fl)
+{
+	/*
+	 * The SGE keeps track of its Producer and Consumer Indices in terms
+	 * of Egress Queue Units so we can only tell it about integral numbers
+	 * of multiples of Free List Entries per Egress Queue Units ...
+	 */
+	if (fl->pend_cred >= FL_PER_EQ_UNIT) {
+		wmb();
+		t4_write_reg(adapter, T4VF_SGE_BASE_ADDR + SGE_VF_KDOORBELL,
+			     DBPRIO |
+			     QID(fl->cntxt_id) |
+			     PIDX(fl->pend_cred / FL_PER_EQ_UNIT));
+		fl->pend_cred %= FL_PER_EQ_UNIT;
+	}
+}
+
+/**
+ *	set_rx_sw_desc - initialize software RX buffer descriptor
+ *	@sdesc: pointer to the softwore RX buffer descriptor
+ *	@page: pointer to the page data structure backing the RX buffer
+ *	@dma_addr: PCI DMA address (possibly with low-bit flags)
+ */
+static inline void set_rx_sw_desc(struct rx_sw_desc *sdesc, struct page *page,
+				  dma_addr_t dma_addr)
+{
+	sdesc->page = page;
+	sdesc->dma_addr = dma_addr;
+}
+
+/*
+ * Support for poisoning RX buffers ...
+ */
+#define POISON_BUF_VAL -1
+
+static inline void poison_buf(struct page *page, size_t sz)
+{
+#if POISON_BUF_VAL >= 0
+	memset(page_address(page), POISON_BUF_VAL, sz);
+#endif
+}
+
+/**
+ *	refill_fl - refill an SGE RX buffer ring
+ *	@adapter: the adapter
+ *	@fl: the Free List ring to refill
+ *	@n: the number of new buffers to allocate
+ *	@gfp: the gfp flags for the allocations
+ *
+ *	(Re)populate an SGE free-buffer queue with up to @n new packet buffers,
+ *	allocated with the supplied gfp flags.  The caller must assure that
+ *	@n does not exceed the queue's capacity -- i.e. (cidx == pidx) _IN
+ *	EGRESS QUEUE UNITS_ indicates an empty Free List!  Returns the number
+ *	of buffers allocated.  If afterwards the queue is found critically low,
+ *	mark it as starving in the bitmap of starving FLs.
+ */
+static unsigned int refill_fl(struct adapter *adapter, struct sge_fl *fl,
+			      int n, gfp_t gfp)
+{
+	struct page *page;
+	dma_addr_t dma_addr;
+	unsigned int cred = fl->avail;
+	__be64 *d = &fl->desc[fl->pidx];
+	struct rx_sw_desc *sdesc = &fl->sdesc[fl->pidx];
+
+	/*
+	 * Sanity: ensure that the result of adding n Free List buffers
+	 * won't result in wrapping the SGE's Producer Index around to
+	 * it's Consumer Index thereby indicating an empty Free List ...
+	 */
+	BUG_ON(fl->avail + n > fl->size - FL_PER_EQ_UNIT);
+
+	/*
+	 * If we support large pages, prefer large buffers and fail over to
+	 * small pages if we can't allocate large pages to satisfy the refill.
+	 * If we don't support large pages, drop directly into the small page
+	 * allocation code.
+	 */
+	if (FL_PG_ORDER == 0)
+		goto alloc_small_pages;
+
+	while (n) {
+		page = alloc_pages(gfp | __GFP_COMP | __GFP_NOWARN,
+				   FL_PG_ORDER);
+		if (unlikely(!page)) {
+			/*
+			 * We've failed inour attempt to allocate a "large
+			 * page".  Fail over to the "small page" allocation
+			 * below.
+			 */
+			fl->large_alloc_failed++;
+			break;
+		}
+		poison_buf(page, PAGE_SIZE << FL_PG_ORDER);
+
+		dma_addr = dma_map_page(adapter->pdev_dev, page, 0,
+					PAGE_SIZE << FL_PG_ORDER,
+					PCI_DMA_FROMDEVICE);
+		if (unlikely(dma_mapping_error(adapter->pdev_dev, dma_addr))) {
+			/*
+			 * We've run out of DMA mapping space.  Free up the
+			 * buffer and return with what we've managed to put
+			 * into the free list.  We don't want to fail over to
+			 * the small page allocation below in this case
+			 * because DMA mapping resources are typically
+			 * critical resources once they become scarse.
+			 */
+			__free_pages(page, FL_PG_ORDER);
+			goto out;
+		}
+		dma_addr |= RX_LARGE_BUF;
+		*d++ = cpu_to_be64(dma_addr);
+
+		set_rx_sw_desc(sdesc, page, dma_addr);
+		sdesc++;
+
+		fl->avail++;
+		if (++fl->pidx == fl->size) {
+			fl->pidx = 0;
+			sdesc = fl->sdesc;
+			d = fl->desc;
+		}
+		n--;
+	}
+
+alloc_small_pages:
+	while (n--) {
+		page = __netdev_alloc_page(adapter->port[0],
+					   gfp | __GFP_NOWARN);
+		if (unlikely(!page)) {
+			fl->alloc_failed++;
+			break;
+		}
+		poison_buf(page, PAGE_SIZE);
+
+		dma_addr = dma_map_page(adapter->pdev_dev, page, 0, PAGE_SIZE,
+				       PCI_DMA_FROMDEVICE);
+		if (unlikely(dma_mapping_error(adapter->pdev_dev, dma_addr))) {
+			netdev_free_page(adapter->port[0], page);
+			break;
+		}
+		*d++ = cpu_to_be64(dma_addr);
+
+		set_rx_sw_desc(sdesc, page, dma_addr);
+		sdesc++;
+
+		fl->avail++;
+		if (++fl->pidx == fl->size) {
+			fl->pidx = 0;
+			sdesc = fl->sdesc;
+			d = fl->desc;
+		}
+	}
+
+out:
+	/*
+	 * Update our accounting state to incorporate the new Free List
+	 * buffers, tell the hardware about them and return the number of
+	 * bufers which we were able to allocate.
+	 */
+	cred = fl->avail - cred;
+	fl->pend_cred += cred;
+	ring_fl_db(adapter, fl);
+
+	if (unlikely(fl_starving(fl))) {
+		smp_wmb();
+		set_bit(fl->cntxt_id, adapter->sge.starving_fl);
+	}
+
+	return cred;
+}
+
+/*
+ * Refill a Free List to its capacity or the Maximum Refill Increment,
+ * whichever is smaller ...
+ */
+static inline void __refill_fl(struct adapter *adapter, struct sge_fl *fl)
+{
+	refill_fl(adapter, fl,
+		  min((unsigned int)MAX_RX_REFILL, fl_cap(fl) - fl->avail),
+		  GFP_ATOMIC);
+}
+
+/**
+ *	alloc_ring - allocate resources for an SGE descriptor ring
+ *	@dev: the PCI device's core device
+ *	@nelem: the number of descriptors
+ *	@hwsize: the size of each hardware descriptor
+ *	@swsize: the size of each software descriptor
+ *	@busaddrp: the physical PCI bus address of the allocated ring
+ *	@swringp: return address pointer for software ring
+ *	@stat_size: extra space in hardware ring for status information
+ *
+ *	Allocates resources for an SGE descriptor ring, such as TX queues,
+ *	free buffer lists, response queues, etc.  Each SGE ring requires
+ *	space for its hardware descriptors plus, optionally, space for software
+ *	state associated with each hardware entry (the metadata).  The function
+ *	returns three values: the virtual address for the hardware ring (the
+ *	return value of the function), the PCI bus address of the hardware
+ *	ring (in *busaddrp), and the address of the software ring (in swringp).
+ *	Both the hardware and software rings are returned zeroed out.
+ */
+static void *alloc_ring(struct device *dev, size_t nelem, size_t hwsize,
+			size_t swsize, dma_addr_t *busaddrp, void *swringp,
+			size_t stat_size)
+{
+	/*
+	 * Allocate the hardware ring and PCI DMA bus address space for said.
+	 */
+	size_t hwlen = nelem * hwsize + stat_size;
+	void *hwring = dma_alloc_coherent(dev, hwlen, busaddrp, GFP_KERNEL);
+
+	if (!hwring)
+		return NULL;
+
+	/*
+	 * If the caller wants a software ring, allocate it and return a
+	 * pointer to it in *swringp.
+	 */
+	BUG_ON((swsize != 0) != (swringp != NULL));
+	if (swsize) {
+		void *swring = kcalloc(nelem, swsize, GFP_KERNEL);
+
+		if (!swring) {
+			dma_free_coherent(dev, hwlen, hwring, *busaddrp);
+			return NULL;
+		}
+		*(void **)swringp = swring;
+	}
+
+	/*
+	 * Zero out the hardware ring and return its address as our function
+	 * value.
+	 */
+	memset(hwring, 0, hwlen);
+	return hwring;
+}
+
+/**
+ *	sgl_len - calculates the size of an SGL of the given capacity
+ *	@n: the number of SGL entries
+ *
+ *	Calculates the number of flits (8-byte units) needed for a Direct
+ *	Scatter/Gather List that can hold the given number of entries.
+ */
+static inline unsigned int sgl_len(unsigned int n)
+{
+	/*
+	 * A Direct Scatter Gather List uses 32-bit lengths and 64-bit PCI DMA
+	 * addresses.  The DSGL Work Request starts off with a 32-bit DSGL
+	 * ULPTX header, then Length0, then Address0, then, for 1 <= i <= N,
+	 * repeated sequences of { Length[i], Length[i+1], Address[i],
+	 * Address[i+1] } (this ensures that all addresses are on 64-bit
+	 * boundaries).  If N is even, then Length[N+1] should be set to 0 and
+	 * Address[N+1] is omitted.
+	 *
+	 * The following calculation incorporates all of the above.  It's
+	 * somewhat hard to follow but, briefly: the "+2" accounts for the
+	 * first two flits which include the DSGL header, Length0 and
+	 * Address0; the "(3*(n-1))/2" covers the main body of list entries (3
+	 * flits for every pair of the remaining N) +1 if (n-1) is odd; and
+	 * finally the "+((n-1)&1)" adds the one remaining flit needed if
+	 * (n-1) is odd ...
+	 */
+	n--;
+	return (3 * n) / 2 + (n & 1) + 2;
+}
+
+/**
+ *	flits_to_desc - returns the num of TX descriptors for the given flits
+ *	@flits: the number of flits
+ *
+ *	Returns the number of TX descriptors needed for the supplied number
+ *	of flits.
+ */
+static inline unsigned int flits_to_desc(unsigned int flits)
+{
+	BUG_ON(flits > SGE_MAX_WR_LEN / sizeof(__be64));
+	return DIV_ROUND_UP(flits, TXD_PER_EQ_UNIT);
+}
+
+/**
+ *	is_eth_imm - can an Ethernet packet be sent as immediate data?
+ *	@skb: the packet
+ *
+ *	Returns whether an Ethernet packet is small enough to fit completely as
+ *	immediate data.
+ */
+static inline int is_eth_imm(const struct sk_buff *skb)
+{
+	/*
+	 * The VF Driver uses the FW_ETH_TX_PKT_VM_WR firmware Work Request
+	 * which does not accommodate immediate data.  We could dike out all
+	 * of the support code for immediate data but that would tie our hands
+	 * too much if we ever want to enhace the firmware.  It would also
+	 * create more differences between the PF and VF Drivers.
+	 */
+	return false;
+}
+
+/**
+ *	calc_tx_flits - calculate the number of flits for a packet TX WR
+ *	@skb: the packet
+ *
+ *	Returns the number of flits needed for a TX Work Request for the
+ *	given Ethernet packet, including the needed WR and CPL headers.
+ */
+static inline unsigned int calc_tx_flits(const struct sk_buff *skb)
+{
+	unsigned int flits;
+
+	/*
+	 * If the skb is small enough, we can pump it out as a work request
+	 * with only immediate data.  In that case we just have to have the
+	 * TX Packet header plus the skb data in the Work Request.
+	 */
+	if (is_eth_imm(skb))
+		return DIV_ROUND_UP(skb->len + sizeof(struct cpl_tx_pkt),
+				    sizeof(__be64));
+
+	/*
+	 * Otherwise, we're going to have to construct a Scatter gather list
+	 * of the skb body and fragments.  We also include the flits necessary
+	 * for the TX Packet Work Request and CPL.  We always have a firmware
+	 * Write Header (incorporated as part of the cpl_tx_pkt_lso and
+	 * cpl_tx_pkt structures), followed by either a TX Packet Write CPL
+	 * message or, if we're doing a Large Send Offload, an LSO CPL message
+	 * with an embeded TX Packet Write CPL message.
+	 */
+	flits = sgl_len(skb_shinfo(skb)->nr_frags + 1);
+	if (skb_shinfo(skb)->gso_size)
+		flits += (sizeof(struct fw_eth_tx_pkt_vm_wr) +
+			  sizeof(struct cpl_tx_pkt_lso_core) +
+			  sizeof(struct cpl_tx_pkt_core)) / sizeof(__be64);
+	else
+		flits += (sizeof(struct fw_eth_tx_pkt_vm_wr) +
+			  sizeof(struct cpl_tx_pkt_core)) / sizeof(__be64);
+	return flits;
+}
+
+/**
+ *	write_sgl - populate a Scatter/Gather List for a packet
+ *	@skb: the packet
+ *	@tq: the TX queue we are writing into
+ *	@sgl: starting location for writing the SGL
+ *	@end: points right after the end of the SGL
+ *	@start: start offset into skb main-body data to include in the SGL
+ *	@addr: the list of DMA bus addresses for the SGL elements
+ *
+ *	Generates a Scatter/Gather List for the buffers that make up a packet.
+ *	The caller must provide adequate space for the SGL that will be written.
+ *	The SGL includes all of the packet's page fragments and the data in its
+ *	main body except for the first @start bytes.  @pos must be 16-byte
+ *	aligned and within a TX descriptor with available space.  @end points
+ *	write after the end of the SGL but does not account for any potential
+ *	wrap around, i.e., @end > @tq->stat.
+ */
+static void write_sgl(const struct sk_buff *skb, struct sge_txq *tq,
+		      struct ulptx_sgl *sgl, u64 *end, unsigned int start,
+		      const dma_addr_t *addr)
+{
+	unsigned int i, len;
+	struct ulptx_sge_pair *to;
+	const struct skb_shared_info *si = skb_shinfo(skb);
+	unsigned int nfrags = si->nr_frags;
+	struct ulptx_sge_pair buf[MAX_SKB_FRAGS / 2 + 1];
+
+	len = skb_headlen(skb) - start;
+	if (likely(len)) {
+		sgl->len0 = htonl(len);
+		sgl->addr0 = cpu_to_be64(addr[0] + start);
+		nfrags++;
+	} else {
+		sgl->len0 = htonl(si->frags[0].size);
+		sgl->addr0 = cpu_to_be64(addr[1]);
+	}
+
+	sgl->cmd_nsge = htonl(ULPTX_CMD(ULP_TX_SC_DSGL) |
+			      ULPTX_NSGE(nfrags));
+	if (likely(--nfrags == 0))
+		return;
+	/*
+	 * Most of the complexity below deals with the possibility we hit the
+	 * end of the queue in the middle of writing the SGL.  For this case
+	 * only we create the SGL in a temporary buffer and then copy it.
+	 */
+	to = (u8 *)end > (u8 *)tq->stat ? buf : sgl->sge;
+
+	for (i = (nfrags != si->nr_frags); nfrags >= 2; nfrags -= 2, to++) {
+		to->len[0] = cpu_to_be32(si->frags[i].size);
+		to->len[1] = cpu_to_be32(si->frags[++i].size);
+		to->addr[0] = cpu_to_be64(addr[i]);
+		to->addr[1] = cpu_to_be64(addr[++i]);
+	}
+	if (nfrags) {
+		to->len[0] = cpu_to_be32(si->frags[i].size);
+		to->len[1] = cpu_to_be32(0);
+		to->addr[0] = cpu_to_be64(addr[i + 1]);
+	}
+	if (unlikely((u8 *)end > (u8 *)tq->stat)) {
+		unsigned int part0 = (u8 *)tq->stat - (u8 *)sgl->sge, part1;
+
+		if (likely(part0))
+			memcpy(sgl->sge, buf, part0);
+		part1 = (u8 *)end - (u8 *)tq->stat;
+		memcpy(tq->desc, (u8 *)buf + part0, part1);
+		end = (void *)tq->desc + part1;
+	}
+	if ((uintptr_t)end & 8)           /* 0-pad to multiple of 16 */
+		*(u64 *)end = 0;
+}
+
+/**
+ *	check_ring_tx_db - check and potentially ring a TX queue's doorbell
+ *	@adapter: the adapter
+ *	@tq: the TX queue
+ *	@n: number of new descriptors to give to HW
+ *
+ *	Ring the doorbel for a TX queue.
+ */
+static inline void ring_tx_db(struct adapter *adapter, struct sge_txq *tq,
+			      int n)
+{
+	/*
+	 * Warn if we write doorbells with the wrong priority and write
+	 * descriptors before telling HW.
+	 */
+	WARN_ON((QID(tq->cntxt_id) | PIDX(n)) & DBPRIO);
+	wmb();
+	t4_write_reg(adapter, T4VF_SGE_BASE_ADDR + SGE_VF_KDOORBELL,
+		     QID(tq->cntxt_id) | PIDX(n));
+}
+
+/**
+ *	inline_tx_skb - inline a packet's data into TX descriptors
+ *	@skb: the packet
+ *	@tq: the TX queue where the packet will be inlined
+ *	@pos: starting position in the TX queue to inline the packet
+ *
+ *	Inline a packet's contents directly into TX descriptors, starting at
+ *	the given position within the TX DMA ring.
+ *	Most of the complexity of this operation is dealing with wrap arounds
+ *	in the middle of the packet we want to inline.
+ */
+static void inline_tx_skb(const struct sk_buff *skb, const struct sge_txq *tq,
+			  void *pos)
+{
+	u64 *p;
+	int left = (void *)tq->stat - pos;
+
+	if (likely(skb->len <= left)) {
+		if (likely(!skb->data_len))
+			skb_copy_from_linear_data(skb, pos, skb->len);
+		else
+			skb_copy_bits(skb, 0, pos, skb->len);
+		pos += skb->len;
+	} else {
+		skb_copy_bits(skb, 0, pos, left);
+		skb_copy_bits(skb, left, tq->desc, skb->len - left);
+		pos = (void *)tq->desc + (skb->len - left);
+	}
+
+	/* 0-pad to multiple of 16 */
+	p = PTR_ALIGN(pos, 8);
+	if ((uintptr_t)p & 8)
+		*p = 0;
+}
+
+/*
+ * Figure out what HW csum a packet wants and return the appropriate control
+ * bits.
+ */
+static u64 hwcsum(const struct sk_buff *skb)
+{
+	int csum_type;
+	const struct iphdr *iph = ip_hdr(skb);
+
+	if (iph->version == 4) {
+		if (iph->protocol == IPPROTO_TCP)
+			csum_type = TX_CSUM_TCPIP;
+		else if (iph->protocol == IPPROTO_UDP)
+			csum_type = TX_CSUM_UDPIP;
+		else {
+nocsum:
+			/*
+			 * unknown protocol, disable HW csum
+			 * and hope a bad packet is detected
+			 */
+			return TXPKT_L4CSUM_DIS;
+		}
+	} else {
+		/*
+		 * this doesn't work with extension headers
+		 */
+		const struct ipv6hdr *ip6h = (const struct ipv6hdr *)iph;
+
+		if (ip6h->nexthdr == IPPROTO_TCP)
+			csum_type = TX_CSUM_TCPIP6;
+		else if (ip6h->nexthdr == IPPROTO_UDP)
+			csum_type = TX_CSUM_UDPIP6;
+		else
+			goto nocsum;
+	}
+
+	if (likely(csum_type >= TX_CSUM_TCPIP))
+		return TXPKT_CSUM_TYPE(csum_type) |
+			TXPKT_IPHDR_LEN(skb_network_header_len(skb)) |
+			TXPKT_ETHHDR_LEN(skb_network_offset(skb) - ETH_HLEN);
+	else {
+		int start = skb_transport_offset(skb);
+
+		return TXPKT_CSUM_TYPE(csum_type) |
+			TXPKT_CSUM_START(start) |
+			TXPKT_CSUM_LOC(start + skb->csum_offset);
+	}
+}
+
+/*
+ * Stop an Ethernet TX queue and record that state change.
+ */
+static void txq_stop(struct sge_eth_txq *txq)
+{
+	netif_tx_stop_queue(txq->txq);
+	txq->q.stops++;
+}
+
+/*
+ * Advance our software state for a TX queue by adding n in use descriptors.
+ */
+static inline void txq_advance(struct sge_txq *tq, unsigned int n)
+{
+	tq->in_use += n;
+	tq->pidx += n;
+	if (tq->pidx >= tq->size)
+		tq->pidx -= tq->size;
+}
+
+/**
+ *	t4vf_eth_xmit - add a packet to an Ethernet TX queue
+ *	@skb: the packet
+ *	@dev: the egress net device
+ *
+ *	Add a packet to an SGE Ethernet TX queue.  Runs with softirqs disabled.
+ */
+int t4vf_eth_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	u64 cntrl, *end;
+	int qidx, credits;
+	unsigned int flits, ndesc;
+	struct adapter *adapter;
+	struct sge_eth_txq *txq;
+	const struct port_info *pi;
+	struct fw_eth_tx_pkt_vm_wr *wr;
+	struct cpl_tx_pkt_core *cpl;
+	const struct skb_shared_info *ssi;
+	dma_addr_t addr[MAX_SKB_FRAGS + 1];
+	const size_t fw_hdr_copy_len = (sizeof(wr->ethmacdst) +
+					sizeof(wr->ethmacsrc) +
+					sizeof(wr->ethtype) +
+					sizeof(wr->vlantci));
+
+	/*
+	 * The chip minimum packet length is 10 octets but the firmware
+	 * command that we are using requires that we copy the Ethernet header
+	 * (including the VLAN tag) into the header so we reject anything
+	 * smaller than that ...
+	 */
+	if (unlikely(skb->len < fw_hdr_copy_len))
+		goto out_free;
+
+	/*
+	 * Figure out which TX Queue we're going to use.
+	 */
+	pi = netdev_priv(dev);
+	adapter = pi->adapter;
+	qidx = skb_get_queue_mapping(skb);
+	BUG_ON(qidx >= pi->nqsets);
+	txq = &adapter->sge.ethtxq[pi->first_qset + qidx];
+
+	/*
+	 * Take this opportunity to reclaim any TX Descriptors whose DMA
+	 * transfers have completed.
+	 */
+	reclaim_completed_tx(adapter, &txq->q, true);
+
+	/*
+	 * Calculate the number of flits and TX Descriptors we're going to
+	 * need along with how many TX Descriptors will be left over after
+	 * we inject our Work Request.
+	 */
+	flits = calc_tx_flits(skb);
+	ndesc = flits_to_desc(flits);
+	credits = txq_avail(&txq->q) - ndesc;
+
+	if (unlikely(credits < 0)) {
+		/*
+		 * Not enough room for this packet's Work Request.  Stop the
+		 * TX Queue and return a "busy" condition.  The queue will get
+		 * started later on when the firmware informs us that space
+		 * has opened up.
+		 */
+		txq_stop(txq);
+		dev_err(adapter->pdev_dev,
+			"%s: TX ring %u full while queue awake!\n",
+			dev->name, qidx);
+		return NETDEV_TX_BUSY;
+	}
+
+	if (!is_eth_imm(skb) &&
+	    unlikely(map_skb(adapter->pdev_dev, skb, addr) < 0)) {
+		/*
+		 * We need to map the skb into PCI DMA space (because it can't
+		 * be in-lined directly into the Work Request) and the mapping
+		 * operation failed.  Record the error and drop the packet.
+		 */
+		txq->mapping_err++;
+		goto out_free;
+	}
+
+	if (unlikely(credits < ETHTXQ_STOP_THRES)) {
+		/*
+		 * After we're done injecting the Work Request for this
+		 * packet, we'll be below our "stop threshhold" so stop the TX
+		 * Queue now.  The queue will get started later on when the
+		 * firmware informs us that space has opened up.
+		 */
+		txq_stop(txq);
+	}
+
+	/*
+	 * Start filling in our Work Request.  Note that we do _not_ handle
+	 * the WR Header wrapping around the TX Descriptor Ring.  If our
+	 * maximum header size ever exceeds one TX Descriptor, we'll need to
+	 * do something else here.
+	 */
+	BUG_ON(DIV_ROUND_UP(ETHTXQ_MAX_HDR, TXD_PER_EQ_UNIT) > 1);
+	wr = (void *)&txq->q.desc[txq->q.pidx];
+	wr->equiq_to_len16 = cpu_to_be32(FW_WR_LEN16(DIV_ROUND_UP(flits, 2)));
+	wr->r3[0] = cpu_to_be64(0);
+	wr->r3[1] = cpu_to_be64(0);
+	skb_copy_from_linear_data(skb, (void *)wr->ethmacdst, fw_hdr_copy_len);
+	end = (u64 *)wr + flits;
+
+	/*
+	 * If this is a Large Send Offload packet we'll put in an LSO CPL
+	 * message with an encapsulated TX Packet CPL message.  Otherwise we
+	 * just use a TX Packet CPL message.
+	 */
+	ssi = skb_shinfo(skb);
+	if (ssi->gso_size) {
+		struct cpl_tx_pkt_lso_core *lso = (void *)(wr + 1);
+		bool v6 = (ssi->gso_type & SKB_GSO_TCPV6) != 0;
+		int l3hdr_len = skb_network_header_len(skb);
+		int eth_xtra_len = skb_network_offset(skb) - ETH_HLEN;
+
+		wr->op_immdlen =
+			cpu_to_be32(FW_WR_OP(FW_ETH_TX_PKT_VM_WR) |
+				    FW_WR_IMMDLEN(sizeof(*lso) +
+						  sizeof(*cpl)));
+		/*
+		 * Fill in the LSO CPL message.
+		 */
+		lso->lso_ctrl =
+			cpu_to_be32(LSO_OPCODE(CPL_TX_PKT_LSO) |
+				    LSO_FIRST_SLICE |
+				    LSO_LAST_SLICE |
+				    LSO_IPV6(v6) |
+				    LSO_ETHHDR_LEN(eth_xtra_len/4) |
+				    LSO_IPHDR_LEN(l3hdr_len/4) |
+				    LSO_TCPHDR_LEN(tcp_hdr(skb)->doff));
+		lso->ipid_ofst = cpu_to_be16(0);
+		lso->mss = cpu_to_be16(ssi->gso_size);
+		lso->seqno_offset = cpu_to_be32(0);
+		lso->len = cpu_to_be32(skb->len);
+
+		/*
+		 * Set up TX Packet CPL pointer, control word and perform
+		 * accounting.
+		 */
+		cpl = (void *)(lso + 1);
+		cntrl = (TXPKT_CSUM_TYPE(v6 ? TX_CSUM_TCPIP6 : TX_CSUM_TCPIP) |
+			 TXPKT_IPHDR_LEN(l3hdr_len) |
+			 TXPKT_ETHHDR_LEN(eth_xtra_len));
+		txq->tso++;
+		txq->tx_cso += ssi->gso_segs;
+	} else {
+		int len;
+
+		len = is_eth_imm(skb) ? skb->len + sizeof(*cpl) : sizeof(*cpl);
+		wr->op_immdlen =
+			cpu_to_be32(FW_WR_OP(FW_ETH_TX_PKT_VM_WR) |
+				    FW_WR_IMMDLEN(len));
+
+		/*
+		 * Set up TX Packet CPL pointer, control word and perform
+		 * accounting.
+		 */
+		cpl = (void *)(wr + 1);
+		if (skb->ip_summed == CHECKSUM_PARTIAL) {
+			cntrl = hwcsum(skb) | TXPKT_IPCSUM_DIS;
+			txq->tx_cso++;
+		} else
+			cntrl = TXPKT_L4CSUM_DIS | TXPKT_IPCSUM_DIS;
+	}
+
+	/*
+	 * If there's a VLAN tag present, add that to the list of things to
+	 * do in this Work Request.
+	 */
+	if (vlan_tx_tag_present(skb)) {
+		txq->vlan_ins++;
+		cntrl |= TXPKT_VLAN_VLD | TXPKT_VLAN(vlan_tx_tag_get(skb));
+	}
+
+	/*
+	 * Fill in the TX Packet CPL message header.
+	 */
+	cpl->ctrl0 = cpu_to_be32(TXPKT_OPCODE(CPL_TX_PKT_XT) |
+				 TXPKT_INTF(pi->port_id) |
+				 TXPKT_PF(0));
+	cpl->pack = cpu_to_be16(0);
+	cpl->len = cpu_to_be16(skb->len);
+	cpl->ctrl1 = cpu_to_be64(cntrl);
+
+#ifdef T4_TRACE
+	T4_TRACE5(adapter->tb[txq->q.cntxt_id & 7],
+		  "eth_xmit: ndesc %u, credits %u, pidx %u, len %u, frags %u",
+		  ndesc, credits, txq->q.pidx, skb->len, ssi->nr_frags);
+#endif
+
+	/*
+	 * Fill in the body of the TX Packet CPL message with either in-lined
+	 * data or a Scatter/Gather List.
+	 */
+	if (is_eth_imm(skb)) {
+		/*
+		 * In-line the packet's data and free the skb since we don't
+		 * need it any longer.
+		 */
+		inline_tx_skb(skb, &txq->q, cpl + 1);
+		dev_kfree_skb(skb);
+	} else {
+		/*
+		 * Write the skb's Scatter/Gather list into the TX Packet CPL
+		 * message and retain a pointer to the skb so we can free it
+		 * later when its DMA completes.  (We store the skb pointer
+		 * in the Software Descriptor corresponding to the last TX
+		 * Descriptor used by the Work Request.)
+		 *
+		 * The retained skb will be freed when the corresponding TX
+		 * Descriptors are reclaimed after their DMAs complete.
+		 * However, this could take quite a while since, in general,
+		 * the hardware is set up to be lazy about sending DMA
+		 * completion notifications to us and we mostly perform TX
+		 * reclaims in the transmit routine.
+		 *
+		 * This is good for performamce but means that we rely on new
+		 * TX packets arriving to run the destructors of completed
+		 * packets, which open up space in their sockets' send queues.
+		 * Sometimes we do not get such new packets causing TX to
+		 * stall.  A single UDP transmitter is a good example of this
+		 * situation.  We have a clean up timer that periodically
+		 * reclaims completed packets but it doesn't run often enough
+		 * (nor do we want it to) to prevent lengthy stalls.  A
+		 * solution to this problem is to run the destructor early,
+		 * after the packet is queued but before it's DMAd.  A con is
+		 * that we lie to socket memory accounting, but the amount of
+		 * extra memory is reasonable (limited by the number of TX
+		 * descriptors), the packets do actually get freed quickly by
+		 * new packets almost always, and for protocols like TCP that
+		 * wait for acks to really free up the data the extra memory
+		 * is even less.  On the positive side we run the destructors
+		 * on the sending CPU rather than on a potentially different
+		 * completing CPU, usually a good thing.  We also run them
+		 * without holding our TX queue lock, unlike what
+		 * reclaim_completed_tx() would otherwise do.
+		 *
+		 * XXX Actually the above is somewhat incorrect since we don't
+		 * XXX yet have a periodic timer which reclaims TX Descriptors.
+		 * XXX What's our plan for this?
+		 * XXX
+		 * XXX Also, we don't currently have a TX Queue lock but
+		 * XXX that may be the result of not having any current
+		 * XXX asynchronous path for reclaiming completed TX
+		 * XXX Descriptors ...
+		 *
+		 * Run the destructor before telling the DMA engine about the
+		 * packet to make sure it doesn't complete and get freed
+		 * prematurely.
+		 */
+		struct ulptx_sgl *sgl = (struct ulptx_sgl *)(cpl + 1);
+		struct sge_txq *tq = &txq->q;
+		int last_desc;
+
+		/*
+		 * If the Work Request header was an exact multiple of our TX
+		 * Descriptor length, then it's possible that the starting SGL
+		 * pointer lines up exactly with the end of our TX Descriptor
+		 * ring.  If that's the case, wrap around to the beginning
+		 * here ...
+		 */
+		if (unlikely((void *)sgl == (void *)tq->stat)) {
+			sgl = (void *)tq->desc;
+			end = (void *)((void *)tq->desc +
+				       ((void *)end - (void *)tq->stat));
+		}
+
+		write_sgl(skb, tq, sgl, end, 0, addr);
+		skb_orphan(skb);
+
+		last_desc = tq->pidx + ndesc - 1;
+		if (last_desc >= tq->size)
+			last_desc -= tq->size;
+		tq->sdesc[last_desc].skb = skb;
+		tq->sdesc[last_desc].sgl = sgl;
+	}
+
+	/*
+	 * Advance our internal TX Queue state, tell the hardware about
+	 * the new TX descriptors and return success.
+	 */
+	txq_advance(&txq->q, ndesc);
+	dev->trans_start = jiffies;
+	ring_tx_db(adapter, &txq->q, ndesc);
+	return NETDEV_TX_OK;
+
+out_free:
+	/*
+	 * An error of some sort happened.  Free the TX skb and tell the
+	 * OS that we've "dealt" with the packet ...
+	 */
+	dev_kfree_skb(skb);
+	return NETDEV_TX_OK;
+}
+
+/**
+ *	t4vf_pktgl_free - free a packet gather list
+ *	@gl: the gather list
+ *
+ *	Releases the pages of a packet gather list.  We do not own the last
+ *	page on the list and do not free it.
+ */
+void t4vf_pktgl_free(const struct pkt_gl *gl)
+{
+	int frag;
+
+	frag = gl->nfrags - 1;
+	while (frag--)
+		put_page(gl->frags[frag].page);
+}
+
+/**
+ *	copy_frags - copy fragments from gather list into skb_shared_info
+ *	@si: destination skb shared info structure
+ *	@gl: source internal packet gather list
+ *	@offset: packet start offset in first page
+ *
+ *	Copy an internal packet gather list into a Linux skb_shared_info
+ *	structure.
+ */
+static inline void copy_frags(struct skb_shared_info *si,
+			      const struct pkt_gl *gl,
+			      unsigned int offset)
+{
+	unsigned int n;
+
+	/* usually there's just one frag */
+	si->frags[0].page = gl->frags[0].page;
+	si->frags[0].page_offset = gl->frags[0].page_offset + offset;
+	si->frags[0].size = gl->frags[0].size - offset;
+	si->nr_frags = gl->nfrags;
+
+	n = gl->nfrags - 1;
+	if (n)
+		memcpy(&si->frags[1], &gl->frags[1], n * sizeof(skb_frag_t));
+
+	/* get a reference to the last page, we don't own it */
+	get_page(gl->frags[n].page);
+}
+
+/**
+ *	do_gro - perform Generic Receive Offload ingress packet processing
+ *	@rxq: ingress RX Ethernet Queue
+ *	@gl: gather list for ingress packet
+ *	@pkt: CPL header for last packet fragment
+ *
+ *	Perform Generic Receive Offload (GRO) ingress packet processing.
+ *	We use the standard Linux GRO interfaces for this.
+ */
+static void do_gro(struct sge_eth_rxq *rxq, const struct pkt_gl *gl,
+		   const struct cpl_rx_pkt *pkt)
+{
+	int ret;
+	struct sk_buff *skb;
+
+	skb = napi_get_frags(&rxq->rspq.napi);
+	if (unlikely(!skb)) {
+		t4vf_pktgl_free(gl);
+		rxq->stats.rx_drops++;
+		return;
+	}
+
+	copy_frags(skb_shinfo(skb), gl, PKTSHIFT);
+	skb->len = gl->tot_len - PKTSHIFT;
+	skb->data_len = skb->len;
+	skb->truesize += skb->data_len;
+	skb->ip_summed = CHECKSUM_UNNECESSARY;
+	skb_record_rx_queue(skb, rxq->rspq.idx);
+
+	if (unlikely(pkt->vlan_ex)) {
+		struct port_info *pi = netdev_priv(rxq->rspq.netdev);
+		struct vlan_group *grp = pi->vlan_grp;
+
+		rxq->stats.vlan_ex++;
+		if (likely(grp)) {
+			ret = vlan_gro_frags(&rxq->rspq.napi, grp,
+					     be16_to_cpu(pkt->vlan));
+			goto stats;
+		}
+	}
+	ret = napi_gro_frags(&rxq->rspq.napi);
+
+stats:
+	if (ret == GRO_HELD)
+		rxq->stats.lro_pkts++;
+	else if (ret == GRO_MERGED || ret == GRO_MERGED_FREE)
+		rxq->stats.lro_merged++;
+	rxq->stats.pkts++;
+	rxq->stats.rx_cso++;
+}
+
+/**
+ *	t4vf_ethrx_handler - process an ingress ethernet packet
+ *	@rspq: the response queue that received the packet
+ *	@rsp: the response queue descriptor holding the RX_PKT message
+ *	@gl: the gather list of packet fragments
+ *
+ *	Process an ingress ethernet packet and deliver it to the stack.
+ */
+int t4vf_ethrx_handler(struct sge_rspq *rspq, const __be64 *rsp,
+		       const struct pkt_gl *gl)
+{
+	struct sk_buff *skb;
+	struct port_info *pi;
+	struct skb_shared_info *ssi;
+	const struct cpl_rx_pkt *pkt = (void *)&rsp[1];
+	bool csum_ok = pkt->csum_calc && !pkt->err_vec;
+	unsigned int len = be16_to_cpu(pkt->len);
+	struct sge_eth_rxq *rxq = container_of(rspq, struct sge_eth_rxq, rspq);
+
+	/*
+	 * If this is a good TCP packet and we have Generic Receive Offload
+	 * enabled, handle the packet in the GRO path.
+	 */
+	if ((pkt->l2info & cpu_to_be32(RXF_TCP)) &&
+	    (rspq->netdev->features & NETIF_F_GRO) && csum_ok &&
+	    !pkt->ip_frag) {
+		do_gro(rxq, gl, pkt);
+		return 0;
+	}
+
+	/*
+	 * If the ingress packet is small enough, allocate an skb large enough
+	 * for all of the data and copy it inline.  Otherwise, allocate an skb
+	 * with enough room to pull in the header and reference the rest of
+	 * the data via the skb fragment list.
+	 */
+	if (len <= RX_COPY_THRES) {
+		/* small packets have only one fragment */
+		skb = alloc_skb(gl->frags[0].size, GFP_ATOMIC);
+		if (!skb)
+			goto nomem;
+		__skb_put(skb, gl->frags[0].size);
+		skb_copy_to_linear_data(skb, gl->va, gl->frags[0].size);
+	} else {
+		skb = alloc_skb(RX_PKT_PULL_LEN, GFP_ATOMIC);
+		if (!skb)
+			goto nomem;
+		__skb_put(skb, RX_PKT_PULL_LEN);
+		skb_copy_to_linear_data(skb, gl->va, RX_PKT_PULL_LEN);
+
+		ssi = skb_shinfo(skb);
+		ssi->frags[0].page = gl->frags[0].page;
+		ssi->frags[0].page_offset = (gl->frags[0].page_offset +
+					     RX_PKT_PULL_LEN);
+		ssi->frags[0].size = gl->frags[0].size - RX_PKT_PULL_LEN;
+		if (gl->nfrags > 1)
+			memcpy(&ssi->frags[1], &gl->frags[1],
+			       (gl->nfrags-1) * sizeof(skb_frag_t));
+		ssi->nr_frags = gl->nfrags;
+		skb->len = len + PKTSHIFT;
+		skb->data_len = skb->len - RX_PKT_PULL_LEN;
+		skb->truesize += skb->data_len;
+
+		/* Get a reference for the last page, we don't own it */
+		get_page(gl->frags[gl->nfrags - 1].page);
+	}
+
+	__skb_pull(skb, PKTSHIFT);
+	skb->protocol = eth_type_trans(skb, rspq->netdev);
+	skb_record_rx_queue(skb, rspq->idx);
+	skb->dev->last_rx = jiffies;                  /* XXX removed 2.6.29 */
+	pi = netdev_priv(skb->dev);
+	rxq->stats.pkts++;
+
+	if (csum_ok && (pi->rx_offload & RX_CSO) && !pkt->err_vec &&
+	    (be32_to_cpu(pkt->l2info) & (RXF_UDP|RXF_TCP))) {
+		if (!pkt->ip_frag)
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
+		else {
+			__sum16 c = (__force __sum16)pkt->csum;
+			skb->csum = csum_unfold(c);
+			skb->ip_summed = CHECKSUM_COMPLETE;
+		}
+		rxq->stats.rx_cso++;
+	} else
+		skb->ip_summed = CHECKSUM_NONE;
+
+	if (unlikely(pkt->vlan_ex)) {
+		struct vlan_group *grp = pi->vlan_grp;
+
+		rxq->stats.vlan_ex++;
+		if (likely(grp))
+			vlan_hwaccel_receive_skb(skb, grp,
+						 be16_to_cpu(pkt->vlan));
+		else
+			dev_kfree_skb_any(skb);
+	} else
+		netif_receive_skb(skb);
+
+	return 0;
+
+nomem:
+	t4vf_pktgl_free(gl);
+	rxq->stats.rx_drops++;
+	return 0;
+}
+
+/**
+ *	is_new_response - check if a response is newly written
+ *	@rc: the response control descriptor
+ *	@rspq: the response queue
+ *
+ *	Returns true if a response descriptor contains a yet unprocessed
+ *	response.
+ */
+static inline bool is_new_response(const struct rsp_ctrl *rc,
+				   const struct sge_rspq *rspq)
+{
+	return RSPD_GEN(rc->type_gen) == rspq->gen;
+}
+
+/**
+ *	restore_rx_bufs - put back a packet's RX buffers
+ *	@gl: the packet gather list
+ *	@fl: the SGE Free List
+ *	@nfrags: how many fragments in @si
+ *
+ *	Called when we find out that the current packet, @si, can't be
+ *	processed right away for some reason.  This is a very rare event and
+ *	there's no effort to make this suspension/resumption process
+ *	particularly efficient.
+ *
+ *	We implement the suspension by putting all of the RX buffers associated
+ *	with the current packet back on the original Free List.  The buffers
+ *	have already been unmapped and are left unmapped, we mark them as
+ *	unmapped in order to prevent further unmapping attempts.  (Effectively
+ *	this function undoes the series of @unmap_rx_buf calls which were done
+ *	to create the current packet's gather list.)  This leaves us ready to
+ *	restart processing of the packet the next time we start processing the
+ *	RX Queue ...
+ */
+static void restore_rx_bufs(const struct pkt_gl *gl, struct sge_fl *fl,
+			    int frags)
+{
+	struct rx_sw_desc *sdesc;
+
+	while (frags--) {
+		if (fl->cidx == 0)
+			fl->cidx = fl->size - 1;
+		else
+			fl->cidx--;
+		sdesc = &fl->sdesc[fl->cidx];
+		sdesc->page = gl->frags[frags].page;
+		sdesc->dma_addr |= RX_UNMAPPED_BUF;
+		fl->avail++;
+	}
+}
+
+/**
+ *	rspq_next - advance to the next entry in a response queue
+ *	@rspq: the queue
+ *
+ *	Updates the state of a response queue to advance it to the next entry.
+ */
+static inline void rspq_next(struct sge_rspq *rspq)
+{
+	rspq->cur_desc = (void *)rspq->cur_desc + rspq->iqe_len;
+	if (unlikely(++rspq->cidx == rspq->size)) {
+		rspq->cidx = 0;
+		rspq->gen ^= 1;
+		rspq->cur_desc = rspq->desc;
+	}
+}
+
+/**
+ *	process_responses - process responses from an SGE response queue
+ *	@rspq: the ingress response queue to process
+ *	@budget: how many responses can be processed in this round
+ *
+ *	Process responses from a Scatter Gather Engine response queue up to
+ *	the supplied budget.  Responses include received packets as well as
+ *	control messages from firmware or hardware.
+ *
+ *	Additionally choose the interrupt holdoff time for the next interrupt
+ *	on this queue.  If the system is under memory shortage use a fairly
+ *	long delay to help recovery.
+ */
+int process_responses(struct sge_rspq *rspq, int budget)
+{
+	struct sge_eth_rxq *rxq = container_of(rspq, struct sge_eth_rxq, rspq);
+	int budget_left = budget;
+
+	while (likely(budget_left)) {
+		int ret, rsp_type;
+		const struct rsp_ctrl *rc;
+
+		rc = (void *)rspq->cur_desc + (rspq->iqe_len - sizeof(*rc));
+		if (!is_new_response(rc, rspq))
+			break;
+
+		/*
+		 * Figure out what kind of response we've received from the
+		 * SGE.
+		 */
+		rmb();
+		rsp_type = RSPD_TYPE(rc->type_gen);
+		if (likely(rsp_type == RSP_TYPE_FLBUF)) {
+			skb_frag_t *fp;
+			struct pkt_gl gl;
+			const struct rx_sw_desc *sdesc;
+			u32 bufsz, frag;
+			u32 len = be32_to_cpu(rc->pldbuflen_qid);
+
+			/*
+			 * If we get a "new buffer" message from the SGE we
+			 * need to move on to the next Free List buffer.
+			 */
+			if (len & RSPD_NEWBUF) {
+				/*
+				 * We get one "new buffer" message when we
+				 * first start up a queue so we need to ignore
+				 * it when our offset into the buffer is 0.
+				 */
+				if (likely(rspq->offset > 0)) {
+					free_rx_bufs(rspq->adapter, &rxq->fl,
+						     1);
+					rspq->offset = 0;
+				}
+				len = RSPD_LEN(len);
+			}
+
+			/*
+			 * Gather packet fragments.
+			 */
+			for (frag = 0, fp = gl.frags; /**/; frag++, fp++) {
+				BUG_ON(frag >= MAX_SKB_FRAGS);
+				BUG_ON(rxq->fl.avail == 0);
+				sdesc = &rxq->fl.sdesc[rxq->fl.cidx];
+				bufsz = get_buf_size(sdesc);
+				fp->page = sdesc->page;
+				fp->page_offset = rspq->offset;
+				fp->size = min(bufsz, len);
+				len -= fp->size;
+				if (!len)
+					break;
+				unmap_rx_buf(rspq->adapter, &rxq->fl);
+			}
+			gl.nfrags = frag+1;
+
+			/*
+			 * Last buffer remains mapped so explicitly make it
+			 * coherent for CPU access and start preloading first
+			 * cache line ...
+			 */
+			dma_sync_single_for_cpu(rspq->adapter->pdev_dev,
+						get_buf_addr(sdesc),
+						fp->size, DMA_FROM_DEVICE);
+			gl.va = (page_address(gl.frags[0].page) +
+				 gl.frags[0].page_offset);
+			prefetch(gl.va);
+
+			/*
+			 * Hand the new ingress packet to the handler for
+			 * this Response Queue.
+			 */
+			ret = rspq->handler(rspq, rspq->cur_desc, &gl);
+			if (likely(ret == 0))
+				rspq->offset += ALIGN(fp->size, FL_ALIGN);
+			else
+				restore_rx_bufs(&gl, &rxq->fl, frag);
+		} else if (likely(rsp_type == RSP_TYPE_CPL)) {
+			ret = rspq->handler(rspq, rspq->cur_desc, NULL);
+		} else {
+			WARN_ON(rsp_type > RSP_TYPE_CPL);
+			ret = 0;
+		}
+
+		if (unlikely(ret)) {
+			/*
+			 * Couldn't process descriptor, back off for recovery.
+			 * We use the SGE's last timer which has the longest
+			 * interrupt coalescing value ...
+			 */
+			const int NOMEM_TIMER_IDX = SGE_NTIMERS-1;
+			rspq->next_intr_params =
+				QINTR_TIMER_IDX(NOMEM_TIMER_IDX);
+			break;
+		}
+
+		rspq_next(rspq);
+		budget_left--;
+	}
+
+	/*
+	 * If this is a Response Queue with an associated Free List and
+	 * at least two Egress Queue units available in the Free List
+	 * for new buffer pointers, refill the Free List.
+	 */
+	if (rspq->offset >= 0 &&
+	    rxq->fl.size - rxq->fl.avail >= 2*FL_PER_EQ_UNIT)
+		__refill_fl(rspq->adapter, &rxq->fl);
+	return budget - budget_left;
+}
+
+/**
+ *	napi_rx_handler - the NAPI handler for RX processing
+ *	@napi: the napi instance
+ *	@budget: how many packets we can process in this round
+ *
+ *	Handler for new data events when using NAPI.  This does not need any
+ *	locking or protection from interrupts as data interrupts are off at
+ *	this point and other adapter interrupts do not interfere (the latter
+ *	in not a concern at all with MSI-X as non-data interrupts then have
+ *	a separate handler).
+ */
+static int napi_rx_handler(struct napi_struct *napi, int budget)
+{
+	unsigned int intr_params;
+	struct sge_rspq *rspq = container_of(napi, struct sge_rspq, napi);
+	int work_done = process_responses(rspq, budget);
+
+	if (likely(work_done < budget)) {
+		napi_complete(napi);
+		intr_params = rspq->next_intr_params;
+		rspq->next_intr_params = rspq->intr_params;
+	} else
+		intr_params = QINTR_TIMER_IDX(SGE_TIMER_UPD_CIDX);
+
+	t4_write_reg(rspq->adapter,
+		     T4VF_SGE_BASE_ADDR + SGE_VF_GTS,
+		     CIDXINC(work_done) |
+		     INGRESSQID((u32)rspq->cntxt_id) |
+		     SEINTARM(intr_params));
+	return work_done;
+}
+
+/*
+ * The MSI-X interrupt handler for an SGE response queue for the NAPI case
+ * (i.e., response queue serviced by NAPI polling).
+ */
+irqreturn_t t4vf_sge_intr_msix(int irq, void *cookie)
+{
+	struct sge_rspq *rspq = cookie;
+
+	napi_schedule(&rspq->napi);
+	return IRQ_HANDLED;
+}
+
+/*
+ * Process the indirect interrupt entries in the interrupt queue and kick off
+ * NAPI for each queue that has generated an entry.
+ */
+static unsigned int process_intrq(struct adapter *adapter)
+{
+	struct sge *s = &adapter->sge;
+	struct sge_rspq *intrq = &s->intrq;
+	unsigned int work_done;
+
+	spin_lock(&adapter->sge.intrq_lock);
+	for (work_done = 0; ; work_done++) {
+		const struct rsp_ctrl *rc;
+		unsigned int qid, iq_idx;
+		struct sge_rspq *rspq;
+
+		/*
+		 * Grab the next response from the interrupt queue and bail
+		 * out if it's not a new response.
+		 */
+		rc = (void *)intrq->cur_desc + (intrq->iqe_len - sizeof(*rc));
+		if (!is_new_response(rc, intrq))
+			break;
+
+		/*
+		 * If the response isn't a forwarded interrupt message issue a
+		 * error and go on to the next response message.  This should
+		 * never happen ...
+		 */
+		rmb();
+		if (unlikely(RSPD_TYPE(rc->type_gen) != RSP_TYPE_INTR)) {
+			dev_err(adapter->pdev_dev,
+				"Unexpected INTRQ response type %d\n",
+				RSPD_TYPE(rc->type_gen));
+			continue;
+		}
+
+		/*
+		 * Extract the Queue ID from the interrupt message and perform
+		 * sanity checking to make sure it really refers to one of our
+		 * Ingress Queues which is active and matches the queue's ID.
+		 * None of these error conditions should ever happen so we may
+		 * want to either make them fatal and/or conditionalized under
+		 * DEBUG.
+		 */
+		qid = RSPD_QID(be32_to_cpu(rc->pldbuflen_qid));
+		iq_idx = IQ_IDX(s, qid);
+		if (unlikely(iq_idx >= MAX_INGQ)) {
+			dev_err(adapter->pdev_dev,
+				"Ingress QID %d out of range\n", qid);
+			continue;
+		}
+		rspq = s->ingr_map[iq_idx];
+		if (unlikely(rspq == NULL)) {
+			dev_err(adapter->pdev_dev,
+				"Ingress QID %d RSPQ=NULL\n", qid);
+			continue;
+		}
+		if (unlikely(rspq->abs_id != qid)) {
+			dev_err(adapter->pdev_dev,
+				"Ingress QID %d refers to RSPQ %d\n",
+				qid, rspq->abs_id);
+			continue;
+		}
+
+		/*
+		 * Schedule NAPI processing on the indicated Response Queue
+		 * and move on to the next entry in the Forwarded Interrupt
+		 * Queue.
+		 */
+		napi_schedule(&rspq->napi);
+		rspq_next(intrq);
+	}
+
+	t4_write_reg(adapter, T4VF_SGE_BASE_ADDR + SGE_VF_GTS,
+		     CIDXINC(work_done) |
+		     INGRESSQID(intrq->cntxt_id) |
+		     SEINTARM(intrq->intr_params));
+
+	spin_unlock(&adapter->sge.intrq_lock);
+
+	return work_done;
+}
+
+/*
+ * The MSI interrupt handler handles data events from SGE response queues as
+ * well as error and other async events as they all use the same MSI vector.
+ */
+irqreturn_t t4vf_intr_msi(int irq, void *cookie)
+{
+	struct adapter *adapter = cookie;
+
+	process_intrq(adapter);
+	return IRQ_HANDLED;
+}
+
+/**
+ *	t4vf_intr_handler - select the top-level interrupt handler
+ *	@adapter: the adapter
+ *
+ *	Selects the top-level interrupt handler based on the type of interrupts
+ *	(MSI-X or MSI).
+ */
+irq_handler_t t4vf_intr_handler(struct adapter *adapter)
+{
+	BUG_ON((adapter->flags & (USING_MSIX|USING_MSI)) == 0);
+	if (adapter->flags & USING_MSIX)
+		return t4vf_sge_intr_msix;
+	else
+		return t4vf_intr_msi;
+}
+
+/**
+ *	sge_rx_timer_cb - perform periodic maintenance of SGE RX queues
+ *	@data: the adapter
+ *
+ *	Runs periodically from a timer to perform maintenance of SGE RX queues.
+ *
+ *	a) Replenishes RX queues that have run out due to memory shortage.
+ *	Normally new RX buffers are added when existing ones are consumed but
+ *	when out of memory a queue can become empty.  We schedule NAPI to do
+ *	the actual refill.
+ */
+static void sge_rx_timer_cb(unsigned long data)
+{
+	struct adapter *adapter = (struct adapter *)data;
+	struct sge *s = &adapter->sge;
+	unsigned int i;
+
+	/*
+	 * Scan the "Starving Free Lists" flag array looking for any Free
+	 * Lists in need of more free buffers.  If we find one and it's not
+	 * being actively polled, then bump its "starving" counter and attempt
+	 * to refill it.  If we're successful in adding enough buffers to push
+	 * the Free List over the starving threshold, then we can clear its
+	 * "starving" status.
+	 */
+	for (i = 0; i < ARRAY_SIZE(s->starving_fl); i++) {
+		unsigned long m;
+
+		for (m = s->starving_fl[i]; m; m &= m - 1) {
+			unsigned int id = __ffs(m) + i * BITS_PER_LONG;
+			struct sge_fl *fl = s->egr_map[id];
+
+			clear_bit(id, s->starving_fl);
+			smp_mb__after_clear_bit();
+
+			/*
+			 * Since we are accessing fl without a lock there's a
+			 * small probability of a false positive where we
+			 * schedule napi but the FL is no longer starving.
+			 * No biggie.
+			 */
+			if (fl_starving(fl)) {
+				struct sge_eth_rxq *rxq;
+
+				rxq = container_of(fl, struct sge_eth_rxq, fl);
+				if (napi_reschedule(&rxq->rspq.napi))
+					fl->starving++;
+				else
+					set_bit(id, s->starving_fl);
+			}
+		}
+	}
+
+	/*
+	 * Reschedule the next scan for starving Free Lists ...
+	 */
+	mod_timer(&s->rx_timer, jiffies + RX_QCHECK_PERIOD);
+}
+
+/**
+ *	sge_tx_timer_cb - perform periodic maintenance of SGE Tx queues
+ *	@data: the adapter
+ *
+ *	Runs periodically from a timer to perform maintenance of SGE TX queues.
+ *
+ *	b) Reclaims completed Tx packets for the Ethernet queues.  Normally
+ *	packets are cleaned up by new Tx packets, this timer cleans up packets
+ *	when no new packets are being submitted.  This is essential for pktgen,
+ *	at least.
+ */
+static void sge_tx_timer_cb(unsigned long data)
+{
+	struct adapter *adapter = (struct adapter *)data;
+	struct sge *s = &adapter->sge;
+	unsigned int i, budget;
+
+	budget = MAX_TIMER_TX_RECLAIM;
+	i = s->ethtxq_rover;
+	do {
+		struct sge_eth_txq *txq = &s->ethtxq[i];
+
+		if (reclaimable(&txq->q) && __netif_tx_trylock(txq->txq)) {
+			int avail = reclaimable(&txq->q);
+
+			if (avail > budget)
+				avail = budget;
+
+			free_tx_desc(adapter, &txq->q, avail, true);
+			txq->q.in_use -= avail;
+			__netif_tx_unlock(txq->txq);
+
+			budget -= avail;
+			if (!budget)
+				break;
+		}
+
+		i++;
+		if (i >= s->ethqsets)
+			i = 0;
+	} while (i != s->ethtxq_rover);
+	s->ethtxq_rover = i;
+
+	/*
+	 * If we found too many reclaimable packets schedule a timer in the
+	 * near future to continue where we left off.  Otherwise the next timer
+	 * will be at its normal interval.
+	 */
+	mod_timer(&s->tx_timer, jiffies + (budget ? TX_QCHECK_PERIOD : 2));
+}
+
+/**
+ *	t4vf_sge_alloc_rxq - allocate an SGE RX Queue
+ *	@adapter: the adapter
+ *	@rspq: pointer to to the new rxq's Response Queue to be filled in
+ *	@iqasynch: if 0, a normal rspq; if 1, an asynchronous event queue
+ *	@dev: the network device associated with the new rspq
+ *	@intr_dest: MSI-X vector index (overriden in MSI mode)
+ *	@fl: pointer to the new rxq's Free List to be filled in
+ *	@hnd: the interrupt handler to invoke for the rspq
+ */
+int t4vf_sge_alloc_rxq(struct adapter *adapter, struct sge_rspq *rspq,
+		       bool iqasynch, struct net_device *dev,
+		       int intr_dest,
+		       struct sge_fl *fl, rspq_handler_t hnd)
+{
+	struct port_info *pi = netdev_priv(dev);
+	struct fw_iq_cmd cmd, rpl;
+	int ret, iqandst, flsz = 0;
+
+	/*
+	 * If we're using MSI interrupts and we're not initializing the
+	 * Forwarded Interrupt Queue itself, then set up this queue for
+	 * indirect interrupts to the Forwarded Interrupt Queue.  Obviously
+	 * the Forwarded Interrupt Queue must be set up before any other
+	 * ingress queue ...
+	 */
+	if ((adapter->flags & USING_MSI) && rspq != &adapter->sge.intrq) {
+		iqandst = SGE_INTRDST_IQ;
+		intr_dest = adapter->sge.intrq.abs_id;
+	} else
+		iqandst = SGE_INTRDST_PCI;
+
+	/*
+	 * Allocate the hardware ring for the Response Queue.  The size needs
+	 * to be a multiple of 16 which includes the mandatory status entry
+	 * (regardless of whether the Status Page capabilities are enabled or
+	 * not).
+	 */
+	rspq->size = roundup(rspq->size, 16);
+	rspq->desc = alloc_ring(adapter->pdev_dev, rspq->size, rspq->iqe_len,
+				0, &rspq->phys_addr, NULL, 0);
+	if (!rspq->desc)
+		return -ENOMEM;
+
+	/*
+	 * Fill in the Ingress Queue Command.  Note: Ideally this code would
+	 * be in t4vf_hw.c but there are so many parameters and dependencies
+	 * on our Linux SGE state that we would end up having to pass tons of
+	 * parameters.  We'll have to think about how this might be migrated
+	 * into OS-independent common code ...
+	 */
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_vfn = cpu_to_be32(FW_CMD_OP(FW_IQ_CMD) |
+				    FW_CMD_REQUEST |
+				    FW_CMD_WRITE |
+				    FW_CMD_EXEC);
+	cmd.alloc_to_len16 = cpu_to_be32(FW_IQ_CMD_ALLOC |
+					 FW_IQ_CMD_IQSTART(1) |
+					 FW_LEN16(cmd));
+	cmd.type_to_iqandstindex =
+		cpu_to_be32(FW_IQ_CMD_TYPE(FW_IQ_TYPE_FL_INT_CAP) |
+			    FW_IQ_CMD_IQASYNCH(iqasynch) |
+			    FW_IQ_CMD_VIID(pi->viid) |
+			    FW_IQ_CMD_IQANDST(iqandst) |
+			    FW_IQ_CMD_IQANUS(1) |
+			    FW_IQ_CMD_IQANUD(SGE_UPDATEDEL_INTR) |
+			    FW_IQ_CMD_IQANDSTINDEX(intr_dest));
+	cmd.iqdroprss_to_iqesize =
+		cpu_to_be16(FW_IQ_CMD_IQPCIECH(pi->port_id) |
+			    FW_IQ_CMD_IQGTSMODE |
+			    FW_IQ_CMD_IQINTCNTTHRESH(rspq->pktcnt_idx) |
+			    FW_IQ_CMD_IQESIZE(ilog2(rspq->iqe_len) - 4));
+	cmd.iqsize = cpu_to_be16(rspq->size);
+	cmd.iqaddr = cpu_to_be64(rspq->phys_addr);
+
+	if (fl) {
+		/*
+		 * Allocate the ring for the hardware free list (with space
+		 * for its status page) along with the associated software
+		 * descriptor ring.  The free list size needs to be a multiple
+		 * of the Egress Queue Unit.
+		 */
+		fl->size = roundup(fl->size, FL_PER_EQ_UNIT);
+		fl->desc = alloc_ring(adapter->pdev_dev, fl->size,
+				      sizeof(__be64), sizeof(struct rx_sw_desc),
+				      &fl->addr, &fl->sdesc, STAT_LEN);
+		if (!fl->desc) {
+			ret = -ENOMEM;
+			goto err;
+		}
+
+		/*
+		 * Calculate the size of the hardware free list ring plus
+		 * status page (which the SGE will place at the end of the
+		 * free list ring) in Egress Queue Units.
+		 */
+		flsz = (fl->size / FL_PER_EQ_UNIT +
+			STAT_LEN / EQ_UNIT);
+
+		/*
+		 * Fill in all the relevant firmware Ingress Queue Command
+		 * fields for the free list.
+		 */
+		cmd.iqns_to_fl0congen =
+			cpu_to_be32(
+				FW_IQ_CMD_FL0HOSTFCMODE(SGE_HOSTFCMODE_NONE) |
+				FW_IQ_CMD_FL0PACKEN |
+				FW_IQ_CMD_FL0PADEN);
+		cmd.fl0dcaen_to_fl0cidxfthresh =
+			cpu_to_be16(
+				FW_IQ_CMD_FL0FBMIN(SGE_FETCHBURSTMIN_64B) |
+				FW_IQ_CMD_FL0FBMAX(SGE_FETCHBURSTMAX_512B));
+		cmd.fl0size = cpu_to_be16(flsz);
+		cmd.fl0addr = cpu_to_be64(fl->addr);
+	}
+
+	/*
+	 * Issue the firmware Ingress Queue Command and extract the results if
+	 * it completes successfully.
+	 */
+	ret = t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), &rpl);
+	if (ret)
+		goto err;
+
+	netif_napi_add(dev, &rspq->napi, napi_rx_handler, 64);
+	rspq->cur_desc = rspq->desc;
+	rspq->cidx = 0;
+	rspq->gen = 1;
+	rspq->next_intr_params = rspq->intr_params;
+	rspq->cntxt_id = be16_to_cpu(rpl.iqid);
+	rspq->abs_id = be16_to_cpu(rpl.physiqid);
+	rspq->size--;			/* subtract status entry */
+	rspq->adapter = adapter;
+	rspq->netdev = dev;
+	rspq->handler = hnd;
+
+	/* set offset to -1 to distinguish ingress queues without FL */
+	rspq->offset = fl ? 0 : -1;
+
+	if (fl) {
+		fl->cntxt_id = be16_to_cpu(rpl.fl0id);
+		fl->avail = 0;
+		fl->pend_cred = 0;
+		fl->pidx = 0;
+		fl->cidx = 0;
+		fl->alloc_failed = 0;
+		fl->large_alloc_failed = 0;
+		fl->starving = 0;
+		refill_fl(adapter, fl, fl_cap(fl), GFP_KERNEL);
+	}
+
+	return 0;
+
+err:
+	/*
+	 * An error occurred.  Clean up our partial allocation state and
+	 * return the error.
+	 */
+	if (rspq->desc) {
+		dma_free_coherent(adapter->pdev_dev, rspq->size * rspq->iqe_len,
+				  rspq->desc, rspq->phys_addr);
+		rspq->desc = NULL;
+	}
+	if (fl && fl->desc) {
+		kfree(fl->sdesc);
+		fl->sdesc = NULL;
+		dma_free_coherent(adapter->pdev_dev, flsz * EQ_UNIT,
+				  fl->desc, fl->addr);
+		fl->desc = NULL;
+	}
+	return ret;
+}
+
+/**
+ *	t4vf_sge_alloc_eth_txq - allocate an SGE Ethernet TX Queue
+ *	@adapter: the adapter
+ *	@txq: pointer to the new txq to be filled in
+ *	@devq: the network TX queue associated with the new txq
+ *	@iqid: the relative ingress queue ID to which events relating to
+ *		the new txq should be directed
+ */
+int t4vf_sge_alloc_eth_txq(struct adapter *adapter, struct sge_eth_txq *txq,
+			   struct net_device *dev, struct netdev_queue *devq,
+			   unsigned int iqid)
+{
+	int ret, nentries;
+	struct fw_eq_eth_cmd cmd, rpl;
+	struct port_info *pi = netdev_priv(dev);
+
+	/*
+	 * Calculate the size of the hardware TX Queue (including the
+	 * status age on the end) in units of TX Descriptors.
+	 */
+	nentries = txq->q.size + STAT_LEN / sizeof(struct tx_desc);
+
+	/*
+	 * Allocate the hardware ring for the TX ring (with space for its
+	 * status page) along with the associated software descriptor ring.
+	 */
+	txq->q.desc = alloc_ring(adapter->pdev_dev, txq->q.size,
+				 sizeof(struct tx_desc),
+				 sizeof(struct tx_sw_desc),
+				 &txq->q.phys_addr, &txq->q.sdesc, STAT_LEN);
+	if (!txq->q.desc)
+		return -ENOMEM;
+
+	/*
+	 * Fill in the Egress Queue Command.  Note: As with the direct use of
+	 * the firmware Ingress Queue COmmand above in our RXQ allocation
+	 * routine, ideally, this code would be in t4vf_hw.c.  Again, we'll
+	 * have to see if there's some reasonable way to parameterize it
+	 * into the common code ...
+	 */
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_vfn = cpu_to_be32(FW_CMD_OP(FW_EQ_ETH_CMD) |
+				    FW_CMD_REQUEST |
+				    FW_CMD_WRITE |
+				    FW_CMD_EXEC);
+	cmd.alloc_to_len16 = cpu_to_be32(FW_EQ_ETH_CMD_ALLOC |
+					 FW_EQ_ETH_CMD_EQSTART |
+					 FW_LEN16(cmd));
+	cmd.viid_pkd = cpu_to_be32(FW_EQ_ETH_CMD_VIID(pi->viid));
+	cmd.fetchszm_to_iqid =
+		cpu_to_be32(FW_EQ_ETH_CMD_HOSTFCMODE(SGE_HOSTFCMODE_STPG) |
+			    FW_EQ_ETH_CMD_PCIECHN(pi->port_id) |
+			    FW_EQ_ETH_CMD_IQID(iqid));
+	cmd.dcaen_to_eqsize =
+		cpu_to_be32(FW_EQ_ETH_CMD_FBMIN(SGE_FETCHBURSTMIN_64B) |
+			    FW_EQ_ETH_CMD_FBMAX(SGE_FETCHBURSTMAX_512B) |
+			    FW_EQ_ETH_CMD_CIDXFTHRESH(SGE_CIDXFLUSHTHRESH_32) |
+			    FW_EQ_ETH_CMD_EQSIZE(nentries));
+	cmd.eqaddr = cpu_to_be64(txq->q.phys_addr);
+
+	/*
+	 * Issue the firmware Egress Queue Command and extract the results if
+	 * it completes successfully.
+	 */
+	ret = t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), &rpl);
+	if (ret) {
+		/*
+		 * The girmware Ingress Queue Command failed for some reason.
+		 * Free up our partial allocation state and return the error.
+		 */
+		kfree(txq->q.sdesc);
+		txq->q.sdesc = NULL;
+		dma_free_coherent(adapter->pdev_dev,
+				  nentries * sizeof(struct tx_desc),
+				  txq->q.desc, txq->q.phys_addr);
+		txq->q.desc = NULL;
+		return ret;
+	}
+
+	txq->q.in_use = 0;
+	txq->q.cidx = 0;
+	txq->q.pidx = 0;
+	txq->q.stat = (void *)&txq->q.desc[txq->q.size];
+	txq->q.cntxt_id = FW_EQ_ETH_CMD_EQID_GET(be32_to_cpu(rpl.eqid_pkd));
+	txq->q.abs_id =
+		FW_EQ_ETH_CMD_PHYSEQID_GET(be32_to_cpu(rpl.physeqid_pkd));
+	txq->txq = devq;
+	txq->tso = 0;
+	txq->tx_cso = 0;
+	txq->vlan_ins = 0;
+	txq->q.stops = 0;
+	txq->q.restarts = 0;
+	txq->mapping_err = 0;
+	return 0;
+}
+
+/*
+ * Free the DMA map resources associated with a TX queue.
+ */
+static void free_txq(struct adapter *adapter, struct sge_txq *tq)
+{
+	dma_free_coherent(adapter->pdev_dev,
+			  tq->size * sizeof(*tq->desc) + STAT_LEN,
+			  tq->desc, tq->phys_addr);
+	tq->cntxt_id = 0;
+	tq->sdesc = NULL;
+	tq->desc = NULL;
+}
+
+/*
+ * Free the resources associated with a response queue (possibly including a
+ * free list).
+ */
+static void free_rspq_fl(struct adapter *adapter, struct sge_rspq *rspq,
+			 struct sge_fl *fl)
+{
+	unsigned int flid = fl ? fl->cntxt_id : 0xffff;
+
+	t4vf_iq_free(adapter, FW_IQ_TYPE_FL_INT_CAP,
+		     rspq->cntxt_id, flid, 0xffff);
+	dma_free_coherent(adapter->pdev_dev, (rspq->size + 1) * rspq->iqe_len,
+			  rspq->desc, rspq->phys_addr);
+	netif_napi_del(&rspq->napi);
+	rspq->netdev = NULL;
+	rspq->cntxt_id = 0;
+	rspq->abs_id = 0;
+	rspq->desc = NULL;
+
+	if (fl) {
+		free_rx_bufs(adapter, fl, fl->avail);
+		dma_free_coherent(adapter->pdev_dev,
+				  fl->size * sizeof(*fl->desc) + STAT_LEN,
+				  fl->desc, fl->addr);
+		kfree(fl->sdesc);
+		fl->sdesc = NULL;
+		fl->cntxt_id = 0;
+		fl->desc = NULL;
+	}
+}
+
+/**
+ *	t4vf_free_sge_resources - free SGE resources
+ *	@adapter: the adapter
+ *
+ *	Frees resources used by the SGE queue sets.
+ */
+void t4vf_free_sge_resources(struct adapter *adapter)
+{
+	struct sge *s = &adapter->sge;
+	struct sge_eth_rxq *rxq = s->ethrxq;
+	struct sge_eth_txq *txq = s->ethtxq;
+	struct sge_rspq *evtq = &s->fw_evtq;
+	struct sge_rspq *intrq = &s->intrq;
+	int qs;
+
+	for (qs = 0; qs < adapter->sge.ethqsets; qs++) {
+		if (rxq->rspq.desc)
+			free_rspq_fl(adapter, &rxq->rspq, &rxq->fl);
+		if (txq->q.desc) {
+			t4vf_eth_eq_free(adapter, txq->q.cntxt_id);
+			free_tx_desc(adapter, &txq->q, txq->q.in_use, true);
+			kfree(txq->q.sdesc);
+			free_txq(adapter, &txq->q);
+		}
+	}
+	if (evtq->desc)
+		free_rspq_fl(adapter, evtq, NULL);
+	if (intrq->desc)
+		free_rspq_fl(adapter, intrq, NULL);
+}
+
+/**
+ *	t4vf_sge_start - enable SGE operation
+ *	@adapter: the adapter
+ *
+ *	Start tasklets and timers associated with the DMA engine.
+ */
+void t4vf_sge_start(struct adapter *adapter)
+{
+	adapter->sge.ethtxq_rover = 0;
+	mod_timer(&adapter->sge.rx_timer, jiffies + RX_QCHECK_PERIOD);
+	mod_timer(&adapter->sge.tx_timer, jiffies + TX_QCHECK_PERIOD);
+}
+
+/**
+ *	t4vf_sge_stop - disable SGE operation
+ *	@adapter: the adapter
+ *
+ *	Stop tasklets and timers associated with the DMA engine.  Note that
+ *	this is effective only if measures have been taken to disable any HW
+ *	events that may restart them.
+ */
+void t4vf_sge_stop(struct adapter *adapter)
+{
+	struct sge *s = &adapter->sge;
+
+	if (s->rx_timer.function)
+		del_timer_sync(&s->rx_timer);
+	if (s->tx_timer.function)
+		del_timer_sync(&s->tx_timer);
+}
+
+/**
+ *	t4vf_sge_init - initialize SGE
+ *	@adapter: the adapter
+ *
+ *	Performs SGE initialization needed every time after a chip reset.
+ *	We do not initialize any of the queue sets here, instead the driver
+ *	top-level must request those individually.  We also do not enable DMA
+ *	here, that should be done after the queues have been set up.
+ */
+int t4vf_sge_init(struct adapter *adapter)
+{
+	struct sge_params *sge_params = &adapter->params.sge;
+	u32 fl0 = sge_params->sge_fl_buffer_size[0];
+	u32 fl1 = sge_params->sge_fl_buffer_size[1];
+	struct sge *s = &adapter->sge;
+
+	/*
+	 * Start by vetting the basic SGE parameters which have been set up by
+	 * the Physical Function Driver.  Ideally we should be able to deal
+	 * with _any_ configuration.  Practice is different ...
+	 */
+	if (fl0 != PAGE_SIZE || (fl1 != 0 && fl1 <= fl0)) {
+		dev_err(adapter->pdev_dev, "bad SGE FL buffer sizes [%d, %d]\n",
+			fl0, fl1);
+		return -EINVAL;
+	}
+	if ((sge_params->sge_control & RXPKTCPLMODE) == 0) {
+		dev_err(adapter->pdev_dev, "bad SGE CPL MODE\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Now translate the adapter parameters into our internal forms.
+	 */
+	if (fl1)
+		FL_PG_ORDER = ilog2(fl1) - PAGE_SHIFT;
+	STAT_LEN = ((sge_params->sge_control & EGRSTATUSPAGESIZE) ? 128 : 64);
+	PKTSHIFT = PKTSHIFT_GET(sge_params->sge_control);
+	FL_ALIGN = 1 << (INGPADBOUNDARY_GET(sge_params->sge_control) +
+			 INGPADBOUNDARY_SHIFT);
+
+	/*
+	 * Set up tasklet timers.
+	 */
+	setup_timer(&s->rx_timer, sge_rx_timer_cb, (unsigned long)adapter);
+	setup_timer(&s->tx_timer, sge_tx_timer_cb, (unsigned long)adapter);
+
+	/*
+	 * Initialize Forwarded Interrupt Queue lock.
+	 */
+	spin_lock_init(&s->intrq_lock);
+
+	return 0;
+}

^ permalink raw reply related

* [PATCH 5/9] cxgb4vf: Add core T4 PCI-E SR-IOV Virtual Function hardware definitions and device communication code
From: Casey Leedom @ 2010-06-25 22:12 UTC (permalink / raw)
  To: netdev

Add core T4 PCI-E SR-IOV Virtual Function hardware definitions and device
communication code.

Signed-off-by: Casey Leedom
---
 drivers/net/cxgb4vf/t4vf_common.h |  273 ++++++++
 drivers/net/cxgb4vf/t4vf_defs.h   |  121 ++++
 drivers/net/cxgb4vf/t4vf_hw.c     | 1333 +++++++++++++++++++++++++++++++++++++
 3 files changed, 1727 insertions(+), 0 deletions(-)

diff --git a/drivers/net/cxgb4vf/t4vf_common.h 
b/drivers/net/cxgb4vf/t4vf_common.h
new file mode 100644
index 0000000..5c7bde7
--- /dev/null
+++ b/drivers/net/cxgb4vf/t4vf_common.h
@@ -0,0 +1,273 @@
+/*
+ * This file is part of the Chelsio T4 PCI-E SR-IOV Virtual Function Ethernet
+ * driver for Linux.
+ *
+ * Copyright (c) 2009-2010 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __T4VF_COMMON_H__
+#define __T4VF_COMMON_H__
+
+#include "../cxgb4/t4fw_api.h"
+
+/*
+ * The "len16" field of a Firmware Command Structure ...
+ */
+#define FW_LEN16(fw_struct) FW_CMD_LEN16(sizeof(fw_struct) / 16)
+
+/*
+ * Per-VF statistics.
+ */
+struct t4vf_port_stats {
+	/*
+	 * TX statistics.
+	 */
+	u64 tx_bcast_bytes;		/* broadcast */
+	u64 tx_bcast_frames;
+	u64 tx_mcast_bytes;		/* multicast */
+	u64 tx_mcast_frames;
+	u64 tx_ucast_bytes;		/* unicast */
+	u64 tx_ucast_frames;
+	u64 tx_drop_frames;		/* TX dropped frames */
+	u64 tx_offload_bytes;		/* offload */
+	u64 tx_offload_frames;
+
+	/*
+	 * RX statistics.
+	 */
+	u64 rx_bcast_bytes;		/* broadcast */
+	u64 rx_bcast_frames;
+	u64 rx_mcast_bytes;		/* multicast */
+	u64 rx_mcast_frames;
+	u64 rx_ucast_bytes;
+	u64 rx_ucast_frames;		/* unicast */
+
+	u64 rx_err_frames;		/* RX error frames */
+};
+
+/*
+ * Per-"port" (Virtual Interface) link configuration ...
+ */
+struct link_config {
+	unsigned int   supported;        /* link capabilities */
+	unsigned int   advertising;      /* advertised capabilities */
+	unsigned short requested_speed;  /* speed user has requested */
+	unsigned short speed;            /* actual link speed */
+	unsigned char  requested_fc;     /* flow control user has requested */
+	unsigned char  fc;               /* actual link flow control */
+	unsigned char  autoneg;          /* autonegotiating? */
+	unsigned char  link_ok;          /* link up? */
+};
+
+enum {
+	PAUSE_RX      = 1 << 0,
+	PAUSE_TX      = 1 << 1,
+	PAUSE_AUTONEG = 1 << 2
+};
+
+/*
+ * General device parameters ...
+ */
+struct dev_params {
+	u32 fwrev;			/* firmware version */
+	u32 tprev;			/* TP Microcode Version */
+};
+
+/*
+ * Scatter Gather Engine parameters.  These are almost all determined by the
+ * Physical Function Driver.  We just need to grab them to see within which
+ * environment we're playing ...
+ */
+struct sge_params {
+	u32 sge_control;		/* padding, boundaries, lengths, etc. */
+	u32 sge_host_page_size;		/* RDMA page sizes */
+	u32 sge_queues_per_page;	/* RDMA queues/page */
+	u32 sge_user_mode_limits;	/* limits for BAR2 user mode accesses */
+	u32 sge_fl_buffer_size[16];	/* free list buffer sizes */
+	u32 sge_ingress_rx_threshold;	/* RX counter interrupt threshold[4] */
+	u32 sge_timer_value_0_and_1;	/* interrupt coalescing timer values */
+	u32 sge_timer_value_2_and_3;
+	u32 sge_timer_value_4_and_5;
+};
+
+/*
+ * Vital Product Data parameters.
+ */
+struct vpd_params {
+	u32 cclk;			/* Core Clock (KHz) */
+};
+
+/*
+ * Global Receive Side Scaling (RSS) parameters in host-native format.
+ */
+struct rss_params {
+	unsigned int mode;		/* RSS mode */
+	union {
+	    struct {
+		int synmapen:1;		/* SYN Map Enable */
+		int syn4tupenipv6:1;	/* enable hashing 4-tuple IPv6 SYNs */
+		int syn2tupenipv6:1;	/* enable hashing 2-tuple IPv6 SYNs */
+		int syn4tupenipv4:1;	/* enable hashing 4-tuple IPv4 SYNs */
+		int syn2tupenipv4:1;	/* enable hashing 2-tuple IPv4 SYNs */
+		int ofdmapen:1;		/* Offload Map Enable */
+		int tnlmapen:1;		/* Tunnel Map Enable */
+		int tnlalllookup:1;	/* Tunnel All Lookup */
+		int hashtoeplitz:1;	/* use Toeplitz hash */
+	    } basicvirtual;
+	} u;
+};
+
+/*
+ * Virtual Interface RSS Configuration in host-native format.
+ */
+union rss_vi_config {
+    struct {
+	u16 defaultq;			/* Ingress Queue ID for !tnlalllookup */
+	int ip6fourtupen:1;		/* hash 4-tuple IPv6 ingress packets */
+	int ip6twotupen:1;		/* hash 2-tuple IPv6 ingress packets */
+	int ip4fourtupen:1;		/* hash 4-tuple IPv4 ingress packets */
+	int ip4twotupen:1;		/* hash 2-tuple IPv4 ingress packets */
+	int udpen;			/* hash 4-tuple UDP ingress packets */
+    } basicvirtual;
+};
+
+/*
+ * Maximum resources provisioned for a PCI VF.
+ */
+struct vf_resources {
+	unsigned int nvi;		/* N virtual interfaces */
+	unsigned int neq;		/* N egress Qs */
+	unsigned int nethctrl;		/* N egress ETH or CTRL Qs */
+	unsigned int niqflint;		/* N ingress Qs/w free list(s) & intr */
+	unsigned int niq;		/* N ingress Qs */
+	unsigned int tc;		/* PCI-E traffic class */
+	unsigned int pmask;		/* port access rights mask */
+	unsigned int nexactf;		/* N exact MPS filters */
+	unsigned int r_caps;		/* read capabilities */
+	unsigned int wx_caps;		/* write/execute capabilities */
+};
+
+/*
+ * Per-"adapter" (Virtual Function) parameters.
+ */
+struct adapter_params {
+	struct dev_params dev;		/* general device parameters */
+	struct sge_params sge;		/* Scatter Gather Engine */
+	struct vpd_params vpd;		/* Vital Product Data */
+	struct rss_params rss;		/* Receive Side Scaling */
+	struct vf_resources vfres;	/* Virtual Function Resource limits */
+	u8 nports;			/* # of Ethernet "ports" */
+};
+
+#include "adapter.h"
+
+#ifndef PCI_VENDOR_ID_CHELSIO
+# define PCI_VENDOR_ID_CHELSIO 0x1425
+#endif
+
+#define for_each_port(adapter, iter) \
+	for (iter = 0; iter < (adapter)->params.nports; iter++)
+
+static inline bool is_10g_port(const struct link_config *lc)
+{
+	return (lc->supported & SUPPORTED_10000baseT_Full) != 0;
+}
+
+static inline unsigned int core_ticks_per_usec(const struct adapter *adapter)
+{
+	return adapter->params.vpd.cclk / 1000;
+}
+
+static inline unsigned int us_to_core_ticks(const struct adapter *adapter,
+					    unsigned int us)
+{
+	return (us * adapter->params.vpd.cclk) / 1000;
+}
+
+static inline unsigned int core_ticks_to_us(const struct adapter *adapter,
+					    unsigned int ticks)
+{
+	return (ticks * 1000) / adapter->params.vpd.cclk;
+}
+
+int t4vf_wr_mbox_core(struct adapter *, const void *, int, void *, bool);
+
+static inline int t4vf_wr_mbox(struct adapter *adapter, const void *cmd,
+			       int size, void *rpl)
+{
+	return t4vf_wr_mbox_core(adapter, cmd, size, rpl, true);
+}
+
+static inline int t4vf_wr_mbox_ns(struct adapter *adapter, const void *cmd,
+				  int size, void *rpl)
+{
+	return t4vf_wr_mbox_core(adapter, cmd, size, rpl, false);
+}
+
+int __devinit t4vf_wait_dev_ready(struct adapter *);
+int __devinit t4vf_port_init(struct adapter *, int);
+
+int t4vf_query_params(struct adapter *, unsigned int, const u32 *, u32 *);
+int t4vf_set_params(struct adapter *, unsigned int, const u32 *, const u32 *);
+
+int t4vf_get_sge_params(struct adapter *);
+int t4vf_get_vpd_params(struct adapter *);
+int t4vf_get_dev_params(struct adapter *);
+int t4vf_get_rss_glb_config(struct adapter *);
+int t4vf_get_vfres(struct adapter *);
+
+int t4vf_read_rss_vi_config(struct adapter *, unsigned int,
+			    union rss_vi_config *);
+int t4vf_write_rss_vi_config(struct adapter *, unsigned int,
+			     union rss_vi_config *);
+int t4vf_config_rss_range(struct adapter *, unsigned int, int, int,
+			  const u16 *, int);
+
+int t4vf_alloc_vi(struct adapter *, int);
+int t4vf_free_vi(struct adapter *, int);
+int t4vf_enable_vi(struct adapter *, unsigned int, bool, bool);
+int t4vf_identify_port(struct adapter *, unsigned int, unsigned int);
+
+int t4vf_set_rxmode(struct adapter *, unsigned int, int, int, int, int, int,
+		    bool);
+int t4vf_alloc_mac_filt(struct adapter *, unsigned int, bool, unsigned int,
+			const u8 **, u16 *, u64 *, bool);
+int t4vf_change_mac(struct adapter *, unsigned int, int, const u8 *, bool);
+int t4vf_set_addr_hash(struct adapter *, unsigned int, bool, u64, bool);
+int t4vf_get_port_stats(struct adapter *, int, struct t4vf_port_stats *);
+
+int t4vf_iq_free(struct adapter *, unsigned int, unsigned int, unsigned int,
+		 unsigned int);
+int t4vf_eth_eq_free(struct adapter *, unsigned int);
+
+int t4vf_handle_fw_rpl(struct adapter *, const __be64 *);
+
+#endif /* __T4VF_COMMON_H__ */
diff --git a/drivers/net/cxgb4vf/t4vf_defs.h b/drivers/net/cxgb4vf/t4vf_defs.h
new file mode 100644
index 0000000..c7b127d
--- /dev/null
+++ b/drivers/net/cxgb4vf/t4vf_defs.h
@@ -0,0 +1,121 @@
+/*
+ * This file is part of the Chelsio T4 PCI-E SR-IOV Virtual Function Ethernet
+ * driver for Linux.
+ *
+ * Copyright (c) 2009-2010 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __T4VF_DEFS_H__
+#define __T4VF_DEFS_H__
+
+#include "../cxgb4/t4_regs.h"
+
+/*
+ * The VF Register Map.
+ *
+ * The Scatter Gather Engine (SGE), Multiport Support module (MPS), PIO Local
+ * bus module (PL) and CPU Interface Module (CIM) components are mapped via
+ * the Slice to Module Map Table (see below) in the Physical Function Register
+ * Map.  The Mail Box Data (MBDATA) range is mapped via the PCI-E Mailbox Base
+ * and Offset registers in the PF Register Map.  The MBDATA base address is
+ * quite constrained as it determines the Mailbox Data addresses for both PFs
+ * and VFs, and therefore must fit in both the VF and PF Register Maps without
+ * overlapping other registers.
+ */
+#define T4VF_SGE_BASE_ADDR	0x0000
+#define T4VF_MPS_BASE_ADDR	0x0100
+#define T4VF_PL_BASE_ADDR	0x0200
+#define T4VF_MBDATA_BASE_ADDR	0x0240
+#define T4VF_CIM_BASE_ADDR	0x0300
+
+#define T4VF_REGMAP_START	0x0000
+#define T4VF_REGMAP_SIZE	0x0400
+
+/*
+ * There's no hardware limitation which requires that the addresses of the
+ * Mailbox Data in the fixed CIM PF map and the programmable VF map must
+ * match.  However, it's a useful convention ...
+ */
+#if T4VF_MBDATA_BASE_ADDR != CIM_PF_MAILBOX_DATA
+#error T4VF_MBDATA_BASE_ADDR must match CIM_PF_MAILBOX_DATA!
+#endif
+
+/*
+ * Virtual Function "Slice to Module Map Table" definitions.
+ *
+ * This table allows us to map subsets of the various module register sets
+ * into the T4VF Register Map.  Each table entry identifies the index of the
+ * module whose registers are being mapped, the offset within the module's
+ * register set that the mapping should start at, the limit of the mapping,
+ * and the offset within the T4VF Register Map to which the module's registers
+ * are being mapped.  All addresses and qualtities are in terms of 32-bit
+ * words.  The "limit" value is also in terms of 32-bit words and is equal to
+ * the last address mapped in the T4VF Register Map 1 (i.e. it's a "<="
+ * relation rather than a "<").
+ */
+#define T4VF_MOD_MAP(module, index, first, last) \
+	T4VF_MOD_MAP_##module##_INDEX  = (index), \
+	T4VF_MOD_MAP_##module##_FIRST  = (first), \
+	T4VF_MOD_MAP_##module##_LAST   = (last), \
+	T4VF_MOD_MAP_##module##_OFFSET = ((first)/4), \
+	T4VF_MOD_MAP_##module##_BASE = \
+		(T4VF_##module##_BASE_ADDR/4 + (first)/4), \
+	T4VF_MOD_MAP_##module##_LIMIT = \
+		(T4VF_##module##_BASE_ADDR/4 + (last)/4),
+
+#define SGE_VF_KDOORBELL 0x0
+#define SGE_VF_GTS 0x4
+#define MPS_VF_CTL 0x0
+#define MPS_VF_STAT_RX_VF_ERR_FRAMES_H 0xfc
+#define PL_VF_WHOAMI 0x0
+#define CIM_VF_EXT_MAILBOX_CTRL 0x0
+#define CIM_VF_EXT_MAILBOX_STATUS 0x4
+
+enum {
+    T4VF_MOD_MAP(SGE, 2, SGE_VF_KDOORBELL, SGE_VF_GTS)
+    T4VF_MOD_MAP(MPS, 0, MPS_VF_CTL, MPS_VF_STAT_RX_VF_ERR_FRAMES_H)
+    T4VF_MOD_MAP(PL,  3, PL_VF_WHOAMI, PL_VF_WHOAMI)
+    T4VF_MOD_MAP(CIM, 1, CIM_VF_EXT_MAILBOX_CTRL, CIM_VF_EXT_MAILBOX_STATUS)
+};
+
+/*
+ * There isn't a Slice to Module Map Table entry for the Mailbox Data
+ * registers, but it's convenient to use similar names as above.  There are 8
+ * little-endian 64-bit Mailbox Data registers.  Note that the "instances"
+ * value below is in terms of 32-bit words which matches the "word" addressing
+ * space we use above for the Slice to Module Map Space.
+ */
+#define NUM_CIM_VF_MAILBOX_DATA_INSTANCES 16
+
+#define T4VF_MBDATA_FIRST	0
+#define T4VF_MBDATA_LAST	((NUM_CIM_VF_MAILBOX_DATA_INSTANCES-1)*4)
+
+#endif /* __T4T4VF_DEFS_H__ */
diff --git a/drivers/net/cxgb4vf/t4vf_hw.c b/drivers/net/cxgb4vf/t4vf_hw.c
new file mode 100644
index 0000000..1ef2528
--- /dev/null
+++ b/drivers/net/cxgb4vf/t4vf_hw.c
@@ -0,0 +1,1333 @@
+/*
+ * This file is part of the Chelsio T4 PCI-E SR-IOV Virtual Function Ethernet
+ * driver for Linux.
+ *
+ * Copyright (c) 2009-2010 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/version.h>
+#include <linux/pci.h>
+
+#include "t4vf_common.h"
+#include "t4vf_defs.h"
+
+#include "../cxgb4/t4_regs.h"
+#include "../cxgb4/t4fw_api.h"
+
+/*
+ * Wait for the device to become ready (signified by our "who am I" register
+ * returning a value other than all 1's).  Return an error if it doesn't
+ * become ready ...
+ */
+int __devinit t4vf_wait_dev_ready(struct adapter *adapter)
+{
+	const u32 whoami = T4VF_PL_BASE_ADDR + PL_VF_WHOAMI;
+	const u32 notready1 = 0xffffffff;
+	const u32 notready2 = 0xeeeeeeee;
+	u32 val;
+
+	val = t4_read_reg(adapter, whoami);
+	if (val != notready1 && val != notready2)
+		return 0;
+	msleep(500);
+	val = t4_read_reg(adapter, whoami);
+	if (val != notready1 && val != notready2)
+		return 0;
+	else
+		return -EIO;
+}
+
+/*
+ * Get the reply to a mailbox command and store it in @rpl in big-endian order
+ * (since the firmware data structures are specified in a big-endian layout).
+ */
+static void get_mbox_rpl(struct adapter *adapter, __be64 *rpl, int size,
+			 u32 mbox_data)
+{
+	for ( ; size; size -= 8, mbox_data += 8)
+		*rpl++ = cpu_to_be64(t4_read_reg64(adapter, mbox_data));
+}
+
+/*
+ * Dump contents of mailbox with a leading tag.
+ */
+static void dump_mbox(struct adapter *adapter, const char *tag, u32 mbox_data)
+{
+	dev_err(adapter->pdev_dev,
+		"mbox %s: %llx %llx %llx %llx %llx %llx %llx %llx\n", tag,
+		(unsigned long long)t4_read_reg64(adapter, mbox_data +  0),
+		(unsigned long long)t4_read_reg64(adapter, mbox_data +  8),
+		(unsigned long long)t4_read_reg64(adapter, mbox_data + 16),
+		(unsigned long long)t4_read_reg64(adapter, mbox_data + 24),
+		(unsigned long long)t4_read_reg64(adapter, mbox_data + 32),
+		(unsigned long long)t4_read_reg64(adapter, mbox_data + 40),
+		(unsigned long long)t4_read_reg64(adapter, mbox_data + 48),
+		(unsigned long long)t4_read_reg64(adapter, mbox_data + 56));
+}
+
+/**
+ *	t4vf_wr_mbox_core - send a command to FW through the mailbox
+ *	@adapter: the adapter
+ *	@cmd: the command to write
+ *	@size: command length in bytes
+ *	@rpl: where to optionally store the reply
+ *	@sleep_ok: if true we may sleep while awaiting command completion
+ *
+ *	Sends the given command to FW through the mailbox and waits for the
+ *	FW to execute the command.  If @rpl is not %NULL it is used to store
+ *	the FW's reply to the command.  The command and its optional reply
+ *	are of the same length.  FW can take up to 500 ms to respond.
+ *	@sleep_ok determines whether we may sleep while awaiting the response.
+ *	If sleeping is allowed we use progressive backoff otherwise we spin.
+ *
+ *	The return value is 0 on success or a negative errno on failure.  A
+ *	failure can happen either because we are not able to execute the
+ *	command or FW executes it but signals an error.  In the latter case
+ *	the return value is the error code indicated by FW (negated).
+ */
+int t4vf_wr_mbox_core(struct adapter *adapter, const void *cmd, int size,
+		      void *rpl, bool sleep_ok)
+{
+	static int delay[] = {
+		1, 1, 3, 5, 10, 10, 20, 50, 100
+	};
+
+	u32 v;
+	int i, ms, delay_idx;
+	const __be64 *p;
+	u32 mbox_data = T4VF_MBDATA_BASE_ADDR;
+	u32 mbox_ctl = T4VF_CIM_BASE_ADDR + CIM_VF_EXT_MAILBOX_CTRL;
+
+	/*
+	 * Commands must be multiples of 16 bytes in length and may not be
+	 * larger than the size of the Mailbox Data register array.
+	 */
+	if ((size % 16) != 0 ||
+	    size > NUM_CIM_VF_MAILBOX_DATA_INSTANCES * 4)
+		return -EINVAL;
+
+	/*
+	 * Loop trying to get ownership of the mailbox.  Return an error
+	 * if we can't gain ownership.
+	 */
+	v = MBOWNER_GET(t4_read_reg(adapter, mbox_ctl));
+	for (i = 0; v == MBOX_OWNER_NONE && i < 3; i++)
+		v = MBOWNER_GET(t4_read_reg(adapter, mbox_ctl));
+	if (v != MBOX_OWNER_DRV)
+		return v == MBOX_OWNER_FW ? -EBUSY : -ETIMEDOUT;
+
+	/*
+	 * Write the command array into the Mailbox Data register array and
+	 * transfer ownership of the mailbox to the firmware.
+	 */
+	for (i = 0, p = cmd; i < size; i += 8)
+		t4_write_reg64(adapter, mbox_data + i, be64_to_cpu(*p++));
+	t4_write_reg(adapter, mbox_ctl,
+		     MBMSGVALID | MBOWNER(MBOX_OWNER_FW));
+	t4_read_reg(adapter, mbox_ctl);          /* flush write */
+
+	/*
+	 * Spin waiting for firmware to acknowledge processing our command.
+	 */
+	delay_idx = 0;
+	ms = delay[0];
+
+	for (i = 0; i < 500; i += ms) {
+		if (sleep_ok) {
+			ms = delay[delay_idx];
+			if (delay_idx < ARRAY_SIZE(delay))
+				delay_idx++;
+			msleep(ms);
+		} else
+			mdelay(ms);
+
+		/*
+		 * If we're the owner, see if this is the reply we wanted.
+		 */
+		v = t4_read_reg(adapter, mbox_ctl);
+		if (MBOWNER_GET(v) == MBOX_OWNER_DRV) {
+			/*
+			 * If the Message Valid bit isn't on, revoke ownership
+			 * of the mailbox and continue waiting for our reply.
+			 */
+			if ((v & MBMSGVALID) == 0) {
+				t4_write_reg(adapter, mbox_ctl,
+					     MBOWNER(MBOX_OWNER_NONE));
+				continue;
+			}
+
+			/*
+			 * We now have our reply.  Extract the command return
+			 * value, copy the reply back to our caller's buffer
+			 * (if specified) and revoke ownership of the mailbox.
+			 * We return the (negated) firmware command return
+			 * code (this depends on FW_SUCCESS == 0).
+			 */
+
+			/* return value in low-order little-endian word */
+			v = t4_read_reg(adapter, mbox_data);
+			if (FW_CMD_RETVAL_GET(v))
+				dump_mbox(adapter, "FW Error", mbox_data);
+
+			if (rpl) {
+				/* request bit in high-order BE word */
+				WARN_ON((be32_to_cpu(*(const u32 *)cmd)
+					 & FW_CMD_REQUEST) == 0);
+				get_mbox_rpl(adapter, rpl, size, mbox_data);
+				WARN_ON((be32_to_cpu(*(u32 *)rpl)
+					 & FW_CMD_REQUEST) != 0);
+			}
+			t4_write_reg(adapter, mbox_ctl,
+				     MBOWNER(MBOX_OWNER_NONE));
+			return -FW_CMD_RETVAL_GET(v);
+		}
+	}
+
+	/*
+	 * We timed out.  Return the error ...
+	 */
+	dump_mbox(adapter, "FW Timeout", mbox_data);
+	return -ETIMEDOUT;
+}
+
+/**
+ *	hash_mac_addr - return the hash value of a MAC address
+ *	@addr: the 48-bit Ethernet MAC address
+ *
+ *	Hashes a MAC address according to the hash function used by hardware
+ *	inexact (hash) address matching.
+ */
+static int hash_mac_addr(const u8 *addr)
+{
+	u32 a = ((u32)addr[0] << 16) | ((u32)addr[1] << 8) | addr[2];
+	u32 b = ((u32)addr[3] << 16) | ((u32)addr[4] << 8) | addr[5];
+	a ^= b;
+	a ^= (a >> 12);
+	a ^= (a >> 6);
+	return a & 0x3f;
+}
+
+/**
+ *	init_link_config - initialize a link's SW state
+ *	@lc: structure holding the link state
+ *	@caps: link capabilities
+ *
+ *	Initializes the SW state maintained for each link, including the link's
+ *	capabilities and default speed/flow-control/autonegotiation settings.
+ */
+static void __devinit init_link_config(struct link_config *lc,
+				       unsigned int caps)
+{
+	lc->supported = caps;
+	lc->requested_speed = 0;
+	lc->speed = 0;
+	lc->requested_fc = lc->fc = PAUSE_RX | PAUSE_TX;
+	if (lc->supported & SUPPORTED_Autoneg) {
+		lc->advertising = lc->supported;
+		lc->autoneg = AUTONEG_ENABLE;
+		lc->requested_fc |= PAUSE_AUTONEG;
+	} else {
+		lc->advertising = 0;
+		lc->autoneg = AUTONEG_DISABLE;
+	}
+}
+
+/**
+ *	t4vf_port_init - initialize port hardware/software state
+ *	@adapter: the adapter
+ *	@pidx: the adapter port index
+ */
+int __devinit t4vf_port_init(struct adapter *adapter, int pidx)
+{
+	struct port_info *pi = adap2pinfo(adapter, pidx);
+	struct fw_vi_cmd vi_cmd, vi_rpl;
+	struct fw_port_cmd port_cmd, port_rpl;
+	int v;
+	u32 word;
+
+	/*
+	 * Execute a VI Read command to get our Virtual Interface information
+	 * like MAC address, etc.
+	 */
+	memset(&vi_cmd, 0, sizeof(vi_cmd));
+	vi_cmd.op_to_vfn = cpu_to_be32(FW_CMD_OP(FW_VI_CMD) |
+				       FW_CMD_REQUEST |
+				       FW_CMD_READ);
+	vi_cmd.alloc_to_len16 = cpu_to_be32(FW_LEN16(vi_cmd));
+	vi_cmd.type_viid = cpu_to_be16(FW_VI_CMD_VIID(pi->viid));
+	v = t4vf_wr_mbox(adapter, &vi_cmd, sizeof(vi_cmd), &vi_rpl);
+	if (v)
+		return v;
+
+	BUG_ON(pi->port_id != FW_VI_CMD_PORTID_GET(vi_rpl.portid_pkd));
+	pi->rss_size = FW_VI_CMD_RSSSIZE_GET(be16_to_cpu(vi_rpl.rsssize_pkd));
+	t4_os_set_hw_addr(adapter, pidx, vi_rpl.mac);
+
+	/*
+	 * If we don't have read access to our port information, we're done
+	 * now.  Otherwise, execute a PORT Read command to get it ...
+	 */
+	if (!(adapter->params.vfres.r_caps & FW_CMD_CAP_PORT))
+		return 0;
+
+	memset(&port_cmd, 0, sizeof(port_cmd));
+	port_cmd.op_to_portid = cpu_to_be32(FW_CMD_OP(FW_PORT_CMD) |
+					    FW_CMD_REQUEST |
+					    FW_CMD_READ |
+					    FW_PORT_CMD_PORTID(pi->port_id));
+	port_cmd.action_to_len16 =
+		cpu_to_be32(FW_PORT_CMD_ACTION(FW_PORT_ACTION_GET_PORT_INFO) |
+			    FW_LEN16(port_cmd));
+	v = t4vf_wr_mbox(adapter, &port_cmd, sizeof(port_cmd), &port_rpl);
+	if (v)
+		return v;
+
+	v = 0;
+	word = be16_to_cpu(port_rpl.u.info.pcap);
+	if (word & FW_PORT_CAP_SPEED_100M)
+		v |= SUPPORTED_100baseT_Full;
+	if (word & FW_PORT_CAP_SPEED_1G)
+		v |= SUPPORTED_1000baseT_Full;
+	if (word & FW_PORT_CAP_SPEED_10G)
+		v |= SUPPORTED_10000baseT_Full;
+	if (word & FW_PORT_CAP_ANEG)
+		v |= SUPPORTED_Autoneg;
+	init_link_config(&pi->link_cfg, v);
+
+	return 0;
+}
+
+/**
+ *	t4vf_query_params - query FW or device parameters
+ *	@adapter: the adapter
+ *	@nparams: the number of parameters
+ *	@params: the parameter names
+ *	@vals: the parameter values
+ *
+ *	Reads the values of firmware or device parameters.  Up to 7 parameters
+ *	can be queried at once.
+ */
+int t4vf_query_params(struct adapter *adapter, unsigned int nparams,
+		      const u32 *params, u32 *vals)
+{
+	int i, ret;
+	struct fw_params_cmd cmd, rpl;
+	struct fw_params_param *p;
+	size_t len16;
+
+	if (nparams > 7)
+		return -EINVAL;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_vfn = cpu_to_be32(FW_CMD_OP(FW_PARAMS_CMD) |
+				    FW_CMD_REQUEST |
+				    FW_CMD_READ);
+	len16 = DIV_ROUND_UP(offsetof(struct fw_params_cmd,
+				      param[nparams].mnem), 16);
+	cmd.retval_len16 = cpu_to_be32(FW_CMD_LEN16(len16));
+	for (i = 0, p = &cmd.param[0]; i < nparams; i++, p++)
+		p->mnem = htonl(*params++);
+
+	ret = t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), &rpl);
+	if (ret == 0)
+		for (i = 0, p = &rpl.param[0]; i < nparams; i++, p++)
+			*vals++ = be32_to_cpu(p->val);
+	return ret;
+}
+
+/**
+ *	t4vf_set_params - sets FW or device parameters
+ *	@adapter: the adapter
+ *	@nparams: the number of parameters
+ *	@params: the parameter names
+ *	@vals: the parameter values
+ *
+ *	Sets the values of firmware or device parameters.  Up to 7 parameters
+ *	can be specified at once.
+ */
+int t4vf_set_params(struct adapter *adapter, unsigned int nparams,
+		    const u32 *params, const u32 *vals)
+{
+	int i;
+	struct fw_params_cmd cmd;
+	struct fw_params_param *p;
+	size_t len16;
+
+	if (nparams > 7)
+		return -EINVAL;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_vfn = cpu_to_be32(FW_CMD_OP(FW_PARAMS_CMD) |
+				    FW_CMD_REQUEST |
+				    FW_CMD_WRITE);
+	len16 = DIV_ROUND_UP(offsetof(struct fw_params_cmd,
+				      param[nparams]), 16);
+	cmd.retval_len16 = cpu_to_be32(FW_CMD_LEN16(len16));
+	for (i = 0, p = &cmd.param[0]; i < nparams; i++, p++) {
+		p->mnem = cpu_to_be32(*params++);
+		p->val = cpu_to_be32(*vals++);
+	}
+
+	return t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), NULL);
+}
+
+/**
+ *	t4vf_get_sge_params - retrieve adapter Scatter gather Engine parameters
+ *	@adapter: the adapter
+ *
+ *	Retrieves various core SGE parameters in the form of hardware SGE
+ *	register values.  The caller is responsible for decoding these as
+ *	needed.  The SGE parameters are stored in @adapter->params.sge.
+ */
+int t4vf_get_sge_params(struct adapter *adapter)
+{
+	struct sge_params *sge_params = &adapter->params.sge;
+	u32 params[7], vals[7];
+	int v;
+
+	params[0] = (FW_PARAMS_MNEM(FW_PARAMS_MNEM_REG) |
+		     FW_PARAMS_PARAM_XYZ(SGE_CONTROL));
+	params[1] = (FW_PARAMS_MNEM(FW_PARAMS_MNEM_REG) |
+		     FW_PARAMS_PARAM_XYZ(SGE_HOST_PAGE_SIZE));
+	params[2] = (FW_PARAMS_MNEM(FW_PARAMS_MNEM_REG) |
+		     FW_PARAMS_PARAM_XYZ(SGE_FL_BUFFER_SIZE0));
+	params[3] = (FW_PARAMS_MNEM(FW_PARAMS_MNEM_REG) |
+		     FW_PARAMS_PARAM_XYZ(SGE_FL_BUFFER_SIZE1));
+	params[4] = (FW_PARAMS_MNEM(FW_PARAMS_MNEM_REG) |
+		     FW_PARAMS_PARAM_XYZ(SGE_TIMER_VALUE_0_AND_1));
+	params[5] = (FW_PARAMS_MNEM(FW_PARAMS_MNEM_REG) |
+		     FW_PARAMS_PARAM_XYZ(SGE_TIMER_VALUE_2_AND_3));
+	params[6] = (FW_PARAMS_MNEM(FW_PARAMS_MNEM_REG) |
+		     FW_PARAMS_PARAM_XYZ(SGE_TIMER_VALUE_4_AND_5));
+	v = t4vf_query_params(adapter, 7, params, vals);
+	if (v)
+		return v;
+	sge_params->sge_control = vals[0];
+	sge_params->sge_host_page_size = vals[1];
+	sge_params->sge_fl_buffer_size[0] = vals[2];
+	sge_params->sge_fl_buffer_size[1] = vals[3];
+	sge_params->sge_timer_value_0_and_1 = vals[4];
+	sge_params->sge_timer_value_2_and_3 = vals[5];
+	sge_params->sge_timer_value_4_and_5 = vals[6];
+
+	params[0] = (FW_PARAMS_MNEM(FW_PARAMS_MNEM_REG) |
+		     FW_PARAMS_PARAM_XYZ(SGE_INGRESS_RX_THRESHOLD));
+	v = t4vf_query_params(adapter, 1, params, vals);
+	if (v)
+		return v;
+	sge_params->sge_ingress_rx_threshold = vals[0];
+
+	return 0;
+}
+
+/**
+ *	t4vf_get_vpd_params - retrieve device VPD paremeters
+ *	@adapter: the adapter
+ *
+ *	Retrives various device Vital Product Data parameters.  The parameters
+ *	are stored in @adapter->params.vpd.
+ */
+int t4vf_get_vpd_params(struct adapter *adapter)
+{
+	struct vpd_params *vpd_params = &adapter->params.vpd;
+	u32 params[7], vals[7];
+	int v;
+
+	params[0] = (FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) |
+		     FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_CCLK));
+	v = t4vf_query_params(adapter, 1, params, vals);
+	if (v)
+		return v;
+	vpd_params->cclk = vals[0];
+
+	return 0;
+}
+
+/**
+ *	t4vf_get_dev_params - retrieve device paremeters
+ *	@adapter: the adapter
+ *
+ *	Retrives various device parameters.  The parameters are stored in
+ *	@adapter->params.dev.
+ */
+int t4vf_get_dev_params(struct adapter *adapter)
+{
+	struct dev_params *dev_params = &adapter->params.dev;
+	u32 params[7], vals[7];
+	int v;
+
+	params[0] = (FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) |
+		     FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_FWREV));
+	params[1] = (FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) |
+		     FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_TPREV));
+	v = t4vf_query_params(adapter, 2, params, vals);
+	if (v)
+		return v;
+	dev_params->fwrev = vals[0];
+	dev_params->tprev = vals[1];
+
+	return 0;
+}
+
+/**
+ *	t4vf_get_rss_glb_config - retrieve adapter RSS Global Configuration
+ *	@adapter: the adapter
+ *
+ *	Retrieves global RSS mode and parameters with which we have to live
+ *	and stores them in the @adapter's RSS parameters.
+ */
+int t4vf_get_rss_glb_config(struct adapter *adapter)
+{
+	struct rss_params *rss = &adapter->params.rss;
+	struct fw_rss_glb_config_cmd cmd, rpl;
+	int v;
+
+	/*
+	 * Execute an RSS Global Configuration read command to retrieve
+	 * our RSS configuration.
+	 */
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_write = cpu_to_be32(FW_CMD_OP(FW_RSS_GLB_CONFIG_CMD) |
+				      FW_CMD_REQUEST |
+				      FW_CMD_READ);
+	cmd.retval_len16 = cpu_to_be32(FW_LEN16(cmd));
+	v = t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), &rpl);
+	if (v)
+		return v;
+
+	/*
+	 * Transate the big-endian RSS Global Configuration into our
+	 * cpu-endian format based on the RSS mode.  We also do first level
+	 * filtering at this point to weed out modes which don't support
+	 * VF Drivers ...
+	 */
+	rss->mode = FW_RSS_GLB_CONFIG_CMD_MODE_GET(
+			be32_to_cpu(rpl.u.manual.mode_pkd));
+	switch (rss->mode) {
+	case FW_RSS_GLB_CONFIG_CMD_MODE_BASICVIRTUAL: {
+		u32 word = be32_to_cpu(
+				rpl.u.basicvirtual.synmapen_to_hashtoeplitz);
+
+		rss->u.basicvirtual.synmapen =
+			((word & FW_RSS_GLB_CONFIG_CMD_SYNMAPEN) != 0);
+		rss->u.basicvirtual.syn4tupenipv6 =
+			((word & FW_RSS_GLB_CONFIG_CMD_SYN4TUPENIPV6) != 0);
+		rss->u.basicvirtual.syn2tupenipv6 =
+			((word & FW_RSS_GLB_CONFIG_CMD_SYN2TUPENIPV6) != 0);
+		rss->u.basicvirtual.syn4tupenipv4 =
+			((word & FW_RSS_GLB_CONFIG_CMD_SYN4TUPENIPV4) != 0);
+		rss->u.basicvirtual.syn2tupenipv4 =
+			((word & FW_RSS_GLB_CONFIG_CMD_SYN2TUPENIPV4) != 0);
+
+		rss->u.basicvirtual.ofdmapen =
+			((word & FW_RSS_GLB_CONFIG_CMD_OFDMAPEN) != 0);
+
+		rss->u.basicvirtual.tnlmapen =
+			((word & FW_RSS_GLB_CONFIG_CMD_TNLMAPEN) != 0);
+		rss->u.basicvirtual.tnlalllookup =
+			((word  & FW_RSS_GLB_CONFIG_CMD_TNLALLLKP) != 0);
+
+		rss->u.basicvirtual.hashtoeplitz =
+			((word & FW_RSS_GLB_CONFIG_CMD_HASHTOEPLITZ) != 0);
+
+		/* we need at least Tunnel Map Enable to be set */
+		if (!rss->u.basicvirtual.tnlmapen)
+			return -EINVAL;
+		break;
+	}
+
+	default:
+		/* all unknown/unsupported RSS modes result in an error */
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
+ *	t4vf_get_vfres - retrieve VF resource limits
+ *	@adapter: the adapter
+ *
+ *	Retrieves configured resource limits and capabilities for a virtual
+ *	function.  The results are stored in @adapter->vfres.
+ */
+int t4vf_get_vfres(struct adapter *adapter)
+{
+	struct vf_resources *vfres = &adapter->params.vfres;
+	struct fw_pfvf_cmd cmd, rpl;
+	int v;
+	u32 word;
+
+	/*
+	 * Execute PFVF Read command to get VF resource limits; bail out early
+	 * with error on command failure.
+	 */
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_vfn = cpu_to_be32(FW_CMD_OP(FW_PFVF_CMD) |
+				    FW_CMD_REQUEST |
+				    FW_CMD_READ);
+	cmd.retval_len16 = cpu_to_be32(FW_LEN16(cmd));
+	v = t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), &rpl);
+	if (v)
+		return v;
+
+	/*
+	 * Extract VF resource limits and return success.
+	 */
+	word = be32_to_cpu(rpl.niqflint_niq);
+	vfres->niqflint = FW_PFVF_CMD_NIQFLINT_GET(word);
+	vfres->niq = FW_PFVF_CMD_NIQ_GET(word);
+
+	word = be32_to_cpu(rpl.type_to_neq);
+	vfres->neq = FW_PFVF_CMD_NEQ_GET(word);
+	vfres->pmask = FW_PFVF_CMD_PMASK_GET(word);
+
+	word = be32_to_cpu(rpl.tc_to_nexactf);
+	vfres->tc = FW_PFVF_CMD_TC_GET(word);
+	vfres->nvi = FW_PFVF_CMD_NVI_GET(word);
+	vfres->nexactf = FW_PFVF_CMD_NEXACTF_GET(word);
+
+	word = be32_to_cpu(rpl.r_caps_to_nethctrl);
+	vfres->r_caps = FW_PFVF_CMD_R_CAPS_GET(word);
+	vfres->wx_caps = FW_PFVF_CMD_WX_CAPS_GET(word);
+	vfres->nethctrl = FW_PFVF_CMD_NETHCTRL_GET(word);
+
+	return 0;
+}
+
+/**
+ *	t4vf_read_rss_vi_config - read a VI's RSS configuration
+ *	@adapter: the adapter
+ *	@viid: Virtual Interface ID
+ *	@config: pointer to host-native VI RSS Configuration buffer
+ *
+ *	Reads the Virtual Interface's RSS configuration information and
+ *	translates it into CPU-native format.
+ */
+int t4vf_read_rss_vi_config(struct adapter *adapter, unsigned int viid,
+			    union rss_vi_config *config)
+{
+	struct fw_rss_vi_config_cmd cmd, rpl;
+	int v;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_viid = cpu_to_be32(FW_CMD_OP(FW_RSS_VI_CONFIG_CMD) |
+				     FW_CMD_REQUEST |
+				     FW_CMD_READ |
+				     FW_RSS_VI_CONFIG_CMD_VIID(viid));
+	cmd.retval_len16 = cpu_to_be32(FW_LEN16(cmd));
+	v = t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), &rpl);
+	if (v)
+		return v;
+
+	switch (adapter->params.rss.mode) {
+	case FW_RSS_GLB_CONFIG_CMD_MODE_BASICVIRTUAL: {
+		u32 word = be32_to_cpu(rpl.u.basicvirtual.defaultq_to_udpen);
+
+		config->basicvirtual.ip6fourtupen =
+			((word & FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN) != 0);
+		config->basicvirtual.ip6twotupen =
+			((word & FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN) != 0);
+		config->basicvirtual.ip4fourtupen =
+			((word & FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN) != 0);
+		config->basicvirtual.ip4twotupen =
+			((word & FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN) != 0);
+		config->basicvirtual.udpen =
+			((word & FW_RSS_VI_CONFIG_CMD_UDPEN) != 0);
+		config->basicvirtual.defaultq =
+			FW_RSS_VI_CONFIG_CMD_DEFAULTQ_GET(word);
+		break;
+	}
+
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
+ *	t4vf_write_rss_vi_config - write a VI's RSS configuration
+ *	@adapter: the adapter
+ *	@viid: Virtual Interface ID
+ *	@config: pointer to host-native VI RSS Configuration buffer
+ *
+ *	Write the Virtual Interface's RSS configuration information
+ *	(translating it into firmware-native format before writing).
+ */
+int t4vf_write_rss_vi_config(struct adapter *adapter, unsigned int viid,
+			     union rss_vi_config *config)
+{
+	struct fw_rss_vi_config_cmd cmd, rpl;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_viid = cpu_to_be32(FW_CMD_OP(FW_RSS_VI_CONFIG_CMD) |
+				     FW_CMD_REQUEST |
+				     FW_CMD_WRITE |
+				     FW_RSS_VI_CONFIG_CMD_VIID(viid));
+	cmd.retval_len16 = cpu_to_be32(FW_LEN16(cmd));
+	switch (adapter->params.rss.mode) {
+	case FW_RSS_GLB_CONFIG_CMD_MODE_BASICVIRTUAL: {
+		u32 word = 0;
+
+		if (config->basicvirtual.ip6fourtupen)
+			word |= FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN;
+		if (config->basicvirtual.ip6twotupen)
+			word |= FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN;
+		if (config->basicvirtual.ip4fourtupen)
+			word |= FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN;
+		if (config->basicvirtual.ip4twotupen)
+			word |= FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN;
+		if (config->basicvirtual.udpen)
+			word |= FW_RSS_VI_CONFIG_CMD_UDPEN;
+		word |= FW_RSS_VI_CONFIG_CMD_DEFAULTQ(
+				config->basicvirtual.defaultq);
+		cmd.u.basicvirtual.defaultq_to_udpen = cpu_to_be32(word);
+		break;
+	}
+
+	default:
+		return -EINVAL;
+	}
+
+	return t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), &rpl);
+}
+
+/**
+ *	t4vf_config_rss_range - configure a portion of the RSS mapping table
+ *	@adapter: the adapter
+ *	@viid: Virtual Interface of RSS Table Slice
+ *	@start: starting entry in the table to write
+ *	@n: how many table entries to write
+ *	@rspq: values for the "Response Queue" (Ingress Queue) lookup table
+ *	@nrspq: number of values in @rspq
+ *
+ *	Programs the selected part of the VI's RSS mapping table with the
+ *	provided values.  If @nrspq < @n the supplied values are used repeatedly
+ *	until the full table range is populated.
+ *
+ *	The caller must ensure the values in @rspq are in the range 0..1023.
+ */
+int t4vf_config_rss_range(struct adapter *adapter, unsigned int viid,
+			  int start, int n, const u16 *rspq, int nrspq)
+{
+	const u16 *rsp = rspq;
+	const u16 *rsp_end = rspq+nrspq;
+	struct fw_rss_ind_tbl_cmd cmd;
+
+	/*
+	 * Initialize firmware command template to write the RSS table.
+	 */
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_viid = cpu_to_be32(FW_CMD_OP(FW_RSS_IND_TBL_CMD) |
+				     FW_CMD_REQUEST |
+				     FW_CMD_WRITE |
+				     FW_RSS_IND_TBL_CMD_VIID(viid));
+	cmd.retval_len16 = cpu_to_be32(FW_LEN16(cmd));
+
+	/*
+	 * Each firmware RSS command can accommodate up to 32 RSS Ingress
+	 * Queue Identifiers.  These Ingress Queue IDs are packed three to
+	 * a 32-bit word as 10-bit values with the upper remaining 2 bits
+	 * reserved.
+	 */
+	while (n > 0) {
+		__be32 *qp = &cmd.iq0_to_iq2;
+		int nq = min(n, 32);
+		int ret;
+
+		/*
+		 * Set up the firmware RSS command header to send the next
+		 * "nq" Ingress Queue IDs to the firmware.
+		 */
+		cmd.niqid = cpu_to_be16(nq);
+		cmd.startidx = cpu_to_be16(start);
+
+		/*
+		 * "nq" more done for the start of the next loop.
+		 */
+		start += nq;
+		n -= nq;
+
+		/*
+		 * While there are still Ingress Queue IDs to stuff into the
+		 * current firmware RSS command, retrieve them from the
+		 * Ingress Queue ID array and insert them into the command.
+		 */
+		while (nq > 0) {
+			/*
+			 * Grab up to the next 3 Ingress Queue IDs (wrapping
+			 * around the Ingress Queue ID array if necessary) and
+			 * insert them into the firmware RSS command at the
+			 * current 3-tuple position within the commad.
+			 */
+			u16 qbuf[3];
+			u16 *qbp = qbuf;
+			int nqbuf = min(3, nq);
+
+			nq -= nqbuf;
+			qbuf[0] = qbuf[1] = qbuf[2] = 0;
+			while (nqbuf) {
+				nqbuf--;
+				*qbp++ = *rsp++;
+				if (rsp >= rsp_end)
+					rsp = rspq;
+			}
+			*qp++ = cpu_to_be32(FW_RSS_IND_TBL_CMD_IQ0(qbuf[0]) |
+					    FW_RSS_IND_TBL_CMD_IQ1(qbuf[1]) |
+					    FW_RSS_IND_TBL_CMD_IQ2(qbuf[2]));
+		}
+
+		/*
+		 * Send this portion of the RRS table update to the firmware;
+		 * bail out on any errors.
+		 */
+		ret = t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), NULL);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+/**
+ *	t4vf_alloc_vi - allocate a virtual interface on a port
+ *	@adapter: the adapter
+ *	@port_id: physical port associated with the VI
+ *
+ *	Allocate a new Virtual Interface and bind it to the indicated
+ *	physical port.  Return the new Virtual Interface Identifier on
+ *	success, or a [negative] error number on failure.
+ */
+int t4vf_alloc_vi(struct adapter *adapter, int port_id)
+{
+	struct fw_vi_cmd cmd, rpl;
+	int v;
+
+	/*
+	 * Execute a VI command to allocate Virtual Interface and return its
+	 * VIID.
+	 */
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_vfn = cpu_to_be32(FW_CMD_OP(FW_VI_CMD) |
+				    FW_CMD_REQUEST |
+				    FW_CMD_WRITE |
+				    FW_CMD_EXEC);
+	cmd.alloc_to_len16 = cpu_to_be32(FW_LEN16(cmd) |
+					 FW_VI_CMD_ALLOC);
+	cmd.portid_pkd = FW_VI_CMD_PORTID(port_id);
+	v = t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), &rpl);
+	if (v)
+		return v;
+
+	return FW_VI_CMD_VIID_GET(be16_to_cpu(rpl.type_viid));
+}
+
+/**
+ *	t4vf_free_vi -- free a virtual interface
+ *	@adapter: the adapter
+ *	@viid: the virtual interface identifier
+ *
+ *	Free a previously allocated Virtual Interface.  Return an error on
+ *	failure.
+ */
+int t4vf_free_vi(struct adapter *adapter, int viid)
+{
+	struct fw_vi_cmd cmd;
+
+	/*
+	 * Execute a VI command to free the Virtual Interface.
+	 */
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_vfn = cpu_to_be32(FW_CMD_OP(FW_VI_CMD) |
+				    FW_CMD_REQUEST |
+				    FW_CMD_EXEC);
+	cmd.alloc_to_len16 = cpu_to_be32(FW_LEN16(cmd) |
+					 FW_VI_CMD_FREE);
+	cmd.type_viid = cpu_to_be16(FW_VI_CMD_VIID(viid));
+	return t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), NULL);
+}
+
+/**
+ *	t4vf_enable_vi - enable/disable a virtual interface
+ *	@adapter: the adapter
+ *	@viid: the Virtual Interface ID
+ *	@rx_en: 1=enable Rx, 0=disable Rx
+ *	@tx_en: 1=enable Tx, 0=disable Tx
+ *
+ *	Enables/disables a virtual interface.
+ */
+int t4vf_enable_vi(struct adapter *adapter, unsigned int viid,
+		   bool rx_en, bool tx_en)
+{
+	struct fw_vi_enable_cmd cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_viid = cpu_to_be32(FW_CMD_OP(FW_VI_ENABLE_CMD) |
+				     FW_CMD_REQUEST |
+				     FW_CMD_EXEC |
+				     FW_VI_ENABLE_CMD_VIID(viid));
+	cmd.ien_to_len16 = cpu_to_be32(FW_VI_ENABLE_CMD_IEN(rx_en) |
+				       FW_VI_ENABLE_CMD_EEN(tx_en) |
+				       FW_LEN16(cmd));
+	return t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), NULL);
+}
+
+/**
+ *	t4vf_identify_port - identify a VI's port by blinking its LED
+ *	@adapter: the adapter
+ *	@viid: the Virtual Interface ID
+ *	@nblinks: how many times to blink LED at 2.5 Hz
+ *
+ *	Identifies a VI's port by blinking its LED.
+ */
+int t4vf_identify_port(struct adapter *adapter, unsigned int viid,
+		       unsigned int nblinks)
+{
+	struct fw_vi_enable_cmd cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_viid = cpu_to_be32(FW_CMD_OP(FW_VI_ENABLE_CMD) |
+				     FW_CMD_REQUEST |
+				     FW_CMD_EXEC |
+				     FW_VI_ENABLE_CMD_VIID(viid));
+	cmd.ien_to_len16 = cpu_to_be32(FW_VI_ENABLE_CMD_LED |
+				       FW_LEN16(cmd));
+	cmd.blinkdur = cpu_to_be16(nblinks);
+	return t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), NULL);
+}
+
+/**
+ *	t4vf_set_rxmode - set Rx properties of a virtual interface
+ *	@adapter: the adapter
+ *	@viid: the VI id
+ *	@mtu: the new MTU or -1 for no change
+ *	@promisc: 1 to enable promiscuous mode, 0 to disable it, -1 no change
+ *	@all_multi: 1 to enable all-multi mode, 0 to disable it, -1 no change
+ *	@bcast: 1 to enable broadcast Rx, 0 to disable it, -1 no change
+ *	@vlanex: 1 to enable hardware VLAN Tag extraction, 0 to disable it,
+ *		-1 no change
+ *
+ *	Sets Rx properties of a virtual interface.
+ */
+int t4vf_set_rxmode(struct adapter *adapter, unsigned int viid,
+		    int mtu, int promisc, int all_multi, int bcast, int vlanex,
+		    bool sleep_ok)
+{
+	struct fw_vi_rxmode_cmd cmd;
+
+	/* convert to FW values */
+	if (mtu < 0)
+		mtu = FW_VI_RXMODE_CMD_MTU_MASK;
+	if (promisc < 0)
+		promisc = FW_VI_RXMODE_CMD_PROMISCEN_MASK;
+	if (all_multi < 0)
+		all_multi = FW_VI_RXMODE_CMD_ALLMULTIEN_MASK;
+	if (bcast < 0)
+		bcast = FW_VI_RXMODE_CMD_BROADCASTEN_MASK;
+	if (vlanex < 0)
+		vlanex = FW_VI_RXMODE_CMD_VLANEXEN_MASK;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_viid = cpu_to_be32(FW_CMD_OP(FW_VI_RXMODE_CMD) |
+				     FW_CMD_REQUEST |
+				     FW_CMD_WRITE |
+				     FW_VI_RXMODE_CMD_VIID(viid));
+	cmd.retval_len16 = cpu_to_be32(FW_LEN16(cmd));
+	cmd.mtu_to_vlanexen =
+		cpu_to_be32(FW_VI_RXMODE_CMD_MTU(mtu) |
+			    FW_VI_RXMODE_CMD_PROMISCEN(promisc) |
+			    FW_VI_RXMODE_CMD_ALLMULTIEN(all_multi) |
+			    FW_VI_RXMODE_CMD_BROADCASTEN(bcast) |
+			    FW_VI_RXMODE_CMD_VLANEXEN(vlanex));
+	return t4vf_wr_mbox_core(adapter, &cmd, sizeof(cmd), NULL, sleep_ok);
+}
+
+/**
+ *	t4vf_alloc_mac_filt - allocates exact-match filters for MAC addresses
+ *	@adapter: the adapter
+ *	@viid: the Virtual Interface Identifier
+ *	@free: if true any existing filters for this VI id are first removed
+ *	@naddr: the number of MAC addresses to allocate filters for (up to 7)
+ *	@addr: the MAC address(es)
+ *	@idx: where to store the index of each allocated filter
+ *	@hash: pointer to hash address filter bitmap
+ *	@sleep_ok: call is allowed to sleep
+ *
+ *	Allocates an exact-match filter for each of the supplied addresses and
+ *	sets it to the corresponding address.  If @idx is not %NULL it should
+ *	have at least @naddr entries, each of which will be set to the index of
+ *	the filter allocated for the corresponding MAC address.  If a filter
+ *	could not be allocated for an address its index is set to 0xffff.
+ *	If @hash is not %NULL addresses that fail to allocate an exact filter
+ *	are hashed and update the hash filter bitmap pointed at by @hash.
+ *
+ *	Returns a negative error number or the number of filters allocated.
+ */
+int t4vf_alloc_mac_filt(struct adapter *adapter, unsigned int viid, bool free,
+			unsigned int naddr, const u8 **addr, u16 *idx,
+			u64 *hash, bool sleep_ok)
+{
+	int i, ret;
+	struct fw_vi_mac_cmd cmd, rpl;
+	struct fw_vi_mac_exact *p;
+	size_t len16;
+
+	if (naddr > ARRAY_SIZE(cmd.u.exact))
+		return -EINVAL;
+	len16 = DIV_ROUND_UP(offsetof(struct fw_vi_mac_cmd,
+				      u.exact[naddr]), 16);
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_viid = cpu_to_be32(FW_CMD_OP(FW_VI_MAC_CMD) |
+				     FW_CMD_REQUEST |
+				     FW_CMD_WRITE |
+				     (free ? FW_CMD_EXEC : 0) |
+				     FW_VI_MAC_CMD_VIID(viid));
+	cmd.freemacs_to_len16 = cpu_to_be32(FW_VI_MAC_CMD_FREEMACS(free) |
+					    FW_CMD_LEN16(len16));
+
+	for (i = 0, p = cmd.u.exact; i < naddr; i++, p++) {
+		p->valid_to_idx =
+			cpu_to_be16(FW_VI_MAC_CMD_VALID |
+				    FW_VI_MAC_CMD_IDX(FW_VI_MAC_ADD_MAC));
+		memcpy(p->macaddr, addr[i], sizeof(p->macaddr));
+	}
+
+	ret = t4vf_wr_mbox_core(adapter, &cmd, sizeof(cmd), &rpl, sleep_ok);
+	if (ret)
+		return ret;
+
+	for (i = 0, p = rpl.u.exact; i < naddr; i++, p++) {
+		u16 index = FW_VI_MAC_CMD_IDX_GET(be16_to_cpu(p->valid_to_idx));
+
+		if (idx)
+			idx[i] = (index >= FW_CLS_TCAM_NUM_ENTRIES
+				  ? 0xffff
+				  : index);
+		if (index < FW_CLS_TCAM_NUM_ENTRIES)
+			ret++;
+		else if (hash)
+			*hash |= (1 << hash_mac_addr(addr[i]));
+	}
+	return ret;
+}
+
+/**
+ *	t4vf_change_mac - modifies the exact-match filter for a MAC address
+ *	@adapter: the adapter
+ *	@viid: the Virtual Interface ID
+ *	@idx: index of existing filter for old value of MAC address, or -1
+ *	@addr: the new MAC address value
+ *	@persist: if idx < 0, the new MAC allocation should be persistent
+ *
+ *	Modifies an exact-match filter and sets it to the new MAC address.
+ *	Note that in general it is not possible to modify the value of a given
+ *	filter so the generic way to modify an address filter is to free the
+ *	one being used by the old address value and allocate a new filter for
+ *	the new address value.  @idx can be -1 if the address is a new
+ *	addition.
+ *
+ *	Returns a negative error number or the index of the filter with the new
+ *	MAC value.
+ */
+int t4vf_change_mac(struct adapter *adapter, unsigned int viid,
+		    int idx, const u8 *addr, bool persist)
+{
+	int ret;
+	struct fw_vi_mac_cmd cmd, rpl;
+	struct fw_vi_mac_exact *p = &cmd.u.exact[0];
+	size_t len16 = DIV_ROUND_UP(offsetof(struct fw_vi_mac_cmd,
+					     u.exact[1]), 16);
+
+	/*
+	 * If this is a new allocation, determine whether it should be
+	 * persistent (across a "freemacs" operation) or not.
+	 */
+	if (idx < 0)
+		idx = persist ? FW_VI_MAC_ADD_PERSIST_MAC : FW_VI_MAC_ADD_MAC;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_viid = cpu_to_be32(FW_CMD_OP(FW_VI_MAC_CMD) |
+				     FW_CMD_REQUEST |
+				     FW_CMD_WRITE |
+				     FW_VI_MAC_CMD_VIID(viid));
+	cmd.freemacs_to_len16 = cpu_to_be32(FW_CMD_LEN16(len16));
+	p->valid_to_idx = cpu_to_be16(FW_VI_MAC_CMD_VALID |
+				      FW_VI_MAC_CMD_IDX(idx));
+	memcpy(p->macaddr, addr, sizeof(p->macaddr));
+
+	ret = t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), &rpl);
+	if (ret == 0) {
+		p = &rpl.u.exact[0];
+		ret = FW_VI_MAC_CMD_IDX_GET(be16_to_cpu(p->valid_to_idx));
+		if (ret >= FW_CLS_TCAM_NUM_ENTRIES)
+			ret = -ENOMEM;
+	}
+	return ret;
+}
+
+/**
+ *	t4vf_set_addr_hash - program the MAC inexact-match hash filter
+ *	@adapter: the adapter
+ *	@viid: the Virtual Interface Identifier
+ *	@ucast: whether the hash filter should also match unicast addresses
+ *	@vec: the value to be written to the hash filter
+ *	@sleep_ok: call is allowed to sleep
+ *
+ *	Sets the 64-bit inexact-match hash filter for a virtual interface.
+ */
+int t4vf_set_addr_hash(struct adapter *adapter, unsigned int viid,
+		       bool ucast, u64 vec, bool sleep_ok)
+{
+	struct fw_vi_mac_cmd cmd;
+	size_t len16 = DIV_ROUND_UP(offsetof(struct fw_vi_mac_cmd,
+					     u.exact[0]), 16);
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_viid = cpu_to_be32(FW_CMD_OP(FW_VI_MAC_CMD) |
+				     FW_CMD_REQUEST |
+				     FW_CMD_WRITE |
+				     FW_VI_ENABLE_CMD_VIID(viid));
+	cmd.freemacs_to_len16 = cpu_to_be32(FW_VI_MAC_CMD_HASHVECEN |
+					    FW_VI_MAC_CMD_HASHUNIEN(ucast) |
+					    FW_CMD_LEN16(len16));
+	cmd.u.hash.hashvec = cpu_to_be64(vec);
+	return t4vf_wr_mbox_core(adapter, &cmd, sizeof(cmd), NULL, sleep_ok);
+}
+
+/**
+ *	t4vf_get_port_stats - collect "port" statistics
+ *	@adapter: the adapter
+ *	@pidx: the port index
+ *	@s: the stats structure to fill
+ *
+ *	Collect statistics for the "port"'s Virtual Interface.
+ */
+int t4vf_get_port_stats(struct adapter *adapter, int pidx,
+			struct t4vf_port_stats *s)
+{
+	struct port_info *pi = adap2pinfo(adapter, pidx);
+	struct fw_vi_stats_vf fwstats;
+	unsigned int rem = VI_VF_NUM_STATS;
+	__be64 *fwsp = (__be64 *)&fwstats;
+
+	/*
+	 * Grab the Virtual Interface statistics a chunk at a time via mailbox
+	 * commands.  We could use a Work Request and get all of them at once
+	 * but that's an asynchronous interface which is awkward to use.
+	 */
+	while (rem) {
+		unsigned int ix = VI_VF_NUM_STATS - rem;
+		unsigned int nstats = min(6U, rem);
+		struct fw_vi_stats_cmd cmd, rpl;
+		size_t len = (offsetof(struct fw_vi_stats_cmd, u) +
+			      sizeof(struct fw_vi_stats_ctl));
+		size_t len16 = DIV_ROUND_UP(len, 16);
+		int ret;
+
+		memset(&cmd, 0, sizeof(cmd));
+		cmd.op_to_viid = cpu_to_be32(FW_CMD_OP(FW_VI_STATS_CMD) |
+					     FW_VI_STATS_CMD_VIID(pi->viid) |
+					     FW_CMD_REQUEST |
+					     FW_CMD_READ);
+		cmd.retval_len16 = cpu_to_be32(FW_CMD_LEN16(len16));
+		cmd.u.ctl.nstats_ix =
+			cpu_to_be16(FW_VI_STATS_CMD_IX(ix) |
+				    FW_VI_STATS_CMD_NSTATS(nstats));
+		ret = t4vf_wr_mbox_ns(adapter, &cmd, len, &rpl);
+		if (ret)
+			return ret;
+
+		memcpy(fwsp, &rpl.u.ctl.stat0, sizeof(__be64) * nstats);
+
+		rem -= nstats;
+		fwsp += nstats;
+	}
+
+	/*
+	 * Translate firmware statistics into host native statistics.
+	 */
+	s->tx_bcast_bytes = be64_to_cpu(fwstats.tx_bcast_bytes);
+	s->tx_bcast_frames = be64_to_cpu(fwstats.tx_bcast_frames);
+	s->tx_mcast_bytes = be64_to_cpu(fwstats.tx_mcast_bytes);
+	s->tx_mcast_frames = be64_to_cpu(fwstats.tx_mcast_frames);
+	s->tx_ucast_bytes = be64_to_cpu(fwstats.tx_ucast_bytes);
+	s->tx_ucast_frames = be64_to_cpu(fwstats.tx_ucast_frames);
+	s->tx_drop_frames = be64_to_cpu(fwstats.tx_drop_frames);
+	s->tx_offload_bytes = be64_to_cpu(fwstats.tx_offload_bytes);
+	s->tx_offload_frames = be64_to_cpu(fwstats.tx_offload_frames);
+
+	s->rx_bcast_bytes = be64_to_cpu(fwstats.rx_bcast_bytes);
+	s->rx_bcast_frames = be64_to_cpu(fwstats.rx_bcast_frames);
+	s->rx_mcast_bytes = be64_to_cpu(fwstats.rx_mcast_bytes);
+	s->rx_mcast_frames = be64_to_cpu(fwstats.rx_mcast_frames);
+	s->rx_ucast_bytes = be64_to_cpu(fwstats.rx_ucast_bytes);
+	s->rx_ucast_frames = be64_to_cpu(fwstats.rx_ucast_frames);
+
+	s->rx_err_frames = be64_to_cpu(fwstats.rx_err_frames);
+
+	return 0;
+}
+
+/**
+ *	t4vf_iq_free - free an ingress queue and its free lists
+ *	@adapter: the adapter
+ *	@iqtype: the ingress queue type (FW_IQ_TYPE_FL_INT_CAP, etc.)
+ *	@iqid: ingress queue ID
+ *	@fl0id: FL0 queue ID or 0xffff if no attached FL0
+ *	@fl1id: FL1 queue ID or 0xffff if no attached FL1
+ *
+ *	Frees an ingress queue and its associated free lists, if any.
+ */
+int t4vf_iq_free(struct adapter *adapter, unsigned int iqtype,
+		 unsigned int iqid, unsigned int fl0id, unsigned int fl1id)
+{
+	struct fw_iq_cmd cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_vfn = cpu_to_be32(FW_CMD_OP(FW_IQ_CMD) |
+				    FW_CMD_REQUEST |
+				    FW_CMD_EXEC);
+	cmd.alloc_to_len16 = cpu_to_be32(FW_IQ_CMD_FREE |
+					 FW_LEN16(cmd));
+	cmd.type_to_iqandstindex =
+		cpu_to_be32(FW_IQ_CMD_TYPE(iqtype));
+
+	cmd.iqid = cpu_to_be16(iqid);
+	cmd.fl0id = cpu_to_be16(fl0id);
+	cmd.fl1id = cpu_to_be16(fl1id);
+	return t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), NULL);
+}
+
+/**
+ *	t4vf_eth_eq_free - free an Ethernet egress queue
+ *	@adapter: the adapter
+ *	@eqid: egress queue ID
+ *
+ *	Frees an Ethernet egress queue.
+ */
+int t4vf_eth_eq_free(struct adapter *adapter, unsigned int eqid)
+{
+	struct fw_eq_eth_cmd cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.op_to_vfn = cpu_to_be32(FW_CMD_OP(FW_EQ_ETH_CMD) |
+				    FW_CMD_REQUEST |
+				    FW_CMD_EXEC);
+	cmd.alloc_to_len16 = cpu_to_be32(FW_EQ_ETH_CMD_FREE |
+					 FW_LEN16(cmd));
+	cmd.eqid_pkd = cpu_to_be32(FW_EQ_ETH_CMD_EQID(eqid));
+	return t4vf_wr_mbox(adapter, &cmd, sizeof(cmd), NULL);
+}
+
+/**
+ *	t4vf_handle_fw_rpl - process a firmware reply message
+ *	@adapter: the adapter
+ *	@rpl: start of the firmware message
+ *
+ *	Processes a firmware message, such as link state change messages.
+ */
+int t4vf_handle_fw_rpl(struct adapter *adapter, const __be64 *rpl)
+{
+	struct fw_cmd_hdr *cmd_hdr = (struct fw_cmd_hdr *)rpl;
+	u8 opcode = FW_CMD_OP_GET(be32_to_cpu(cmd_hdr->hi));
+
+	switch (opcode) {
+	case FW_PORT_CMD: {
+		/*
+		 * Link/module state change message.
+		 */
+		const struct fw_port_cmd *port_cmd = (void *)rpl;
+		u32 word;
+		int action, port_id, link_ok, speed, fc, pidx;
+
+		/*
+		 * Extract various fields from port status change message.
+		 */
+		action = FW_PORT_CMD_ACTION_GET(
+			be32_to_cpu(port_cmd->action_to_len16));
+		if (action != FW_PORT_ACTION_GET_PORT_INFO) {
+			dev_err(adapter->pdev_dev,
+				"Unknown firmware PORT reply action %x\n",
+				action);
+			break;
+		}
+
+		port_id = FW_PORT_CMD_PORTID_GET(
+			be32_to_cpu(port_cmd->op_to_portid));
+
+		word = be32_to_cpu(port_cmd->u.info.lstatus_to_modtype);
+		link_ok = (word & FW_PORT_CMD_LSTATUS) != 0;
+		speed = 0;
+		fc = 0;
+		if (word & FW_PORT_CMD_RXPAUSE)
+			fc |= PAUSE_RX;
+		if (word & FW_PORT_CMD_TXPAUSE)
+			fc |= PAUSE_TX;
+		if (word & FW_PORT_CMD_LSPEED(FW_PORT_CAP_SPEED_100M))
+			speed = SPEED_100;
+		else if (word & FW_PORT_CMD_LSPEED(FW_PORT_CAP_SPEED_1G))
+			speed = SPEED_1000;
+		else if (word & FW_PORT_CMD_LSPEED(FW_PORT_CAP_SPEED_10G))
+			speed = SPEED_10000;
+
+		/*
+		 * Scan all of our "ports" (Virtual Interfaces) looking for
+		 * those bound to the physical port which has changed.  If
+		 * our recorded state doesn't match the current state,
+		 * signal that change to the OS code.
+		 */
+		for_each_port(adapter, pidx) {
+			struct port_info *pi = adap2pinfo(adapter, pidx);
+			struct link_config *lc;
+
+			if (pi->port_id != port_id)
+				continue;
+
+			lc = &pi->link_cfg;
+			if (link_ok != lc->link_ok || speed != lc->speed ||
+			    fc != lc->fc) {
+				/* something changed */
+				lc->link_ok = link_ok;
+				lc->speed = speed;
+				lc->fc = fc;
+				t4vf_os_link_changed(adapter, pidx, link_ok);
+			}
+		}
+		break;
+	}
+
+	default:
+		dev_err(adapter->pdev_dev, "Unknown firmware reply %X\n",
+			opcode);
+	}
+	return 0;
+}

^ permalink raw reply related

* [PATCH 4/9] cxgb4vf: Add code to provision T4 PCI-E SR-IOV Virtual Functions with hardware resources
From: Casey Leedom @ 2010-06-25 22:11 UTC (permalink / raw)
  To: netdev

Add code to provision T4 PCI-E SR-IOV Virtual Functions with hardware
resources.

Signed-off-by: Casey Leedom
---
 drivers/net/cxgb4/cxgb4_main.c |  106 
++++++++++++++++++++++++++++++++++++++++
 1 files changed, 106 insertions(+), 0 deletions(-)

diff --git a/drivers/net/cxgb4/cxgb4_main.c b/drivers/net/cxgb4/cxgb4_main.c
index 27f65b5..6528167 100644
--- a/drivers/net/cxgb4/cxgb4_main.c
+++ b/drivers/net/cxgb4/cxgb4_main.c
@@ -77,6 +77,76 @@
  */
 #define MAX_SGE_TIMERVAL 200U
 
+#ifdef CONFIG_PCI_IOV
+/*
+ * Virtual Function provisioning constants.  We need two extra Ingress Queues
+ * with Interrupt capability to serve as the VF's Firmware Event Queue and
+ * Forwarded Interrupt Queue (when using MSI mode) -- neither will have Free
+ * Lists associated with them).  For each Ethernet/Control Egress Queue and
+ * for each Free List, we need an Egress Context.
+ */
+enum {
+	VFRES_NPORTS = 1,		/* # of "ports" per VF */
+	VFRES_NQSETS = 2,		/* # of "Queue Sets" per VF */
+
+	VFRES_NVI = VFRES_NPORTS,	/* # of Virtual Interfaces */
+	VFRES_NETHCTRL = VFRES_NQSETS,	/* # of EQs used for ETH or CTRL Qs */
+	VFRES_NIQFLINT = VFRES_NQSETS+2,/* # of ingress Qs/w Free List(s)/intr */
+	VFRES_NIQ = 0,			/* # of non-fl/int ingress queues */
+	VFRES_NEQ = VFRES_NQSETS*2,	/* # of egress queues */
+	VFRES_TC = 0,			/* PCI-E traffic class */
+	VFRES_NEXACTF = 16,		/* # of exact MPS filters */
+
+	VFRES_R_CAPS = FW_CMD_CAP_DMAQ|FW_CMD_CAP_VF|FW_CMD_CAP_PORT,
+	VFRES_WX_CAPS = FW_CMD_CAP_DMAQ|FW_CMD_CAP_VF,
+};
+
+/*
+ * Provide a Port Access Rights Mask for the specified PF/VF.  This is very
+ * static and likely not to be useful in the long run.  We really need to
+ * implement some form of persistent configuration which the firmware
+ * controls.
+ */
+static unsigned int pfvfres_pmask(struct adapter *adapter,
+				  unsigned int pf, unsigned int vf)
+{
+	unsigned int portn, portvec;
+
+	/*
+	 * Give PF's access to all of the ports.
+	 */
+	if (vf == 0)
+		return FW_PFVF_CMD_PMASK_MASK;
+
+	/*
+	 * For VFs, we'll assign them access to the ports based purely on the
+	 * PF.  We assign active ports in order, wrapping around if there are
+	 * fewer active ports than PFs: e.g. active port[pf % nports].
+	 * Unfortunately the adapter's port_info structs haven't been
+	 * initialized yet so we have to compute this.
+	 */
+	if (adapter->params.nports == 0)
+		return 0;
+
+	portn = pf % adapter->params.nports;
+	portvec = adapter->params.portvec;
+	for (;;) {
+		/*
+		 * Isolate the lowest set bit in the port vector.  If we're at
+		 * the port number that we want, return that as the pmask.
+		 * otherwise mask that bit out of the port vector and
+		 * decrement our port number ...
+		 */
+		unsigned int pmask = portvec ^ (portvec & (portvec-1));
+		if (portn == 0)
+			return pmask;
+		portn--;
+		portvec &= ~pmask;
+	}
+	/*NOTREACHED*/
+}
+#endif
+
 enum {
 	MEMWIN0_APERTURE = 65536,
 	MEMWIN0_BASE     = 0x30000,
@@ -2925,6 +2995,42 @@ static int adap_init0(struct adapter *adap)
 	t4_read_mtu_tbl(adap, adap->params.mtus, NULL);
 	t4_load_mtus(adap, adap->params.mtus, adap->params.a_wnd,
 		     adap->params.b_wnd);
+
+#ifdef CONFIG_PCI_IOV
+	/*
+	 * Provision resource limits for Virtual Functions.  We currently
+	 * grant them all the same static resource limits except for the Port
+	 * Access Rights Mask which we're assigning based on the PF.  All of
+	 * the static provisioning stuff for both the PF and VF really needs
+	 * to be managed in a persistent manner for each device which the
+	 * firmware controls.
+	 */
+	{
+		int pf, vf;
+
+		for (pf = 0; pf < ARRAY_SIZE(num_vf); pf++) {
+			if (num_vf[pf] <= 0)
+				continue;
+
+			/* VF numbering starts at 1! */
+			for (vf = 1; vf <= num_vf[pf]; vf++) {
+				ret = t4_cfg_pfvf(adap, 0, pf, vf,
+						  VFRES_NEQ, VFRES_NETHCTRL,
+						  VFRES_NIQFLINT, VFRES_NIQ,
+						  VFRES_TC, VFRES_NVI,
+						  FW_PFVF_CMD_CMASK_MASK,
+						  pfvfres_pmask(adap, pf, vf),
+						  VFRES_NEXACTF,
+						  VFRES_R_CAPS, VFRES_WX_CAPS);
+				if (ret < 0)
+					dev_warn(adap->pdev_dev, "failed to "
+						 "provision pf/vf=%d/%d; "
+						 "err=%d\n", pf, vf, ret);
+			}
+		}
+	}
+#endif
+
 	return 0;
 
 	/*

^ permalink raw reply related

* [PATCH 3/9] cxgb4vf: Add new macros and definitions for hardware constants
From: Casey Leedom @ 2010-06-25 22:11 UTC (permalink / raw)
  To: netdev

Add new macros and definitions for hardware constants.

Signed-off-by: Casey Leedom
---
 drivers/net/cxgb4/t4_hw.h   |   39 +++++++++++++++++++++++++++++++++++++++
 drivers/net/cxgb4/t4_regs.h |    3 +++
 2 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/drivers/net/cxgb4/t4_hw.h b/drivers/net/cxgb4/t4_hw.h
index 98c02be..e875d09 100644
--- a/drivers/net/cxgb4/t4_hw.h
+++ b/drivers/net/cxgb4/t4_hw.h
@@ -67,6 +67,45 @@ enum {
 	SGE_MAX_WR_LEN = 512,     /* max WR size in bytes */
 	SGE_NTIMERS = 6,          /* # of interrupt holdoff timer values */
 	SGE_NCOUNTERS = 4,        /* # of interrupt packet counter values */
+
+	SGE_TIMER_RSTRT_CNTR = 6, /* restart RX packet threshold counter */
+	SGE_TIMER_UPD_CIDX = 7,   /* update cidx only */
+
+	SGE_EQ_IDXSIZE = 64,      /* egress queue pidx/cidx unit size */
+
+	SGE_INTRDST_PCI = 0,      /* interrupt destination is PCI-E */
+	SGE_INTRDST_IQ = 1,       /*   destination is an ingress queue */
+
+	SGE_UPDATEDEL_NONE = 0,   /* ingress queue pidx update delivery */
+	SGE_UPDATEDEL_INTR = 1,   /*   interrupt */
+	SGE_UPDATEDEL_STPG = 2,   /*   status page */
+	SGE_UPDATEDEL_BOTH = 3,   /*   interrupt and status page */
+
+	SGE_HOSTFCMODE_NONE = 0,  /* egress queue cidx updates */
+	SGE_HOSTFCMODE_IQ = 1,    /*   sent to ingress queue */
+	SGE_HOSTFCMODE_STPG = 2,  /*   sent to status page */
+	SGE_HOSTFCMODE_BOTH = 3,  /*   ingress queue and status page */
+
+	SGE_FETCHBURSTMIN_16B = 0,/* egress queue descriptor fetch minimum */
+	SGE_FETCHBURSTMIN_32B = 1,
+	SGE_FETCHBURSTMIN_64B = 2,
+	SGE_FETCHBURSTMIN_128B = 3,
+
+	SGE_FETCHBURSTMAX_64B = 0,/* egress queue descriptor fetch maximum */
+	SGE_FETCHBURSTMAX_128B = 1,
+	SGE_FETCHBURSTMAX_256B = 2,
+	SGE_FETCHBURSTMAX_512B = 3,
+
+	SGE_CIDXFLUSHTHRESH_1 = 0,/* egress queue cidx flush threshold */
+	SGE_CIDXFLUSHTHRESH_2 = 1,
+	SGE_CIDXFLUSHTHRESH_4 = 2,
+	SGE_CIDXFLUSHTHRESH_8 = 3,
+	SGE_CIDXFLUSHTHRESH_16 = 4,
+	SGE_CIDXFLUSHTHRESH_32 = 5,
+	SGE_CIDXFLUSHTHRESH_64 = 6,
+	SGE_CIDXFLUSHTHRESH_128 = 7,
+
+	SGE_INGPADBOUNDARY_SHIFT = 5,/* ingress queue pad boundary */
 };
 
 struct sge_qstat {                /* data written to SGE queue status entries 
*/
diff --git a/drivers/net/cxgb4/t4_regs.h b/drivers/net/cxgb4/t4_regs.h
index 8fed46d..bf21c14 100644
--- a/drivers/net/cxgb4/t4_regs.h
+++ b/drivers/net/cxgb4/t4_regs.h
@@ -93,12 +93,15 @@
 #define  PKTSHIFT_MASK          0x00001c00U
 #define  PKTSHIFT_SHIFT         10
 #define  PKTSHIFT(x)            ((x) << PKTSHIFT_SHIFT)
+#define  PKTSHIFT_GET(x)	(((x) & PKTSHIFT_MASK) >> PKTSHIFT_SHIFT)
 #define  INGPCIEBOUNDARY_MASK   0x00000380U
 #define  INGPCIEBOUNDARY_SHIFT  7
 #define  INGPCIEBOUNDARY(x)     ((x) << INGPCIEBOUNDARY_SHIFT)
 #define  INGPADBOUNDARY_MASK    0x00000070U
 #define  INGPADBOUNDARY_SHIFT   4
 #define  INGPADBOUNDARY(x)      ((x) << INGPADBOUNDARY_SHIFT)
+#define  INGPADBOUNDARY_GET(x)	(((x) & INGPADBOUNDARY_MASK) \
+				 >> INGPADBOUNDARY_SHIFT)
 #define  EGRPCIEBOUNDARY_MASK   0x0000000eU
 #define  EGRPCIEBOUNDARY_SHIFT  1
 #define  EGRPCIEBOUNDARY(x)     ((x) << EGRPCIEBOUNDARY_SHIFT)

^ permalink raw reply related

* [PATCH 2/9] cxgb4vf: update to latest T4 firmware API file
From: Casey Leedom @ 2010-06-25 22:10 UTC (permalink / raw)
  To: netdev

Update to latest T4 firmware API file.

Signed-off-by: Casey Leedom
---
 drivers/net/cxgb4/t4_hw.c    |    2 +-
 drivers/net/cxgb4/t4fw_api.h |   27 +++++++++++++++++++++------
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/net/cxgb4/t4_hw.c b/drivers/net/cxgb4/t4_hw.c
index d92129b..3e63d14 100644
--- a/drivers/net/cxgb4/t4_hw.c
+++ b/drivers/net/cxgb4/t4_hw.c
@@ -2518,7 +2518,7 @@ int t4_cfg_pfvf(struct adapter *adap, unsigned int mbox, 
unsigned int pf,
 	c.retval_len16 = htonl(FW_LEN16(c));
 	c.niqflint_niq = htonl(FW_PFVF_CMD_NIQFLINT(rxqi) |
 			       FW_PFVF_CMD_NIQ(rxq));
-	c.cmask_to_neq = htonl(FW_PFVF_CMD_CMASK(cmask) |
+	c.type_to_neq = htonl(FW_PFVF_CMD_CMASK(cmask) |
 			       FW_PFVF_CMD_PMASK(pmask) |
 			       FW_PFVF_CMD_NEQ(txq));
 	c.tc_to_nexactf = htonl(FW_PFVF_CMD_TC(tc) | FW_PFVF_CMD_NVI(vi) |
diff --git a/drivers/net/cxgb4/t4fw_api.h b/drivers/net/cxgb4/t4fw_api.h
index 111c2a5..ca45df8 100644
--- a/drivers/net/cxgb4/t4fw_api.h
+++ b/drivers/net/cxgb4/t4fw_api.h
@@ -71,6 +71,7 @@ struct fw_wr_hdr {
 #define FW_WR_ATOMIC(x)	 ((x) << 23)
 #define FW_WR_FLUSH(x)   ((x) << 22)
 #define FW_WR_COMPL(x)   ((x) << 21)
+#define FW_WR_IMMDLEN_MASK 0xff
 #define FW_WR_IMMDLEN(x) ((x) << 0)
 
 #define FW_WR_EQUIQ	(1U << 31)
@@ -447,7 +448,9 @@ enum fw_params_param_dev {
 	FW_PARAMS_PARAM_DEV_INTVER_RI	= 0x07,
 	FW_PARAMS_PARAM_DEV_INTVER_ISCSIPDU = 0x08,
 	FW_PARAMS_PARAM_DEV_INTVER_ISCSI = 0x09,
-	FW_PARAMS_PARAM_DEV_INTVER_FCOE = 0x0A
+	FW_PARAMS_PARAM_DEV_INTVER_FCOE = 0x0A,
+	FW_PARAMS_PARAM_DEV_FWREV = 0x0B,
+	FW_PARAMS_PARAM_DEV_TPREV = 0x0C,
 };
 
 /*
@@ -518,7 +521,7 @@ struct fw_pfvf_cmd {
 	__be32 op_to_vfn;
 	__be32 retval_len16;
 	__be32 niqflint_niq;
-	__be32 cmask_to_neq;
+	__be32 type_to_neq;
 	__be32 tc_to_nexactf;
 	__be32 r_caps_to_nethctrl;
 	__be16 nricq;
@@ -535,11 +538,16 @@ struct fw_pfvf_cmd {
 #define FW_PFVF_CMD_NIQ(x) ((x) << 0)
 #define FW_PFVF_CMD_NIQ_GET(x) (((x) >> 0) & 0xfffff)
 
+#define FW_PFVF_CMD_TYPE (1 << 31)
+#define FW_PFVF_CMD_TYPE_GET(x) (((x) >> 31) & 0x1)
+
 #define FW_PFVF_CMD_CMASK(x) ((x) << 24)
-#define FW_PFVF_CMD_CMASK_GET(x) (((x) >> 24) & 0xf)
+#define FW_PFVF_CMD_CMASK_MASK 0xf
+#define FW_PFVF_CMD_CMASK_GET(x) (((x) >> 24) & FW_PFVF_CMD_CMASK_MASK)
 
 #define FW_PFVF_CMD_PMASK(x) ((x) << 20)
-#define FW_PFVF_CMD_PMASK_GET(x) (((x) >> 20) & 0xf)
+#define FW_PFVF_CMD_PMASK_MASK 0xf
+#define FW_PFVF_CMD_PMASK_GET(x) (((x) >> 20) & FW_PFVF_CMD_PMASK_MASK)
 
 #define FW_PFVF_CMD_NEQ(x) ((x) << 0)
 #define FW_PFVF_CMD_NEQ_GET(x) (((x) >> 0) & 0xfffff)
@@ -692,6 +700,7 @@ struct fw_eq_eth_cmd {
 #define FW_EQ_ETH_CMD_EQID(x) ((x) << 0)
 #define FW_EQ_ETH_CMD_EQID_GET(x) (((x) >> 0) & 0xfffff)
 #define FW_EQ_ETH_CMD_PHYSEQID(x) ((x) << 0)
+#define FW_EQ_ETH_CMD_PHYSEQID_GET(x) (((x) >> 0) & 0xfffff)
 
 #define FW_EQ_ETH_CMD_FETCHSZM(x) ((x) << 26)
 #define FW_EQ_ETH_CMD_STATUSPGNS(x) ((x) << 25)
@@ -832,12 +841,14 @@ struct fw_vi_cmd {
 #define FW_VI_CMD_VIID(x) ((x) << 0)
 #define FW_VI_CMD_VIID_GET(x) ((x) & 0xfff)
 #define FW_VI_CMD_PORTID(x) ((x) << 4)
+#define FW_VI_CMD_PORTID_GET(x) (((x) >> 4) & 0xf)
 #define FW_VI_CMD_RSSSIZE_GET(x) (((x) >> 0) & 0x7ff)
 
 /* Special VI_MAC command index ids */
 #define FW_VI_MAC_ADD_MAC		0x3FF
 #define FW_VI_MAC_ADD_PERSIST_MAC	0x3FE
 #define FW_VI_MAC_MAC_BASED_FREE	0x3FD
+#define FW_CLS_TCAM_NUM_ENTRIES		336
 
 enum fw_vi_mac_smac {
 	FW_VI_MAC_MPS_TCAM_ENTRY,
@@ -888,6 +899,7 @@ struct fw_vi_rxmode_cmd {
 };
 
 #define FW_VI_RXMODE_CMD_VIID(x) ((x) << 0)
+#define FW_VI_RXMODE_CMD_MTU_MASK 0xffff
 #define FW_VI_RXMODE_CMD_MTU(x) ((x) << 16)
 #define FW_VI_RXMODE_CMD_PROMISCEN_MASK 0x3
 #define FW_VI_RXMODE_CMD_PROMISCEN(x) ((x) << 14)
@@ -1173,6 +1185,7 @@ struct fw_port_cmd {
 #define FW_PORT_CMD_PORTID_GET(x) (((x) >> 0) & 0xf)
 
 #define FW_PORT_CMD_ACTION(x) ((x) << 16)
+#define FW_PORT_CMD_ACTION_GET(x) (((x) >> 16) & 0xffff)
 
 #define FW_PORT_CMD_CTLBF(x) ((x) << 10)
 #define FW_PORT_CMD_OVLAN3(x) ((x) << 7)
@@ -1487,6 +1500,7 @@ struct fw_rss_glb_config_cmd {
 };
 
 #define FW_RSS_GLB_CONFIG_CMD_MODE(x)	((x) << 28)
+#define FW_RSS_GLB_CONFIG_CMD_MODE_GET(x) (((x) >> 28) & 0xf)
 
 #define FW_RSS_GLB_CONFIG_CMD_MODE_MANUAL	0
 #define FW_RSS_GLB_CONFIG_CMD_MODE_BASICVIRTUAL	1
@@ -1503,13 +1517,14 @@ struct fw_rss_vi_config_cmd {
 		} manual;
 		struct fw_rss_vi_config_basicvirtual {
 			__be32 r6;
-			__be32 defaultq_to_ip4udpen;
+			__be32 defaultq_to_udpen;
 #define FW_RSS_VI_CONFIG_CMD_DEFAULTQ(x)  ((x) << 16)
+#define FW_RSS_VI_CONFIG_CMD_DEFAULTQ_GET(x) (((x) >> 16) & 0x3ff)
 #define FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN (1U << 4)
 #define FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN  (1U << 3)
 #define FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN (1U << 2)
 #define FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN  (1U << 1)
-#define FW_RSS_VI_CONFIG_CMD_IP4UDPEN     (1U << 0)
+#define FW_RSS_VI_CONFIG_CMD_UDPEN        (1U << 0)
 			__be64 r9;
 			__be64 r10;
 		} basicvirtual;

^ permalink raw reply related

* [PATCH 1/9] cxgb4vf: small changes to message processing structures/macros
From: Casey Leedom @ 2010-06-25 22:09 UTC (permalink / raw)
  To: netdev

Split cpl_tx_pkt_lso into core message structure and encapsulated message,
make RSPD_LEN macro match other response descriptor macros.

Signed-off-by: Casey Leedom
---
 drivers/net/cxgb4/sge.c    |    4 ++--
 drivers/net/cxgb4/t4_hw.h  |    4 +++-
 drivers/net/cxgb4/t4_msg.h |   14 ++++++++++++--
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/net/cxgb4/sge.c b/drivers/net/cxgb4/sge.c
index d1f8f22..4388f72 100644
--- a/drivers/net/cxgb4/sge.c
+++ b/drivers/net/cxgb4/sge.c
@@ -931,7 +931,7 @@ out_free:	dev_kfree_skb(skb);
 
 	ssi = skb_shinfo(skb);
 	if (ssi->gso_size) {
-		struct cpl_tx_pkt_lso *lso = (void *)wr;
+		struct cpl_tx_pkt_lso_core *lso = (void *)(wr + 1);
 		bool v6 = (ssi->gso_type & SKB_GSO_TCPV6) != 0;
 		int l3hdr_len = skb_network_header_len(skb);
 		int eth_xtra_len = skb_network_offset(skb) - ETH_HLEN;
@@ -1718,7 +1718,7 @@ static int process_responses(struct sge_rspq *q, int 
budget)
 					free_rx_bufs(q->adap, &rxq->fl, 1);
 					q->offset = 0;
 				}
-				len &= RSPD_LEN;
+				len = RSPD_LEN(len);
 			}
 			si.tot_len = len;
 
diff --git a/drivers/net/cxgb4/t4_hw.h b/drivers/net/cxgb4/t4_hw.h
index f886677..98c02be 100644
--- a/drivers/net/cxgb4/t4_hw.h
+++ b/drivers/net/cxgb4/t4_hw.h
@@ -88,11 +88,13 @@ struct rsp_ctrl {
 };
 
 #define RSPD_NEWBUF 0x80000000U
-#define RSPD_LEN    0x7fffffffU
+#define RSPD_LEN(x) (((x) >> 0) & 0x7fffffffU)
+#define RSPD_QID(x) RSPD_LEN(x)
 
 #define RSPD_GEN(x)  ((x) >> 7)
 #define RSPD_TYPE(x) (((x) >> 4) & 3)
 
 #define QINTR_CNT_EN       0x1
 #define QINTR_TIMER_IDX(x) ((x) << 1)
+#define QINTR_TIMER_IDX_GET(x) (((x) << 1) & 0x7)
 #endif /* __T4_HW_H */
diff --git a/drivers/net/cxgb4/t4_msg.h b/drivers/net/cxgb4/t4_msg.h
index 7a981b8..623932b 100644
--- a/drivers/net/cxgb4/t4_msg.h
+++ b/drivers/net/cxgb4/t4_msg.h
@@ -443,8 +443,7 @@ struct cpl_tx_pkt {
 
 #define cpl_tx_pkt_xt cpl_tx_pkt
 
-struct cpl_tx_pkt_lso {
-	WR_HDR;
+struct cpl_tx_pkt_lso_core {
 	__be32 lso_ctrl;
 #define LSO_TCPHDR_LEN(x) ((x) << 0)
 #define LSO_IPHDR_LEN(x)  ((x) << 4)
@@ -460,6 +459,12 @@ struct cpl_tx_pkt_lso {
 	/* encapsulated CPL (TX_PKT, TX_PKT_XT or TX_DATA) follows here */
 };
 
+struct cpl_tx_pkt_lso {
+	WR_HDR;
+	struct cpl_tx_pkt_lso_core c;
+	/* encapsulated CPL (TX_PKT, TX_PKT_XT or TX_DATA) follows here */
+};
+
 struct cpl_iscsi_hdr {
 	union opcode_tid ot;
 	__be16 pdu_len_ddp;
@@ -623,6 +628,11 @@ struct cpl_fw6_msg {
 	__be64 data[4];
 };
 
+/* cpl_fw6_msg.type values */
+enum {
+	FW6_TYPE_CMD_RPL = 0,
+};
+
 enum {
 	ULP_TX_MEM_READ = 2,
 	ULP_TX_MEM_WRITE = 3,

^ permalink raw reply related

* [PATCH 0/9] New cxgb4vf network driver for Chelsio T4 Virtual Function NIC
From: Casey Leedom @ 2010-06-25 22:07 UTC (permalink / raw)
  To: netdev

This patch series introduces a new network driver for the Chelsio T4 PCI-E
SR-IOV Virtual Function NIC.  The patches are against net-next as of commit
39c9cf07077146b14ab077a0e27c869c6f0e6199.

This driver "depends" on the cxgb4 NIC driver only in-so-far as the cxgb4
driver needs to "provision" chip resources for each of the Virtual Functions
and the cxgb4 driver must enable the PCI-E SR-IOV functionality.  After that
the new "cxgb4vf" driver operates completely independently.  (It will often
be the case that the cxgb4vf driver will be operating withing a Virtual
Machine via "PCI Pass Through."

This is my first submission to Linux so I apologize in advance for any
mistakes.  I looked at many submissions to see how to do this but I
anticipate that I've almost certainly missed something critical.  Please
correct me on anything I've done wrong and I will happily fix them.  Thanks
in advance for you patience and understanding.

There is one part of this submission where I needed to make a big guess
regarding driver policies and procedures so I'll draw your attention to
that.  There are several header files which both the cxgb4 and cxgb4vf
drivers need to operate (T4 hardware register definitions, message
structures, etc.)  This patch series leaves those files in place within the
cxgb4 driver directory (with a few tiny modifications to add new
definitions) and the cxgb4vf driver includes them via "../cxgb4/xxx.h" I
found a couple of similar usages but I couldn't determine what the official
policy was.

Again, thank you for your time and consideration.

Casey Leedom

 drivers/net/Kconfig                |   23 +
 drivers/net/Makefile               |    1 +
 drivers/net/cxgb4/cxgb4_main.c     |  106 ++
 drivers/net/cxgb4/t4_hw.c          |    2 +-
 drivers/net/cxgb4/t4_hw.h          |   39 +
 drivers/net/cxgb4/t4_regs.h        |    3 +
 drivers/net/cxgb4/t4fw_api.h       |   27 +-
 drivers/net/cxgb4vf/Makefile       |    7 +
 drivers/net/cxgb4vf/adapter.h      |  540 +++++++
 drivers/net/cxgb4vf/cxgb4vf_main.c | 2906 ++++++++++++++++++++++++++++++++++++
 drivers/net/cxgb4vf/sge.c          | 2460 ++++++++++++++++++++++++++++++
 drivers/net/cxgb4vf/t4vf_common.h  |  273 ++++
 drivers/net/cxgb4vf/t4vf_defs.h    |  121 ++
 drivers/net/cxgb4vf/t4vf_hw.c      | 1333 +++++++++++++++++
 14 files changed, 7834 insertions(+), 7 deletions(-)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox