Netdev List
 help / color / mirror / Atom feed
* [PATCH 4/6] driver/net/irda: Replace timeval with ktime_t in nsc-ircc
From: Chunyan Zhang @ 2015-01-07  3:39 UTC (permalink / raw)
  To: samuel; +Cc: arnd, zhang.lyra, netdev, linux-kernel
In-Reply-To: <1420601978-15866-1-git-send-email-zhang.chunyan@linaro.org>

This patch changes the 32-bit time type (timeval) to the 64-bit one
(ktime_t), since 32-bit time types will break in the year 2038.
So, I use ktime_t instead of all uses of timeval.

This patch also changes do_gettimeofday() to ktime_get() accordingly,
since ktime_get returns a ktime_t, but do_gettimeofday returns a
struct timeval, and the other reason is that ktime_get() uses
the monotonic clock.

This patch use ktime_us_delta to get the elapsed time, and in this
way it no longer needs to check for the overflow, because
ktime_us_delta returns time difference of microsecond.

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
---
 drivers/net/irda/nsc-ircc.c |    8 +++-----
 drivers/net/irda/nsc-ircc.h |    6 +++---
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/irda/nsc-ircc.c b/drivers/net/irda/nsc-ircc.c
index e7317b1..cabb82e 100644
--- a/drivers/net/irda/nsc-ircc.c
+++ b/drivers/net/irda/nsc-ircc.c
@@ -1501,10 +1501,8 @@ static netdev_tx_t nsc_ircc_hard_xmit_fir(struct sk_buff *skb,
 		mtt = irda_get_mtt(skb);
 		if (mtt) {
 			/* Check how much time we have used already */
-			do_gettimeofday(&self->now);
-			diff = self->now.tv_usec - self->stamp.tv_usec;
-			if (diff < 0) 
-				diff += 1000000;
+			self->now = ktime_get();
+			diff = ktime_us_delta(self->now, self->stamp);
 			
 			/* Check if the mtt is larger than the time we have
 			 * already used by all the protocol processing
@@ -1867,7 +1865,7 @@ static int nsc_ircc_dma_receive_complete(struct nsc_ircc_cb *self, int iobase)
 			 * reduce the min turn time a bit since we will know
 			 * how much time we have used for protocol processing
 			 */
-			do_gettimeofday(&self->stamp);
+			self->stamp = ktime_get();
 
 			skb = dev_alloc_skb(len+1);
 			if (skb == NULL)  {
diff --git a/drivers/net/irda/nsc-ircc.h b/drivers/net/irda/nsc-ircc.h
index 32fa582..dd2185a 100644
--- a/drivers/net/irda/nsc-ircc.h
+++ b/drivers/net/irda/nsc-ircc.h
@@ -28,7 +28,7 @@
 #ifndef NSC_IRCC_H
 #define NSC_IRCC_H
 
-#include <linux/time.h>
+#include <linux/ktime.h>
 
 #include <linux/spinlock.h>
 #include <linux/pm.h>
@@ -263,8 +263,8 @@ struct nsc_ircc_cb {
 
 	__u8 ier;                  /* Interrupt enable register */
 
-	struct timeval stamp;
-	struct timeval now;
+	ktime_t stamp;
+	ktime_t	now;
 
 	spinlock_t lock;           /* For serializing operations */
 	
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 3/6] driver/net/irda: Replace timeval with ktime_t in irda-usb
From: Chunyan Zhang @ 2015-01-07  3:39 UTC (permalink / raw)
  To: samuel; +Cc: arnd, zhang.lyra, netdev, linux-kernel
In-Reply-To: <1420601978-15866-1-git-send-email-zhang.chunyan@linaro.org>

This patch changes the 32-bit time type (timeval) to the 64-bit one
(ktime_t), since 32-bit time types will break in the year 2038.
So, I use ktime_t instead of all uses of timeval.

This patch also changes do_gettimeofday() to ktime_get() accordingly,
since ktime_get returns a ktime_t, but do_gettimeofday returns a
struct timeval, and the other reason is that ktime_get() uses
the monotonic clock.

This patch use ktime_us_delta to get the elapsed time, and in this
way it no longer needs to check for the overflow, because
ktime_us_delta returns time difference of microsecond.

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
---
 drivers/net/irda/irda-usb.c |   11 +++--------
 drivers/net/irda/irda-usb.h |    6 +++---
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/net/irda/irda-usb.c b/drivers/net/irda/irda-usb.c
index 48b2f9a..be86840 100644
--- a/drivers/net/irda/irda-usb.c
+++ b/drivers/net/irda/irda-usb.c
@@ -495,18 +495,13 @@ static netdev_tx_t irda_usb_hard_xmit(struct sk_buff *skb,
 		mtt = irda_get_mtt(skb);
 		if (mtt) {
 			int diff;
-			do_gettimeofday(&self->now);
-			diff = self->now.tv_usec - self->stamp.tv_usec;
+			self->now = ktime_get();
+			diff = ktime_us_delta(self->now, self->stamp);
 #ifdef IU_USB_MIN_RTT
 			/* Factor in USB delays -> Get rid of udelay() that
 			 * would be lost in the noise - Jean II */
 			diff += IU_USB_MIN_RTT;
 #endif /* IU_USB_MIN_RTT */
-			/* If the usec counter did wraparound, the diff will
-			 * go negative (tv_usec is a long), so we need to
-			 * correct it by one second. Jean II */
-			if (diff < 0)
-				diff += 1000000;
 
 		        /* Check if the mtt is larger than the time we have
 			 * already used by all the protocol processing
@@ -869,7 +864,7 @@ static void irda_usb_receive(struct urb *urb)
 	 * reduce the min turn time a bit since we will know
 	 * how much time we have used for protocol processing
 	 */
-        do_gettimeofday(&self->stamp);
+	self->stamp = ktime_get();
 
 	/* Check if we need to copy the data to a new skb or not.
 	 * For most frames, we use ZeroCopy and pass the already
diff --git a/drivers/net/irda/irda-usb.h b/drivers/net/irda/irda-usb.h
index 58ddb52..8093216 100644
--- a/drivers/net/irda/irda-usb.h
+++ b/drivers/net/irda/irda-usb.h
@@ -29,7 +29,7 @@
  *
  *****************************************************************************/
 
-#include <linux/time.h>
+#include <linux/ktime.h>
 
 #include <net/irda/irda.h>
 #include <net/irda/irda_device.h>      /* struct irlap_cb */
@@ -157,8 +157,8 @@ struct irda_usb_cb {
 	char *speed_buff;		/* Buffer for speed changes */
 	char *tx_buff;
 
-	struct timeval stamp;
-	struct timeval now;
+	ktime_t stamp;
+	ktime_t now;
 
 	spinlock_t lock;		/* For serializing Tx operations */
 
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 1/6] driver/net/irda: Removed all unused timeval variables
From: Chunyan Zhang @ 2015-01-07  3:39 UTC (permalink / raw)
  To: samuel; +Cc: arnd, zhang.lyra, netdev, linux-kernel
In-Reply-To: <1420601978-15866-1-git-send-email-zhang.chunyan@linaro.org>

In the file au1k_ir.c & via-ircc.h, there were two unused
definitions of the timeval type members, so I removed them
entirely in this patch.

In other three files, the same problem is the rx_time
member is only ever written, never read, so removed it
entirely.

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
---
 drivers/net/irda/au1k_ir.c     |    3 ---
 drivers/net/irda/kingsun-sir.c |    3 ---
 drivers/net/irda/ks959-sir.c   |    3 ---
 drivers/net/irda/mcs7780.c     |    2 --
 drivers/net/irda/mcs7780.h     |    1 -
 drivers/net/irda/via-ircc.h    |    4 ----
 6 files changed, 16 deletions(-)

diff --git a/drivers/net/irda/au1k_ir.c b/drivers/net/irda/au1k_ir.c
index e151205..44e4f38 100644
--- a/drivers/net/irda/au1k_ir.c
+++ b/drivers/net/irda/au1k_ir.c
@@ -24,7 +24,6 @@
 #include <linux/interrupt.h>
 #include <linux/platform_device.h>
 #include <linux/slab.h>
-#include <linux/time.h>
 #include <linux/types.h>
 #include <linux/ioport.h>
 
@@ -163,8 +162,6 @@ struct au1k_private {
 	iobuff_t rx_buff;
 
 	struct net_device *netdev;
-	struct timeval stamp;
-	struct timeval now;
 	struct qos_info qos;
 	struct irlap_cb *irlap;
 
diff --git a/drivers/net/irda/kingsun-sir.c b/drivers/net/irda/kingsun-sir.c
index e638893..fb5d162 100644
--- a/drivers/net/irda/kingsun-sir.c
+++ b/drivers/net/irda/kingsun-sir.c
@@ -114,7 +114,6 @@ struct kingsun_cb {
 					   (usually 8) */
 
 	iobuff_t  	  rx_buff;	/* receive unwrap state machine */
-	struct timeval	  rx_time;
 	spinlock_t lock;
 	int receiving;
 
@@ -235,7 +234,6 @@ static void kingsun_rcv_irq(struct urb *urb)
 						  &kingsun->netdev->stats,
 						  &kingsun->rx_buff, bytes[i]);
 			}
-			do_gettimeofday(&kingsun->rx_time);
 			kingsun->receiving =
 				(kingsun->rx_buff.state != OUTSIDE_FRAME)
 				? 1 : 0;
@@ -273,7 +271,6 @@ static int kingsun_net_open(struct net_device *netdev)
 
 	skb_reserve(kingsun->rx_buff.skb, 1);
 	kingsun->rx_buff.head = kingsun->rx_buff.skb->data;
-	do_gettimeofday(&kingsun->rx_time);
 
 	kingsun->rx_urb = usb_alloc_urb(0, GFP_KERNEL);
 	if (!kingsun->rx_urb)
diff --git a/drivers/net/irda/ks959-sir.c b/drivers/net/irda/ks959-sir.c
index e6b3804..8e6e0ed 100644
--- a/drivers/net/irda/ks959-sir.c
+++ b/drivers/net/irda/ks959-sir.c
@@ -187,7 +187,6 @@ struct ks959_cb {
 	__u8 *rx_buf;
 	__u8 rx_variable_xormask;
 	iobuff_t rx_unwrap_buff;
-	struct timeval rx_time;
 
 	struct usb_ctrlrequest *speed_setuprequest;
 	struct urb *speed_urb;
@@ -476,7 +475,6 @@ static void ks959_rcv_irq(struct urb *urb)
 						  bytes[i]);
 			}
 		}
-		do_gettimeofday(&kingsun->rx_time);
 		kingsun->receiving =
 		    (kingsun->rx_unwrap_buff.state != OUTSIDE_FRAME) ? 1 : 0;
 	}
@@ -514,7 +512,6 @@ static int ks959_net_open(struct net_device *netdev)
 
 	skb_reserve(kingsun->rx_unwrap_buff.skb, 1);
 	kingsun->rx_unwrap_buff.head = kingsun->rx_unwrap_buff.skb->data;
-	do_gettimeofday(&kingsun->rx_time);
 
 	kingsun->rx_urb = usb_alloc_urb(0, GFP_KERNEL);
 	if (!kingsun->rx_urb)
diff --git a/drivers/net/irda/mcs7780.c b/drivers/net/irda/mcs7780.c
index e4d678f..bca6a1e 100644
--- a/drivers/net/irda/mcs7780.c
+++ b/drivers/net/irda/mcs7780.c
@@ -722,7 +722,6 @@ static int mcs_net_open(struct net_device *netdev)
 
 	skb_reserve(mcs->rx_buff.skb, 1);
 	mcs->rx_buff.head = mcs->rx_buff.skb->data;
-	do_gettimeofday(&mcs->rx_time);
 
 	/*
 	 * Now that everything should be initialized properly,
@@ -799,7 +798,6 @@ static void mcs_receive_irq(struct urb *urb)
 			mcs_unwrap_fir(mcs, urb->transfer_buffer,
 				urb->actual_length);
 		}
-		do_gettimeofday(&mcs->rx_time);
 	}
 
 	ret = usb_submit_urb(urb, GFP_ATOMIC);
diff --git a/drivers/net/irda/mcs7780.h b/drivers/net/irda/mcs7780.h
index b10689b..a6e8f7d 100644
--- a/drivers/net/irda/mcs7780.h
+++ b/drivers/net/irda/mcs7780.h
@@ -116,7 +116,6 @@ struct mcs_cb {
 	__u8 *fifo_status;
 
 	iobuff_t rx_buff;	/* receive unwrap state machine */
-	struct timeval rx_time;
 	spinlock_t lock;
 	int receiving;
 
diff --git a/drivers/net/irda/via-ircc.h b/drivers/net/irda/via-ircc.h
index 7ce820e..ac15255 100644
--- a/drivers/net/irda/via-ircc.h
+++ b/drivers/net/irda/via-ircc.h
@@ -29,7 +29,6 @@ this program; if not, see <http://www.gnu.org/licenses/>.
  ********************************************************************/
 #ifndef via_IRCC_H
 #define via_IRCC_H
-#include <linux/time.h>
 #include <linux/spinlock.h>
 #include <linux/pm.h>
 #include <linux/types.h>
@@ -106,9 +105,6 @@ struct via_ircc_cb {
 
 	__u8 ier;		/* Interrupt enable register */
 
-	struct timeval stamp;
-	struct timeval now;
-
 	spinlock_t lock;	/* For serializing operations */
 
 	__u32 flags;		/* Interface flags */
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 0/6] driver/net/irda: Use ktime_t instead of timeval
From: Chunyan Zhang @ 2015-01-07  3:39 UTC (permalink / raw)
  To: samuel; +Cc: arnd, zhang.lyra, netdev, linux-kernel
In-Reply-To: <driver-irda>

This patch-set removed all uses of timeval and used ktime_t instead if
needed, since 32-bit time types will break in the year 2038.

This patch-set also used the ktime_xxx functions accordingly.
e.g.
* Used ktime_get to get the current time instead of do_gettimeofday.
* And, used ktime_us_delta to get the elapsed time directly.

Chunyan Zhang (6):
  driver/net/irda: Removed all unused timeval variables
  driver/net/irda: Replace timeval with ktime_t in ali-ircc
  driver/net/irda: Replace timeval with ktime_t in irda-usb
  driver/net/irda: Replace timeval with ktime_t in nsc-ircc
  driver/net/irda: Replace timeval with ktime_t in stir4200
  driver/net/irda: Replace timeval with ktime_t in vlsi_ir

 drivers/net/irda/ali-ircc.c    |   12 ++++------
 drivers/net/irda/ali-ircc.h    |    6 ++---
 drivers/net/irda/au1k_ir.c     |    3 ---
 drivers/net/irda/irda-usb.c    |   11 +++------
 drivers/net/irda/irda-usb.h    |    6 ++---
 drivers/net/irda/kingsun-sir.c |    3 ---
 drivers/net/irda/ks959-sir.c   |    3 ---
 drivers/net/irda/mcs7780.c     |    2 --
 drivers/net/irda/mcs7780.h     |    1 -
 drivers/net/irda/nsc-ircc.c    |    8 +++----
 drivers/net/irda/nsc-ircc.h    |    6 ++---
 drivers/net/irda/stir4200.c    |   18 +++++++-------
 drivers/net/irda/via-ircc.h    |    4 ----
 drivers/net/irda/vlsi_ir.c     |   51 ++++++++++++++--------------------------
 drivers/net/irda/vlsi_ir.h     |    2 +-
 15 files changed, 46 insertions(+), 90 deletions(-)

-- 
1.7.9.5

^ permalink raw reply

* Re: [PATCH 2/6] vxlan: Group Policy extension
From: Alexei Starovoitov @ 2015-01-07  3:37 UTC (permalink / raw)
  To: Thomas Graf
  Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Stephen Hemminger,
	David S. Miller

On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> +struct vxlan_gbp {
> +#ifdef __LITTLE_ENDIAN_BITFIELD
> +       __u8    reserved_flags1:3,
...
> +       __be16 policy_id;
> +} __packed;

are you sure that compiler will be smart enough
to do single short load when you pack
u8 + struct vxlan_gbp inside struct vxlanhdr ?
I suspect compiler will use two byte loads
with shifts and ors every time to access policy_id.
Even it works ok, I think this struct layout is ugly.
imo would be much easier to read if you replace
the whole vxlanhdr with vxlanhdr_gbp
or split vxlanhdr into two 32-bit structs.
then __packed hacks won't be needed.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply

* Re: [PATCH next] net: e1000: support txtd update delay via xmit_more
From: Jeff Kirsher @ 2015-01-07  3:34 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <1420557419-18869-1-git-send-email-fw@strlen.de>

On Tue, Jan 6, 2015 at 7:16 AM, Florian Westphal <fw@strlen.de> wrote:
> Don't update tx tail descriptor if we queue hasn't been stopped and
> we know at least one more skb will be sent right away.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)

Thanks Florian, I will add your patch to my queue.

^ permalink raw reply

* Re: [PATCH net-next v1 2/3] ARM: imx: add FEC sleep mode callback function
From: Shawn Guo @ 2015-01-07  3:17 UTC (permalink / raw)
  To: fugang.duan@freescale.com, davem@davemloft.net
  Cc: netdev@vger.kernel.org, bhutchings@solarflare.com,
	stephen@networkplumber.org
In-Reply-To: <BLUPR03MB37385464C06052A231ABF19F5460@BLUPR03MB373.namprd03.prod.outlook.com>

On Wed, Jan 07, 2015 at 02:40:33AM +0000, fugang.duan@freescale.com wrote:
> Yes, the platform code will become quite messy and tremendous.
> I agree with your thinking.
> 
> But David applied the patches, how can I do it now ?

David,

Can you please drop the last two arch/arm/ patches, or revert them if
your tree is non-rebasable?

Shawn

^ permalink raw reply

* [PATCH net-next 4/4] cxgb4: Add support for mps_tcam debugfs
From: Hariprasad Shenai @ 2015-01-07  3:18 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, nirranjan, anish, Hariprasad Shenai
In-Reply-To: <1420600683-23289-1-git-send-email-hariprasad@chelsio.com>

Debug log to get the MPS TCAM table

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  131 ++++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h       |   60 +++++++++
 2 files changed, 191 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index a8b0223..e9f3489 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -433,6 +433,136 @@ static const struct file_operations devlog_fops = {
 	.release = seq_release_private
 };
 
+static inline void tcamxy2valmask(u64 x, u64 y, u8 *addr, u64 *mask)
+{
+	*mask = x | y;
+	y = (__force u64)cpu_to_be64(y);
+	memcpy(addr, (char *)&y + 2, ETH_ALEN);
+}
+
+static int mps_tcam_show(struct seq_file *seq, void *v)
+{
+	if (v == SEQ_START_TOKEN)
+		seq_puts(seq, "Idx  Ethernet address     Mask     Vld Ports PF"
+			 "  VF              Replication             "
+			 "P0 P1 P2 P3  ML\n");
+	else {
+		u64 mask;
+		u8 addr[ETH_ALEN];
+		struct adapter *adap = seq->private;
+		unsigned int idx = (uintptr_t)v - 2;
+		u64 tcamy = t4_read_reg64(adap, MPS_CLS_TCAM_Y_L(idx));
+		u64 tcamx = t4_read_reg64(adap, MPS_CLS_TCAM_X_L(idx));
+		u32 cls_lo = t4_read_reg(adap, MPS_CLS_SRAM_L(idx));
+		u32 cls_hi = t4_read_reg(adap, MPS_CLS_SRAM_H(idx));
+		u32 rplc[4] = {0, 0, 0, 0};
+
+		if (tcamx & tcamy) {
+			seq_printf(seq, "%3u         -\n", idx);
+			goto out;
+		}
+
+		if (cls_lo & REPLICATE_F) {
+			struct fw_ldst_cmd ldst_cmd;
+			int ret;
+
+			memset(&ldst_cmd, 0, sizeof(ldst_cmd));
+			ldst_cmd.op_to_addrspace =
+				htonl(FW_CMD_OP_V(FW_LDST_CMD) |
+				      FW_CMD_REQUEST_F |
+				      FW_CMD_READ_F |
+				      FW_LDST_CMD_ADDRSPACE_V(
+					      FW_LDST_ADDRSPC_MPS));
+			ldst_cmd.cycles_to_len16 = htonl(FW_LEN16(ldst_cmd));
+			ldst_cmd.u.mps.fid_ctl =
+				htons(FW_LDST_CMD_FID_V(FW_LDST_MPS_RPLC) |
+				      FW_LDST_CMD_CTL_V(idx));
+			ret = t4_wr_mbox(adap, adap->mbox, &ldst_cmd,
+					 sizeof(ldst_cmd), &ldst_cmd);
+			if (ret)
+				dev_warn(adap->pdev_dev, "Can't read MPS "
+					 "replication map for idx %d: %d\n",
+					 idx, -ret);
+			else {
+				rplc[0] = ntohl(ldst_cmd.u.mps.rplc31_0);
+				rplc[1] = ntohl(ldst_cmd.u.mps.rplc63_32);
+				rplc[2] = ntohl(ldst_cmd.u.mps.rplc95_64);
+				rplc[3] = ntohl(ldst_cmd.u.mps.rplc127_96);
+			}
+		}
+
+		tcamxy2valmask(tcamx, tcamy, addr, &mask);
+		seq_printf(seq, "%3u %02x:%02x:%02x:%02x:%02x:%02x %012llx"
+			   "%3c   %#x%4u%4d",
+			   idx, addr[0], addr[1], addr[2], addr[3], addr[4],
+			   addr[5], (unsigned long long)mask,
+			   (cls_lo & SRAM_VLD_F) ? 'Y' : 'N', PORTMAP_G(cls_hi),
+			   PF_G(cls_lo),
+			   (cls_lo & VF_VALID_F) ? VF_G(cls_lo) : -1);
+		if (cls_lo & REPLICATE_F)
+			seq_printf(seq, " %08x %08x %08x %08x",
+				   rplc[3], rplc[2], rplc[1], rplc[0]);
+		else
+			seq_printf(seq, "%36c", ' ');
+		seq_printf(seq, "%4u%3u%3u%3u %#x\n",
+			   SRAM_PRIO0_G(cls_lo), SRAM_PRIO1_G(cls_lo),
+			   SRAM_PRIO2_G(cls_lo), SRAM_PRIO3_G(cls_lo),
+			   (cls_lo >> MULTILISTEN0_S) & 0xf);
+	}
+out:	return 0;
+}
+
+static inline void *mps_tcam_get_idx(struct seq_file *seq, loff_t pos)
+{
+	struct adapter *adap = seq->private;
+	int max_mac_addr = is_t4(adap->params.chip) ?
+				NUM_MPS_CLS_SRAM_L_INSTANCES :
+				NUM_MPS_T5_CLS_SRAM_L_INSTANCES;
+	return ((pos <= max_mac_addr) ? (void *)(uintptr_t)(pos + 1) : NULL);
+}
+
+static void *mps_tcam_start(struct seq_file *seq, loff_t *pos)
+{
+	return *pos ? mps_tcam_get_idx(seq, *pos) : SEQ_START_TOKEN;
+}
+
+static void *mps_tcam_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	++*pos;
+	return mps_tcam_get_idx(seq, *pos);
+}
+
+static void mps_tcam_stop(struct seq_file *seq, void *v)
+{
+}
+
+static const struct seq_operations mps_tcam_seq_ops = {
+	.start = mps_tcam_start,
+	.next  = mps_tcam_next,
+	.stop  = mps_tcam_stop,
+	.show  = mps_tcam_show
+};
+
+static int mps_tcam_open(struct inode *inode, struct file *file)
+{
+	int res = seq_open(file, &mps_tcam_seq_ops);
+
+	if (!res) {
+		struct seq_file *seq = file->private_data;
+
+		seq->private = inode->i_private;
+	}
+	return res;
+}
+
+static const struct file_operations mps_tcam_debugfs_fops = {
+	.owner   = THIS_MODULE,
+	.open    = mps_tcam_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = seq_release,
+};
+
 static ssize_t mem_read(struct file *file, char __user *buf, size_t count,
 			loff_t *ppos)
 {
@@ -515,6 +645,7 @@ int t4_setup_debugfs(struct adapter *adap)
 		{ "cim_qcfg", &cim_qcfg_fops, S_IRUSR, 0 },
 		{ "devlog", &devlog_fops, S_IRUSR, 0 },
 		{ "l2t", &t4_l2t_fops, S_IRUSR, 0},
+		{ "mps_tcam", &mps_tcam_debugfs_fops, S_IRUSR, 0 },
 	};
 
 	add_debugfs_files(adap,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
index a2cae0e..7ce55f9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
@@ -1708,6 +1708,66 @@
 
 #define MPS_RX_PERR_INT_CAUSE_A 0x11074
 
+#define MPS_CLS_TCAM_Y_L_A 0xf000
+#define MPS_CLS_TCAM_X_L_A 0xf008
+
+#define MPS_CLS_TCAM_Y_L(idx) (MPS_CLS_TCAM_Y_L_A + (idx) * 16)
+#define NUM_MPS_CLS_TCAM_Y_L_INSTANCES 512
+
+#define MPS_CLS_TCAM_X_L(idx) (MPS_CLS_TCAM_X_L_A + (idx) * 16)
+#define NUM_MPS_CLS_TCAM_X_L_INSTANCES 512
+
+#define MPS_CLS_SRAM_L_A 0xe000
+#define MPS_CLS_SRAM_H_A 0xe004
+
+#define MPS_CLS_SRAM_L(idx) (MPS_CLS_SRAM_L_A + (idx) * 8)
+#define NUM_MPS_CLS_SRAM_L_INSTANCES 336
+
+#define MPS_CLS_SRAM_H(idx) (MPS_CLS_SRAM_H_A + (idx) * 8)
+#define NUM_MPS_CLS_SRAM_H_INSTANCES 336
+
+#define MULTILISTEN0_S    25
+
+#define REPLICATE_S    11
+#define REPLICATE_V(x) ((x) << REPLICATE_S)
+#define REPLICATE_F    REPLICATE_V(1U)
+
+#define PF_S    8
+#define PF_M    0x7U
+#define PF_G(x) (((x) >> PF_S) & PF_M)
+
+#define VF_VALID_S    7
+#define VF_VALID_V(x) ((x) << VF_VALID_S)
+#define VF_VALID_F    VF_VALID_V(1U)
+
+#define VF_S    0
+#define VF_M    0x7fU
+#define VF_G(x) (((x) >> VF_S) & VF_M)
+
+#define SRAM_PRIO3_S    22
+#define SRAM_PRIO3_M    0x7U
+#define SRAM_PRIO3_G(x) (((x) >> SRAM_PRIO3_S) & SRAM_PRIO3_M)
+
+#define SRAM_PRIO2_S    19
+#define SRAM_PRIO2_M    0x7U
+#define SRAM_PRIO2_G(x) (((x) >> SRAM_PRIO2_S) & SRAM_PRIO2_M)
+
+#define SRAM_PRIO1_S    16
+#define SRAM_PRIO1_M    0x7U
+#define SRAM_PRIO1_G(x) (((x) >> SRAM_PRIO1_S) & SRAM_PRIO1_M)
+
+#define SRAM_PRIO0_S    13
+#define SRAM_PRIO0_M    0x7U
+#define SRAM_PRIO0_G(x) (((x) >> SRAM_PRIO0_S) & SRAM_PRIO0_M)
+
+#define SRAM_VLD_S    12
+#define SRAM_VLD_V(x) ((x) << SRAM_VLD_S)
+#define SRAM_VLD_F    SRAM_VLD_V(1U)
+
+#define PORTMAP_S    0
+#define PORTMAP_M    0xfU
+#define PORTMAP_G(x) (((x) >> PORTMAP_S) & PORTMAP_M)
+
 #define CPL_INTR_CAUSE_A 0x19054
 
 #define CIM_OP_MAP_PERR_S    5
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 3/4] cxgb4: Add support for cim_qcfg entry in debugfs
From: Hariprasad Shenai @ 2015-01-07  3:18 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, nirranjan, anish, Hariprasad Shenai
In-Reply-To: <1420600683-23289-1-git-send-email-hariprasad@chelsio.com>

Adds debug log to get cim queue config

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |    1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |   70 ++++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c         |   35 ++++++++++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.h         |    3 +
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h       |   55 +++++++++++++++
 5 files changed, 164 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 46cd506..7c785b5 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -1041,6 +1041,7 @@ int t4_cim_read(struct adapter *adap, unsigned int addr, unsigned int n,
 int t4_cim_write(struct adapter *adap, unsigned int addr, unsigned int n,
 		 const unsigned int *valp);
 int t4_cim_read_la(struct adapter *adap, u32 *la_buf, unsigned int *wrptr);
+void t4_read_cimq_cfg(struct adapter *adap, u16 *base, u16 *size, u16 *thres);
 const char *t4_get_port_type_description(enum fw_port_type port_type);
 void t4_get_port_stats(struct adapter *adap, int idx, struct port_stats *p);
 void t4_read_mtu_tbl(struct adapter *adap, u16 *mtus, u8 *mtu_log);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 0f7b23f..a8b0223 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -168,6 +168,75 @@ static const struct file_operations cim_la_fops = {
 	.release = seq_release_private
 };
 
+static int cim_qcfg_show(struct seq_file *seq, void *v)
+{
+	static const char * const qname[] = {
+		"TP0", "TP1", "ULP", "SGE0", "SGE1", "NC-SI",
+		"ULP0", "ULP1", "ULP2", "ULP3", "SGE", "NC-SI",
+		"SGE0-RX", "SGE1-RX"
+	};
+
+	int i;
+	struct adapter *adap = seq->private;
+	u16 base[CIM_NUM_IBQ + CIM_NUM_OBQ_T5];
+	u16 size[CIM_NUM_IBQ + CIM_NUM_OBQ_T5];
+	u32 stat[(4 * (CIM_NUM_IBQ + CIM_NUM_OBQ_T5))];
+	u16 thres[CIM_NUM_IBQ];
+	u32 obq_wr_t4[2 * CIM_NUM_OBQ], *wr;
+	u32 obq_wr_t5[2 * CIM_NUM_OBQ_T5];
+	u32 *p = stat;
+	int cim_num_obq = is_t4(adap->params.chip) ?
+				CIM_NUM_OBQ : CIM_NUM_OBQ_T5;
+
+	i = t4_cim_read(adap, is_t4(adap->params.chip) ? UP_IBQ_0_RDADDR_A :
+			UP_IBQ_0_SHADOW_RDADDR_A,
+			ARRAY_SIZE(stat), stat);
+	if (!i) {
+		if (is_t4(adap->params.chip)) {
+			i = t4_cim_read(adap, UP_OBQ_0_REALADDR_A,
+					ARRAY_SIZE(obq_wr_t4), obq_wr_t4);
+				wr = obq_wr_t4;
+		} else {
+			i = t4_cim_read(adap, UP_OBQ_0_SHADOW_REALADDR_A,
+					ARRAY_SIZE(obq_wr_t5), obq_wr_t5);
+				wr = obq_wr_t5;
+		}
+	}
+	if (i)
+		return i;
+
+	t4_read_cimq_cfg(adap, base, size, thres);
+
+	seq_printf(seq,
+		   "  Queue  Base  Size Thres  RdPtr WrPtr  SOP  EOP Avail\n");
+	for (i = 0; i < CIM_NUM_IBQ; i++, p += 4)
+		seq_printf(seq, "%7s %5x %5u %5u %6x  %4x %4u %4u %5u\n",
+			   qname[i], base[i], size[i], thres[i],
+			   IBQRDADDR_G(p[0]), IBQWRADDR_G(p[1]),
+			   QUESOPCNT_G(p[3]), QUEEOPCNT_G(p[3]),
+			   QUEREMFLITS_G(p[2]) * 16);
+	for ( ; i < CIM_NUM_IBQ + cim_num_obq; i++, p += 4, wr += 2)
+		seq_printf(seq, "%7s %5x %5u %12x  %4x %4u %4u %5u\n",
+			   qname[i], base[i], size[i],
+			   QUERDADDR_G(p[0]) & 0x3fff, wr[0] - base[i],
+			   QUESOPCNT_G(p[3]), QUEEOPCNT_G(p[3]),
+			   QUEREMFLITS_G(p[2]) * 16);
+	return 0;
+}
+
+static int cim_qcfg_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, cim_qcfg_show, inode->i_private);
+}
+
+static const struct file_operations cim_qcfg_fops = {
+	.owner   = THIS_MODULE,
+	.open    = cim_qcfg_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = single_release,
+};
+
 /* Firmware Device Log dump. */
 static const char * const devlog_level_strings[] = {
 	[FW_DEVLOG_LEVEL_EMERG]		= "EMERG",
@@ -443,6 +512,7 @@ int t4_setup_debugfs(struct adapter *adap)
 
 	static struct t4_debugfs_entry t4_debugfs_files[] = {
 		{ "cim_la", &cim_la_fops, S_IRUSR, 0 },
+		{ "cim_qcfg", &cim_qcfg_fops, S_IRUSR, 0 },
 		{ "devlog", &devlog_fops, S_IRUSR, 0 },
 		{ "l2t", &t4_l2t_fops, S_IRUSR, 0},
 	};
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 1e30554..734d33e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -4326,6 +4326,41 @@ int t4_port_init(struct adapter *adap, int mbox, int pf, int vf)
 }
 
 /**
+ *	t4_read_cimq_cfg - read CIM queue configuration
+ *	@adap: the adapter
+ *	@base: holds the queue base addresses in bytes
+ *	@size: holds the queue sizes in bytes
+ *	@thres: holds the queue full thresholds in bytes
+ *
+ *	Returns the current configuration of the CIM queues, starting with
+ *	the IBQs, then the OBQs.
+ */
+void t4_read_cimq_cfg(struct adapter *adap, u16 *base, u16 *size, u16 *thres)
+{
+	unsigned int i, v;
+	int cim_num_obq = is_t4(adap->params.chip) ?
+				CIM_NUM_OBQ : CIM_NUM_OBQ_T5;
+
+	for (i = 0; i < CIM_NUM_IBQ; i++) {
+		t4_write_reg(adap, CIM_QUEUE_CONFIG_REF_A, IBQSELECT_F |
+			     QUENUMSELECT_V(i));
+		v = t4_read_reg(adap, CIM_QUEUE_CONFIG_CTRL_A);
+		/* value is in 256-byte units */
+		*base++ = CIMQBASE_G(v) * 256;
+		*size++ = CIMQSIZE_G(v) * 256;
+		*thres++ = QUEFULLTHRSH_G(v) * 8; /* 8-byte unit */
+	}
+	for (i = 0; i < cim_num_obq; i++) {
+		t4_write_reg(adap, CIM_QUEUE_CONFIG_REF_A, OBQSELECT_F |
+			     QUENUMSELECT_V(i));
+		v = t4_read_reg(adap, CIM_QUEUE_CONFIG_CTRL_A);
+		/* value is in 256-byte units */
+		*base++ = CIMQBASE_G(v) * 256;
+		*size++ = CIMQSIZE_G(v) * 256;
+	}
+}
+
+/**
  *	t4_cim_read - read a block from CIM internal address space
  *	@adap: the adapter
  *	@addr: the start address within the CIM address space
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h
index bcc925b..f6b82da 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h
@@ -56,6 +56,9 @@ enum {
 };
 
 enum {
+	CIM_NUM_IBQ    = 6,     /* # of CIM IBQs */
+	CIM_NUM_OBQ    = 6,     /* # of CIM OBQs */
+	CIM_NUM_OBQ_T5 = 8,     /* # of CIM OBQs for T5 adapter */
 	CIMLA_SIZE     = 2048,  /* # of 32-bit words in CIM LA */
 };
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
index 3fe6eeb..a2cae0e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
@@ -2096,4 +2096,59 @@
 #define UPDBGLACAPTPCONLY_V(x)	((x) << UPDBGLACAPTPCONLY_S)
 #define UPDBGLACAPTPCONLY_F	UPDBGLACAPTPCONLY_V(1U)
 
+#define CIM_QUEUE_CONFIG_REF_A 0x7b48
+#define CIM_QUEUE_CONFIG_CTRL_A 0x7b4c
+
+#define CIMQSIZE_S    24
+#define CIMQSIZE_M    0x3fU
+#define CIMQSIZE_G(x) (((x) >> CIMQSIZE_S) & CIMQSIZE_M)
+
+#define CIMQBASE_S    16
+#define CIMQBASE_M    0x3fU
+#define CIMQBASE_G(x) (((x) >> CIMQBASE_S) & CIMQBASE_M)
+
+#define QUEFULLTHRSH_S    0
+#define QUEFULLTHRSH_M    0x1ffU
+#define QUEFULLTHRSH_G(x) (((x) >> QUEFULLTHRSH_S) & QUEFULLTHRSH_M)
+
+#define UP_IBQ_0_RDADDR_A 0x10
+#define UP_IBQ_0_SHADOW_RDADDR_A 0x280
+#define UP_OBQ_0_REALADDR_A 0x104
+#define UP_OBQ_0_SHADOW_REALADDR_A 0x394
+
+#define IBQRDADDR_S    0
+#define IBQRDADDR_M    0x1fffU
+#define IBQRDADDR_G(x) (((x) >> IBQRDADDR_S) & IBQRDADDR_M)
+
+#define IBQWRADDR_S    0
+#define IBQWRADDR_M    0x1fffU
+#define IBQWRADDR_G(x) (((x) >> IBQWRADDR_S) & IBQWRADDR_M)
+
+#define QUERDADDR_S    0
+#define QUERDADDR_M    0x7fffU
+#define QUERDADDR_G(x) (((x) >> QUERDADDR_S) & QUERDADDR_M)
+
+#define QUEREMFLITS_S    0
+#define QUEREMFLITS_M    0x7ffU
+#define QUEREMFLITS_G(x) (((x) >> QUEREMFLITS_S) & QUEREMFLITS_M)
+
+#define QUEEOPCNT_S    16
+#define QUEEOPCNT_M    0xfffU
+#define QUEEOPCNT_G(x) (((x) >> QUEEOPCNT_S) & QUEEOPCNT_M)
+
+#define QUESOPCNT_S    0
+#define QUESOPCNT_M    0xfffU
+#define QUESOPCNT_G(x) (((x) >> QUESOPCNT_S) & QUESOPCNT_M)
+
+#define OBQSELECT_S    4
+#define OBQSELECT_V(x) ((x) << OBQSELECT_S)
+#define OBQSELECT_F    OBQSELECT_V(1U)
+
+#define IBQSELECT_S    3
+#define IBQSELECT_V(x) ((x) << IBQSELECT_S)
+#define IBQSELECT_F    IBQSELECT_V(1U)
+
+#define QUENUMSELECT_S    0
+#define QUENUMSELECT_V(x) ((x) << QUENUMSELECT_S)
+
 #endif /* __T4_REGS_H */
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 2/4] cxgb4: Add support for cim_la entry in debugfs
From: Hariprasad Shenai @ 2015-01-07  3:18 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, nirranjan, anish, Hariprasad Shenai
In-Reply-To: <1420600683-23289-1-git-send-email-hariprasad@chelsio.com>

The CIM LA captures the embedded processor’s internal state. Optionally, it can
also trace the flow of data in and out of the embedded processor. Therefore, the
CIM LA output contains detailed information of what code the embedded processor
executed prior to the CIM LA capture.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |    7 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  129 +++++++++++++++++++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h |   12 ++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c         |  120 ++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.h         |    4 +
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h       |   34 +++++
 6 files changed, 304 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 73b1f3a..46cd506 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -304,6 +304,8 @@ struct adapter_params {
 	struct devlog_params devlog;
 	enum pcie_memwin drv_memwin;
 
+	unsigned int cim_la_size;
+
 	unsigned int sf_size;             /* serial flash size in bytes */
 	unsigned int sf_nsec;             /* # of flash sectors */
 	unsigned int sf_fw_start;         /* start of FW image in flash */
@@ -1034,6 +1036,11 @@ int t4_mc_read(struct adapter *adap, int idx, u32 addr, __be32 *data,
 	       u64 *parity);
 int t4_edc_read(struct adapter *adap, int idx, u32 addr, __be32 *data,
 		u64 *parity);
+int t4_cim_read(struct adapter *adap, unsigned int addr, unsigned int n,
+		unsigned int *valp);
+int t4_cim_write(struct adapter *adap, unsigned int addr, unsigned int n,
+		 const unsigned int *valp);
+int t4_cim_read_la(struct adapter *adap, u32 *la_buf, unsigned int *wrptr);
 const char *t4_get_port_type_description(enum fw_port_type port_type);
 void t4_get_port_stats(struct adapter *adap, int idx, struct port_stats *p);
 void t4_read_mtu_tbl(struct adapter *adap, u16 *mtus, u8 *mtu_log);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 1d3d1c5..0f7b23f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -43,8 +43,132 @@
 #include "cxgb4_debugfs.h"
 #include "l2t.h"
 
-/* Firmware Device Log dump.
- */
+/* generic seq_file support for showing a table of size rows x width. */
+static void *seq_tab_get_idx(struct seq_tab *tb, loff_t pos)
+{
+	pos -= tb->skip_first;
+	return pos >= tb->rows ? NULL : &tb->data[pos * tb->width];
+}
+
+static void *seq_tab_start(struct seq_file *seq, loff_t *pos)
+{
+	struct seq_tab *tb = seq->private;
+
+	if (tb->skip_first && *pos == 0)
+		return SEQ_START_TOKEN;
+
+	return seq_tab_get_idx(tb, *pos);
+}
+
+static void *seq_tab_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	v = seq_tab_get_idx(seq->private, *pos + 1);
+	if (v)
+		++*pos;
+	return v;
+}
+
+static void seq_tab_stop(struct seq_file *seq, void *v)
+{
+}
+
+static int seq_tab_show(struct seq_file *seq, void *v)
+{
+	const struct seq_tab *tb = seq->private;
+
+	return tb->show(seq, v, ((char *)v - tb->data) / tb->width);
+}
+
+static const struct seq_operations seq_tab_ops = {
+	.start = seq_tab_start,
+	.next  = seq_tab_next,
+	.stop  = seq_tab_stop,
+	.show  = seq_tab_show
+};
+
+struct seq_tab *seq_open_tab(struct file *f, unsigned int rows,
+			     unsigned int width, unsigned int have_header,
+			     int (*show)(struct seq_file *seq, void *v, int i))
+{
+	struct seq_tab *p;
+
+	p = __seq_open_private(f, &seq_tab_ops, sizeof(*p) + rows * width);
+	if (p) {
+		p->show = show;
+		p->rows = rows;
+		p->width = width;
+		p->skip_first = have_header != 0;
+	}
+	return p;
+}
+
+static int cim_la_show(struct seq_file *seq, void *v, int idx)
+{
+	if (v == SEQ_START_TOKEN)
+		seq_puts(seq, "Status   Data      PC     LS0Stat  LS0Addr "
+			 "            LS0Data\n");
+	else {
+		const u32 *p = v;
+
+		seq_printf(seq,
+			   "  %02x  %x%07x %x%07x %08x %08x %08x%08x%08x%08x\n",
+			   (p[0] >> 4) & 0xff, p[0] & 0xf, p[1] >> 4,
+			   p[1] & 0xf, p[2] >> 4, p[2] & 0xf, p[3], p[4], p[5],
+			   p[6], p[7]);
+	}
+	return 0;
+}
+
+static int cim_la_show_3in1(struct seq_file *seq, void *v, int idx)
+{
+	if (v == SEQ_START_TOKEN) {
+		seq_puts(seq, "Status   Data      PC\n");
+	} else {
+		const u32 *p = v;
+
+		seq_printf(seq, "  %02x   %08x %08x\n", p[5] & 0xff, p[6],
+			   p[7]);
+		seq_printf(seq, "  %02x   %02x%06x %02x%06x\n",
+			   (p[3] >> 8) & 0xff, p[3] & 0xff, p[4] >> 8,
+			   p[4] & 0xff, p[5] >> 8);
+		seq_printf(seq, "  %02x   %x%07x %x%07x\n", (p[0] >> 4) & 0xff,
+			   p[0] & 0xf, p[1] >> 4, p[1] & 0xf, p[2] >> 4);
+	}
+	return 0;
+}
+
+static int cim_la_open(struct inode *inode, struct file *file)
+{
+	int ret;
+	unsigned int cfg;
+	struct seq_tab *p;
+	struct adapter *adap = inode->i_private;
+
+	ret = t4_cim_read(adap, UP_UP_DBG_LA_CFG_A, 1, &cfg);
+	if (ret)
+		return ret;
+
+	p = seq_open_tab(file, adap->params.cim_la_size / 8, 8 * sizeof(u32), 1,
+			 cfg & UPDBGLACAPTPCONLY_F ?
+			 cim_la_show_3in1 : cim_la_show);
+	if (!p)
+		return -ENOMEM;
+
+	ret = t4_cim_read_la(adap, (u32 *)p->data, NULL);
+	if (ret)
+		seq_release_private(inode, file);
+	return ret;
+}
+
+static const struct file_operations cim_la_fops = {
+	.owner   = THIS_MODULE,
+	.open    = cim_la_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = seq_release_private
+};
+
+/* Firmware Device Log dump. */
 static const char * const devlog_level_strings[] = {
 	[FW_DEVLOG_LEVEL_EMERG]		= "EMERG",
 	[FW_DEVLOG_LEVEL_CRIT]		= "CRIT",
@@ -318,6 +442,7 @@ int t4_setup_debugfs(struct adapter *adap)
 	u32 size;
 
 	static struct t4_debugfs_entry t4_debugfs_files[] = {
+		{ "cim_la", &cim_la_fops, S_IRUSR, 0 },
 		{ "devlog", &devlog_fops, S_IRUSR, 0 },
 		{ "l2t", &t4_l2t_fops, S_IRUSR, 0},
 	};
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h
index a3d8867..70fcbc9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h
@@ -44,6 +44,18 @@ struct t4_debugfs_entry {
 	unsigned char data;
 };
 
+struct seq_tab {
+	int (*show)(struct seq_file *seq, void *v, int idx);
+	unsigned int rows;        /* # of entries */
+	unsigned char width;      /* size in bytes of each entry */
+	unsigned char skip_first; /* whether the first line is a header */
+	char data[0];             /* the table data */
+};
+
+struct seq_tab *seq_open_tab(struct file *f, unsigned int rows,
+			     unsigned int width, unsigned int have_header,
+			     int (*show)(struct seq_file *seq, void *v, int i));
+
 int t4_setup_debugfs(struct adapter *adap);
 void add_debugfs_files(struct adapter *adap,
 		       struct t4_debugfs_entry *files,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 3776279..1e30554 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -4031,6 +4031,7 @@ int t4_prep_adapter(struct adapter *adapter)
 		return -EINVAL;
 	}
 
+	adapter->params.cim_la_size = CIMLA_SIZE;
 	init_cong_ctrl(adapter->params.a_wnd, adapter->params.b_wnd);
 
 	/*
@@ -4323,3 +4324,122 @@ int t4_port_init(struct adapter *adap, int mbox, int pf, int vf)
 	}
 	return 0;
 }
+
+/**
+ *	t4_cim_read - read a block from CIM internal address space
+ *	@adap: the adapter
+ *	@addr: the start address within the CIM address space
+ *	@n: number of words to read
+ *	@valp: where to store the result
+ *
+ *	Reads a block of 4-byte words from the CIM intenal address space.
+ */
+int t4_cim_read(struct adapter *adap, unsigned int addr, unsigned int n,
+		unsigned int *valp)
+{
+	int ret = 0;
+
+	if (t4_read_reg(adap, CIM_HOST_ACC_CTRL_A) & HOSTBUSY_F)
+		return -EBUSY;
+
+	for ( ; !ret && n--; addr += 4) {
+		t4_write_reg(adap, CIM_HOST_ACC_CTRL_A, addr);
+		ret = t4_wait_op_done(adap, CIM_HOST_ACC_CTRL_A, HOSTBUSY_F,
+				      0, 5, 2);
+		if (!ret)
+			*valp++ = t4_read_reg(adap, CIM_HOST_ACC_DATA_A);
+	}
+	return ret;
+}
+
+/**
+ *	t4_cim_write - write a block into CIM internal address space
+ *	@adap: the adapter
+ *	@addr: the start address within the CIM address space
+ *	@n: number of words to write
+ *	@valp: set of values to write
+ *
+ *	Writes a block of 4-byte words into the CIM intenal address space.
+ */
+int t4_cim_write(struct adapter *adap, unsigned int addr, unsigned int n,
+		 const unsigned int *valp)
+{
+	int ret = 0;
+
+	if (t4_read_reg(adap, CIM_HOST_ACC_CTRL_A) & HOSTBUSY_F)
+		return -EBUSY;
+
+	for ( ; !ret && n--; addr += 4) {
+		t4_write_reg(adap, CIM_HOST_ACC_DATA_A, *valp++);
+		t4_write_reg(adap, CIM_HOST_ACC_CTRL_A, addr | HOSTWRITE_F);
+		ret = t4_wait_op_done(adap, CIM_HOST_ACC_CTRL_A, HOSTBUSY_F,
+				      0, 5, 2);
+	}
+	return ret;
+}
+
+static int t4_cim_write1(struct adapter *adap, unsigned int addr,
+			 unsigned int val)
+{
+	return t4_cim_write(adap, addr, 1, &val);
+}
+
+/**
+ *	t4_cim_read_la - read CIM LA capture buffer
+ *	@adap: the adapter
+ *	@la_buf: where to store the LA data
+ *	@wrptr: the HW write pointer within the capture buffer
+ *
+ *	Reads the contents of the CIM LA buffer with the most recent entry at
+ *	the end	of the returned data and with the entry at @wrptr first.
+ *	We try to leave the LA in the running state we find it in.
+ */
+int t4_cim_read_la(struct adapter *adap, u32 *la_buf, unsigned int *wrptr)
+{
+	int i, ret;
+	unsigned int cfg, val, idx;
+
+	ret = t4_cim_read(adap, UP_UP_DBG_LA_CFG_A, 1, &cfg);
+	if (ret)
+		return ret;
+
+	if (cfg & UPDBGLAEN_F) {	/* LA is running, freeze it */
+		ret = t4_cim_write1(adap, UP_UP_DBG_LA_CFG_A, 0);
+		if (ret)
+			return ret;
+	}
+
+	ret = t4_cim_read(adap, UP_UP_DBG_LA_CFG_A, 1, &val);
+	if (ret)
+		goto restart;
+
+	idx = UPDBGLAWRPTR_G(val);
+	if (wrptr)
+		*wrptr = idx;
+
+	for (i = 0; i < adap->params.cim_la_size; i++) {
+		ret = t4_cim_write1(adap, UP_UP_DBG_LA_CFG_A,
+				    UPDBGLARDPTR_V(idx) | UPDBGLARDEN_F);
+		if (ret)
+			break;
+		ret = t4_cim_read(adap, UP_UP_DBG_LA_CFG_A, 1, &val);
+		if (ret)
+			break;
+		if (val & UPDBGLARDEN_F) {
+			ret = -ETIMEDOUT;
+			break;
+		}
+		ret = t4_cim_read(adap, UP_UP_DBG_LA_DATA_A, 1, &la_buf[i]);
+		if (ret)
+			break;
+		idx = (idx + 1) & UPDBGLARDPTR_M;
+	}
+restart:
+	if (cfg & UPDBGLAEN_F) {
+		int r = t4_cim_write1(adap, UP_UP_DBG_LA_CFG_A,
+				      cfg & ~UPDBGLARDEN_F);
+		if (!ret)
+			ret = r;
+	}
+	return ret;
+}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h
index 5e5eee6..bcc925b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h
@@ -56,6 +56,10 @@ enum {
 };
 
 enum {
+	CIMLA_SIZE     = 2048,  /* # of 32-bit words in CIM LA */
+};
+
+enum {
 	SF_PAGE_SIZE = 256,           /* serial flash page size */
 	SF_SEC_SIZE = 64 * 1024,      /* serial flash sector size */
 };
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
index 4077227..3fe6eeb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
@@ -2062,4 +2062,38 @@
 #define PL_VF_WHOAMI_A 0x0
 #define PL_VF_REVISION_A 0x8
 
+/* registers for module CIM */
+#define CIM_HOST_ACC_CTRL_A	0x7b50
+#define CIM_HOST_ACC_DATA_A	0x7b54
+#define UP_UP_DBG_LA_CFG_A	0x140
+#define UP_UP_DBG_LA_DATA_A	0x144
+
+#define HOSTBUSY_S	17
+#define HOSTBUSY_V(x)	((x) << HOSTBUSY_S)
+#define HOSTBUSY_F	HOSTBUSY_V(1U)
+
+#define HOSTWRITE_S	16
+#define HOSTWRITE_V(x)	((x) << HOSTWRITE_S)
+#define HOSTWRITE_F	HOSTWRITE_V(1U)
+
+#define UPDBGLARDEN_S		1
+#define UPDBGLARDEN_V(x)	((x) << UPDBGLARDEN_S)
+#define UPDBGLARDEN_F		UPDBGLARDEN_V(1U)
+
+#define UPDBGLAEN_S	0
+#define UPDBGLAEN_V(x)	((x) << UPDBGLAEN_S)
+#define UPDBGLAEN_F	UPDBGLAEN_V(1U)
+
+#define UPDBGLARDPTR_S		2
+#define UPDBGLARDPTR_M		0xfffU
+#define UPDBGLARDPTR_V(x)	((x) << UPDBGLARDPTR_S)
+
+#define UPDBGLAWRPTR_S    16
+#define UPDBGLAWRPTR_M    0xfffU
+#define UPDBGLAWRPTR_G(x) (((x) >> UPDBGLAWRPTR_S) & UPDBGLAWRPTR_M)
+
+#define UPDBGLACAPTPCONLY_S	30
+#define UPDBGLACAPTPCONLY_V(x)	((x) << UPDBGLACAPTPCONLY_S)
+#define UPDBGLACAPTPCONLY_F	UPDBGLACAPTPCONLY_V(1U)
+
 #endif /* __T4_REGS_H */
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 0/4] Add support for few debugfs entries
From: Hariprasad Shenai @ 2015-01-07  3:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, nirranjan, anish, Hariprasad Shenai

Hi,

This patch series adds support for devlog, cim_la, cim_qcfg and mps_tcam
debugfs entries.

The patches series is created against 'net-next' tree.
And includes patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.

Thanks

Hariprasad Shenai (4):
  cxgb4: Add support for devlog
  cxgb4: Add support for cim_la entry in debugfs
  cxgb4: Add support for cim_qcfg entry in debugfs
  cxgb4: Add support for mps_tcam debugfs

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |   16 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  524 ++++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.h |   12 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    |   26 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c         |  155 ++++++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.h         |   19 +
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h       |  149 ++++++
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h      |   81 +++
 8 files changed, 982 insertions(+), 0 deletions(-)

^ permalink raw reply

* [PATCH net-next 1/4] cxgb4: Add support for devlog
From: Hariprasad Shenai @ 2015-01-07  3:18 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, nirranjan, anish, Hariprasad Shenai
In-Reply-To: <1420600683-23289-1-git-send-email-hariprasad@chelsio.com>

Add support for device log entry in debugfs

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h         |    8 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  198 ++++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    |   26 +++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.h         |   12 ++
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h      |   81 ++++++++
 5 files changed, 325 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 5ab5c31..73b1f3a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -290,11 +290,19 @@ enum chip_type {
 	T5_LAST_REV	= T5_A1,
 };
 
+struct devlog_params {
+	u32 memtype;                    /* which memory (EDC0, EDC1, MC) */
+	u32 start;                      /* start of log in firmware memory */
+	u32 size;                       /* size of log */
+};
+
 struct adapter_params {
 	struct sge_params sge;
 	struct tp_params  tp;
 	struct vpd_params vpd;
 	struct pci_params pci;
+	struct devlog_params devlog;
+	enum pcie_memwin drv_memwin;
 
 	unsigned int sf_size;             /* serial flash size in bytes */
 	unsigned int sf_nsec;             /* # of flash sectors */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index c98a350..1d3d1c5 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -43,6 +43,203 @@
 #include "cxgb4_debugfs.h"
 #include "l2t.h"
 
+/* Firmware Device Log dump.
+ */
+static const char * const devlog_level_strings[] = {
+	[FW_DEVLOG_LEVEL_EMERG]		= "EMERG",
+	[FW_DEVLOG_LEVEL_CRIT]		= "CRIT",
+	[FW_DEVLOG_LEVEL_ERR]		= "ERR",
+	[FW_DEVLOG_LEVEL_NOTICE]	= "NOTICE",
+	[FW_DEVLOG_LEVEL_INFO]		= "INFO",
+	[FW_DEVLOG_LEVEL_DEBUG]		= "DEBUG"
+};
+
+static const char * const devlog_facility_strings[] = {
+	[FW_DEVLOG_FACILITY_CORE]	= "CORE",
+	[FW_DEVLOG_FACILITY_SCHED]	= "SCHED",
+	[FW_DEVLOG_FACILITY_TIMER]	= "TIMER",
+	[FW_DEVLOG_FACILITY_RES]	= "RES",
+	[FW_DEVLOG_FACILITY_HW]		= "HW",
+	[FW_DEVLOG_FACILITY_FLR]	= "FLR",
+	[FW_DEVLOG_FACILITY_DMAQ]	= "DMAQ",
+	[FW_DEVLOG_FACILITY_PHY]	= "PHY",
+	[FW_DEVLOG_FACILITY_MAC]	= "MAC",
+	[FW_DEVLOG_FACILITY_PORT]	= "PORT",
+	[FW_DEVLOG_FACILITY_VI]		= "VI",
+	[FW_DEVLOG_FACILITY_FILTER]	= "FILTER",
+	[FW_DEVLOG_FACILITY_ACL]	= "ACL",
+	[FW_DEVLOG_FACILITY_TM]		= "TM",
+	[FW_DEVLOG_FACILITY_QFC]	= "QFC",
+	[FW_DEVLOG_FACILITY_DCB]	= "DCB",
+	[FW_DEVLOG_FACILITY_ETH]	= "ETH",
+	[FW_DEVLOG_FACILITY_OFLD]	= "OFLD",
+	[FW_DEVLOG_FACILITY_RI]		= "RI",
+	[FW_DEVLOG_FACILITY_ISCSI]	= "ISCSI",
+	[FW_DEVLOG_FACILITY_FCOE]	= "FCOE",
+	[FW_DEVLOG_FACILITY_FOISCSI]	= "FOISCSI",
+	[FW_DEVLOG_FACILITY_FOFCOE]	= "FOFCOE"
+};
+
+/* Information gathered by Device Log Open routine for the display routine.
+ */
+struct devlog_info {
+	unsigned int nentries;		/* number of entries in log[] */
+	unsigned int first;		/* first [temporal] entry in log[] */
+	struct fw_devlog_e log[0];	/* Firmware Device Log */
+};
+
+/* Dump a Firmaware Device Log entry.
+ */
+static int devlog_show(struct seq_file *seq, void *v)
+{
+	if (v == SEQ_START_TOKEN)
+		seq_printf(seq, "%10s  %15s  %8s  %8s  %s\n",
+			   "Seq#", "Tstamp", "Level", "Facility", "Message");
+	else {
+		struct devlog_info *dinfo = seq->private;
+		int fidx = (uintptr_t)v - 2;
+		unsigned long index;
+		struct fw_devlog_e *e;
+
+		/* Get a pointer to the log entry to display.  Skip unused log
+		 * entries.
+		 */
+		index = dinfo->first + fidx;
+		if (index >= dinfo->nentries)
+			index -= dinfo->nentries;
+		e = &dinfo->log[index];
+		if (e->timestamp == 0)
+			return 0;
+
+		/* Print the message.  This depends on the firmware using
+		 * exactly the same formating strings as the kernel so we may
+		 * eventually have to put a format interpreter in here ...
+		 */
+		seq_printf(seq, "%10d  %15llu  %8s  %8s  ",
+			   e->seqno, e->timestamp,
+			   (e->level < ARRAY_SIZE(devlog_level_strings)
+			    ? devlog_level_strings[e->level]
+			    : "UNKNOWN"),
+			   (e->facility < ARRAY_SIZE(devlog_facility_strings)
+			    ? devlog_facility_strings[e->facility]
+			    : "UNKNOWN"));
+		seq_printf(seq, e->fmt, e->params[0], e->params[1],
+			   e->params[2], e->params[3], e->params[4],
+			   e->params[5], e->params[6], e->params[7]);
+	}
+	return 0;
+}
+
+/* Sequential File Operations for Device Log.
+ */
+static inline void *devlog_get_idx(struct devlog_info *dinfo, loff_t pos)
+{
+	if (pos > dinfo->nentries)
+		return NULL;
+
+	return (void *)(uintptr_t)(pos + 1);
+}
+
+static void *devlog_start(struct seq_file *seq, loff_t *pos)
+{
+	struct devlog_info *dinfo = seq->private;
+
+	return (*pos
+		? devlog_get_idx(dinfo, *pos)
+		: SEQ_START_TOKEN);
+}
+
+static void *devlog_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct devlog_info *dinfo = seq->private;
+
+	(*pos)++;
+	return devlog_get_idx(dinfo, *pos);
+}
+
+static void devlog_stop(struct seq_file *seq, void *v)
+{
+}
+
+static const struct seq_operations devlog_seq_ops = {
+	.start = devlog_start,
+	.next  = devlog_next,
+	.stop  = devlog_stop,
+	.show  = devlog_show
+};
+
+/* Set up for reading the firmware's device log.  We read the entire log here
+ * and then display it incrementally in devlog_show().
+ */
+static int devlog_open(struct inode *inode, struct file *file)
+{
+	struct adapter *adap = inode->i_private;
+	struct devlog_params *dparams = &adap->params.devlog;
+	struct devlog_info *dinfo;
+	unsigned int index;
+	u32 fseqno;
+	int ret;
+
+	/* If we don't know where the log is we can't do anything.
+	 */
+	if (dparams->start == 0)
+		return -ENXIO;
+
+	/* Allocate the space to read in the firmware's device log and set up
+	 * for the iterated call to our display function.
+	 */
+	dinfo = __seq_open_private(file, &devlog_seq_ops,
+				   sizeof(*dinfo) + dparams->size);
+	if (!dinfo)
+		return -ENOMEM;
+
+	/* Record the basic log buffer information and read in the raw log.
+	 */
+	dinfo->nentries = (dparams->size / sizeof(struct fw_devlog_e));
+	dinfo->first = 0;
+	spin_lock(&adap->win0_lock);
+	ret = t4_memory_rw(adap, adap->params.drv_memwin, dparams->memtype,
+			   dparams->start, dparams->size, (__be32 *)dinfo->log,
+			   T4_MEMORY_READ);
+	spin_unlock(&adap->win0_lock);
+	if (ret) {
+		seq_release_private(inode, file);
+		return ret;
+	}
+
+	/* Translate log multi-byte integral elements into host native format
+	 * and determine where the first entry in the log is.
+	 */
+	for (fseqno = ~((u32)0), index = 0; index < dinfo->nentries; index++) {
+		struct fw_devlog_e *e = &dinfo->log[index];
+		int i;
+		__u32 seqno;
+
+		if (e->timestamp == 0)
+			continue;
+
+		e->timestamp = (__force __be64)be64_to_cpu(e->timestamp);
+		seqno = be32_to_cpu(e->seqno);
+		for (i = 0; i < 8; i++)
+			e->params[i] =
+				(__force __be32)be32_to_cpu(e->params[i]);
+
+		if (seqno < fseqno) {
+			fseqno = seqno;
+			dinfo->first = index;
+		}
+	}
+	return 0;
+}
+
+static const struct file_operations devlog_fops = {
+	.owner   = THIS_MODULE,
+	.open    = devlog_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = seq_release_private
+};
+
 static ssize_t mem_read(struct file *file, char __user *buf, size_t count,
 			loff_t *ppos)
 {
@@ -121,6 +318,7 @@ int t4_setup_debugfs(struct adapter *adap)
 	u32 size;
 
 	static struct t4_debugfs_entry t4_debugfs_files[] = {
+		{ "devlog", &devlog_fops, S_IRUSR, 0 },
 		{ "l2t", &t4_l2t_fops, S_IRUSR, 0},
 	};
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 04e675b..6de41cc 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5541,6 +5541,8 @@ static int adap_init0(struct adapter *adap)
 	enum dev_state state;
 	u32 params[7], val[7];
 	struct fw_caps_config_cmd caps_cmd;
+	struct fw_devlog_cmd devlog_cmd;
+	u32 devlog_meminfo;
 	int reset = 1;
 
 	/* Contact FW, advertising Master capability */
@@ -5621,6 +5623,30 @@ static int adap_init0(struct adapter *adap)
 	if (ret < 0)
 		goto bye;
 
+	/* Read firmware device log parameters.  We really need to find a way
+	 * to get these parameters initialized with some default values (which
+	 * are likely to be correct) for the case where we either don't
+	 * attache to the firmware or it's crashed when we probe the adapter.
+	 * That way we'll still be able to perform early firmware startup
+	 * debugging ...  If the request to get the Firmware's Device Log
+	 * parameters fails, we'll live so we don't make that a fatal error.
+	 */
+	memset(&devlog_cmd, 0, sizeof(devlog_cmd));
+	devlog_cmd.op_to_write = htonl(FW_CMD_OP_V(FW_DEVLOG_CMD) |
+				       FW_CMD_REQUEST_F | FW_CMD_READ_F);
+	devlog_cmd.retval_len16 = htonl(FW_LEN16(devlog_cmd));
+	ret = t4_wr_mbox(adap, adap->mbox, &devlog_cmd, sizeof(devlog_cmd),
+			 &devlog_cmd);
+	if (ret == 0) {
+		devlog_meminfo =
+			ntohl(devlog_cmd.memtype_devlog_memaddr16_devlog);
+		adap->params.devlog.memtype =
+			FW_DEVLOG_CMD_MEMTYPE_DEVLOG_G(devlog_meminfo);
+		adap->params.devlog.start =
+			FW_DEVLOG_CMD_MEMADDR16_DEVLOG_G(devlog_meminfo) << 4;
+		adap->params.devlog.size = ntohl(devlog_cmd.memsize_devlog);
+	}
+
 	/*
 	 * Find out what ports are available to us.  Note that we need to do
 	 * this before calling adap_init0_no_config() since it needs nports
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h
index c19a90e..5e5eee6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.h
@@ -110,6 +110,18 @@ enum {
 	SGE_INGPADBOUNDARY_SHIFT = 5,/* ingress queue pad boundary */
 };
 
+/* PCI-e memory window access */
+enum pcie_memwin {
+	MEMWIN_NIC      = 0,
+	MEMWIN_RSVD1    = 1,
+	MEMWIN_RSVD2    = 2,
+	MEMWIN_RDMA     = 3,
+	MEMWIN_RSVD4    = 4,
+	MEMWIN_FOISCSI  = 5,
+	MEMWIN_CSIOSTOR = 6,
+	MEMWIN_RSVD7    = 7,
+};
+
 struct sge_qstat {                /* data written to SGE queue status entries */
 	__be32 qid;
 	__be16 cidx;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index 7c0aec8..de82833 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -673,6 +673,7 @@ enum fw_cmd_opcodes {
 	FW_RSS_IND_TBL_CMD             = 0x20,
 	FW_RSS_GLB_CONFIG_CMD          = 0x22,
 	FW_RSS_VI_CONFIG_CMD           = 0x23,
+	FW_DEVLOG_CMD                  = 0x25,
 	FW_CLIP_CMD                    = 0x28,
 	FW_LASTC2E_CMD                 = 0x40,
 	FW_ERROR_CMD                   = 0x80,
@@ -3038,4 +3039,84 @@ enum fw_hdr_flags {
 	FW_HDR_FLAGS_RESET_HALT = 0x00000001,
 };
 
+/* length of the formatting string  */
+#define FW_DEVLOG_FMT_LEN	192
+
+/* maximum number of the formatting string parameters */
+#define FW_DEVLOG_FMT_PARAMS_NUM 8
+
+/* priority levels */
+enum fw_devlog_level {
+	FW_DEVLOG_LEVEL_EMERG	= 0x0,
+	FW_DEVLOG_LEVEL_CRIT	= 0x1,
+	FW_DEVLOG_LEVEL_ERR	= 0x2,
+	FW_DEVLOG_LEVEL_NOTICE	= 0x3,
+	FW_DEVLOG_LEVEL_INFO	= 0x4,
+	FW_DEVLOG_LEVEL_DEBUG	= 0x5,
+	FW_DEVLOG_LEVEL_MAX	= 0x5,
+};
+
+/* facilities that may send a log message */
+enum fw_devlog_facility {
+	FW_DEVLOG_FACILITY_CORE		= 0x00,
+	FW_DEVLOG_FACILITY_CF		= 0x01,
+	FW_DEVLOG_FACILITY_SCHED	= 0x02,
+	FW_DEVLOG_FACILITY_TIMER	= 0x04,
+	FW_DEVLOG_FACILITY_RES		= 0x06,
+	FW_DEVLOG_FACILITY_HW		= 0x08,
+	FW_DEVLOG_FACILITY_FLR		= 0x10,
+	FW_DEVLOG_FACILITY_DMAQ		= 0x12,
+	FW_DEVLOG_FACILITY_PHY		= 0x14,
+	FW_DEVLOG_FACILITY_MAC		= 0x16,
+	FW_DEVLOG_FACILITY_PORT		= 0x18,
+	FW_DEVLOG_FACILITY_VI		= 0x1A,
+	FW_DEVLOG_FACILITY_FILTER	= 0x1C,
+	FW_DEVLOG_FACILITY_ACL		= 0x1E,
+	FW_DEVLOG_FACILITY_TM		= 0x20,
+	FW_DEVLOG_FACILITY_QFC		= 0x22,
+	FW_DEVLOG_FACILITY_DCB		= 0x24,
+	FW_DEVLOG_FACILITY_ETH		= 0x26,
+	FW_DEVLOG_FACILITY_OFLD		= 0x28,
+	FW_DEVLOG_FACILITY_RI		= 0x2A,
+	FW_DEVLOG_FACILITY_ISCSI	= 0x2C,
+	FW_DEVLOG_FACILITY_FCOE		= 0x2E,
+	FW_DEVLOG_FACILITY_FOISCSI	= 0x30,
+	FW_DEVLOG_FACILITY_FOFCOE	= 0x32,
+	FW_DEVLOG_FACILITY_MAX		= 0x32,
+};
+
+/* log message format */
+struct fw_devlog_e {
+	__be64	timestamp;
+	__be32	seqno;
+	__be16	reserved1;
+	__u8	level;
+	__u8	facility;
+	__u8	fmt[FW_DEVLOG_FMT_LEN];
+	__be32	params[FW_DEVLOG_FMT_PARAMS_NUM];
+	__be32	reserved3[4];
+};
+
+struct fw_devlog_cmd {
+	__be32 op_to_write;
+	__be32 retval_len16;
+	__u8   level;
+	__u8   r2[7];
+	__be32 memtype_devlog_memaddr16_devlog;
+	__be32 memsize_devlog;
+	__be32 r3[2];
+};
+
+#define FW_DEVLOG_CMD_MEMTYPE_DEVLOG_S		28
+#define FW_DEVLOG_CMD_MEMTYPE_DEVLOG_M		0xf
+#define FW_DEVLOG_CMD_MEMTYPE_DEVLOG_G(x)	\
+	(((x) >> FW_DEVLOG_CMD_MEMTYPE_DEVLOG_S) & \
+	 FW_DEVLOG_CMD_MEMTYPE_DEVLOG_M)
+
+#define FW_DEVLOG_CMD_MEMADDR16_DEVLOG_S	0
+#define FW_DEVLOG_CMD_MEMADDR16_DEVLOG_M	0xfffffff
+#define FW_DEVLOG_CMD_MEMADDR16_DEVLOG_G(x)	\
+	(((x) >> FW_DEVLOG_CMD_MEMADDR16_DEVLOG_S) & \
+	 FW_DEVLOG_CMD_MEMADDR16_DEVLOG_M)
+
 #endif /* _T4FW_INTERFACE_H_ */
-- 
1.7.1

^ permalink raw reply related

* RE: [PATCH net-next v1 2/3] ARM: imx: add FEC sleep mode callback function
From: fugang.duan @ 2015-01-07  2:40 UTC (permalink / raw)
  To: Shawn Guo
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	bhutchings@solarflare.com, stephen@networkplumber.org
In-Reply-To: <20150107023307.GA4928@dragon>

From: Shawn Guo <shawn.guo@linaro.org> Sent: Wednesday, January 07, 2015 10:33 AM
> To: Duan Fugang-B38611
> Cc: davem@davemloft.net; netdev@vger.kernel.org;
> bhutchings@solarflare.com; stephen@networkplumber.org
> Subject: Re: [PATCH net-next v1 2/3] ARM: imx: add FEC sleep mode
> callback function
> 
> On Wed, Jan 07, 2015 at 01:41:33AM +0000, fugang.duan@freescale.com wrote:
> > From: Shawn Guo <shawn.guo@linaro.org> Sent: Tuesday, January 06, 2015
> > 7:48 PM
> > > To: Duan Fugang-B38611
> > > Cc: davem@davemloft.net; netdev@vger.kernel.org;
> > > bhutchings@solarflare.com; stephen@networkplumber.org
> > > Subject: Re: [PATCH net-next v1 2/3] ARM: imx: add FEC sleep mode
> > > callback function
> > >
> > > On Wed, Dec 24, 2014 at 05:30:40PM +0800, Fugang Duan wrote:
> > > > i.MX6q/dl, i.MX6SX SOCs enet support sleep mode that magic packet
> > > > can wake up system in suspend status. For different SOCs, there
> > > > have some SOC specifical GPR register to set sleep on/off mode. So
> > > > add these to callback function for driver.
> > > >
> > > > Signed-off-by: Fugang Duan <B38611@freescale.com>
> > >
> > > I do not like this patch.  In the end, this is just a GRP register
> > > bit setup per FEC driver need.  Rather than messing up platform code
> > > for each SoC with the same pattern, I do not see why this can not be
> > > done by FEC driver itself.
> > >
> > > You can take a look at LDB driver (drivers/gpu/drm/imx/imx-ldb.c) to
> > > see how this can be done.
> > >
> > > Shawn
> > >
> >
> > Hi, Shawn,
> >
> > It is SOC related setting, not fec IP itself setting, and different SOC
> GPR setting is not different.
> > So I think it better to put it to platform code.
> 
> The GPR difference between SoCs can be encoded in device tree as well.
> It's pointless to repeat the same code pattern for every single platform,
> that need to set up GPR bits for enabling magic packet wake up, while the
> only difference is the register and bit offset.
> 
> The platform code will become quite messy and unmaintainable if every
> device driver dump their GPR register setup code into platform.
> 
> Sorry, but it's NACK from me.
> 
> Shawn

Yes, the platform code will become quite messy and tremendous.
I agree with your thinking.

But David applied the patches, how can I do it now ?

Thanks,
Andy

^ permalink raw reply

* Re: [PATCH net-next v1 2/3] ARM: imx: add FEC sleep mode callback function
From: Shawn Guo @ 2015-01-07  2:33 UTC (permalink / raw)
  To: fugang.duan@freescale.com
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	bhutchings@solarflare.com, stephen@networkplumber.org
In-Reply-To: <BLUPR03MB373F1A5D01A8A6D16DC3364F5460@BLUPR03MB373.namprd03.prod.outlook.com>

On Wed, Jan 07, 2015 at 01:41:33AM +0000, fugang.duan@freescale.com wrote:
> From: Shawn Guo <shawn.guo@linaro.org> Sent: Tuesday, January 06, 2015 7:48 PM
> > To: Duan Fugang-B38611
> > Cc: davem@davemloft.net; netdev@vger.kernel.org;
> > bhutchings@solarflare.com; stephen@networkplumber.org
> > Subject: Re: [PATCH net-next v1 2/3] ARM: imx: add FEC sleep mode
> > callback function
> > 
> > On Wed, Dec 24, 2014 at 05:30:40PM +0800, Fugang Duan wrote:
> > > i.MX6q/dl, i.MX6SX SOCs enet support sleep mode that magic packet can
> > > wake up system in suspend status. For different SOCs, there have some
> > > SOC specifical GPR register to set sleep on/off mode. So add these to
> > > callback function for driver.
> > >
> > > Signed-off-by: Fugang Duan <B38611@freescale.com>
> > 
> > I do not like this patch.  In the end, this is just a GRP register bit
> > setup per FEC driver need.  Rather than messing up platform code for each
> > SoC with the same pattern, I do not see why this can not be done by FEC
> > driver itself.
> > 
> > You can take a look at LDB driver (drivers/gpu/drm/imx/imx-ldb.c) to see
> > how this can be done.
> > 
> > Shawn
> > 
> 
> Hi, Shawn,
> 
> It is SOC related setting, not fec IP itself setting, and different SOC GPR setting is not different.
> So I think it better to put it to platform code.

The GPR difference between SoCs can be encoded in device tree as well.
It's pointless to repeat the same code pattern for every single
platform, that need to set up GPR bits for enabling magic packet wake
up, while the only difference is the register and bit offset.

The platform code will become quite messy and unmaintainable if every
device driver dump their GPR register setup code into platform.

Sorry, but it's NACK from me.

Shawn

^ permalink raw reply

* Re: [PATCH 1/1] update ip-sysctl.txt documentation
From: Ani Sinha @ 2015-01-07  2:08 UTC (permalink / raw)
  To: David Miller
  Cc: corbet, Eric Dumazet, linux-doc, linux-kernel, Pádraig Brady,
	netdev@vger.kernel.org, fruggeri
In-Reply-To: <20150106.174039.783537681001060022.davem@davemloft.net>

On Tue, Jan 6, 2015 at 2:40 PM, David Miller <davem@davemloft.net> wrote:
> From: Ani Sinha <ani@arista.com>
> Date: Tue,  6 Jan 2015 14:33:45 -0800
>
>> @@ -64,8 +64,10 @@ fwmark_reflect - BOOLEAN
>>       Default: 0
>>
>>  route/max_size - INTEGER
>> -     Maximum number of routes allowed in the kernel.  Increase
>> -     this when using large numbers of interfaces and/or routes.
>> +        Post linux kernel 3.6, this is depricated for ipv4 as route cache is no
>> +        longer used. For ipv6, this is used to limit the maximum number of ipv6
>> +        routes allowed in the kernel.  Increase this when using large numbers of
>> +        interfaces and/or routes.
>
> Please do not change the TABs into sequenes of space characters.

Fixed in the latest patch.

^ permalink raw reply

* RE: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
From: Shrijeet Mukherjee @ 2015-01-07  2:08 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Scott Feldman
  Cc: Netdev, Jiří Pírko, john fastabend, Thomas Graf,
	Jamal Hadi Salim, Andy Gospodarek, Roopa Prabhu
In-Reply-To: <1420574353.15181.19.camel@stressinduktion.org>

>For the first idea, I'll try to make an example:
>
>Initial setup:
># ip rule ls
>0:	from all lookup local
>32766:	from all lookup main
>32767:	from all lookup default
>
># ip rule add pref 100 iif swdev0 table 5 # ip rule ls
>0:	from all lookup local
>100:	from all iif swdev0 [detached] lookup 5
>> maybe we can show which rules are being able to get offloaded here
>32766:	from all lookup main
>32767:	from all lookup default
>
>table 5 should be the table we can insert routes into which are offloaded
>to
>hardware.
>
>During table modifications we linearly scan the rules if we find selectors
>which
>cannot be represented by hardware.
>
>In case we have a iif selector, we simply can use this table and just
>synthesize it
>into the particular interface.
>
>A ip-rule-from would need all the hardware being capable of matching source
>addresses, otherwise we cannot offload all routing tables with higher
>preference,
>same for a to/tos rule. If we encounter a fwmark rule, we certainly cannot
>represent it in hardware, so skip it (here we can think about entangling
>those with
>ACLs, but it feels hard to do).
>
>If rules are inserted or changed we must again validate the complete list
>of rules
>and decide if we need to flush all the routes and install a slow path via
>kernel.
>
>What do you think? Does that make sense? I could try to come up with an API
>for
>that. ;)
>

This sounds really good, but I suspect the real problem is the case where
the rule evaluation is in the hardware path right. If it is purely IF based
there is no issue .. but any other policy like missed in table 1, then use
table 2 will not work with this model .. or did I miss something ?

^ permalink raw reply

* [PATCH 6/6] openvswitch: Support VXLAN Group Policy extension
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev
In-Reply-To: <cover.1420594925.git.tgraf@suug.ch>

Introduces support for the group policy extension to the VXLAN virtual
port. The extension is disabled by default and only enabled if the user
has provided the respective configuration.

  ovs-vsctl add-port br0 vxlan0 -- \
     set Interface vxlan0 type=vxlan options:exts=gbp

The configuration interface to enable the extension is based on a new
attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION
which can carry additional extensions as needed in the future.

The group policy metadata is handled in the same way as Geneve options
and transported as binary blob in a new Netlink attribute
OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS which is mutually exclusive to the
existing OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 include/uapi/linux/openvswitch.h | 19 ++++++++++
 net/openvswitch/flow_netlink.c   | 78 +++++++++++++++++++++++++--------------
 net/openvswitch/vport-vxlan.c    | 80 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 148 insertions(+), 29 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 3a6dcaa..676a89e 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -248,11 +248,29 @@ enum ovs_vport_attr {
 
 #define OVS_VPORT_ATTR_MAX (__OVS_VPORT_ATTR_MAX - 1)
 
+/**
+ * struct ovs_vxlan_opts - VXLAN tunnel options
+ * @gbp: Group policy bits
+ */
+struct ovs_vxlan_opts {
+	__u32 gbp;
+};
+
+enum {
+	OVS_VXLAN_EXT_UNSPEC,
+	OVS_VXLAN_EXT_GBP,
+	__OVS_VXLAN_EXT_MAX,
+};
+
+#define OVS_VXLAN_EXT_MAX (__OVS_VXLAN_EXT_MAX - 1)
+
+
 /* OVS_VPORT_ATTR_OPTIONS attributes for tunnels.
  */
 enum {
 	OVS_TUNNEL_ATTR_UNSPEC,
 	OVS_TUNNEL_ATTR_DST_PORT, /* 16-bit UDP port, used by L4 tunnels. */
+	OVS_TUNNEL_ATTR_EXTENSION,
 	__OVS_TUNNEL_ATTR_MAX
 };
 
@@ -324,6 +342,7 @@ enum ovs_tunnel_key_attr {
 	OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS,        /* Array of Geneve options. */
 	OVS_TUNNEL_KEY_ATTR_TP_SRC,		/* be16 src Transport Port. */
 	OVS_TUNNEL_KEY_ATTR_TP_DST,		/* be16 dst Transport Port. */
+	OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS,		/* struct ovs_vxlan_opts. */
 	__OVS_TUNNEL_KEY_ATTR_MAX
 };
 
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index c60ae3f..1528709 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -446,6 +446,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 	int rem;
 	bool ttl = false;
 	__be16 tun_flags = 0;
+	int opts_type = 0;
 
 	nla_for_each_nested(a, attr, rem) {
 		int type = nla_type(a);
@@ -463,6 +464,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 			[OVS_TUNNEL_KEY_ATTR_TP_DST] = sizeof(u16),
 			[OVS_TUNNEL_KEY_ATTR_OAM] = 0,
 			[OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS] = -1,
+			[OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS] = -1,
 		};
 
 		if (type > OVS_TUNNEL_KEY_ATTR_MAX) {
@@ -519,11 +521,18 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 			tun_flags |= TUNNEL_OAM;
 			break;
 		case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS:
+		case OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS:
+			if (opts_type) {
+				OVS_NLERR(log, "Multiple metadata blocks provided");
+				return -EINVAL;
+			}
+
 			err = tun_md_opt_from_nlattr(a, match, is_mask, log);
 			if (err)
 				return err;
 
 			tun_flags |= TUNNEL_OPTIONS_PRESENT;
+			opts_type = type;
 			break;
 		default:
 			OVS_NLERR(log, "Unknown IPv4 tunnel attribute %d",
@@ -552,7 +561,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 		}
 	}
 
-	return 0;
+	return opts_type;
 }
 
 static int __ipv4_tun_to_nlattr(struct sk_buff *skb,
@@ -1537,6 +1546,34 @@ void ovs_match_init(struct sw_flow_match *match,
 	}
 }
 
+static int validate_and_copy_geneve_opts(struct sw_flow_key *key)
+{
+	struct geneve_opt *option;
+	int opts_len = key->tun_opts_len;
+	bool crit_opt = false;
+
+	option = (struct geneve_opt *) TUN_METADATA_OPTS(key, key->tun_opts_len);
+	while (opts_len > 0) {
+		int len;
+
+		if (opts_len < sizeof(*option))
+			return -EINVAL;
+
+		len = sizeof(*option) + option->length * 4;
+		if (len > opts_len)
+			return -EINVAL;
+
+		crit_opt |= !!(option->type & GENEVE_CRIT_OPT_TYPE);
+
+		option = (struct geneve_opt *)((u8 *)option + len);
+		opts_len -= len;
+	};
+
+	key->tun_key.tun_flags |= crit_opt ? TUNNEL_CRIT_OPT : 0;
+
+	return 0;
+}
+
 static int validate_and_copy_set_tun(const struct nlattr *attr,
 				     struct sw_flow_actions **sfa, bool log)
 {
@@ -1544,36 +1581,23 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 	struct sw_flow_key key;
 	struct ovs_tunnel_info *tun_info;
 	struct nlattr *a;
-	int err, start;
+	int err, start, opts_type;
 
 	ovs_match_init(&match, &key, NULL);
-	err = ipv4_tun_from_nlattr(nla_data(attr), &match, false, log);
-	if (err)
-		return err;
+	opts_type = ipv4_tun_from_nlattr(nla_data(attr), &match, false, log);
+	if (opts_type < 0)
+		return opts_type;
 
 	if (key.tun_opts_len) {
-		struct geneve_opt *option;
-		int opts_len = key.tun_opts_len;
-		bool crit_opt = false;
-
-		option = (struct geneve_opt *) TUN_METADATA_OPTS(&key, key.tun_opts_len);
-		while (opts_len > 0) {
-			int len;
-
-			if (opts_len < sizeof(*option))
-				return -EINVAL;
-
-			len = sizeof(*option) + option->length * 4;
-			if (len > opts_len)
-				return -EINVAL;
-
-			crit_opt |= !!(option->type & GENEVE_CRIT_OPT_TYPE);
-
-			option = (struct geneve_opt *)((u8 *)option + len);
-			opts_len -= len;
-		};
-
-		key.tun_key.tun_flags |= crit_opt ? TUNNEL_CRIT_OPT : 0;
+		switch (opts_type) {
+		case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS:
+			err = validate_and_copy_geneve_opts(&key);
+			if (err < 0)
+				return err;
+			break;
+		case OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS:
+			break;
+		}
 	};
 
 	start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SET, log);
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 266c595..8ed7163 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -49,6 +49,7 @@
 struct vxlan_port {
 	struct vxlan_sock *vs;
 	char name[IFNAMSIZ];
+	u32 exts; /* VXLAN_EXT_* in <net/vxlan.h> */
 };
 
 static struct vport_ops ovs_vxlan_vport_ops;
@@ -63,16 +64,26 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
 		      struct vxlan_metadata *md)
 {
 	struct ovs_tunnel_info tun_info;
+	struct vxlan_port *vxlan_port;
 	struct vport *vport = vs->data;
 	struct iphdr *iph;
+	struct ovs_vxlan_opts opts = {
+		.gbp = md->gbp,
+	};
 	__be64 key;
+	__be16 flags;
+
+	flags = TUNNEL_KEY;
+	vxlan_port = vxlan_vport(vport);
+	if (vxlan_port->exts & VXLAN_EXT_GBP)
+		flags |= TUNNEL_OPTIONS_PRESENT;
 
 	/* Save outer tunnel values */
 	iph = ip_hdr(skb);
 	key = cpu_to_be64(ntohl(md->vni) >> 8);
 	ovs_flow_tun_info_init(&tun_info, iph,
 			       udp_hdr(skb)->source, udp_hdr(skb)->dest,
-			       key, TUNNEL_KEY, NULL, 0);
+			       key, flags, &opts, sizeof(opts));
 
 	ovs_vport_receive(vport, skb, &tun_info);
 }
@@ -84,6 +95,21 @@ static int vxlan_get_options(const struct vport *vport, struct sk_buff *skb)
 
 	if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, ntohs(dst_port)))
 		return -EMSGSIZE;
+
+	if (vxlan_port->exts) {
+		struct nlattr *exts;
+
+		exts = nla_nest_start(skb, OVS_TUNNEL_ATTR_EXTENSION);
+		if (!exts)
+			return -EMSGSIZE;
+
+		if (vxlan_port->exts & VXLAN_EXT_GBP &&
+		    nla_put_flag(skb, OVS_VXLAN_EXT_GBP))
+			return -EMSGSIZE;
+
+		nla_nest_end(skb, exts);
+	}
+
 	return 0;
 }
 
@@ -96,6 +122,31 @@ static void vxlan_tnl_destroy(struct vport *vport)
 	ovs_vport_deferred_free(vport);
 }
 
+static const struct nla_policy exts_policy[OVS_VXLAN_EXT_MAX+1] = {
+	[OVS_VXLAN_EXT_GBP]	= { .type = NLA_FLAG, },
+};
+
+static int vxlan_configure_exts(struct vport *vport, struct nlattr *attr)
+{
+	struct nlattr *exts[OVS_VXLAN_EXT_MAX+1];
+	struct vxlan_port *vxlan_port;
+	int err;
+
+	if (nla_len(attr) < sizeof(struct nlattr))
+		return -EINVAL;
+
+	err = nla_parse_nested(exts, OVS_VXLAN_EXT_MAX, attr, exts_policy);
+	if (err < 0)
+		return err;
+
+	vxlan_port = vxlan_vport(vport);
+
+	if (exts[OVS_VXLAN_EXT_GBP])
+		vxlan_port->exts |= VXLAN_EXT_GBP;
+
+	return 0;
+}
+
 static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
 {
 	struct net *net = ovs_dp_get_net(parms->dp);
@@ -128,7 +179,17 @@ static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
 	vxlan_port = vxlan_vport(vport);
 	strncpy(vxlan_port->name, parms->name, IFNAMSIZ);
 
-	vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true, 0, 0);
+	a = nla_find_nested(options, OVS_TUNNEL_ATTR_EXTENSION);
+	if (a) {
+		err = vxlan_configure_exts(vport, a);
+		if (err) {
+			ovs_vport_free(vport);
+			goto error;
+		}
+	}
+
+	vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true, 0,
+			    vxlan_port->exts);
 	if (IS_ERR(vs)) {
 		ovs_vport_free(vport);
 		return (void *)vs;
@@ -141,6 +202,20 @@ error:
 	return ERR_PTR(err);
 }
 
+static int vxlan_ext_gbp(struct sk_buff *skb)
+{
+	const struct ovs_tunnel_info *tun_info;
+	const struct ovs_vxlan_opts *opts;
+
+	tun_info = OVS_CB(skb)->egress_tun_info;
+	opts = tun_info->options;
+
+	if (tun_info->options_len >= sizeof(*opts))
+		return opts->gbp;
+	else
+		return 0;
+}
+
 static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 {
 	struct net *net = ovs_dp_get_net(vport->dp);
@@ -181,6 +256,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 
 	src_port = udp_flow_src_port(net, skb, 0, 0, true);
 	md.vni = htonl(be64_to_cpu(tun_key->tun_id) << 8);
+	md.gbp = vxlan_ext_gbp(skb);
 
 	err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
 			     fl.saddr, tun_key->ipv4_dst,
-- 
1.9.3

^ permalink raw reply related

* [PATCH 5/6] openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev
In-Reply-To: <cover.1420594925.git.tgraf@suug.ch>

A subsequent patch will introduce VXLAN options. Rename the existing
GENEVE_TUN_OPTS() to reflect its extended purpose of carrying generic
tunnel metadata options.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/openvswitch/flow.c         |  2 +-
 net/openvswitch/flow.h         | 14 +++++++-------
 net/openvswitch/flow_netlink.c | 37 +++++++++++++++++--------------------
 3 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 70bef2a..bfc74ac 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -690,7 +690,7 @@ int ovs_flow_key_extract(const struct ovs_tunnel_info *tun_info,
 			BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) *
 						   8)) - 1
 					> sizeof(key->tun_opts));
-			memcpy(GENEVE_OPTS(key, tun_info->options_len),
+			memcpy(TUN_METADATA_OPTS(key, tun_info->options_len),
 			       tun_info->options, tun_info->options_len);
 			key->tun_opts_len = tun_info->options_len;
 		} else {
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index a8b30f3..d3d0a40 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -53,7 +53,7 @@ struct ovs_key_ipv4_tunnel {
 
 struct ovs_tunnel_info {
 	struct ovs_key_ipv4_tunnel tunnel;
-	const struct geneve_opt *options;
+	const void *options;
 	u8 options_len;
 };
 
@@ -61,10 +61,10 @@ struct ovs_tunnel_info {
  * maximum size. This allows us to get the benefits of variable length
  * matching for small options.
  */
-#define GENEVE_OPTS(flow_key, opt_len)	\
-	((struct geneve_opt *)((flow_key)->tun_opts + \
-			       FIELD_SIZEOF(struct sw_flow_key, tun_opts) - \
-			       opt_len))
+#define TUN_METADATA_OFFSET(opt_len) \
+	(FIELD_SIZEOF(struct sw_flow_key, tun_opts) - opt_len)
+#define TUN_METADATA_OPTS(flow_key, opt_len) \
+	((void *)((flow_key)->tun_opts + TUN_METADATA_OFFSET(opt_len)))
 
 static inline void __ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
 					    __be32 saddr, __be32 daddr,
@@ -73,7 +73,7 @@ static inline void __ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
 					    __be16 tp_dst,
 					    __be64 tun_id,
 					    __be16 tun_flags,
-					    const struct geneve_opt *opts,
+					    const void *opts,
 					    u8 opts_len)
 {
 	tun_info->tunnel.tun_id = tun_id;
@@ -105,7 +105,7 @@ static inline void ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
 					  __be16 tp_dst,
 					  __be64 tun_id,
 					  __be16 tun_flags,
-					  const struct geneve_opt *opts,
+					  const void *opts,
 					  u8 opts_len)
 {
 	__ovs_flow_tun_info_init(tun_info, iph->saddr, iph->daddr,
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d1eecf7..c60ae3f 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -387,20 +387,20 @@ static int parse_flow_nlattrs(const struct nlattr *attr,
 	return __parse_flow_nlattrs(attr, a, attrsp, log, false);
 }
 
-static int genev_tun_opt_from_nlattr(const struct nlattr *a,
-				     struct sw_flow_match *match, bool is_mask,
-				     bool log)
+static int tun_md_opt_from_nlattr(const struct nlattr *a,
+				  struct sw_flow_match *match, bool is_mask,
+				  bool log)
 {
 	unsigned long opt_key_offset;
 
 	if (nla_len(a) > sizeof(match->key->tun_opts)) {
-		OVS_NLERR(log, "Geneve option length err (len %d, max %zu).",
+		OVS_NLERR(log, "Tunnel metadata option length err (len %d, max %zu).",
 			  nla_len(a), sizeof(match->key->tun_opts));
 		return -EINVAL;
 	}
 
 	if (nla_len(a) % 4 != 0) {
-		OVS_NLERR(log, "Geneve opt len %d is not a multiple of 4.",
+		OVS_NLERR(log, "Tunnel metadata opt len %d is not a multiple of 4.",
 			  nla_len(a));
 		return -EINVAL;
 	}
@@ -424,7 +424,7 @@ static int genev_tun_opt_from_nlattr(const struct nlattr *a,
 		 * information later.
 		 */
 		if (match->key->tun_opts_len != nla_len(a)) {
-			OVS_NLERR(log, "Geneve option len %d != mask len %d",
+			OVS_NLERR(log, "Tunnel metadata option len %d != mask len %d",
 				  match->key->tun_opts_len, nla_len(a));
 			return -EINVAL;
 		}
@@ -432,8 +432,7 @@ static int genev_tun_opt_from_nlattr(const struct nlattr *a,
 		SW_FLOW_KEY_PUT(match, tun_opts_len, 0xff, true);
 	}
 
-	opt_key_offset = (unsigned long)GENEVE_OPTS((struct sw_flow_key *)0,
-						    nla_len(a));
+	opt_key_offset = TUN_METADATA_OFFSET(nla_len(a));
 	SW_FLOW_KEY_MEMCPY_OFFSET(match, opt_key_offset, nla_data(a),
 				  nla_len(a), is_mask);
 	return 0;
@@ -520,7 +519,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 			tun_flags |= TUNNEL_OAM;
 			break;
 		case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS:
-			err = genev_tun_opt_from_nlattr(a, match, is_mask, log);
+			err = tun_md_opt_from_nlattr(a, match, is_mask, log);
 			if (err)
 				return err;
 
@@ -558,8 +557,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
 
 static int __ipv4_tun_to_nlattr(struct sk_buff *skb,
 				const struct ovs_key_ipv4_tunnel *output,
-				const struct geneve_opt *tun_opts,
-				int swkey_tun_opts_len)
+				const void *tun_opts, int swkey_tun_opts_len)
 {
 	if (output->tun_flags & TUNNEL_KEY &&
 	    nla_put_be64(skb, OVS_TUNNEL_KEY_ATTR_ID, output->tun_id))
@@ -600,8 +598,7 @@ static int __ipv4_tun_to_nlattr(struct sk_buff *skb,
 
 static int ipv4_tun_to_nlattr(struct sk_buff *skb,
 			      const struct ovs_key_ipv4_tunnel *output,
-			      const struct geneve_opt *tun_opts,
-			      int swkey_tun_opts_len)
+			      const void *tun_opts, int swkey_tun_opts_len)
 {
 	struct nlattr *nla;
 	int err;
@@ -1148,10 +1145,10 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
 		goto nla_put_failure;
 
 	if ((swkey->tun_key.ipv4_dst || is_mask)) {
-		const struct geneve_opt *opts = NULL;
+		const void *opts = NULL;
 
 		if (output->tun_key.tun_flags & TUNNEL_OPTIONS_PRESENT)
-			opts = GENEVE_OPTS(output, swkey->tun_opts_len);
+			opts = TUN_METADATA_OPTS(output, swkey->tun_opts_len);
 
 		if (ipv4_tun_to_nlattr(skb, &output->tun_key, opts,
 				       swkey->tun_opts_len))
@@ -1555,11 +1552,11 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 		return err;
 
 	if (key.tun_opts_len) {
-		struct geneve_opt *option = GENEVE_OPTS(&key,
-							key.tun_opts_len);
+		struct geneve_opt *option;
 		int opts_len = key.tun_opts_len;
 		bool crit_opt = false;
 
+		option = (struct geneve_opt *) TUN_METADATA_OPTS(&key, key.tun_opts_len);
 		while (opts_len > 0) {
 			int len;
 
@@ -1597,9 +1594,9 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
 		 * everything else will go away after flow setup. We can append
 		 * it to tun_info and then point there.
 		 */
-		memcpy((tun_info + 1), GENEVE_OPTS(&key, key.tun_opts_len),
-		       key.tun_opts_len);
-		tun_info->options = (struct geneve_opt *)(tun_info + 1);
+		memcpy((tun_info + 1), TUN_METADATA_OPTS(&key,
+		       key.tun_opts_len), key.tun_opts_len);
+		tun_info->options = (tun_info + 1);
 	} else {
 		tun_info->options = NULL;
 	}
-- 
1.9.3

^ permalink raw reply related

* [PATCH 4/6] vxlan: Fail build if VXLAN header is misdefined
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev
In-Reply-To: <cover.1420594925.git.tgraf@suug.ch>

Due to the complexity of struct vxlanhdr, protect against unwanted
and undesired changes by failing the build if the size of the struct
changes.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 drivers/net/vxlan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 2b75c62..293d524 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2842,6 +2842,8 @@ static int __init vxlan_init_module(void)
 {
 	int rc;
 
+	BUILD_BUG_ON(sizeof(struct vxlanhdr) != 8);
+
 	vxlan_wq = alloc_workqueue("vxlan", 0, 0);
 	if (!vxlan_wq)
 		return -ENOMEM;
-- 
1.9.3

^ permalink raw reply related

* [PATCH 3/6] vxlan: Only bind to sockets with correct extensions enabled
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev
In-Reply-To: <cover.1420594925.git.tgraf@suug.ch>

A VXLAN net_device looking for an appropriate socket may only
consider a socket which has the exact set of extensions enabled.
If none can be found, a new socket must be created.

The OVS VXLAN port is kept unaware of extensions at this point.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 drivers/net/vxlan.c           | 35 +++++++++++++++++++++--------------
 include/net/vxlan.h           |  2 +-
 net/openvswitch/vport-vxlan.c |  2 +-
 3 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 30b7b59..2b75c62 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -271,14 +271,15 @@ static inline struct vxlan_rdst *first_remote_rtnl(struct vxlan_fdb *fdb)
 }
 
 /* Find VXLAN socket based on network namespace, address family and UDP port */
-static struct vxlan_sock *vxlan_find_sock(struct net *net,
-					  sa_family_t family, __be16 port)
+static struct vxlan_sock *vxlan_find_sock(struct net *net, sa_family_t family,
+					  __be16 port, u32 exts)
 {
 	struct vxlan_sock *vs;
 
 	hlist_for_each_entry_rcu(vs, vs_head(net, port), hlist) {
 		if (inet_sk(vs->sock->sk)->inet_sport == port &&
-		    inet_sk(vs->sock->sk)->sk.sk_family == family)
+		    inet_sk(vs->sock->sk)->sk.sk_family == family &&
+		    vs->exts == exts)
 			return vs;
 	}
 	return NULL;
@@ -298,11 +299,12 @@ static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs, u32 id)
 
 /* Look up VNI in a per net namespace table */
 static struct vxlan_dev *vxlan_find_vni(struct net *net, u32 id,
-					sa_family_t family, __be16 port)
+					sa_family_t family, __be16 port,
+					u32 exts)
 {
 	struct vxlan_sock *vs;
 
-	vs = vxlan_find_sock(net, family, port);
+	vs = vxlan_find_sock(net, family, port, exts);
 	if (!vs)
 		return NULL;
 
@@ -1770,7 +1772,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 			ip_rt_put(rt);
 			dst_vxlan = vxlan_find_vni(vxlan->net, vni,
-						   dst->sa.sa_family, dst_port);
+						   dst->sa.sa_family, dst_port,
+						   vxlan->exts);
 			if (!dst_vxlan)
 				goto tx_error;
 			vxlan_encap_bypass(skb, vxlan, dst_vxlan);
@@ -1829,7 +1832,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 			dst_release(ndst);
 			dst_vxlan = vxlan_find_vni(vxlan->net, vni,
-						   dst->sa.sa_family, dst_port);
+						   dst->sa.sa_family, dst_port,
+						   vxlan->exts);
 			if (!dst_vxlan)
 				goto tx_error;
 			vxlan_encap_bypass(skb, vxlan, dst_vxlan);
@@ -1999,7 +2003,7 @@ static int vxlan_init(struct net_device *dev)
 
 	spin_lock(&vn->sock_lock);
 	vs = vxlan_find_sock(vxlan->net, ipv6 ? AF_INET6 : AF_INET,
-			     vxlan->dst_port);
+			     vxlan->dst_port, vxlan->exts);
 	if (vs && atomic_add_unless(&vs->refcnt, 1, 0)) {
 		/* If we have a socket with same port already, reuse it */
 		vxlan_vs_add_dev(vs, vxlan);
@@ -2353,7 +2357,7 @@ static struct socket *vxlan_create_sock(struct net *net, bool ipv6,
 /* Create new listen socket if needed */
 static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 					      vxlan_rcv_t *rcv, void *data,
-					      u32 flags)
+					      u32 flags, u32 exts)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
 	struct vxlan_sock *vs;
@@ -2381,6 +2385,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 	atomic_set(&vs->refcnt, 1);
 	vs->rcv = rcv;
 	vs->data = data;
+	vs->exts = exts;
 
 	/* Initialize the vxlan udp offloads structure */
 	vs->udp_offloads.port = port;
@@ -2405,13 +2410,14 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 
 struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
 				  vxlan_rcv_t *rcv, void *data,
-				  bool no_share, u32 flags)
+				  bool no_share, u32 flags,
+				  u32 exts)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
 	struct vxlan_sock *vs;
 	bool ipv6 = flags & VXLAN_F_IPV6;
 
-	vs = vxlan_socket_create(net, port, rcv, data, flags);
+	vs = vxlan_socket_create(net, port, rcv, data, flags, exts);
 	if (!IS_ERR(vs))
 		return vs;
 
@@ -2419,7 +2425,7 @@ struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
 		return vs;
 
 	spin_lock(&vn->sock_lock);
-	vs = vxlan_find_sock(net, ipv6 ? AF_INET6 : AF_INET, port);
+	vs = vxlan_find_sock(net, ipv6 ? AF_INET6 : AF_INET, port, exts);
 	if (vs && ((vs->rcv != rcv) ||
 		   !atomic_add_unless(&vs->refcnt, 1, 0)))
 			vs = ERR_PTR(-EBUSY);
@@ -2441,7 +2447,8 @@ static void vxlan_sock_work(struct work_struct *work)
 	__be16 port = vxlan->dst_port;
 	struct vxlan_sock *nvs;
 
-	nvs = vxlan_sock_add(net, port, vxlan_rcv, NULL, false, vxlan->flags);
+	nvs = vxlan_sock_add(net, port, vxlan_rcv, NULL, false, vxlan->flags,
+			     vxlan->exts);
 	spin_lock(&vn->sock_lock);
 	if (!IS_ERR(nvs))
 		vxlan_vs_add_dev(nvs, vxlan);
@@ -2591,7 +2598,7 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 		configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
 
 	if (vxlan_find_vni(net, vni, use_ipv6 ? AF_INET6 : AF_INET,
-			   vxlan->dst_port)) {
+			   vxlan->dst_port, vxlan->exts)) {
 		pr_info("duplicate VNI %u\n", vni);
 		return -EEXIST;
 	}
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 66000d0..da257a7 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -136,7 +136,7 @@ struct vxlan_sock {
 
 struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
 				  vxlan_rcv_t *rcv, void *data,
-				  bool no_share, u32 flags);
+				  bool no_share, u32 flags, u32 exts);
 
 void vxlan_sock_release(struct vxlan_sock *vs);
 
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index dd68c97..266c595 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -128,7 +128,7 @@ static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
 	vxlan_port = vxlan_vport(vport);
 	strncpy(vxlan_port->name, parms->name, IFNAMSIZ);
 
-	vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true, 0);
+	vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true, 0, 0);
 	if (IS_ERR(vs)) {
 		ovs_vport_free(vport);
 		return (void *)vs;
-- 
1.9.3

^ permalink raw reply related

* [PATCH 2/6] vxlan: Group Policy extension
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev
In-Reply-To: <cover.1420594925.git.tgraf@suug.ch>

Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.

The group membership is defined by the lower 16 bits of skb->mark, the
upper 16 bits are used for flags.

SELinux allows to manage label to secure local resources. However,
distributed applications require ACLs to implemented across hosts. This
is typically achieved by matching on L2-L4 fields to identify the
original sending host and process on the receiver. On top of that,
netlabel and specifically CIPSO [1] allow to map security contexts to
universal labels.  However, netlabel and CIPSO are relatively complex.
This patch provides a lightweight alternative for overlay network
environments with a trusted underlay. No additional control protocol
is required.

           Host 1:                       Host 2:

      Group A        Group B        Group B     Group A
      +-----+   +-------------+    +-------+   +-----+
      | lxc |   | SELinux CTX |    | httpd |   | VM  |
      +--+--+   +--+----------+    +---+---+   +--+--+
	  \---+---/                     \----+---/
	      |                              |
	  +---+---+                      +---+---+
	  | vxlan |                      | vxlan |
	  +---+---+                      +---+---+
	      +------------------------------+

Backwards compatibility:
A VXLAN-GBP socket can receive standard VXLAN frames and will assign
the default group 0x0000 to such frames. A Linux VXLAN socket will
drop VXLAN-GBP  frames. The extension is therefore disabled by default
and needs to be specifically enabled:

   ip link add [...] type vxlan [...] gbp

In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
must run on a separate port number.

Examples:
  iptables:
  $ iptables -I OUTPUT -p icmp -j MARK --set-mark 0x200
  $ iptables -I INPUT -i br0 -m mark --mark 0x200 -j ACCEPT

  OVS (patches provided separately):
  in_port=1, actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL

[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] http://lwn.net/Articles/204905/

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 drivers/net/vxlan.c           | 155 ++++++++++++++++++++++++++++++------------
 include/net/vxlan.h           |  80 ++++++++++++++++++++--
 include/uapi/linux/if_link.h  |   8 +++
 net/openvswitch/vport-vxlan.c |   9 ++-
 4 files changed, 197 insertions(+), 55 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 4d52aa9..30b7b59 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -132,6 +132,7 @@ struct vxlan_dev {
 	__u8		  tos;		/* TOS override */
 	__u8		  ttl;
 	u32		  flags;	/* VXLAN_F_* in vxlan.h */
+	u32		  exts;		/* Enabled extensions */
 
 	struct work_struct sock_work;
 	struct work_struct igmp_join;
@@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
 			continue;
 
 		vh2 = (struct vxlanhdr *)(p->data + off_vx);
-		if (vh->vx_vni != vh2->vx_vni) {
+		if (vh->vx_flags != vh2->vx_flags ||
+		    vh->vx_vni != vh2->vx_vni) {
 			NAPI_GRO_CB(p)->same_flow = 0;
 			continue;
 		}
@@ -1095,6 +1097,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
 	struct vxlan_sock *vs;
 	struct vxlanhdr *vxh;
+	struct vxlan_metadata md = {0};
 
 	/* Need Vxlan and inner Ethernet header to be present */
 	if (!pskb_may_pull(skb, VXLAN_HLEN))
@@ -1113,6 +1116,19 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (vs->exts) {
 		if (!vxh->vni_present)
 			goto error_invalid_header;
+
+		if (vxh->gbp_present) {
+			if (!(vs->exts & VXLAN_EXT_GBP))
+				goto error_invalid_header;
+
+			md.gbp = ntohs(vxh->gbp.policy_id);
+
+			if (vxh->gbp.dont_learn)
+				md.gbp |= VXLAN_GBP_DONT_LEARN;
+
+			if (vxh->gbp.policy_applied)
+				md.gbp |= VXLAN_GBP_POLICY_APPLIED;
+		}
 	} else {
 		if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
 		    (vxh->vx_vni & htonl(0xff)))
@@ -1122,7 +1138,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
 		goto drop;
 
-	vs->rcv(vs, skb, vxh->vx_vni);
+	md.vni = vxh->vx_vni;
+	vs->rcv(vs, skb, &md);
 	return 0;
 
 drop:
@@ -1138,8 +1155,8 @@ error:
 	return 1;
 }
 
-static void vxlan_rcv(struct vxlan_sock *vs,
-		      struct sk_buff *skb, __be32 vx_vni)
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+		      struct vxlan_metadata *md)
 {
 	struct iphdr *oip = NULL;
 	struct ipv6hdr *oip6 = NULL;
@@ -1150,7 +1167,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 	int err = 0;
 	union vxlan_addr *remote_ip;
 
-	vni = ntohl(vx_vni) >> 8;
+	vni = ntohl(md->vni) >> 8;
 	/* Is this VNI defined? */
 	vxlan = vxlan_vs_find_vni(vs, vni);
 	if (!vxlan)
@@ -1184,6 +1201,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 		goto drop;
 
 	skb_reset_network_header(skb);
+	skb->mark = md->gbp;
 
 	if (oip6)
 		err = IP6_ECN_decapsulate(oip6, skb);
@@ -1533,15 +1551,54 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 	return false;
 }
 
+static int vxlan_build_hdr(struct sk_buff *skb, struct vxlan_sock *vs,
+			   int min_headroom, struct vxlan_metadata *md)
+{
+	struct vxlanhdr *vxh;
+	int err;
+
+	/* Need space for new headers (invalidates iph ptr) */
+	err = skb_cow_head(skb, min_headroom);
+	if (unlikely(err)) {
+		kfree_skb(skb);
+		return err;
+	}
+
+	skb = vlan_hwaccel_push_inside(skb);
+	if (WARN_ON(!skb))
+		return -ENOMEM;
+
+	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
+	vxh->vx_flags = htonl(VXLAN_FLAGS);
+	vxh->vx_vni = md->vni;
+
+	if (vs->exts)  {
+		if (vs->exts & VXLAN_EXT_GBP) {
+			vxh->gbp_present = 1;
+
+			if (md->gbp & VXLAN_GBP_DONT_LEARN)
+				vxh->gbp.dont_learn = 1;
+
+			if (md->gbp & VXLAN_GBP_POLICY_APPLIED)
+				vxh->gbp.policy_applied = 1;
+
+			vxh->gbp.policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
+		}
+	}
+
+	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
+
+	return 0;
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 			   struct dst_entry *dst, struct sk_buff *skb,
 			   struct net_device *dev, struct in6_addr *saddr,
 			   struct in6_addr *daddr, __u8 prio, __u8 ttl,
-			   __be16 src_port, __be16 dst_port, __be32 vni,
-			   bool xnet)
+			   __be16 src_port, __be16 dst_port,
+			   struct vxlan_metadata *md, bool xnet)
 {
-	struct vxlanhdr *vxh;
 	int min_headroom;
 	int err;
 	bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
@@ -1558,24 +1615,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 			+ VXLAN_HLEN + sizeof(struct ipv6hdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
-	/* Need space for new headers (invalidates iph ptr) */
-	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
-		goto err;
-	}
-
-	skb = vlan_hwaccel_push_inside(skb);
-	if (WARN_ON(!skb)) {
-		err = -ENOMEM;
+	err = vxlan_build_hdr(skb, vs, min_headroom, md);
+	if (err)
 		goto err;
-	}
-
-	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = htonl(VXLAN_FLAGS);
-	vxh->vx_vni = vni;
-
-	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
 			     ttl, src_port, dst_port);
@@ -1589,9 +1631,9 @@ err:
 int vxlan_xmit_skb(struct vxlan_sock *vs,
 		   struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-		   __be16 src_port, __be16 dst_port, __be32 vni, bool xnet)
+		   __be16 src_port, __be16 dst_port,
+		   struct vxlan_metadata *md, bool xnet)
 {
-	struct vxlanhdr *vxh;
 	int min_headroom;
 	int err;
 	bool udp_sum = !vs->sock->sk->sk_no_check_tx;
@@ -1604,22 +1646,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 			+ VXLAN_HLEN + sizeof(struct iphdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
-	/* Need space for new headers (invalidates iph ptr) */
-	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
+	err = vxlan_build_hdr(skb, vs, min_headroom, md);
+	if (err)
 		return err;
-	}
-
-	skb = vlan_hwaccel_push_inside(skb);
-	if (WARN_ON(!skb))
-		return -ENOMEM;
-
-	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = htonl(VXLAN_FLAGS);
-	vxh->vx_vni = vni;
-
-	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
 				   ttl, df, src_port, dst_port, xnet);
@@ -1679,6 +1708,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	const struct iphdr *old_iph;
 	struct flowi4 fl4;
 	union vxlan_addr *dst;
+	struct vxlan_metadata md;
 	__be16 src_port = 0, dst_port;
 	u32 vni;
 	__be16 df = 0;
@@ -1749,11 +1779,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 		tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
 		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
+		md.vni = htonl(vni << 8);
+		md.gbp = skb->mark;
 
 		err = vxlan_xmit_skb(vxlan->vn_sock, rt, skb,
 				     fl4.saddr, dst->sin.sin_addr.s_addr,
-				     tos, ttl, df, src_port, dst_port,
-				     htonl(vni << 8),
+				     tos, ttl, df, src_port, dst_port, &md,
 				     !net_eq(vxlan->net, dev_net(vxlan->dev)));
 		if (err < 0) {
 			/* skb is already freed. */
@@ -1806,10 +1837,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		ttl = ttl ? : ip6_dst_hoplimit(ndst);
+		md.vni = htonl(vni << 8);
+		md.gbp = skb->mark;
 
 		err = vxlan6_xmit_skb(vxlan->vn_sock, ndst, skb,
 				      dev, &fl6.saddr, &fl6.daddr, 0, ttl,
-				      src_port, dst_port, htonl(vni << 8),
+				      src_port, dst_port, &md,
 				      !net_eq(vxlan->net, dev_net(vxlan->dev)));
 #endif
 	}
@@ -2210,6 +2243,11 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_UDP_CSUM]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]	= { .type = NLA_U8 },
+	[IFLA_VXLAN_EXTENSION]	= { .type = NLA_NESTED },
+};
+
+static const struct nla_policy vxlan_ext_policy[IFLA_VXLAN_EXT_MAX + 1] = {
+	[IFLA_VXLAN_EXT_GBP]	= { .type = NLA_FLAG, },
 };
 
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -2246,6 +2284,18 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
 		}
 	}
 
+	if (data[IFLA_VXLAN_EXTENSION]) {
+		int err;
+
+		err = nla_validate_nested(data[IFLA_VXLAN_EXTENSION],
+					  IFLA_VXLAN_EXT_MAX, vxlan_ext_policy);
+		if (err < 0) {
+			pr_debug("invalid VXLAN extension configuration: %d\n",
+				 err);
+			return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
@@ -2400,6 +2450,18 @@ static void vxlan_sock_work(struct work_struct *work)
 	dev_put(vxlan->dev);
 }
 
+static void configure_vxlan_exts(struct vxlan_dev *vxlan, struct nlattr *attr)
+{
+	struct nlattr *exts[IFLA_VXLAN_EXT_MAX+1];
+
+	/* Validated in vxlan_validate() */
+	if (nla_parse_nested(exts, IFLA_VXLAN_EXT_MAX, attr, NULL) < 0)
+		BUG();
+
+	if (exts[IFLA_VXLAN_EXT_GBP])
+		vxlan->exts |= VXLAN_EXT_GBP;
+}
+
 static int vxlan_newlink(struct net *net, struct net_device *dev,
 			 struct nlattr *tb[], struct nlattr *data[])
 {
@@ -2525,6 +2587,9 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 	    nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
 		vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
 
+	if (data[IFLA_VXLAN_EXTENSION])
+		configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
+
 	if (vxlan_find_vni(net, vni, use_ipv6 ? AF_INET6 : AF_INET,
 			   vxlan->dst_port)) {
 		pr_info("duplicate VNI %u\n", vni);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 3e98d31..66000d0 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -11,13 +11,60 @@
 #define VNI_HASH_BITS	10
 #define VNI_HASH_SIZE	(1<<VNI_HASH_BITS)
 
+/*
+ * VXLAN Group Based Policy Extension:
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |1|-|-|-|1|-|-|-|R|D|R|R|A|R|R|R|        Group Policy ID        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                VXLAN Network Identifier (VNI) |   Reserved    |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * D = Don't Learn bit. When set, this bit indicates that the egress
+ *     VTEP MUST NOT learn the source address of the encapsulated frame.
+ *
+ * A = Indicates that the group policy has already been applied to
+ *     this packet. Policies MUST NOT be applied by devices when the
+ *     A bit is set.
+ *
+ * [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
+ */
+struct vxlan_gbp {
+#ifdef __LITTLE_ENDIAN_BITFIELD
+	__u8	reserved_flags1:3,
+		policy_applied:1,
+		reserved_flags2:2,
+		dont_learn:1,
+		reserved_flags3:1;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+	__u8	reserved_flags1:1,
+		dont_learn:1,
+		reserved_flags2:2,
+		policy_applied:1,
+		reserved_flags3:3;
+#else
+#error	"Please fix <asm/byteorder.h>"
+#endif
+	__be16 policy_id;
+} __packed;
+
+/* skb->mark mapping
+ *
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |R|R|R|R|R|R|R|R|R|D|R|R|A|R|R|R|        Group Policy ID        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+#define VXLAN_GBP_DONT_LEARN		(BIT(6) << 16)
+#define VXLAN_GBP_POLICY_APPLIED	(BIT(3) << 16)
+#define VXLAN_GBP_ID_MASK		(0xFFFF)
+
 /* VXLAN protocol header:
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- * |R|R|R|R|I|R|R|R|               Reserved                        |
+ * |G|R|R|R|I|R|R|R|               Reserved                        |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * |                VXLAN Network Identifier (VNI) |   Reserved    |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  *
+ * G = 1	Group Policy (VXLAN-GBP)
  * I = 1	VXLAN Network Identifier (VNI) present
  */
 struct vxlanhdr {
@@ -26,24 +73,42 @@ struct vxlanhdr {
 #ifdef __LITTLE_ENDIAN_BITFIELD
 			__u8	reserved_flags1:3,
 				vni_present:1,
-				reserved_flags2:4;
+				reserved_flags2:3,
+				gbp_present:1;
 #elif defined(__BIG_ENDIAN_BITFIELD)
-			__u8	reserved_flags2:4,
+			__u8	gbp_present:1,
+				reserved_flags2:3,
 				vni_present:1,
 				reserved_flags1:3;
 #else
 #error	"Please fix <asm/byteorder.h>"
 #endif
-			__u8	vx_reserved1;
-			__be16	vx_reserved2;
+			union {
+				/* NOTE: Offset 0 will be 1 byte aligned, so
+				 * all member structs must be marked packed.
+				 */
+				struct vxlan_gbp gbp;
+				struct {
+					__u8	vx_reserved1;
+					__be16	vx_reserved2;
+				} __packed;
+			};
 		};
 		__be32 vx_flags;
 	};
 	__be32	vx_vni;
 };
 
+struct vxlan_metadata {
+	__be32		vni;
+	u32		gbp;
+};
+
 struct vxlan_sock;
-typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb, __be32 key);
+typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb,
+			   struct vxlan_metadata *md);
+
+#define VXLAN_EXT_GBP			BIT(0)
 
 /* per UDP socket information */
 struct vxlan_sock {
@@ -78,7 +143,8 @@ void vxlan_sock_release(struct vxlan_sock *vs);
 int vxlan_xmit_skb(struct vxlan_sock *vs,
 		   struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-		   __be16 src_port, __be16 dst_port, __be32 vni, bool xnet);
+		   __be16 src_port, __be16 dst_port, struct vxlan_metadata *md,
+		   bool xnet);
 
 static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
 						     netdev_features_t features)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index f7d0d2d..9f07bf5 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -370,10 +370,18 @@ enum {
 	IFLA_VXLAN_UDP_CSUM,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
+	IFLA_VXLAN_EXTENSION,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
 
+enum {
+	IFLA_VXLAN_EXT_UNSPEC,
+	IFLA_VXLAN_EXT_GBP,
+	__IFLA_VXLAN_EXT_MAX,
+};
+#define IFLA_VXLAN_EXT_MAX (__IFLA_VXLAN_EXT_MAX - 1)
+
 struct ifla_vxlan_port_range {
 	__be16	low;
 	__be16	high;
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index d7c46b3..dd68c97 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -59,7 +59,8 @@ static inline struct vxlan_port *vxlan_vport(const struct vport *vport)
 }
 
 /* Called with rcu_read_lock and BH disabled. */
-static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+		      struct vxlan_metadata *md)
 {
 	struct ovs_tunnel_info tun_info;
 	struct vport *vport = vs->data;
@@ -68,7 +69,7 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
 
 	/* Save outer tunnel values */
 	iph = ip_hdr(skb);
-	key = cpu_to_be64(ntohl(vx_vni) >> 8);
+	key = cpu_to_be64(ntohl(md->vni) >> 8);
 	ovs_flow_tun_info_init(&tun_info, iph,
 			       udp_hdr(skb)->source, udp_hdr(skb)->dest,
 			       key, TUNNEL_KEY, NULL, 0);
@@ -146,6 +147,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 	struct vxlan_port *vxlan_port = vxlan_vport(vport);
 	__be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
 	struct ovs_key_ipv4_tunnel *tun_key;
+	struct vxlan_metadata md;
 	struct rtable *rt;
 	struct flowi4 fl;
 	__be16 src_port;
@@ -178,12 +180,13 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 	skb->ignore_df = 1;
 
 	src_port = udp_flow_src_port(net, skb, 0, 0, true);
+	md.vni = htonl(be64_to_cpu(tun_key->tun_id) << 8);
 
 	err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
 			     fl.saddr, tun_key->ipv4_dst,
 			     tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
 			     src_port, dst_port,
-			     htonl(be64_to_cpu(tun_key->tun_id) << 8),
+			     &md,
 			     false);
 	if (err < 0)
 		ip_rt_put(rt);
-- 
1.9.3

^ permalink raw reply related

* [PATCH 1/6] vxlan: Allow for VXLAN extensions to be implemented
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev
In-Reply-To: <cover.1420594925.git.tgraf@suug.ch>

The VXLAN receive code is currently conservative in what it accepts and
will reject any frame that uses any of the reserved VXLAN protocol fields.
The VXLAN draft specifies that "reserved fields MUST be set to zero on
transmit and ignored on receive.".

Retain the current conservative parsing behaviour by default but allows
these fields to be used by VXLAN extensions which are explicitly enabled
on the VXLAN socket respectively VXLAN net_device.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 drivers/net/vxlan.c | 29 +++++++++++++++++++----------
 include/net/vxlan.h | 32 +++++++++++++++++++++++++++++---
 2 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 2ab0922..4d52aa9 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -65,7 +65,7 @@
 #define VXLAN_VID_MASK	(VXLAN_N_VID - 1)
 #define VXLAN_HLEN (sizeof(struct udphdr) + sizeof(struct vxlanhdr))
 
-#define VXLAN_FLAGS 0x08000000	/* struct vxlanhdr.vx_flags required value. */
+#define VXLAN_FLAGS 0x08000000 /* struct vxlanhdr.vx_flags default value. */
 
 /* UDP port for VXLAN traffic.
  * The IANA assigned port is 4789, but the Linux default is 8472
@@ -1100,22 +1100,28 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (!pskb_may_pull(skb, VXLAN_HLEN))
 		goto error;
 
+	vs = rcu_dereference_sk_user_data(sk);
+	if (!vs)
+		goto drop;
+
 	/* Return packets with reserved bits set */
 	vxh = (struct vxlanhdr *)(udp_hdr(skb) + 1);
-	if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
-	    (vxh->vx_vni & htonl(0xff))) {
-		netdev_dbg(skb->dev, "invalid vxlan flags=%#x vni=%#x\n",
-			   ntohl(vxh->vx_flags), ntohl(vxh->vx_vni));
-		goto error;
+
+	/* For backwards compatibility, only allow reserved fields to be
+	 * used by VXLAN extensions if explicitly requested.
+	 */
+	if (vs->exts) {
+		if (!vxh->vni_present)
+			goto error_invalid_header;
+	} else {
+		if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
+		    (vxh->vx_vni & htonl(0xff)))
+			goto error_invalid_header;
 	}
 
 	if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
 		goto drop;
 
-	vs = rcu_dereference_sk_user_data(sk);
-	if (!vs)
-		goto drop;
-
 	vs->rcv(vs, skb, vxh->vx_vni);
 	return 0;
 
@@ -1124,6 +1130,9 @@ drop:
 	kfree_skb(skb);
 	return 0;
 
+error_invalid_header:
+	netdev_dbg(skb->dev, "invalid vxlan flags=%#x vni=%#x\n",
+		   ntohl(vxh->vx_flags), ntohl(vxh->vx_vni));
 error:
 	/* Return non vxlan pkt */
 	return 1;
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 903461a..3e98d31 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -11,10 +11,35 @@
 #define VNI_HASH_BITS	10
 #define VNI_HASH_SIZE	(1<<VNI_HASH_BITS)
 
-/* VXLAN protocol header */
+/* VXLAN protocol header:
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |R|R|R|R|I|R|R|R|               Reserved                        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                VXLAN Network Identifier (VNI) |   Reserved    |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * I = 1	VXLAN Network Identifier (VNI) present
+ */
 struct vxlanhdr {
-	__be32 vx_flags;
-	__be32 vx_vni;
+	union {
+		struct {
+#ifdef __LITTLE_ENDIAN_BITFIELD
+			__u8	reserved_flags1:3,
+				vni_present:1,
+				reserved_flags2:4;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+			__u8	reserved_flags2:4,
+				vni_present:1,
+				reserved_flags1:3;
+#else
+#error	"Please fix <asm/byteorder.h>"
+#endif
+			__u8	vx_reserved1;
+			__be16	vx_reserved2;
+		};
+		__be32 vx_flags;
+	};
+	__be32	vx_vni;
 };
 
 struct vxlan_sock;
@@ -25,6 +50,7 @@ struct vxlan_sock {
 	struct hlist_node hlist;
 	vxlan_rcv_t	 *rcv;
 	void		 *data;
+	u32		  exts;
 	struct work_struct del_work;
 	struct socket	 *sock;
 	struct rcu_head	  rcu;
-- 
1.9.3

^ permalink raw reply related

* [PATCH 0/6 net-next] VXLAN Group Policy Extension
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev

Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.

The extension is disabled by default and should be run on a distinct
port in mixed Linux VXLAN VTEP environments. Liberal VXLAN VTEPs
which ignore unknown reserved bits will be able to receive VXLAN-GBP
frames.

Simple usage example:

10.1.1.1:
   # ip link add vxlan0 type vxlan id 10 remote 10.1.1.2 gbp
   # iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200

10.1.1.2:
   # ip link add vxlan0 type vxlan id 10 remote 10.1.1.1 gbp
   # iptables -I INPUT -m mark --mark 0x200 -j DROP

iproute2 [1] and OVS [2] support will be provided in separate patches.

[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] https://github.com/tgraf/iproute2/tree/vxlan-gbp
[2] https://github.com/tgraf/ovs/tree/vxlan-gbp

Thomas Graf (6):
  vxlan: Allow for VXLAN extensions to be implemented
  vxlan: Group Policy extension
  vxlan: Only bind to sockets with correct extensions enabled
  vxlan: Fail build if VXLAN header is misdefined
  openvswitch: Rename GENEVE_TUN_OPTS() to TUN_METADATA_OPTS()
  openvswitch: Support VXLAN Group Policy extension

 drivers/net/vxlan.c              | 221 +++++++++++++++++++++++++++------------
 include/net/vxlan.h              | 104 ++++++++++++++++--
 include/uapi/linux/if_link.h     |   8 ++
 include/uapi/linux/openvswitch.h |  19 ++++
 net/openvswitch/flow.c           |   2 +-
 net/openvswitch/flow.h           |  14 +--
 net/openvswitch/flow_netlink.c   | 111 ++++++++++++--------
 net/openvswitch/vport-vxlan.c    |  89 +++++++++++++++-
 8 files changed, 435 insertions(+), 133 deletions(-)

-- 
1.9.3

^ permalink raw reply

* Re: route/max_size sysctl in ipv4
From: Ani Sinha @ 2015-01-07  1:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <20150105.195128.794605376092864881.davem@davemloft.net>

On Mon, Jan 5, 2015 at 4:51 PM, David Miller <davem@davemloft.net> wrote:

> The sysctl is kept so that scripts reading it don't suddenly stop
> working.  We can't just remove sysctl values.

Interestingly, one of our scripts did break. It broke because now this
sysctl is only available in the global net namespace and not in the
child namespaces. If not breaking scripts is the fundamental logic in
keeping in sysctl intact, would you guys be open to accepting a patch
where we make this sysctl available for all net namespaces?

^ permalink raw reply

* RE: [PATCH net-next v1 2/3] ARM: imx: add FEC sleep mode callback function
From: fugang.duan @ 2015-01-07  1:41 UTC (permalink / raw)
  To: Shawn Guo
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	bhutchings@solarflare.com, stephen@networkplumber.org
In-Reply-To: <20150106114807.GL24511@dragon>

From: Shawn Guo <shawn.guo@linaro.org> Sent: Tuesday, January 06, 2015 7:48 PM
> To: Duan Fugang-B38611
> Cc: davem@davemloft.net; netdev@vger.kernel.org;
> bhutchings@solarflare.com; stephen@networkplumber.org
> Subject: Re: [PATCH net-next v1 2/3] ARM: imx: add FEC sleep mode
> callback function
> 
> On Wed, Dec 24, 2014 at 05:30:40PM +0800, Fugang Duan wrote:
> > i.MX6q/dl, i.MX6SX SOCs enet support sleep mode that magic packet can
> > wake up system in suspend status. For different SOCs, there have some
> > SOC specifical GPR register to set sleep on/off mode. So add these to
> > callback function for driver.
> >
> > Signed-off-by: Fugang Duan <B38611@freescale.com>
> 
> I do not like this patch.  In the end, this is just a GRP register bit
> setup per FEC driver need.  Rather than messing up platform code for each
> SoC with the same pattern, I do not see why this can not be done by FEC
> driver itself.
> 
> You can take a look at LDB driver (drivers/gpu/drm/imx/imx-ldb.c) to see
> how this can be done.
> 
> Shawn
> 

Hi, Shawn,

It is SOC related setting, not fec IP itself setting, and different SOC GPR setting is not different.
So I think it better to put it to platform code.

Regards,
Andy

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox