Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 1/2] sh_eth: fix handling of no LINK signal
From: Sergei Shtylyov @ 2013-03-31 19:50 UTC (permalink / raw)
  To: netdev; +Cc: nobuhiro.iwamatsu.yj, linux-sh

The code handling the absent LINK signal (or the absent PSR register -- which
reflects the state of this signal) is quite naive and has probably never really
worked.  It's probably enough to say that this code is executed only on the LINK
change interrupt (sic!) but even if we actually have the signal and choose to
ignore it (it might be connected to PHY's link/activity LED output as on the
Renesas BOCK-W board), sh_eth_adjust_link() on which this code relies to update
'mdp->link' gets executed later than the LINK change interrupt where it is
checked, and so RX/TX never get enabled via ECMR register.

So, ignore the LINK changed interrupt iff LINK signal is absent (or just chosen
not to be used) or PSR register is absent, and enable/disable RX/TX directly in
sh_eth_adjust_link() in this case. 

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
The patch is against the David Miller's 'net.git' repo.

 drivers/net/ethernet/renesas/sh_eth.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: net/drivers/net/ethernet/renesas/sh_eth.c
===================================================================
--- net.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net/drivers/net/ethernet/renesas/sh_eth.c
@@ -1216,10 +1216,7 @@ static void sh_eth_error(struct net_devi
 		if (felic_stat & ECSR_LCHNG) {
 			/* Link Changed */
 			if (mdp->cd->no_psr || mdp->no_ether_link) {
-				if (mdp->link == PHY_DOWN)
-					link_stat = 0;
-				else
-					link_stat = PHY_ST_LINK;
+				goto ignore_link;
 			} else {
 				link_stat = (sh_eth_read(ndev, PSR));
 				if (mdp->ether_link_active_low)
@@ -1242,6 +1239,7 @@ static void sh_eth_error(struct net_devi
 		}
 	}

+ignore_link:
 	if (intr_status & EESR_TWB) {
 		/* Write buck end. unused write back interrupt */
 		if (intr_status & EESR_TABT)	/* Transmit Abort int */
@@ -1392,12 +1390,16 @@ static void sh_eth_adjust_link(struct ne
 				(sh_eth_read(ndev, ECMR) & ~ECMR_TXF), ECMR);
 			new_state = 1;
 			mdp->link = phydev->link;
+			if (mdp->cd->no_psr || mdp->no_ether_link)
+				sh_eth_rcv_snd_enable(ndev);
 		}
 	} else if (mdp->link) {
 		new_state = 1;
 		mdp->link = PHY_DOWN;
 		mdp->speed = 0;
 		mdp->duplex = -1;
+		if (mdp->cd->no_psr || mdp->no_ether_link)
+			sh_eth_rcv_snd_disable(ndev);
 	}

 	if (new_state && netif_msg_link(mdp))

^ permalink raw reply

* Re: [PATCH V4 2/4] PHYLIB: queue work on unbound wq
From: David Miller @ 2013-03-31 16:45 UTC (permalink / raw)
  To: viresh.kumar
  Cc: tj, linaro-kernel, patches, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, arvind.chauhan, airlied, axboe,
	tglx, peterz, mingo, rostedt, linux-rt-users, linux-kernel,
	netdev
In-Reply-To: <6be4e9ad2048a2a9d60143f58931c3fe94770175.1364740180.git.viresh.kumar@linaro.org>

From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Sun, 31 Mar 2013 20:01:45 +0530

> Phylib uses workqueues for multiple purposes. There is no real dependency of
> scheduling these on the cpu which scheduled them.
> 
> On a idle system, it is observed that and idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which the
> scheduler believes to be the most appropriate one.
> 
> This patch replaces system_wq with system_unbound_wq for PHYLIB.
> 
> Cc: David S. Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: handle spurious interrupts
From: Ben Hutchings @ 2013-03-31 16:31 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: davem, netdev, ariele, Eilon Greenstein
In-Reply-To: <1364729701-18855-1-git-send-email-yuvalmin@broadcom.com>

On Sun, 2013-03-31 at 14:35 +0300, Yuval Mintz wrote:
> In scenarios in which a previous driver was removed without proper cleanup
> (e.g., kdump), it is possible for the chip to generate an interrupt without
> any apparent reason once interrupts are requested.
> 
> This patch introduces 2 changes:
> 
>   - It changes bnx2x's interrupt request scheme so that bnx2x would only
>     request interrupts from the kernel after it has finished initializing
>     all the inner structs required for those interrupts' handling.
> 
>   - The driver ignores any interrupt which arrives in an inappropriate time,
>     i.e., when the HW generates interrupts without being configured to do so.
>     Acking the HW for such an interrupt will prevent HW from generating further
>     interrupts, thus ignoring them is necessary.
>    
> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
[...]

The flag changes can still race with the interrupt handler, so I think
you need to add:

> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> @@ -866,6 +866,7 @@ static void bnx2x_int_disable(struct bnx2x *bp)
>  		bnx2x_hc_int_disable(bp);
>  	else
>  		bnx2x_igu_int_disable(bp);

for (each interrupt vector)
	synchronize_irq(irq);

> +	bp->flags &= ~INTERRUPTS_ENABLED_FLAG;
>  }
>  
>  void bnx2x_panic_dump(struct bnx2x *bp, bool disable_int)
> @@ -1605,6 +1606,8 @@ static void bnx2x_igu_int_enable(struct bnx2x *bp)
>  
>  void bnx2x_int_enable(struct bnx2x *bp)
>  {
> +	bp->flags |= INTERRUPTS_ENABLED_FLAG;
> +

wmb();

>  	if (bp->common.int_block == INT_BLOCK_HC)
>  		bnx2x_hc_int_enable(bp);
>  	else
[...]

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH V4 2/4] PHYLIB: queue work on unbound wq
From: Viresh Kumar @ 2013-03-31 14:31 UTC (permalink / raw)
  To: tj
  Cc: linaro-kernel, patches, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, arvind.chauhan, davem, airlied,
	axboe, tglx, peterz, mingo, rostedt, linux-rt-users, linux-kernel,
	Viresh Kumar, netdev
In-Reply-To: <cover.1364740180.git.viresh.kumar@linaro.org>

Phylib uses workqueues for multiple purposes. There is no real dependency of
scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.

This patch replaces system_wq with system_unbound_wq for PHYLIB.

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/net/phy/phy.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index c14f147..b2fe180 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -439,7 +439,7 @@ void phy_start_machine(struct phy_device *phydev,
 {
 	phydev->adjust_state = handler;
 
-	schedule_delayed_work(&phydev->state_queue, HZ);
+	queue_delayed_work(system_unbound_wq, &phydev->state_queue, HZ);
 }
 
 /**
@@ -500,7 +500,7 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 	disable_irq_nosync(irq);
 	atomic_inc(&phydev->irq_disable);
 
-	schedule_work(&phydev->phy_queue);
+	queue_work(system_unbound_wq, &phydev->phy_queue);
 
 	return IRQ_HANDLED;
 }
@@ -655,7 +655,7 @@ static void phy_change(struct work_struct *work)
 
 	/* reschedule state queue work to run as soon as possible */
 	cancel_delayed_work_sync(&phydev->state_queue);
-	schedule_delayed_work(&phydev->state_queue, 0);
+	queue_delayed_work(system_unbound_wq, &phydev->state_queue, 0);
 
 	return;
 
@@ -918,7 +918,8 @@ void phy_state_machine(struct work_struct *work)
 	if (err < 0)
 		phy_error(phydev);
 
-	schedule_delayed_work(&phydev->state_queue, PHY_STATE_TIME * HZ);
+	queue_delayed_work(system_unbound_wq, &phydev->state_queue,
+			PHY_STATE_TIME * HZ);
 }
 
 static inline void mmd_phy_indirect(struct mii_bus *bus, int prtad, int devad,
-- 
1.7.12.rc2.18.g61b472e

^ permalink raw reply related

* [PATCH V4 2/4] PHYLIB: queue work on unbound wq
From: Viresh Kumar @ 2013-03-31 14:27 UTC (permalink / raw)
  To: tj
  Cc: linaro-kernel, patches, robin.randhawa, Steve.Bannister,
	Liviu.Dudau, charles.garcia-tobin, arvind.chauhan, davem, airlied,
	axboe, Viresh Kumar, netdev
In-Reply-To: <cover.1364739015.git.viresh.kumar@linaro.org>

Phylib uses workqueues for multiple purposes. There is no real dependency of
scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.

This patch replaces system_wq with system_unbound_wq for PHYLIB.

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/net/phy/phy.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index c14f147..b2fe180 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -439,7 +439,7 @@ void phy_start_machine(struct phy_device *phydev,
 {
 	phydev->adjust_state = handler;
 
-	schedule_delayed_work(&phydev->state_queue, HZ);
+	queue_delayed_work(system_unbound_wq, &phydev->state_queue, HZ);
 }
 
 /**
@@ -500,7 +500,7 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 	disable_irq_nosync(irq);
 	atomic_inc(&phydev->irq_disable);
 
-	schedule_work(&phydev->phy_queue);
+	queue_work(system_unbound_wq, &phydev->phy_queue);
 
 	return IRQ_HANDLED;
 }
@@ -655,7 +655,7 @@ static void phy_change(struct work_struct *work)
 
 	/* reschedule state queue work to run as soon as possible */
 	cancel_delayed_work_sync(&phydev->state_queue);
-	schedule_delayed_work(&phydev->state_queue, 0);
+	queue_delayed_work(system_unbound_wq, &phydev->state_queue, 0);
 
 	return;
 
@@ -918,7 +918,8 @@ void phy_state_machine(struct work_struct *work)
 	if (err < 0)
 		phy_error(phydev);
 
-	schedule_delayed_work(&phydev->state_queue, PHY_STATE_TIME * HZ);
+	queue_delayed_work(system_unbound_wq, &phydev->state_queue,
+			PHY_STATE_TIME * HZ);
 }
 
 static inline void mmd_phy_indirect(struct mii_bus *bus, int prtad, int devad,
-- 
1.7.12.rc2.18.g61b472e

^ permalink raw reply related

* [PATCH net-next] bnx2x: handle spurious interrupts
From: Yuval Mintz @ 2013-03-31 11:35 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, Yuval Mintz, Eilon Greenstein

In scenarios in which a previous driver was removed without proper cleanup
(e.g., kdump), it is possible for the chip to generate an interrupt without
any apparent reason once interrupts are requested.

This patch introduces 2 changes:

  - It changes bnx2x's interrupt request scheme so that bnx2x would only
    request interrupts from the kernel after it has finished initializing
    all the inner structs required for those interrupts' handling.

  - The driver ignores any interrupt which arrives in an inappropriate time,
    i.e., when the HW generates interrupts without being configured to do so.
    Acking the HW for such an interrupt will prevent HW from generating further
    interrupts, thus ignoring them is necessary.
   
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
Hi Dave,

This is the revised patch (didn't know if v2 was appropriate, as most
changes are newly introduced).

Please apply this patch to `net-next'.

Thanks,
Yuval

---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h      |  1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  | 26 ++++++++++++++----------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |  9 +++++++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 21 ++++++++++++++++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c |  3 ++-
 5 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index c630342..5f181ee 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -1391,6 +1391,7 @@ struct bnx2x {
 #define USING_SINGLE_MSIX_FLAG		(1 << 20)
 #define BC_SUPPORTS_DCBX_MSG_NON_PMF	(1 << 21)
 #define IS_VF_FLAG			(1 << 22)
+#define INTERRUPTS_ENABLED_FLAG	(1 << 23)
 
 #define BP_NOMCP(bp)			((bp)->flags & NO_MCP_FLAG)
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 352e58e..d256ffd 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -1038,6 +1038,17 @@ static irqreturn_t bnx2x_msix_fp_int(int irq, void *fp_cookie)
 	DP(NETIF_MSG_INTR,
 	   "got an MSI-X interrupt on IDX:SB [fp %d fw_sd %d igusb %d]\n",
 	   fp->index, fp->fw_sb_id, fp->igu_sb_id);
+
+	/* It's possible for a spurious interrupt to be received, if the
+	 * driver was loaded above a previous configured function (e.g., kdump).
+	 * We simply ignore such interrupts.
+	 */
+	if (unlikely(!(bp->flags & INTERRUPTS_ENABLED_FLAG))) {
+		WARN_ONCE(1, "%s: Got Spurious interrupts",
+			  bp->dev ? bp->dev->name : "?");
+		return IRQ_HANDLED;
+	}
+
 	bnx2x_ack_sb(bp, fp->igu_sb_id, USTORM_ID, 0, IGU_INT_DISABLE, 0);
 
 #ifdef BNX2X_STOP_ON_ERROR
@@ -1719,7 +1730,7 @@ static int bnx2x_req_irq(struct bnx2x *bp)
 	return request_irq(irq, bnx2x_interrupt, flags, bp->dev->name, bp->dev);
 }
 
-static int bnx2x_setup_irqs(struct bnx2x *bp)
+int bnx2x_setup_irqs(struct bnx2x *bp)
 {
 	int rc = 0;
 	if (bp->flags & USING_MSIX_FLAG &&
@@ -2575,17 +2586,10 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode)
 		}
 	}
 
-	/* Connect to IRQs */
-	rc = bnx2x_setup_irqs(bp);
-	if (rc) {
-		BNX2X_ERR("setup irqs failed\n");
-		if (IS_PF(bp))
-			bnx2x_fw_command(bp, DRV_MSG_CODE_LOAD_DONE, 0);
-		LOAD_ERROR_EXIT(bp, load_error2);
-	}
-
 	/* Setup NIC internals and enable interrupts */
-	bnx2x_nic_init(bp, load_code);
+	rc = bnx2x_nic_init(bp, load_code);
+	if (rc)
+		LOAD_ERROR_EXIT(bp, load_error2);
 
 	/* Init per-function objects */
 	if (IS_PF(bp)) {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 54e1b14..84ae16c 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -304,7 +304,7 @@ void bnx2x_nic_init_cnic(struct bnx2x *bp);
  *  - status blocks
  *  - etc.
  */
-void bnx2x_nic_init(struct bnx2x *bp, u32 load_code);
+int bnx2x_nic_init(struct bnx2x *bp, u32 load_code);
 /**
  * bnx2x_alloc_mem_cnic - allocate driver's memory for cnic.
  *
@@ -1406,4 +1406,11 @@ void bnx2x_fill_fw_str(struct bnx2x *bp, char *buf, size_t buf_len);
 int bnx2x_drain_tx_queues(struct bnx2x *bp);
 void bnx2x_squeeze_objects(struct bnx2x *bp);
 
+/**
+ * bnx2x_setup_irqs - Request MSI-X/MSI/INTa interrupt from kernel
+ *
+ * @bp - driver handle
+ */
+int bnx2x_setup_irqs(struct bnx2x *bp);
+
 #endif /* BNX2X_CMN_H */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index fdfe33b..960b453 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -866,6 +866,7 @@ static void bnx2x_int_disable(struct bnx2x *bp)
 		bnx2x_hc_int_disable(bp);
 	else
 		bnx2x_igu_int_disable(bp);
+	bp->flags &= ~INTERRUPTS_ENABLED_FLAG;
 }
 
 void bnx2x_panic_dump(struct bnx2x *bp, bool disable_int)
@@ -1605,6 +1606,8 @@ static void bnx2x_igu_int_enable(struct bnx2x *bp)
 
 void bnx2x_int_enable(struct bnx2x *bp)
 {
+	bp->flags |= INTERRUPTS_ENABLED_FLAG;
+
 	if (bp->common.int_block == INT_BLOCK_HC)
 		bnx2x_hc_int_enable(bp);
 	else
@@ -6030,9 +6033,9 @@ void bnx2x_nic_init_cnic(struct bnx2x *bp)
 	mmiowb();
 }
 
-void bnx2x_nic_init(struct bnx2x *bp, u32 load_code)
+int bnx2x_nic_init(struct bnx2x *bp, u32 load_code)
 {
-	int i;
+	int i, rc;
 
 	for_each_eth_queue(bp, i)
 		bnx2x_init_eth_fp(bp, i);
@@ -6043,7 +6046,7 @@ void bnx2x_nic_init(struct bnx2x *bp, u32 load_code)
 	bnx2x_init_tx_rings(bp);
 	if (IS_VF(bp)) {
 		bnx2x_memset_stats(bp);
-		return;
+		return 0;
 	}
 
 	/* Initialize MOD_ABS interrupts */
@@ -6054,6 +6057,16 @@ void bnx2x_nic_init(struct bnx2x *bp, u32 load_code)
 	bnx2x_init_def_sb(bp);
 	bnx2x_update_dsb_idx(bp);
 	bnx2x_init_sp_ring(bp);
+
+	/* Connect to IRQs */
+	rc = bnx2x_setup_irqs(bp);
+	if (rc) {
+		BNX2X_ERR("setup irqs failed\n");
+		if (IS_PF(bp))
+			bnx2x_fw_command(bp, DRV_MSG_CODE_LOAD_DONE, 0);
+		return rc;
+	}
+
 	bnx2x_init_eq_ring(bp);
 	bnx2x_init_internal(bp, load_code);
 	bnx2x_pf_init(bp);
@@ -6069,6 +6082,8 @@ void bnx2x_nic_init(struct bnx2x *bp, u32 load_code)
 	bnx2x_attn_int_deasserted0(bp,
 		REG_RD(bp, MISC_REG_AEU_AFTER_INVERT_1_FUNC_0 + BP_PORT(bp)*4) &
 				   AEU_INPUTS_ATTN_BITS_SPIO5);
+
+	return 0;
 }
 
 /* end of nic init */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
index 90fbf9c..2017901 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
@@ -282,7 +282,8 @@ int bnx2x_vfpf_acquire(struct bnx2x *bp, u8 tx_count, u8 rx_count)
 	bp->mf_mode = 0;
 	bp->common.flash_size = 0;
 	bp->flags |=
-		NO_WOL_FLAG | NO_ISCSI_OOO_FLAG | NO_ISCSI_FLAG | NO_FCOE_FLAG;
+		NO_WOL_FLAG | NO_ISCSI_OOO_FLAG | NO_ISCSI_FLAG | NO_FCOE_FLAG |
+		INTERRUPTS_ENABLED_FLAG;
 	bp->igu_sb_cnt = 1;
 	bp->igu_base_sb = bp->acquire_resp.resc.hw_sbs[0].hw_sb_id;
 	strlcpy(bp->fw_ver, bp->acquire_resp.pfdev_info.fw_ver,
-- 
1.8.1.227.g44fe835

^ permalink raw reply related

* Re: [PATCH v2 6/6] sctp: convert sctp_assoc_set_id to use idr_alloc_cyclic
From: Jeff Layton @ 2013-03-31 10:23 UTC (permalink / raw)
  To: Neil Horman
  Cc: akpm, linux-kernel, tj, Vlad Yasevich, Sridhar Samudrala,
	David S. Miller, linux-sctp, netdev
In-Reply-To: <20130328140406.GB14489@neilslaptop.think-freely.org>

On Thu, 28 Mar 2013 10:04:06 -0400
Neil Horman <nhorman@tuxdriver.com> wrote:

> On Thu, Mar 28, 2013 at 09:53:08AM -0400, Neil Horman wrote:
> > On Wed, Mar 27, 2013 at 03:29:38PM -0400, Jeff Layton wrote:
> > > (Note: compile-tested only)
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > Cc: Vlad Yasevich <vyasevich@gmail.com>
> > > Cc: Sridhar Samudrala <sri@us.ibm.com>
> > > Cc: Neil Horman <nhorman@tuxdriver.com>
> > > Cc: "David S. Miller" <davem@davemloft.net>
> > > Cc: linux-sctp@vger.kernel.org
> > > Cc: netdev@vger.kernel.org
> > > ---
> > >  net/sctp/associola.c | 16 ++--------------
> > >  1 file changed, 2 insertions(+), 14 deletions(-)
> > > 
> > I don't see anything wrong with this patch per-se, but the idr_alloc_cyclic call
> > isn't integrated with net/net-next or Linus' tree yet.  If we don't gate this
> > patch on that integration, we'll break the build.
> > Neil
> > 
> Actually, I just noticed that you only sent us 6/6 here, I'm assuming a prior
> patch in the series adds the idr_alloc_cyclic code?  if so, I've seen the prior
> version


Yes, idr_alloc_cyclic is added in patch #1. v2 has some slight changes
from v1 but it's essentially the same concept.

> 
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> 

Thanks!
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply

* Registration No. 0201GFRM-7
From: BlackBerry® Easter Promotion @ 2013-03-31  7:29 UTC (permalink / raw)
  To: info

You Have Won !!!

You Have Won!!!

^ permalink raw reply

* Re: [PATCH v2] net IPv6 : Fix broken IPv6 routing table after loopback down-up
From: Balakumaran Kannan @ 2013-03-31  6:38 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, yoshfuji, kaber, kuznet, jmorris,
	Balakumaran.Kannan, maruthi.thotad, netdev, jamshed.a,
	amit.agarwal, takuzo.ohara, aaditya.kumar
In-Reply-To: <20130329.151759.1699178800809051958.davem@davemloft.net>

>>  static void addrconf_add_linklocal(struct inet6_dev *idev, const
>> struct in6_addr *addr)
>
> Your email client corrupted this patch, please resubmit but only after
> you have emailed the patch to yourself and successfully applied the
> patch you receive in that test email.
>
> Thanks.

I'll send working patch after verifying it.

Thank you.
K.Balakumaran

^ permalink raw reply

* [PATCH iproute2] vxlan: add ipv6 support
From: Cong Wang @ 2013-03-31  6:17 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, Cong Wang
In-Reply-To: <1364708625-29063-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 include/linux/if_link.h |    2 ++
 ip/iplink_vxlan.c       |   45 ++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 40167af..f74b8cc 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -306,6 +306,8 @@ enum {
 	IFLA_VXLAN_RSC,
 	IFLA_VXLAN_L2MISS,
 	IFLA_VXLAN_L3MISS,
+	IFLA_VXLAN_GROUP6,
+	IFLA_VXLAN_LOCAL6,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
index 1025326..c10ec0f 100644
--- a/ip/iplink_vxlan.c
+++ b/ip/iplink_vxlan.c
@@ -42,6 +42,8 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	int vni_set = 0;
 	__u32 saddr = 0;
 	__u32 gaddr = 0;
+	struct in6_addr saddr6 = IN6ADDR_ANY_INIT;
+	struct in6_addr gaddr6 = IN6ADDR_ANY_INIT;
 	unsigned link = 0;
 	__u8 tos = 0;
 	__u8 ttl = 0;
@@ -65,15 +67,26 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 			vni_set = 1;
 		} else if (!matches(*argv, "group")) {
 			NEXT_ARG();
-			gaddr = get_addr32(*argv);
-
-			if (!IN_MULTICAST(ntohl(gaddr)))
-				invarg("invald group address", *argv);
+			if (!inet_pton(AF_INET, *argv, &gaddr)) {
+				if (!inet_pton(AF_INET6, *argv, &gaddr6)) {
+					fprintf(stderr, "Invalid address \"%s\"\n", *argv);
+					return -1;
+				} else if (!IN6_IS_ADDR_MULTICAST(&gaddr6))
+					invarg("invald group address", *argv);
+			} else if (!IN_MULTICAST(ntohl(gaddr)))
+					invarg("invald group address", *argv);
 		} else if (!matches(*argv, "local")) {
 			NEXT_ARG();
-			if (strcmp(*argv, "any"))
-				saddr = get_addr32(*argv);
-			if (IN_MULTICAST(ntohl(saddr)))
+			if (strcmp(*argv, "any")) {
+				if (!inet_pton(AF_INET, *argv, &saddr)) {
+					if (!inet_pton(AF_INET6, *argv, &saddr6)) {
+						fprintf(stderr, "Invalid address \"%s\"\n", *argv);
+						return -1;
+					}
+				}
+			}
+
+			if (IN_MULTICAST(ntohl(saddr)) || IN6_IS_ADDR_MULTICAST(&saddr6))
 				invarg("invalid local address", *argv);
 		} else if (!matches(*argv, "dev")) {
 			NEXT_ARG();
@@ -163,8 +176,14 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	addattr32(n, 1024, IFLA_VXLAN_ID, vni);
 	if (gaddr)
 		addattr_l(n, 1024, IFLA_VXLAN_GROUP, &gaddr, 4);
+	else if (memcmp(&gaddr6, &in6addr_any, sizeof(gaddr6)) != 0)
+		addattr_l(n, 1024, IFLA_VXLAN_GROUP6, &gaddr6, sizeof(struct in6_addr));
+
 	if (saddr)
 		addattr_l(n, 1024, IFLA_VXLAN_LOCAL, &saddr, 4);
+	else if (memcmp(&saddr6, &in6addr_any, sizeof(saddr6)) != 0)
+		addattr_l(n, 1024, IFLA_VXLAN_LOCAL6, &saddr6, sizeof(struct in6_addr));
+
 	if (link)
 		addattr32(n, 1024, IFLA_VXLAN_LINK, link);
 	addattr8(n, 1024, IFLA_VXLAN_TTL, ttl);
@@ -211,6 +230,12 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 		if (addr)
 			fprintf(f, "group %s ",
 				format_host(AF_INET, 4, &addr, s1, sizeof(s1)));
+	} else if (tb[IFLA_VXLAN_GROUP6]) {
+		struct in6_addr addr;
+		memcpy(&addr, RTA_DATA(tb[IFLA_VXLAN_GROUP6]), sizeof(struct in6_addr));
+		if (memcmp(&addr, &in6addr_any, sizeof(addr)) != 0)
+			fprintf(f, "group %s ",
+				format_host(AF_INET6, sizeof(struct in6_addr), &addr, s1, sizeof(s1)));
 	}
 
 	if (tb[IFLA_VXLAN_LOCAL]) {
@@ -218,6 +243,12 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 		if (addr)
 			fprintf(f, "local %s ",
 				format_host(AF_INET, 4, &addr, s1, sizeof(s1)));
+	} else if (tb[IFLA_VXLAN_LOCAL6]) {
+		struct in6_addr addr;
+		memcpy(&addr, RTA_DATA(tb[IFLA_VXLAN_LOCAL6]), sizeof(struct in6_addr));
+		if (memcmp(&addr, &in6addr_any, sizeof(addr)) != 0)
+			fprintf(f, "local %s ",
+				format_host(AF_INET6, sizeof(struct in6_addr), &addr, s1, sizeof(s1)));
 	}
 
 	if (tb[IFLA_VXLAN_LINK] &&
-- 
1.7.7.6

^ permalink raw reply related

* Re: RX/dropped counter values for tagged packets
From: Eric Dumazet @ 2013-03-31  5:44 UTC (permalink / raw)
  To: Ani Sinha; +Cc: netdev, linuxdev
In-Reply-To: <CAOxq_8OBcHktjmkwiz2j=DXDAdNx3t7D_nTmDtuJudfs49VNrg@mail.gmail.com>

On Sat, 2013-03-30 at 20:30 -0700, Ani Sinha wrote:

> On Sun, Mar 24, 2013 at 9:43 PM, Ani Sinha <ani@aristanetworks.com> wrote:
> > hello everyone :
> >
> > Please bear with me if you think I said something wrong.
> >
> > I was looking at the netif_receive_skb() code and it seems to me that
> > for vlan tagged packets, the counters are manipulated at two different
> > places. One is from vlan_do_receive() where the receive counters are
> > incremented. The other counter (the dropped counter) is incremented at
> > the very end of netif_receive_skb() when there are no takers for this
> > packet. The dropped counts are incremented for skb->dev->rx_dropped
> > whereas the receive counts are incremented for the vlan device
> > specific private stat counter values. When the numbers are reported on
> > /proc/net/dev (from where ifconfig gets the numbers), these two stat
> > counters are combined together (see dev_get_stats() where rx_dropped
> > is added to the device private stat numbers). So if no one is
> > receiving the tagged packets, it will be reported both as received and
> > dropped! I wonder whether I am analyzing it wrong or whether no one
> > has noticed this before or whether this is intentional.
> >
> > Any  clarification will be greatly appreciated. Again, I could be
> > wrong looking at the code, so please bear with me.
> >

I don't know, I am not reading this code like you, I see no problem.

If vlan_do_receive() return true, we "goto another_round" to deliver the
packt to vlan device.

vlan_do_receive() doesn't deliver the packet at all. It only prepares
the "another_round"

So if packet is delivered to vlan, we don't increment rx_dropped.

Just try by yourself, it is fine.

^ permalink raw reply

* [Patch net-next v1 4/4] ipv6: Add generic UDP Tunnel segmentation
From: Cong Wang @ 2013-03-31  5:43 UTC (permalink / raw)
  To: netdev
  Cc: Jesse Gross, Pravin B Shelar, Stephen Hemminger, David S. Miller,
	Cong Wang
In-Reply-To: <1364708625-29063-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>

Similar to commit 731362674580cb0c696cd1b1a03d8461a10cf90a
(tunneling: Add generic Tunnel segmentation)

This patch adds generic tunneling offloading support for IPv6-UDP based
tunnels.

This can be used by tunneling protocols like VXLAN.

Cc: Jesse Gross <jesse@nicira.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 net/ipv6/ip6_offload.c |    4 +-
 net/ipv6/udp_offload.c |  155 +++++++++++++++++++++++++++++++++---------------
 2 files changed, 110 insertions(+), 49 deletions(-)

diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 71b766e..f031ccf 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -91,6 +91,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	unsigned int unfrag_ip6hlen;
 	u8 *prevhdr;
 	int offset = 0;
+	bool tunnel;
 
 	if (unlikely(skb_shinfo(skb)->gso_type &
 		     ~(SKB_GSO_UDP |
@@ -105,6 +106,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h))))
 		goto out;
 
+	tunnel = !!skb->encapsulation;
 	ipv6h = ipv6_hdr(skb);
 	__skb_pull(skb, sizeof(*ipv6h));
 	segs = ERR_PTR(-EPROTONOSUPPORT);
@@ -125,7 +127,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 		ipv6h = ipv6_hdr(skb);
 		ipv6h->payload_len = htons(skb->len - skb->mac_len -
 					   sizeof(*ipv6h));
-		if (proto == IPPROTO_UDP) {
+		if (!tunnel && proto == IPPROTO_UDP) {
 			unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
 			fptr = (struct frag_hdr *)(skb_network_header(skb) +
 				unfrag_ip6hlen);
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 3bb3a89..bbde7ba 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -21,26 +21,81 @@ static int udp6_ufo_send_check(struct sk_buff *skb)
 	const struct ipv6hdr *ipv6h;
 	struct udphdr *uh;
 
-	/* UDP Tunnel offload on ipv6 is not yet supported. */
-	if (skb->encapsulation)
-		return -EINVAL;
-
 	if (!pskb_may_pull(skb, sizeof(*uh)))
 		return -EINVAL;
 
-	ipv6h = ipv6_hdr(skb);
-	uh = udp_hdr(skb);
+	if (likely(!skb->encapsulation)) {
+		ipv6h = ipv6_hdr(skb);
+		uh = udp_hdr(skb);
+
+		uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, skb->len,
+					     IPPROTO_UDP, 0);
+		skb->csum_start = skb_transport_header(skb) - skb->head;
+		skb->csum_offset = offsetof(struct udphdr, check);
+		skb->ip_summed = CHECKSUM_PARTIAL;
+	}
 
-	uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, skb->len,
-				     IPPROTO_UDP, 0);
-	skb->csum_start = skb_transport_header(skb) - skb->head;
-	skb->csum_offset = offsetof(struct udphdr, check);
-	skb->ip_summed = CHECKSUM_PARTIAL;
 	return 0;
 }
 
+static struct sk_buff *skb_udp6_tunnel_segment(struct sk_buff *skb,
+					       netdev_features_t features)
+{
+	struct sk_buff *segs = ERR_PTR(-EINVAL);
+	int mac_len = skb->mac_len;
+	int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);
+	int outer_hlen;
+	netdev_features_t enc_features;
+
+	if (unlikely(!pskb_may_pull(skb, tnl_hlen)))
+		goto out;
+
+	skb->encapsulation = 0;
+	__skb_pull(skb, tnl_hlen);
+	skb_reset_mac_header(skb);
+	skb_set_network_header(skb, skb_inner_network_offset(skb));
+	skb->mac_len = skb_inner_network_offset(skb);
+
+	/* segment inner packet. */
+	enc_features = skb->dev->hw_enc_features & netif_skb_features(skb);
+	segs = skb_mac_gso_segment(skb, enc_features);
+	if (!segs || IS_ERR(segs))
+		goto out;
+
+	outer_hlen = skb_tnl_header_len(skb);
+	skb = segs;
+	do {
+		struct udphdr *uh;
+		int udp_offset = outer_hlen - tnl_hlen;
+
+		skb->mac_len = mac_len;
+
+		skb_push(skb, outer_hlen);
+		skb_reset_mac_header(skb);
+		skb_set_network_header(skb, mac_len);
+		skb_set_transport_header(skb, udp_offset);
+		uh = udp_hdr(skb);
+		uh->len = htons(skb->len - udp_offset);
+
+		/* csum segment if tunnel sets skb with csum. */
+		if (unlikely(uh->check)) {
+			struct ipv6hdr *iph = ipv6_hdr(skb);
+
+			uh->check = csum_ipv6_magic(&iph->saddr, &iph->daddr,
+						       skb->len - udp_offset,
+						       IPPROTO_UDP, 0);
+			if (uh->check == 0)
+				uh->check = CSUM_MANGLED_0;
+
+		}
+		skb->ip_summed = CHECKSUM_NONE;
+	} while ((skb = skb->next));
+out:
+	return segs;
+}
+
 static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
-	netdev_features_t features)
+					 netdev_features_t features)
 {
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	unsigned int mss;
@@ -73,43 +128,47 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 		goto out;
 	}
 
-	/* Do software UFO. Complete and fill in the UDP checksum as HW cannot
-	 * do checksum of UDP packets sent as multiple IP fragments.
-	 */
-	offset = skb_checksum_start_offset(skb);
-	csum = skb_checksum(skb, offset, skb->len - offset, 0);
-	offset += skb->csum_offset;
-	*(__sum16 *)(skb->data + offset) = csum_fold(csum);
-	skb->ip_summed = CHECKSUM_NONE;
-
-	/* Check if there is enough headroom to insert fragment header. */
-	if ((skb_mac_header(skb) < skb->head + frag_hdr_sz) &&
-	    pskb_expand_head(skb, frag_hdr_sz, 0, GFP_ATOMIC))
-		goto out;
+	if (skb->encapsulation && skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL)
+		segs = skb_udp6_tunnel_segment(skb, features);
+	else {
+		/* Do software UFO. Complete and fill in the UDP checksum as HW cannot
+		 * do checksum of UDP packets sent as multiple IP fragments.
+		 */
+		offset = skb_checksum_start_offset(skb);
+		csum = skb_checksum(skb, offset, skb->len - offset, 0);
+		offset += skb->csum_offset;
+		*(__sum16 *)(skb->data + offset) = csum_fold(csum);
+		skb->ip_summed = CHECKSUM_NONE;
+
+		/* Check if there is enough headroom to insert fragment header. */
+		if ((skb_mac_header(skb) < skb->head + frag_hdr_sz) &&
+		    pskb_expand_head(skb, frag_hdr_sz, 0, GFP_ATOMIC))
+			goto out;
 
-	/* Find the unfragmentable header and shift it left by frag_hdr_sz
-	 * bytes to insert fragment header.
-	 */
-	unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
-	nexthdr = *prevhdr;
-	*prevhdr = NEXTHDR_FRAGMENT;
-	unfrag_len = skb_network_header(skb) - skb_mac_header(skb) +
-		     unfrag_ip6hlen;
-	mac_start = skb_mac_header(skb);
-	memmove(mac_start-frag_hdr_sz, mac_start, unfrag_len);
-
-	skb->mac_header -= frag_hdr_sz;
-	skb->network_header -= frag_hdr_sz;
-
-	fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
-	fptr->nexthdr = nexthdr;
-	fptr->reserved = 0;
-	ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));
-
-	/* Fragment the skb. ipv6 header and the remaining fields of the
-	 * fragment header are updated in ipv6_gso_segment()
-	 */
-	segs = skb_segment(skb, features);
+		/* Find the unfragmentable header and shift it left by frag_hdr_sz
+		 * bytes to insert fragment header.
+		 */
+		unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
+		nexthdr = *prevhdr;
+		*prevhdr = NEXTHDR_FRAGMENT;
+		unfrag_len = skb_network_header(skb) - skb_mac_header(skb) +
+			     unfrag_ip6hlen;
+		mac_start = skb_mac_header(skb);
+		memmove(mac_start-frag_hdr_sz, mac_start, unfrag_len);
+
+		skb->mac_header -= frag_hdr_sz;
+		skb->network_header -= frag_hdr_sz;
+
+		fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
+		fptr->nexthdr = nexthdr;
+		fptr->reserved = 0;
+		ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));
+
+		/* Fragment the skb. ipv6 header and the remaining fields of the
+		 * fragment header are updated in ipv6_gso_segment()
+		 */
+		segs = skb_segment(skb, features);
+	}
 
 out:
 	return segs;
-- 
1.7.7.6

^ permalink raw reply related

* [Patch net-next v1 3/4] vxlan: add ipv6 support
From: Cong Wang @ 2013-03-31  5:43 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, David S. Miller, Cong Wang
In-Reply-To: <1364708625-29063-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>

This patch adds IPv6 support to vxlan device, as the new version
RFC already mentions it:

   http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03

(I am not sure if I do checksum correctly, at least it works)

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 drivers/net/vxlan.c          |  553 ++++++++++++++++++++++++++++++++---------
 include/uapi/linux/if_link.h |    2 +
 2 files changed, 433 insertions(+), 122 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index cac4e4f..8923593 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -9,7 +9,6 @@
  *
  * TODO
  *  - use IANA UDP port number (when defined)
- *  - IPv6 (not in RFC)
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -42,6 +41,11 @@
 #include <net/inet_ecn.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
+#if IS_ENABLED(CONFIG_IPV6)
+#include <net/addrconf.h>
+#include <net/ip6_route.h>
+#include <net/ip6_tunnel.h>
+#endif
 
 #define VXLAN_VERSION	"0.1"
 
@@ -56,6 +60,7 @@
 #define VXLAN_VID_MASK	(VXLAN_N_VID - 1)
 /* IP header + UDP + VXLAN + Ethernet header */
 #define VXLAN_HEADROOM (20 + 8 + 8 + 14)
+#define VXLAN6_HEADROOM (40 + 8 + 8 + 14)
 
 #define VXLAN_FLAGS 0x08000000	/* struct vxlanhdr.vx_flags required value. */
 
@@ -81,9 +86,19 @@ struct vxlan_net {
 	struct hlist_head vni_list[VNI_HASH_SIZE];
 };
 
+struct vxlan_ip {
+	union {
+		__be32  ip4;
+#if IS_ENABLED(CONFIG_IPV6)
+		struct in6_addr ip6;
+#endif
+	};
+	__be16          proto;
+};
+
 struct vxlan_rdst {
 	struct rcu_head		 rcu;
-	__be32			 remote_ip;
+	struct vxlan_ip		 remote_ip;
 	__be16			 remote_port;
 	u32			 remote_vni;
 	u32			 remote_ifindex;
@@ -106,8 +121,8 @@ struct vxlan_dev {
 	struct hlist_node hlist;
 	struct net_device *dev;
 	__u32		  vni;		/* virtual network id */
-	__be32	          gaddr;	/* multicast group */
-	__be32		  saddr;	/* source address */
+	struct vxlan_ip	  gaddr;	/* multicast group */
+	struct vxlan_ip	  saddr;	/* source address */
 	unsigned int      link;		/* link to multicast over */
 	__u16		  port_min;	/* source port range */
 	__u16		  port_max;
@@ -130,6 +145,79 @@ struct vxlan_dev {
 #define VXLAN_F_L2MISS	0x08
 #define VXLAN_F_L3MISS	0x10
 
+#if IS_ENABLED(CONFIG_IPV6)
+static inline bool vxlan_ip_equal(const struct vxlan_ip *a, const struct vxlan_ip *b)
+{
+	if (a->proto != b->proto)
+		return false;
+	switch (a->proto) {
+	case htons(ETH_P_IP):
+		return a->ip4 == b->ip4;
+	case htons(ETH_P_IPV6):
+		return ipv6_addr_equal(&a->ip6, &b->ip6);
+	}
+	return false;
+}
+
+static inline bool vxlan_ip_any(const struct vxlan_ip *ipa)
+{
+	if (ipa->proto == htons(ETH_P_IP))
+		return ipa->ip4 == htonl(INADDR_ANY);
+	else
+		return ipv6_addr_any(&ipa->ip6);
+}
+
+static int vxlan_nla_get_addr(struct vxlan_ip *ip, struct nlattr *nla)
+{
+	if (nla_len(nla) == sizeof(__be32)) {
+		ip->ip4 = nla_get_be32(nla);
+		ip->proto = htons(ETH_P_IP);
+	} else if (nla_len(nla) == sizeof(struct in6_addr)) {
+		nla_memcpy(&ip->ip6, nla, sizeof(struct in6_addr));
+		ip->proto = htons(ETH_P_IPV6);
+	} else
+		return -EAFNOSUPPORT;
+	return 0;
+}
+
+static int vxlan_nla_put_addr(struct sk_buff *skb, int attr, const struct vxlan_ip *ip)
+{
+	if (ip->proto == htons(ETH_P_IP))
+		return nla_put_be32(skb, attr, ip->ip4);
+	else if (ip->proto == htons(ETH_P_IPV6))
+		return nla_put(skb, attr, sizeof(struct in6_addr), &ip->ip6);
+	else
+		return -EAFNOSUPPORT;
+}
+#else
+static inline bool vxlan_ip_equal(const struct vxlan_ip *a, const struct vxlan_ip *b)
+{
+	return a->ip4 == b->ip4;
+}
+
+static inline bool vxlan_ip_any(const struct vxlan_ip *ipa)
+{
+	return ipa->ip4 == htonl(INADDR_ANY);
+}
+
+static int vxlan_nla_get_addr(struct vxlan_ip *ip, struct nlattr *nla)
+{
+	if (nla_len(nla) == sizeof(__be32)) {
+		ip->ip4 = nla_get_be32(nla);
+		ip->proto = htons(ETH_P_IP);
+	} else
+		return -EAFNOSUPPORT;
+}
+
+static int vxlan_nla_put_addr(struct sk_buff *skb, int attr, const struct vxlan_ip *ip)
+{
+	if (ip->proto == htons(ETH_P_IP))
+		return nla_put_be32(skb, attr, ip->ip4);
+	else
+		return -EAFNOSUPPORT;
+}
+#endif
+
 /* salt for hash table */
 static u32 vxlan_salt __read_mostly;
 
@@ -176,7 +264,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 
 	if (type == RTM_GETNEIGH) {
 		ndm->ndm_family	= AF_INET;
-		send_ip = rdst->remote_ip != htonl(INADDR_ANY);
+		send_ip = !vxlan_ip_any(&rdst->remote_ip);
 		send_eth = !is_zero_ether_addr(fdb->eth_addr);
 	} else
 		ndm->ndm_family	= AF_BRIDGE;
@@ -188,7 +276,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 	if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr))
 		goto nla_put_failure;
 
-	if (send_ip && nla_put_be32(skb, NDA_DST, rdst->remote_ip))
+	if (send_ip && vxlan_nla_put_addr(skb, NDA_DST, &rdst->remote_ip))
 		goto nla_put_failure;
 
 	if (rdst->remote_port && rdst->remote_port != vxlan_port &&
@@ -220,7 +308,7 @@ static inline size_t vxlan_nlmsg_size(void)
 {
 	return NLMSG_ALIGN(sizeof(struct ndmsg))
 		+ nla_total_size(ETH_ALEN) /* NDA_LLADDR */
-		+ nla_total_size(sizeof(__be32)) /* NDA_DST */
+		+ nla_total_size(sizeof(struct in6_addr)) /* NDA_DST */
 		+ nla_total_size(sizeof(__be32)) /* NDA_PORT */
 		+ nla_total_size(sizeof(__be32)) /* NDA_VNI */
 		+ nla_total_size(sizeof(__u32)) /* NDA_IFINDEX */
@@ -253,14 +341,14 @@ errout:
 		rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
 }
 
-static void vxlan_ip_miss(struct net_device *dev, __be32 ipa)
+static void vxlan_ip_miss(struct net_device *dev, struct vxlan_ip *ipa)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct vxlan_fdb f;
 
 	memset(&f, 0, sizeof f);
 	f.state = NUD_STALE;
-	f.remote.remote_ip = ipa; /* goes to NDA_DST */
+	f.remote.remote_ip = *ipa; /* goes to NDA_DST */
 	f.remote.remote_vni = VXLAN_N_VID;
 
 	vxlan_fdb_notify(vxlan, &f, RTM_GETNEIGH);
@@ -316,13 +404,13 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
 
 /* Add/update destinations for multicast */
 static int vxlan_fdb_append(struct vxlan_fdb *f,
-			    __be32 ip, __u32 port, __u32 vni, __u32 ifindex)
+			    struct vxlan_ip *ip, __u32 port, __u32 vni, __u32 ifindex)
 {
 	struct vxlan_rdst *rd_prev, *rd;
 
 	rd_prev = NULL;
 	for (rd = &f->remote; rd; rd = rd->remote_next) {
-		if (rd->remote_ip == ip &&
+		if (vxlan_ip_equal(&rd->remote_ip, ip) &&
 		    rd->remote_port == port &&
 		    rd->remote_vni == vni &&
 		    rd->remote_ifindex == ifindex)
@@ -332,7 +420,7 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
 	rd = kmalloc(sizeof(*rd), GFP_ATOMIC);
 	if (rd == NULL)
 		return -ENOBUFS;
-	rd->remote_ip = ip;
+	rd->remote_ip = *ip;
 	rd->remote_port = port;
 	rd->remote_vni = vni;
 	rd->remote_ifindex = ifindex;
@@ -343,7 +431,7 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
 
 /* Add new entry to forwarding table -- assumes lock held */
 static int vxlan_fdb_create(struct vxlan_dev *vxlan,
-			    const u8 *mac, __be32 ip,
+			    const u8 *mac, struct vxlan_ip *ip,
 			    __u16 state, __u16 flags,
 			    __u32 port, __u32 vni, __u32 ifindex)
 {
@@ -383,7 +471,7 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
 			return -ENOMEM;
 
 		notify = 1;
-		f->remote.remote_ip = ip;
+		f->remote.remote_ip = *ip;
 		f->remote.remote_port = port;
 		f->remote.remote_vni = vni;
 		f->remote.remote_ifindex = ifindex;
@@ -435,7 +523,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct net *net = dev_net(vxlan->dev);
-	__be32 ip;
+	struct vxlan_ip ip;
 	u32 port, vni, ifindex;
 	int err;
 
@@ -448,10 +536,9 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 	if (tb[NDA_DST] == NULL)
 		return -EINVAL;
 
-	if (nla_len(tb[NDA_DST]) != sizeof(__be32))
-		return -EAFNOSUPPORT;
-
-	ip = nla_get_be32(tb[NDA_DST]);
+	err = vxlan_nla_get_addr(&ip, tb[NDA_DST]);
+	if (err)
+		return err;
 
 	if (tb[NDA_PORT]) {
 		if (nla_len(tb[NDA_PORT]) != sizeof(u32))
@@ -481,7 +568,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 		ifindex = 0;
 
 	spin_lock_bh(&vxlan->hash_lock);
-	err = vxlan_fdb_create(vxlan, addr, ip, ndm->ndm_state, flags, port,
+	err = vxlan_fdb_create(vxlan, addr, &ip, ndm->ndm_state, flags, port,
 		vni, ifindex);
 	spin_unlock_bh(&vxlan->hash_lock);
 
@@ -545,7 +632,7 @@ skip:
  * and Tunnel endpoint.
  */
 static void vxlan_snoop(struct net_device *dev,
-			__be32 src_ip, const u8 *src_mac)
+			struct vxlan_ip *src_ip, const u8 *src_mac)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct vxlan_fdb *f;
@@ -554,7 +641,7 @@ static void vxlan_snoop(struct net_device *dev,
 	f = vxlan_find_mac(vxlan, src_mac);
 	if (likely(f)) {
 		f->used = jiffies;
-		if (likely(f->remote.remote_ip == src_ip))
+		if (likely(vxlan_ip_equal(&f->remote.remote_ip, src_ip)))
 			return;
 
 		if (net_ratelimit())
@@ -562,7 +649,7 @@ static void vxlan_snoop(struct net_device *dev,
 				    "%pM migrated from %pI4 to %pI4\n",
 				    src_mac, &f->remote.remote_ip, &src_ip);
 
-		f->remote.remote_ip = src_ip;
+		f->remote.remote_ip = *src_ip;
 		f->updated = jiffies;
 	} else {
 		/* learned new entry */
@@ -591,7 +678,7 @@ static bool vxlan_group_used(struct vxlan_net *vn,
 			if (!netif_running(vxlan->dev))
 				continue;
 
-			if (vxlan->gaddr == this->gaddr)
+			if (vxlan_ip_equal(&vxlan->gaddr, &this->gaddr))
 				return true;
 		}
 
@@ -605,7 +692,7 @@ static int vxlan_join_group(struct net_device *dev)
 	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
 	struct sock *sk = vn->sock->sk;
 	struct ip_mreqn mreq = {
-		.imr_multiaddr.s_addr	= vxlan->gaddr,
+		.imr_multiaddr.s_addr	= vxlan->gaddr.ip4,
 		.imr_ifindex		= vxlan->link,
 	};
 	int err;
@@ -616,8 +703,15 @@ static int vxlan_join_group(struct net_device *dev)
 
 	/* Need to drop RTNL to call multicast join */
 	rtnl_unlock();
-	lock_sock(sk);
-	err = ip_mc_join_group(sk, &mreq);
+	if (vxlan->gaddr.proto == htons(ETH_P_IP)) {
+		lock_sock(sk);
+		err = ip_mc_join_group(sk, &mreq);
+	} else {
+#if IS_ENABLED(CONFIG_IPV6)
+		lock_sock(sk);
+		err = ipv6_sock_mc_join(sk, vxlan->link, &vxlan->gaddr.ip6);
+#endif
+	}
 	release_sock(sk);
 	rtnl_lock();
 
@@ -633,7 +727,7 @@ static int vxlan_leave_group(struct net_device *dev)
 	int err = 0;
 	struct sock *sk = vn->sock->sk;
 	struct ip_mreqn mreq = {
-		.imr_multiaddr.s_addr	= vxlan->gaddr,
+		.imr_multiaddr.s_addr	= vxlan->gaddr.ip4,
 		.imr_ifindex		= vxlan->link,
 	};
 
@@ -643,8 +737,15 @@ static int vxlan_leave_group(struct net_device *dev)
 
 	/* Need to drop RTNL to call multicast leave */
 	rtnl_unlock();
-	lock_sock(sk);
-	err = ip_mc_leave_group(sk, &mreq);
+	if (vxlan->gaddr.proto == htons(ETH_P_IP)) {
+		lock_sock(sk);
+		err = ip_mc_leave_group(sk, &mreq);
+	} else {
+#if IS_ENABLED(CONFIG_IPV6)
+		lock_sock(sk);
+		err = ipv6_sock_mc_drop(sk, vxlan->link, &vxlan->gaddr.ip6);
+#endif
+	}
 	release_sock(sk);
 	rtnl_lock();
 
@@ -654,10 +755,12 @@ static int vxlan_leave_group(struct net_device *dev)
 /* Callback from net/ipv4/udp.c to receive packets */
 static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
-	struct iphdr *oip;
+	struct iphdr *oip = NULL;
+	struct ipv6hdr *oip6 = NULL;
 	struct vxlanhdr *vxh;
 	struct vxlan_dev *vxlan;
 	struct pcpu_tstats *stats;
+	struct vxlan_ip src_ip;
 	__u32 vni;
 	int err;
 
@@ -696,7 +799,13 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	skb_reset_mac_header(skb);
 
 	/* Re-examine inner Ethernet packet */
-	oip = ip_hdr(skb);
+	if (skb->protocol == htons(ETH_P_IP))
+		oip = ip_hdr(skb);
+#if IS_ENABLED(CONFIG_IPV6)
+	else
+		oip6 = ipv6_hdr(skb);
+#endif
+
 	skb->protocol = eth_type_trans(skb, vxlan->dev);
 
 	/* Ignore packet loops (and multicast echo) */
@@ -704,8 +813,19 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 			       vxlan->dev->dev_addr) == 0)
 		goto drop;
 
-	if (vxlan->flags & VXLAN_F_LEARN)
-		vxlan_snoop(skb->dev, oip->saddr, eth_hdr(skb)->h_source);
+	if (vxlan->flags & VXLAN_F_LEARN) {
+		if (oip) {
+			src_ip.ip4 = oip->saddr;
+			src_ip.proto = htons(ETH_P_IP);
+		}
+#if IS_ENABLED(CONFIG_IPV6)
+		if (oip6) {
+			src_ip.ip6 = oip6->saddr;
+			src_ip.proto = ETH_P_IPV6;
+		}
+#endif
+		vxlan_snoop(skb->dev, &src_ip, eth_hdr(skb)->h_source);
+	}
 
 	__skb_tunnel_rx(skb, vxlan->dev);
 	skb_reset_network_header(skb);
@@ -721,15 +841,32 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 
 	skb->encapsulation = 0;
 
-	err = IP_ECN_decapsulate(oip, skb);
-	if (unlikely(err)) {
-		if (log_ecn_error)
-			net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n",
-					     &oip->saddr, oip->tos);
-		if (err > 1) {
-			++vxlan->dev->stats.rx_frame_errors;
-			++vxlan->dev->stats.rx_errors;
-			goto drop;
+#if IS_ENABLED(CONFIG_IPV6)
+	if (oip6) {
+		err = IP6_ECN_decapsulate(oip6, skb);
+		if (unlikely(err)) {
+			if (log_ecn_error)
+				net_info_ratelimited("non-ECT from %pI4\n",
+						     &oip6->saddr);
+			if (err > 1) {
+				++vxlan->dev->stats.rx_frame_errors;
+				++vxlan->dev->stats.rx_errors;
+				goto drop;
+			}
+		}
+	}
+#endif
+	if (oip) {
+		err = IP_ECN_decapsulate(oip, skb);
+		if (unlikely(err)) {
+			if (log_ecn_error)
+				net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n",
+						     &oip->saddr, oip->tos);
+			if (err > 1) {
+				++vxlan->dev->stats.rx_frame_errors;
+				++vxlan->dev->stats.rx_errors;
+				goto drop;
+			}
 		}
 	}
 
@@ -760,6 +897,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
 	u8 *arpptr, *sha;
 	__be32 sip, tip;
 	struct neighbour *n;
+	struct vxlan_ip ipa;
 
 	if (dev->flags & IFF_NOARP)
 		goto out;
@@ -801,7 +939,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
 		}
 
 		f = vxlan_find_mac(vxlan, n->ha);
-		if (f && f->remote.remote_ip == htonl(INADDR_ANY)) {
+		if (f && vxlan_ip_any(&f->remote.remote_ip)) {
 			/* bridge-local neighbor */
 			neigh_release(n);
 			goto out;
@@ -819,8 +957,11 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
 
 		if (netif_rx_ni(reply) == NET_RX_DROP)
 			dev->stats.rx_dropped++;
-	} else if (vxlan->flags & VXLAN_F_L3MISS)
-		vxlan_ip_miss(dev, tip);
+	} else if (vxlan->flags & VXLAN_F_L3MISS) {
+		ipa.ip4 = tip;
+		ipa.proto = htons(ETH_P_IP);
+		vxlan_ip_miss(dev, &ipa);
+	}
 out:
 	consume_skb(skb);
 	return NETDEV_TX_OK;
@@ -842,6 +983,14 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 			return false;
 		pip = ip_hdr(skb);
 		n = neigh_lookup(&arp_tbl, &pip->daddr, dev);
+		if (!n && vxlan->flags & VXLAN_F_L3MISS) {
+			struct vxlan_ip ipa;
+			ipa.ip4 = pip->daddr;
+			ipa.proto = htons(ETH_P_IP);
+			vxlan_ip_miss(dev, &ipa);
+			return false;
+		}
+
 		break;
 	default:
 		return false;
@@ -858,8 +1007,8 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 		}
 		neigh_release(n);
 		return diff;
-	} else if (vxlan->flags & VXLAN_F_L3MISS)
-		vxlan_ip_miss(dev, pip->daddr);
+	}
+
 	return false;
 }
 
@@ -869,7 +1018,8 @@ static void vxlan_sock_free(struct sk_buff *skb)
 }
 
 /* On transmit, associate with the tunnel socket */
-static void vxlan_set_owner(struct net_device *dev, struct sk_buff *skb)
+static inline void vxlan_set_owner(struct net_device *dev,
+				   struct sk_buff *skb)
 {
 	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
 	struct sock *sk = vn->sock->sk;
@@ -917,13 +1067,20 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct rtable *rt;
-	const struct iphdr *old_iph;
+	const struct iphdr *old_iph = NULL;
 	struct iphdr *iph;
+	struct ipv6hdr *ip6h;
 	struct vxlanhdr *vxh;
 	struct udphdr *uh;
 	struct flowi4 fl4;
+#if IS_ENABLED(CONFIG_IPV6)
+	struct flowi6 fl6;
+	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
+	struct sock *sk = vn->sock->sk;
+#endif
 	unsigned int pkt_len = skb->len;
-	__be32 dst;
+	const struct vxlan_ip *dst;
+	struct dst_entry *ndst;
 	__u16 src_port, dst_port;
         u32 vni;
 	__be16 df = 0;
@@ -931,9 +1088,9 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 	dst_port = rdst->remote_port ? rdst->remote_port : vxlan_port;
 	vni = rdst->remote_vni;
-	dst = rdst->remote_ip;
+	dst = &rdst->remote_ip;
 
-	if (!dst) {
+	if (vxlan_ip_any(dst)) {
 		if (did_rsc) {
 			__skb_pull(skb, skb_network_offset(skb));
 			skb->ip_summed = CHECKSUM_NONE;
@@ -961,47 +1118,88 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		skb->encapsulation = 1;
 	}
 
-	/* Need space for new headers (invalidates iph ptr) */
-	if (skb_cow_head(skb, VXLAN_HEADROOM))
-		goto drop;
+	if (dst->proto == htons(ETH_P_IP)) {
+		/* Need space for new headers (invalidates iph ptr) */
+		if (skb_cow_head(skb, VXLAN_HEADROOM))
+			goto drop;
 
-	old_iph = ip_hdr(skb);
+		old_iph = ip_hdr(skb);
+		ttl = vxlan->ttl;
+		if (!ttl && IN_MULTICAST(ntohl(dst->ip4)))
+			ttl = 1;
 
-	ttl = vxlan->ttl;
-	if (!ttl && IN_MULTICAST(ntohl(dst)))
-		ttl = 1;
+		tos = vxlan->tos;
+		if (tos == 1)
+			tos = ip_tunnel_get_dsfield(old_iph, skb);
 
-	tos = vxlan->tos;
-	if (tos == 1)
-		tos = ip_tunnel_get_dsfield(old_iph, skb);
+		src_port = vxlan_src_port(vxlan, skb);
 
-	src_port = vxlan_src_port(vxlan, skb);
+		memset(&fl4, 0, sizeof(fl4));
+		fl4.flowi4_oif = rdst->remote_ifindex;
+		fl4.flowi4_tos = RT_TOS(tos);
+		fl4.daddr = dst->ip4;
+		fl4.saddr = vxlan->saddr.ip4;
 
-	memset(&fl4, 0, sizeof(fl4));
-	fl4.flowi4_oif = rdst->remote_ifindex;
-	fl4.flowi4_tos = RT_TOS(tos);
-	fl4.daddr = dst;
-	fl4.saddr = vxlan->saddr;
+		rt = ip_route_output_key(dev_net(dev), &fl4);
+		if (IS_ERR(rt)) {
+			netdev_dbg(dev, "no route to %pI4\n", &dst->ip4);
+			dev->stats.tx_carrier_errors++;
+			goto tx_error;
+		}
 
-	rt = ip_route_output_key(dev_net(dev), &fl4);
-	if (IS_ERR(rt)) {
-		netdev_dbg(dev, "no route to %pI4\n", &dst);
-		dev->stats.tx_carrier_errors++;
-		goto tx_error;
-	}
+		if (rt->dst.dev == dev) {
+			netdev_dbg(dev, "circular route to %pI4\n", &dst->ip4);
+			ip_rt_put(rt);
+			dev->stats.collisions++;
+			goto tx_error;
+		}
+		ndst = &rt->dst;
+		memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
+	} else {
+#if IS_ENABLED(CONFIG_IPV6)
+		const struct ipv6hdr *old_iph6;
+
+		/* Need space for new headers (invalidates iph ptr) */
+		if (skb_cow_head(skb, VXLAN6_HEADROOM))
+			goto drop;
 
-	if (rt->dst.dev == dev) {
-		netdev_dbg(dev, "circular route to %pI4\n", &dst);
-		ip_rt_put(rt);
-		dev->stats.collisions++;
-		goto tx_error;
+		old_iph6 = ipv6_hdr(skb);
+		ttl = vxlan->ttl;
+		if (!ttl && ipv6_addr_is_multicast(&dst->ip6))
+			ttl = 1;
+
+		tos = vxlan->tos;
+		if (tos == 1)
+			tos = ipv6_get_dsfield(old_iph6);
+
+		src_port = vxlan_src_port(vxlan, skb);
+
+		memset(&fl6, 0, sizeof(fl6));
+		fl6.flowi6_oif = vxlan->link;
+		fl6.flowi6_tos = RT_TOS(tos);
+		fl6.daddr = dst->ip6;
+		fl6.saddr = vxlan->saddr.ip6;
+		fl6.flowi6_proto = skb->protocol;
+
+		if (ip6_dst_lookup(sk, &ndst, &fl6)) {
+			netdev_dbg(dev, "no route to %pI6\n", &dst->ip6);
+			dev->stats.tx_carrier_errors++;
+			goto tx_error;
+		}
+
+		if (ndst->dev == dev) {
+			netdev_dbg(dev, "circular route to %pI6\n", &dst->ip6);
+			dst_release(ndst);
+			dev->stats.collisions++;
+			goto tx_error;
+		}
+#endif
 	}
 
-	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
 	IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
 			      IPSKB_REROUTED);
 	skb_dst_drop(skb);
-	skb_dst_set(skb, &rt->dst);
+	skb_dst_set(skb, ndst);
 
 	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
 	vxh->vx_flags = htonl(VXLAN_FLAGS);
@@ -1017,27 +1215,55 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	uh->len = htons(skb->len);
 	uh->check = 0;
 
-	__skb_push(skb, sizeof(*iph));
-	skb_reset_network_header(skb);
-	iph		= ip_hdr(skb);
-	iph->version	= 4;
-	iph->ihl	= sizeof(struct iphdr) >> 2;
-	iph->frag_off	= df;
-	iph->protocol	= IPPROTO_UDP;
-	iph->tos	= ip_tunnel_ecn_encap(tos, old_iph, skb);
-	iph->daddr	= dst;
-	iph->saddr	= fl4.saddr;
-	iph->ttl	= ttl ? : ip4_dst_hoplimit(&rt->dst);
-	tunnel_ip_select_ident(skb, old_iph, &rt->dst);
-
-	nf_reset(skb);
+	if (dst->proto == htons(ETH_P_IP)) {
+		__skb_push(skb, sizeof(*iph));
+		skb_reset_network_header(skb);
+		iph		= ip_hdr(skb);
+		iph->version	= 4;
+		iph->ihl	= sizeof(struct iphdr) >> 2;
+		iph->frag_off	= df;
+		iph->protocol	= IPPROTO_UDP;
+		iph->tos	= ip_tunnel_ecn_encap(tos, old_iph, skb);
+		iph->daddr	= dst->ip4;
+		iph->saddr	= fl4.saddr;
+		iph->ttl	= ttl ? : ip4_dst_hoplimit(ndst);
+		tunnel_ip_select_ident(skb, old_iph, ndst);
+	} else {
+#if IS_ENABLED(CONFIG_IPV6)
+		if (skb->ip_summed == CHECKSUM_PARTIAL) {
+			skb->csum_start = skb_transport_header(skb) - skb->head;
+			skb->csum_offset = offsetof(struct udphdr, check);
+		} else
+			uh->check = csum_ipv6_magic(&fl6.saddr, &fl6.daddr,
+						    skb->len, IPPROTO_UDP,
+						    csum_partial(uh, skb->len, 0));
+		__skb_push(skb, sizeof(*ip6h));
+		skb_reset_network_header(skb);
+		ip6h		  = ipv6_hdr(skb);
+		ip6h->version	  = 6;
+		ip6h->priority	  = 0;
+		ip6h->flow_lbl[0] = 0;
+		ip6h->flow_lbl[1] = 0;
+		ip6h->flow_lbl[2] = 0;
+		ip6h->payload_len = htons(skb->len);
+		ip6h->nexthdr     = IPPROTO_UDP;
+		ip6h->hop_limit   = ttl ? : ip6_dst_hoplimit(ndst);
+		ip6h->daddr	  = fl6.daddr;
+		ip6h->saddr	  = fl6.saddr;
+#endif
+	}
 
 	vxlan_set_owner(dev, skb);
 
 	if (handle_offloads(skb))
 		goto drop;
 
-	iptunnel_xmit(skb, dev);
+	if (dst->proto == htons(ETH_P_IP))
+		iptunnel_xmit(skb, dev);
+#if IS_ENABLED(CONFIG_IPV6)
+	else
+		ip6tunnel_xmit(skb, dev);
+#endif
 	return NETDEV_TX_OK;
 
 drop:
@@ -1084,7 +1310,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 		group.remote_next = 0;
 		rdst0 = &group;
 
-		if (group.remote_ip == htonl(INADDR_ANY) &&
+		if (vxlan_ip_any(&group.remote_ip) &&
 		    (vxlan->flags & VXLAN_F_L2MISS) &&
 		    !is_multicast_ether_addr(eth->h_dest))
 			vxlan_fdb_miss(vxlan, eth->h_dest);
@@ -1162,7 +1388,7 @@ static int vxlan_open(struct net_device *dev)
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	int err;
 
-	if (vxlan->gaddr) {
+	if (!vxlan_ip_any(&vxlan->gaddr)) {
 		err = vxlan_join_group(dev);
 		if (err)
 			return err;
@@ -1196,7 +1422,7 @@ static int vxlan_stop(struct net_device *dev)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 
-	if (vxlan->gaddr)
+	if (!vxlan_ip_any(&vxlan->gaddr))
 		vxlan_leave_group(dev);
 
 	del_timer_sync(&vxlan->age_timer);
@@ -1246,7 +1472,10 @@ static void vxlan_setup(struct net_device *dev)
 
 	eth_hw_addr_random(dev);
 	ether_setup(dev);
-	dev->hard_header_len = ETH_HLEN + VXLAN_HEADROOM;
+	if (vxlan->gaddr.proto == htons(ETH_P_IP))
+		dev->hard_header_len = ETH_HLEN + VXLAN_HEADROOM;
+	else
+		dev->hard_header_len = ETH_HLEN + VXLAN6_HEADROOM;
 
 	dev->netdev_ops = &vxlan_netdev_ops;
 	dev->destructor = vxlan_free;
@@ -1283,8 +1512,10 @@ static void vxlan_setup(struct net_device *dev)
 static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_ID]		= { .type = NLA_U32 },
 	[IFLA_VXLAN_GROUP]	= { .len = FIELD_SIZEOF(struct iphdr, daddr) },
+	[IFLA_VXLAN_GROUP6]	= { .len = sizeof(struct in6_addr) },
 	[IFLA_VXLAN_LINK]	= { .type = NLA_U32 },
 	[IFLA_VXLAN_LOCAL]	= { .len = FIELD_SIZEOF(struct iphdr, saddr) },
+	[IFLA_VXLAN_LOCAL6]	= { .len = sizeof(struct in6_addr) },
 	[IFLA_VXLAN_TOS]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_TTL]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_LEARNING]	= { .type = NLA_U8 },
@@ -1326,6 +1557,13 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
 			pr_debug("group address is not IPv4 multicast\n");
 			return -EADDRNOTAVAIL;
 		}
+	} else if (data[IFLA_VXLAN_GROUP6]) {
+		struct in6_addr gaddr;
+		nla_memcpy(&gaddr, data[IFLA_VXLAN_GROUP6], sizeof(struct in6_addr));
+		if (!ipv6_addr_is_multicast(&gaddr)) {
+			pr_debug("group address is not IPv6 multicast\n");
+			return -EADDRNOTAVAIL;
+		}
 	}
 
 	if (data[IFLA_VXLAN_PORT_RANGE]) {
@@ -1371,11 +1609,21 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 	}
 	vxlan->vni = vni;
 
-	if (data[IFLA_VXLAN_GROUP])
-		vxlan->gaddr = nla_get_be32(data[IFLA_VXLAN_GROUP]);
+	if (data[IFLA_VXLAN_GROUP]) {
+		vxlan->gaddr.ip4 = nla_get_be32(data[IFLA_VXLAN_GROUP]);
+		vxlan->gaddr.proto = htons(ETH_P_IP);
+	} else if (data[IFLA_VXLAN_GROUP6]) {
+		nla_memcpy(&vxlan->gaddr.ip6, data[IFLA_VXLAN_GROUP6], sizeof(struct in6_addr));
+		vxlan->gaddr.proto = htons(ETH_P_IPV6);
+	}
 
-	if (data[IFLA_VXLAN_LOCAL])
-		vxlan->saddr = nla_get_be32(data[IFLA_VXLAN_LOCAL]);
+	if (data[IFLA_VXLAN_LOCAL]) {
+		vxlan->saddr.ip4 = nla_get_be32(data[IFLA_VXLAN_GROUP]);
+		vxlan->saddr.proto = htons(ETH_P_IP);
+	} else if (data[IFLA_VXLAN_LOCAL6]) {
+		nla_memcpy(&vxlan->saddr.ip6, data[IFLA_VXLAN_GROUP6], sizeof(struct in6_addr));
+		vxlan->saddr.proto = htons(ETH_P_IPV6);
+	}
 
 	if (data[IFLA_VXLAN_LINK] &&
 	    (vxlan->link = nla_get_u32(data[IFLA_VXLAN_LINK]))) {
@@ -1453,9 +1701,9 @@ static size_t vxlan_get_size(const struct net_device *dev)
 {
 
 	return nla_total_size(sizeof(__u32)) +	/* IFLA_VXLAN_ID */
-		nla_total_size(sizeof(__be32)) +/* IFLA_VXLAN_GROUP */
+		nla_total_size(sizeof(struct in6_addr)) + /* IFLA_VXLAN_GROUP{6} */
 		nla_total_size(sizeof(__u32)) +	/* IFLA_VXLAN_LINK */
-		nla_total_size(sizeof(__be32))+	/* IFLA_VXLAN_LOCAL */
+		nla_total_size(sizeof(struct in6_addr)) + /* IFLA_VXLAN_LOCAL{6} */
 		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_TTL */
 		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_TOS */
 		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_LEARNING */
@@ -1480,14 +1728,28 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	if (nla_put_u32(skb, IFLA_VXLAN_ID, vxlan->vni))
 		goto nla_put_failure;
 
-	if (vxlan->gaddr && nla_put_be32(skb, IFLA_VXLAN_GROUP, vxlan->gaddr))
-		goto nla_put_failure;
+	if (!vxlan_ip_any(&vxlan->gaddr)) {
+		if (vxlan->gaddr.proto == htons(ETH_P_IP)) {
+			if (nla_put_be32(skb, IFLA_VXLAN_GROUP, vxlan->gaddr.ip4))
+				goto nla_put_failure;
+		} else {
+			if (nla_put(skb, IFLA_VXLAN_GROUP6, sizeof(struct in6_addr), &vxlan->gaddr.ip6))
+				goto nla_put_failure;
+		}
+	}
 
 	if (vxlan->link && nla_put_u32(skb, IFLA_VXLAN_LINK, vxlan->link))
 		goto nla_put_failure;
 
-	if (vxlan->saddr && nla_put_be32(skb, IFLA_VXLAN_LOCAL, vxlan->saddr))
-		goto nla_put_failure;
+	if (!vxlan_ip_any(&vxlan->saddr)) {
+		if (vxlan->saddr.proto == htons(ETH_P_IP)) {
+			if (nla_put_be32(skb, IFLA_VXLAN_LOCAL, vxlan->saddr.ip4))
+				goto nla_put_failure;
+		} else {
+			if (nla_put(skb, IFLA_VXLAN_LOCAL6, sizeof(struct in6_addr), &vxlan->saddr.ip6))
+				goto nla_put_failure;
+		}
+	}
 
 	if (nla_put_u8(skb, IFLA_VXLAN_TTL, vxlan->ttl) ||
 	    nla_put_u8(skb, IFLA_VXLAN_TOS, vxlan->tos) ||
@@ -1526,38 +1788,82 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
 	.fill_info	= vxlan_fill_info,
 };
 
-static __net_init int vxlan_init_net(struct net *net)
+/* Create UDP socket for encapsulation receive. AF_INET6 sockets
+ * could be used for both IPv4 and IPv6 communications.
+ */
+#if IS_ENABLED(CONFIG_IPV6)
+static __net_init int create_sock(struct net *net, struct sock **sk)
+{
+	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
+	struct sockaddr_in6 vxlan_addr = {
+		.sin6_family = AF_INET6,
+		.sin6_port = htons(vxlan_port),
+	};
+	int rc;
+
+	rc = sock_create_kern(AF_INET6, SOCK_DGRAM, IPPROTO_UDP, &vn->sock);
+	if (rc < 0) {
+		pr_debug("UDP socket create failed\n");
+		return rc;
+	}
+	/* Put in proper namespace */
+	*sk = vn->sock->sk;
+	sk_change_net(*sk, net);
+
+	rc = kernel_bind(vn->sock, (struct sockaddr *)&vxlan_addr,
+			 sizeof(struct sockaddr_in6));
+	if (rc < 0) {
+		pr_debug("bind for UDP socket %pI6:%u (%d)\n",
+			 &vxlan_addr.sin6_addr, ntohs(vxlan_addr.sin6_port), rc);
+		sk_release_kernel(*sk);
+		vn->sock = NULL;
+		return rc;
+	}
+	return 0;
+}
+#else
+static __net_init int create_sock(struct net *net, struct sock **sk)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
-	struct sock *sk;
 	struct sockaddr_in vxlan_addr = {
 		.sin_family = AF_INET,
+		.sin_port = htons(vxlan_port),
 		.sin_addr.s_addr = htonl(INADDR_ANY),
 	};
 	int rc;
-	unsigned h;
 
-	/* Create UDP socket for encapsulation receive. */
 	rc = sock_create_kern(AF_INET, SOCK_DGRAM, IPPROTO_UDP, &vn->sock);
 	if (rc < 0) {
 		pr_debug("UDP socket create failed\n");
 		return rc;
 	}
 	/* Put in proper namespace */
-	sk = vn->sock->sk;
-	sk_change_net(sk, net);
+	*sk = vn->sock->sk;
+	sk_change_net(*sk, net);
 
-	vxlan_addr.sin_port = htons(vxlan_port);
-
-	rc = kernel_bind(vn->sock, (struct sockaddr *) &vxlan_addr,
-			 sizeof(vxlan_addr));
+	rc = kernel_bind(vn->sock, (struct sockaddr *)&vxlan_addr,
+			 sizeof(struct sockaddr_in));
 	if (rc < 0) {
 		pr_debug("bind for UDP socket %pI4:%u (%d)\n",
 			 &vxlan_addr.sin_addr, ntohs(vxlan_addr.sin_port), rc);
-		sk_release_kernel(sk);
+		sk_release_kernel(*sk);
 		vn->sock = NULL;
 		return rc;
 	}
+	return 0;
+}
+#endif
+
+static __net_init int vxlan_init_net(struct net *net)
+{
+	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
+	struct sock *sk;
+	int rc;
+	unsigned h;
+
+	rc = create_sock(net, &sk);
+	if (rc < 0)
+		return rc;
 
 	/* Disable multicast loopback */
 	inet_sk(sk)->mc_loop = 0;
@@ -1566,6 +1872,9 @@ static __net_init int vxlan_init_net(struct net *net)
 	udp_sk(sk)->encap_type = 1;
 	udp_sk(sk)->encap_rcv = vxlan_udp_encap_recv;
 	udp_encap_enable();
+#if IS_ENABLED(CONFIG_IPV6)
+	udpv6_encap_enable();
+#endif
 
 	for (h = 0; h < VNI_HASH_SIZE; ++h)
 		INIT_HLIST_HEAD(&vn->vni_list[h]);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index c4edfe1..0eee00f 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -308,6 +308,8 @@ enum {
 	IFLA_VXLAN_RSC,
 	IFLA_VXLAN_L2MISS,
 	IFLA_VXLAN_L3MISS,
+	IFLA_VXLAN_GROUP6,
+	IFLA_VXLAN_LOCAL6,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
-- 
1.7.7.6

^ permalink raw reply related

* [Patch net-next v1 2/4] ipv6: export ipv6_sock_mc_join and ipv6_sock_mc_drop
From: Cong Wang @ 2013-03-31  5:43 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, David S. Miller, Cong Wang
In-Reply-To: <1364708625-29063-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>

They will be used by vxlan module.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 net/ipv6/mcast.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index bfa6cc3..d03426d 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -200,6 +200,7 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
 
 	return 0;
 }
+EXPORT_SYMBOL(ipv6_sock_mc_join);
 
 /*
  *	socket leave on multicast group
@@ -246,6 +247,7 @@ int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
 
 	return -EADDRNOTAVAIL;
 }
+EXPORT_SYMBOL(ipv6_sock_mc_drop);
 
 /* called with rcu_read_lock() */
 static struct inet6_dev *ip6_mc_find_dev_rcu(struct net *net,
-- 
1.7.7.6

^ permalink raw reply related

* [Patch net-next v1 1/4] vxlan: defer vxlan init as late as possible
From: Cong Wang @ 2013-03-31  5:43 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, David S. Miller, Cong Wang

From: Cong Wang <amwang@redhat.com>

When vxlan is compiled as builtin, its init code
runs before IPv6 init, this could cause problems
if we create IPv6 socket in the latter patch.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 drivers/net/vxlan.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 62a4438..cac4e4f 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1619,7 +1619,7 @@ out2:
 out1:
 	return rc;
 }
-module_init(vxlan_init_module);
+late_initcall(vxlan_init_module);

 static void __exit vxlan_cleanup_module(void)
 {
-- 
1.7.7.6

^ permalink raw reply related

* Re: RX/dropped counter values for tagged packets
From: Ani Sinha @ 2013-03-31  3:30 UTC (permalink / raw)
  To: netdev, linuxdev
In-Reply-To: <CAOxq_8OrE6=XtJ9TDbL7NH9313-WutZPZP8=cRBH7pVUdBDvCg@mail.gmail.com>

Hi guys,

any feedback on this will be highly appreciated.

thanks,

ani


On Sun, Mar 24, 2013 at 9:43 PM, Ani Sinha <ani@aristanetworks.com> wrote:
> hello everyone :
>
> Please bear with me if you think I said something wrong.
>
> I was looking at the netif_receive_skb() code and it seems to me that
> for vlan tagged packets, the counters are manipulated at two different
> places. One is from vlan_do_receive() where the receive counters are
> incremented. The other counter (the dropped counter) is incremented at
> the very end of netif_receive_skb() when there are no takers for this
> packet. The dropped counts are incremented for skb->dev->rx_dropped
> whereas the receive counts are incremented for the vlan device
> specific private stat counter values. When the numbers are reported on
> /proc/net/dev (from where ifconfig gets the numbers), these two stat
> counters are combined together (see dev_get_stats() where rx_dropped
> is added to the device private stat numbers). So if no one is
> receiving the tagged packets, it will be reported both as received and
> dropped! I wonder whether I am analyzing it wrong or whether no one
> has noticed this before or whether this is intentional.
>
> Any  clarification will be greatly appreciated. Again, I could be
> wrong looking at the code, so please bear with me.
>
> Thanks
>
> ani

^ permalink raw reply

* Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken
From: Ben Hutchings @ 2013-03-31  2:10 UTC (permalink / raw)
  To: Huang, Xiong
  Cc: Anders Boström, netdev@vger.kernel.org,
	565404@bugs.debian.org,
	Hannes Frederic Sowa (hannes@stressinduktion.org)
In-Reply-To: <157393863283F442885425D2C45428564F202477@nasanexd02f.na.qualcomm.com>

[-- Attachment #1: Type: text/plain, Size: 1351 bytes --]

On Sun, 2013-03-31 at 02:18 +0000, Huang, Xiong wrote:
> > > >
> > > > I've tested with
> > > > Debian linux-image-2.6.26-2-amd64 version 2.6.26-19lenny2, Debian
> > > > linux-image-2.6.30-bpo.2-amd64 version 2.6.30-8~bpo50+2 and
> > > > kernel.org
> > > > 2.6.30.10 amd64 with ethtool patch for setting of tso. Same result.
> > >
> > > Does booting with the kernel parameter 'pci=nomsi' avoid the problem?
> > >
> > 
> > Hannes has found DMA-write (for rx-packet)  is abnormal due to msi function.
> > But TSO is for rx-packet, an opposite direction. I'm not sure :(, If someone
> > has this issue,  he/she could have a try.
> > 
> 
> I checked windows driver, it does limit  the max packet length for TSO
> windows XP : 32*1024 bytes (include MAC header and all MAC payload). No support IP/TCP option.
> Windows 7:  15, 000 bytes, support IP/TCP option.

If TSO on these devices don't work properly with TCP options then you're
just going to have to disable it - Linux requires it to support at least
the timestamp option.  I'm not sure about IP options (this really ought
to be documented).

If there's a length limit lower than 64K, you'll need to set the limit
using netif_set_gso_max_size() before registering the net device.

Ben.

-- 
Ben Hutchings
Teamwork is essential - it allows you to blame someone else.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* RE: Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken
From: Huang, Xiong @ 2013-03-31  1:18 UTC (permalink / raw)
  To: Ben Hutchings, Anders Boström
  Cc: netdev@vger.kernel.org, 565404@bugs.debian.org,
	Hannes Frederic Sowa (hannes@stressinduktion.org)
In-Reply-To: <1364689558.3557.22.camel@deadeye.wl.decadent.org.uk>

> > >
> > > I've tested with
> > > Debian linux-image-2.6.26-2-amd64 version 2.6.26-19lenny2, Debian
> > > linux-image-2.6.30-bpo.2-amd64 version 2.6.30-8~bpo50+2 and
> > > kernel.org
> > > 2.6.30.10 amd64 with ethtool patch for setting of tso. Same result.
> >
> > Does booting with the kernel parameter 'pci=nomsi' avoid the problem?
> >
> 
> Hannes has found DMA-write (for rx-packet)  is abnormal due to msi function.
> But TSO is for rx-packet, an opposite direction. I'm not sure :(, If someone
> has this issue,  he/she could have a try.
> 

I checked windows driver, it does limit  the max packet length for TSO
windows XP : 32*1024 bytes (include MAC header and all MAC payload). No support IP/TCP option.
Windows 7:  15, 000 bytes, support IP/TCP option.

Thanks
Xiong


^ permalink raw reply

* RE: Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken
From: Huang, Xiong @ 2013-03-31  0:43 UTC (permalink / raw)
  To: Ben Hutchings, Anders Boström
  Cc: netdev@vger.kernel.org, 565404@bugs.debian.org,
	Hannes Frederic Sowa (hannes@stressinduktion.org)
In-Reply-To: <1364689558.3557.22.camel@deadeye.wl.decadent.org.uk>

> >
> > I've tested with
> > Debian linux-image-2.6.26-2-amd64 version 2.6.26-19lenny2, Debian
> > linux-image-2.6.30-bpo.2-amd64 version 2.6.30-8~bpo50+2 and kernel.org
> > 2.6.30.10 amd64 with ethtool patch for setting of tso. Same result.
> 
> Does booting with the kernel parameter 'pci=nomsi' avoid the problem?
> 

Hannes has found DMA-write (for rx-packet)  is abnormal due to msi function. 
But TSO is for rx-packet, an opposite direction. I'm not sure :(,
If someone has this issue,  he/she could have a try.

Thanks
Xiong

^ permalink raw reply

* Re: Bug#565404: linux-image-2.6.26-2-amd64: atl1e: TSO is broken
From: Ben Hutchings @ 2013-03-31  0:25 UTC (permalink / raw)
  To: Anders Boström; +Cc: Jie.Yang, netdev, 565404, Xiong.Huang
In-Reply-To: <20100126.093439.651115405791675606.anders@netinsight.net>

[-- Attachment #1: Type: text/plain, Size: 1583 bytes --]

On Tue, 2010-01-26 at 09:34 +0100, Anders Boström wrote:
> >>>>> "JY" == Jie Yang <Jie.Yang@Atheros.com> writes:
> 
>  JY> Anders Boström <anders@netinsight.net> wrote:
> 
>  JY> following is my test cese,
>  >> 
>  JY> a nfs server server with ar8131chip, device id 1063.
>  >> export /tmp/ dir as the nfs share directory,  JY> the client,
>  >> mount the server_ip:/tmp to local dir /mnt/nfs, ust a python
>  >> script to write and read data on the  JY>
>  >> /mnt/nfs/testnfs.log. it works fine.
>  >> 
>  >> OK, the device-ID in our NFS-server is 1026, rev. b0. So it
>  >> is possible that the problem is specific to that chip/version.
>  JY> oops, its my mistake in writing, my case is 1026 device ID
> 
>  >> 
>  JY> Can you give me some advice on how to reproduce this bug??
>  >> 
>  >> The only suggestion I have is to try to find a board with a
>  >> 1026-chip on it.
>  >> 
>  >> My test-case is just copy of a 1 Gbyte file from the
>  >> NFS-server to /dev/null , after making sure that the file
>  >> isn't cached on the client by reading huge amounts of other data.
>  >> 
>  JY> just to check, if the kernel version is 2.6.26-2 ??
> 
> I've tested with
> Debian linux-image-2.6.26-2-amd64 version 2.6.26-19lenny2,
> Debian linux-image-2.6.30-bpo.2-amd64 version 2.6.30-8~bpo50+2 and
> kernel.org 2.6.30.10 amd64 with ethtool patch for setting of tso. Same
> result.

Does booting with the kernel parameter 'pci=nomsi' avoid the problem?

Ben.

-- 
Ben Hutchings
Teamwork is essential - it allows you to blame someone else.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Fw: [Bug 55951] New: usbatm BUG when killing pppd or br2684ctl
From: Stephen Hemminger @ 2013-03-31  0:06 UTC (permalink / raw)
  To: Chas Williams; +Cc: linux-atm-general, netdev



Begin forwarded message:

Date: Fri, 29 Mar 2013 17:45:31 -0700
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: [Bug 55951] New: usbatm BUG when killing pppd or br2684ctl


https://bugzilla.kernel.org/show_bug.cgi?id=55951

           Summary: usbatm BUG when killing pppd or br2684ctl
           Product: Networking
           Version: 2.5
    Kernel Version: 3.8.2
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
        AssignedTo: shemminger@linux-foundation.org
        ReportedBy: jd.girard@sysnux.pf
        Regression: Yes


Created an attachment (id=96631)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=96631)
Kernel BUG

The system is an alix2d2 acting as my Internet router through an USB ADSL modem
(DLink DSL200); xusbatm is driving the modem, but it needs a patched version of
usbatm (patch was written by Roman Kagan). I've been using this setup since
November 2011 (kernel 2.6.36 or 2.6.37), regularly updating the kernel without
trouble, so I think this is a regression.

The problem only occurs when killing pppd or br2684ctl (done in a script when
connection is lost), otherwise the system works fine:
[xxxxx@ix100 ~]$ uptime 
 14:34:36 up 9 days,  8:48,  1 user,  load average: 0.20, 0.13, 0.14

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply

* Re: [PATCH net-next] doc: packet: add minimal TPACKET_V3 example code
From: David Miller @ 2013-03-30 21:40 UTC (permalink / raw)
  To: dborkman; +Cc: eric.dumazet, netdev, eric, victor
In-Reply-To: <5155B82A.2050701@redhat.com>

From: Daniel Borkmann <dborkman@redhat.com>
Date: Fri, 29 Mar 2013 16:50:02 +0100

> On 03/29/2013 04:42 PM, Eric Dumazet wrote:
>> On Fri, 2013-03-29 at 16:36 +0100, Daniel Borkmann wrote:
>>> Lost in space for a long time, but it finally came back to us from
>>> some ancient code tombs. This patch adds a minimal runnable example
>>> of Linux' packet mmap(2) from Chetan Loke's TPACKET_V3. Special
>>> thanks to David S. Miller, and also Eric Leblond and Victor Julien!
>>>
>>> Cc: Eric Leblond <eric@regit.org>
>>> Cc: Victor Julien <victor@inliniac.net>
>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>> ---
>>>   Documentation/networking/packet_mmap.txt | 327
>>>   +++++++++++++++++++++++++++++++
>>>   1 file changed, 327 insertions(+)
>>
>> tools/testing/selftests/net  seems a better place to get another
>> program ?
> 
> Hm, well. Actually, it should serve as nothing more than a code
> example that does nothing useful except shows how to use it. Just
> like the packet fanout example in the same file. It should just
> provide a starting point for people to grab it and build something
> bigger out of it.

Agreed, and applied, thanks Daniel.

^ permalink raw reply

* Re: [PATCH net-next] macvlan: use the right RCU api
From: David Miller @ 2013-03-30 21:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1364674124.5113.132.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 30 Mar 2013 13:08:44 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Make sure we use proper API to fetch dev->rx_handler_data,
> instead of ugly casts.
> 
> Rename macvlan_port_get() to macvlan_port_get_rtnl() to document fact
> that we hold RTNL when needed, with lockdep support.
> 
> This removes sparse warnings as well (CONFIG_SPARSE_RCU_POINTER=y)
> 
> CHECK   drivers/net/macvlan.c
> drivers/net/macvlan.c:706:37: warning: cast removes address space of expression
> drivers/net/macvlan.c:775:16: warning: cast removes address space of expression
> drivers/net/macvlan.c:924:16: warning: cast removes address space of expression
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: reorder some fields of net_device
From: David Miller @ 2013-03-30 21:31 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1364661063.5113.116.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 30 Mar 2013 09:31:03 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> As time passed, some fields were added in net_device, and not
> at sensible offsets.
> 
> Lets reorder some fields to reduce number of cache lines in RX path.
> Fields not used in data path should be moved out of this critical cache
> line.
> 
> In particular, move broadcast[] to the end of the rx section,
> as it is less used, and ethernet uses only the beginning of the 32bytes
> field.
 ...
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] cdc_ncm: return -ENOMEM if kzalloc fails
From: David Miller @ 2013-03-30 21:31 UTC (permalink / raw)
  To: devendra.aaru; +Cc: oliver, linux-usb, netdev
In-Reply-To: <1364634202-10341-1-git-send-email-devendra.aaru@gmail.com>

From: Devendra Naga <devendra.aaru@gmail.com>
Date: Sat, 30 Mar 2013 14:33:22 +0530

> return -ENOMEM instead if kzalloc of cdc_ncm_ctx structure is failed.
> 
> and also remove the comparision of ctx structure with NULL and make
> it as !ctx.
> 
> Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox