Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 0/5] stmmac: pci: various cleanups and fixes
From: Andy Shevchenko @ 2014-10-30  9:41 UTC (permalink / raw)
  To: Giuseppe CAVALLARO
  Cc: netdev, Kweh Hock Leong, David S. Miller, Vince Bridgers
In-Reply-To: <5451F32C.8060403@st.com>

On Thu, 2014-10-30 at 09:13 +0100, Giuseppe CAVALLARO wrote:
> On 10/21/2014 6:35 PM, Andy Shevchenko wrote:
> > There are few cleanups and fixes regarding to stmmac PCI driver.
> > This has been tested on Intel Galileo board with 3.18-rc1 kernel.
> 
> Hello Andy,
> 
> for your next version I think that the patches should be for net-next
> tree.
> Maybe the following for net.git: "stmmac: pci: set default filter bins"
> 
> I kindly ask you to detail fix and cleanup.

I just send the fix as a separate patch.

I'm going to reshuffle the others and maybe introduce few more. I would
like to get this soon since it would be a good base to go with Quark
support further.

> 
> peppe
> 
> >
> > Andy Shevchenko (5):
> >    stmmac: pci: convert to use dev_pm_ops
> >    stmmac: pci: use managed resources
> >    stmmac: pci: convert to use dev_* macros
> >    stmmac: pci: set default filter bins
> >    stmmac: pci: remove FSF address
> >
> >   drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 92 ++++++++----------------
> >   1 file changed, 30 insertions(+), 62 deletions(-)
> >
> 

-- 
Andy Shevchenko <andriy.shevchenko@intel.com>
Intel Finland Oy

^ permalink raw reply

* Re: [PATCH v1 1/2] dtb: xgene: fix: Disable 10GbE and SGMII based 1GbE by default
From: Arnd Bergmann @ 2014-10-30 10:13 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Iyappan Subramanian, davem, netdev, devicetree, kchudgar, patches
In-Reply-To: <1414630580-24640-2-git-send-email-isubramanian@apm.com>

On Wednesday 29 October 2014 17:56:19 Iyappan Subramanian wrote:
> @@ -621,7 +621,7 @@
>                         };
>                 };
>  
> -               sgenet0: ethernet@1f210000 {
> +               sgenet0: sgenet@1f210000 {
>                         compatible = "apm,xgene-enet";
>                         status = "disabled";
>                         reg = <0x0 0x1f210000 0x0 0x10000>,
> 

This looks like you accidentally reverted a bug fix made earlier.
Network devices should always have the name 'ethernet@...'.

	Arnd

^ permalink raw reply

* Re: [net-next 1/2] sctp: add transport state in /proc/net/sctp/remaddr
From: Neil Horman @ 2014-10-30 10:24 UTC (permalink / raw)
  To: Michele Baldessari; +Cc: Vlad Yasevich, linux-sctp, netdev, David S. Miller
In-Reply-To: <1414661356-17255-1-git-send-email-michele@acksyn.org>

On Thu, Oct 30, 2014 at 10:29:15AM +0100, Michele Baldessari wrote:
> It is often quite helpful to be able to know the state of a transport
> outside of the application itself (for troubleshooting purposes or for
> monitoring purposes). Add it under /proc/net/sctp/remaddr.
> 
> Signed-off-by: Michele Baldessari <michele@acksyn.org>
> ---
>  net/sctp/proc.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> index 34229ee7f379..bfb242af06ab 100644
> --- a/net/sctp/proc.c
> +++ b/net/sctp/proc.c
> @@ -417,7 +417,7 @@ static void *sctp_remaddr_seq_start(struct seq_file *seq, loff_t *pos)
>  
>  	if (*pos == 0)
>  		seq_printf(seq, "ADDR ASSOC_ID HB_ACT RTO MAX_PATH_RTX "
> -				"REM_ADDR_RTX  START\n");
> +				"REM_ADDR_RTX START STATE\n");
>  
>  	return (void *)pos;
>  }
> @@ -497,7 +497,13 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
>  			 * currently implemented, but we can record it with a
>  			 * jiffies marker in a subsequent patch
>  			 */
> -			seq_printf(seq, "0");
> +			seq_printf(seq, "0 ");
> +
> +			/*
> +			 * The current state of this destination. I.e.
> +			 * SCTP_ACTIVE, SCTP_INACTIVE, ...
> +			 */
> +			seq_printf(seq, "%d", tsp->state);
>  
>  			seq_printf(seq, "\n");
>  		}
> -- 
> 2.1.0
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* [PATCH] net: gianfar: fix dma check map error when DMA_API_DEBUG is enabled
From: Kevin Hao @ 2014-10-30 10:25 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, Claudiu Manoil

We need to use dma_mapping_error() to check the dma address returned
by dma_map_single/page(). Otherwise we would get warning like this:
  WARNING: at lib/dma-debug.c:1140
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc2-next-20141029 #196
  task: c0834300 ti: effe6000 task.ti: c0874000
  NIP: c02b2c98 LR: c02b2c98 CTR: c030abc4
  REGS: effe7d70 TRAP: 0700   Not tainted  (3.18.0-rc2-next-20141029)
  MSR: 00021000 <CE,ME>  CR: 22044022  XER: 20000000

  GPR00: c02b2c98 effe7e20 c0834300 00000098 00021000 00000000 c030b898 00000003
  GPR08: 00000001 00000000 00000001 749eec9d 22044022 1001abe0 00000020 ef278678
  GPR16: ef278670 ef278668 ef278660 070a8040 c087f99c c08cdc60 00029000 c0840d44
  GPR24: c08be6e8 c0840000 effe7e78 ef041340 00000600 ef114e10 00000000 c08be6e0
  NIP [c02b2c98] check_unmap+0x51c/0x9e4
  LR [c02b2c98] check_unmap+0x51c/0x9e4
  Call Trace:
  [effe7e20] [c02b2c98] check_unmap+0x51c/0x9e4 (unreliable)
  [effe7e70] [c02b31d8] debug_dma_unmap_page+0x78/0x8c
  [effe7ed0] [c03d1640] gfar_clean_rx_ring+0x208/0x488
  [effe7f40] [c03d1a9c] gfar_poll_rx_sq+0x3c/0xa8
  [effe7f60] [c04f8714] net_rx_action+0xc0/0x178
  [effe7f90] [c00435a0] __do_softirq+0x100/0x1fc
  [effe7fe0] [c0043958] irq_exit+0xa4/0xc8
  [effe7ff0] [c000d14c] call_do_irq+0x24/0x3c
  [c0875e90] [c00048a0] do_IRQ+0x8c/0xf8
  [c0875eb0] [c000ed10] ret_from_except+0x0/0x18

For TX, we need to unmap the pages which has already been mapped and
free the skb before return. For RX, just let the rxbdp as unempty.
We can retry to initialize it to empty in next round.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
---
 drivers/net/ethernet/freescale/gianfar.c | 58 ++++++++++++++++++++++++++------
 1 file changed, 47 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 4fdf0aa16978..04b647c4cef6 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -117,8 +117,8 @@ static void gfar_reset_task(struct work_struct *work);
 static void gfar_timeout(struct net_device *dev);
 static int gfar_close(struct net_device *dev);
 struct sk_buff *gfar_new_skb(struct net_device *dev);
-static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
-			   struct sk_buff *skb);
+static int gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
+			  struct sk_buff *skb);
 static int gfar_set_mac_address(struct net_device *dev);
 static int gfar_change_mtu(struct net_device *dev, int new_mtu);
 static irqreturn_t gfar_error(int irq, void *dev_id);
@@ -219,9 +219,13 @@ static int gfar_init_bds(struct net_device *ndev)
 					netdev_err(ndev, "Can't allocate RX buffers\n");
 					return -ENOMEM;
 				}
-				rx_queue->rx_skbuff[j] = skb;
 
-				gfar_new_rxbdp(rx_queue, rxbdp, skb);
+				if (gfar_new_rxbdp(rx_queue, rxbdp, skb)) {
+					dev_kfree_skb_any(skb);
+					skb = NULL;
+				}
+
+				rx_queue->rx_skbuff[j] = skb;
 			}
 
 			rxbdp++;
@@ -2290,6 +2294,8 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 						   0,
 						   frag_len,
 						   DMA_TO_DEVICE);
+			if (unlikely(dma_mapping_error(priv->dev, bufaddr)))
+				goto dma_map_err;
 
 			/* set the TxBD length and buffer pointer */
 			txbdp->bufPtr = bufaddr;
@@ -2339,8 +2345,12 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		fcb->ptp = 1;
 	}
 
-	txbdp_start->bufPtr = dma_map_single(priv->dev, skb->data,
-					     skb_headlen(skb), DMA_TO_DEVICE);
+	bufaddr = dma_map_single(priv->dev, skb->data, skb_headlen(skb),
+				 DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(priv->dev, bufaddr)))
+		goto dma_map_err;
+
+	txbdp_start->bufPtr = bufaddr;
 
 	/* If time stamping is requested one additional TxBD must be set up. The
 	 * first TxBD points to the FCB and must have a data length of
@@ -2406,6 +2416,25 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	spin_unlock_irqrestore(&tx_queue->txlock, flags);
 
 	return NETDEV_TX_OK;
+
+dma_map_err:
+	txbdp = next_txbd(txbdp_start, base, tx_queue->tx_ring_size);
+	if (do_tstamp)
+		txbdp = next_txbd(txbdp, base, tx_queue->tx_ring_size);
+	for (i = 0; i < nr_frags; i++) {
+		lstatus = txbdp->lstatus;
+		if (!(lstatus & BD_LFLAG(TXBD_READY)))
+			break;
+
+		txbdp->lstatus = lstatus & ~BD_LFLAG(TXBD_READY);
+		bufaddr = txbdp->bufPtr;
+		dma_unmap_page(priv->dev, bufaddr, txbdp->length,
+			       DMA_TO_DEVICE);
+		txbdp = next_txbd(txbdp, base, tx_queue->tx_ring_size);
+	}
+	gfar_wmb();
+	dev_kfree_skb_any(skb);
+	return NETDEV_TX_OK;
 }
 
 /* Stops the kernel queue, and halts the controller */
@@ -2606,8 +2635,8 @@ static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 	netdev_tx_completed_queue(txq, howmany, bytes_sent);
 }
 
-static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
-			   struct sk_buff *skb)
+static int gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
+			  struct sk_buff *skb)
 {
 	struct net_device *dev = rx_queue->dev;
 	struct gfar_private *priv = netdev_priv(dev);
@@ -2615,7 +2644,11 @@ static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
 
 	buf = dma_map_single(priv->dev, skb->data,
 			     priv->rx_buffer_size, DMA_FROM_DEVICE);
+	if (dma_mapping_error(priv->dev, buf))
+		return -1;
+
 	gfar_init_rxbdp(rx_queue, bdp, buf);
+	return 0;
 }
 
 static struct sk_buff *gfar_alloc_skb(struct net_device *dev)
@@ -2851,10 +2884,13 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit)
 
 		}
 
-		rx_queue->rx_skbuff[rx_queue->skb_currx] = newskb;
-
 		/* Setup the new bdp */
-		gfar_new_rxbdp(rx_queue, bdp, newskb);
+		if (gfar_new_rxbdp(rx_queue, bdp, newskb)) {
+			dev_kfree_skb_any(newskb);
+			newskb = NULL;
+		}
+
+		rx_queue->rx_skbuff[rx_queue->skb_currx] = newskb;
 
 		/* Update to the next pointer */
 		bdp = next_bd(bdp, base, rx_queue->rx_ring_size);
-- 
1.9.3

^ permalink raw reply related

* Re: [net-next 2/2] sctp: replace seq_printf with seq_puts
From: Neil Horman @ 2014-10-30 10:26 UTC (permalink / raw)
  To: Michele Baldessari; +Cc: Vlad Yasevich, linux-sctp, netdev, David S. Miller
In-Reply-To: <1414661356-17255-2-git-send-email-michele@acksyn.org>

On Thu, Oct 30, 2014 at 10:29:16AM +0100, Michele Baldessari wrote:
> Fixes checkpatch warning:
> "WARNING: Prefer seq_puts to seq_printf"
> 
> Signed-off-by: Michele Baldessari <michele@acksyn.org>
> ---
>  net/sctp/proc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> index bfb242af06ab..0697eda5aed8 100644
> --- a/net/sctp/proc.c
> +++ b/net/sctp/proc.c
> @@ -490,14 +490,14 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
>  			 * Note: We don't have a way to tally this at the moment
>  			 * so lets just leave it as zero for the moment
>  			 */
> -			seq_printf(seq, "0 ");
> +			seq_puts(seq, "0 ");
>  
>  			/*
>  			 * remote address start time (START).  This is also not
>  			 * currently implemented, but we can record it with a
>  			 * jiffies marker in a subsequent patch
>  			 */
> -			seq_printf(seq, "0 ");
> +			seq_puts(seq, "0 ");
>  
>  			/*
>  			 * The current state of this destination. I.e.
> -- 
> 2.1.0
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* Re: [PATCH 0/6 3.18] Fixes for iwlwifi drivers
From: Luca Coelho @ 2014-10-30 11:08 UTC (permalink / raw)
  To: Larry Finger
  Cc: linville-2XuSBdqkA4R54TAoqtyWWQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Murilo Opsfelder Araujo
In-Reply-To: <1414642633-3700-1-git-send-email-Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>

The cover-letter subject is wrong. :) I guess you meant
s/iwlwifi/rtlwifi/ ;)

--
Luca.

On Wed, 2014-10-29 at 23:17 -0500, Larry Finger wrote:
> Some late changes to rtlwifi made some of the older drivers not start correctly.
> These patches should be applied to 3.18.
> 
> Signed-off-by: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
> Cc: Murilo Opsfelder Araujo <mopsfelder-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> Larry Finger (6):
>   rtlwifi: rtl8192ce: rtl8192de: rtl8192se: Fix handling for missing
>     get_btc_status
>   rtlwifi: rtl8192se: Fix duplicate calls to ieee80211_register_hw()
>   rtlwifi: rtl8192se: Add missing section to read descriptor setting
>   rtlwifi: rtl8192ce: Add missing section to read descriptor setting
>   rtlwifi: rtl8821ae: Remove extra semicolons
>   rtlwifi: rtl8192se: Fix firmware loading
> 
>  drivers/net/wireless/rtlwifi/core.c          |  6 ++++++
>  drivers/net/wireless/rtlwifi/core.h          |  1 +
>  drivers/net/wireless/rtlwifi/rtl8192ce/def.h |  2 ++
>  drivers/net/wireless/rtlwifi/rtl8192ce/sw.c  |  1 +
>  drivers/net/wireless/rtlwifi/rtl8192ce/trx.c |  3 +++
>  drivers/net/wireless/rtlwifi/rtl8192de/sw.c  |  1 +
>  drivers/net/wireless/rtlwifi/rtl8192se/def.h |  2 ++
>  drivers/net/wireless/rtlwifi/rtl8192se/sw.c  | 22 +++-------------------
>  drivers/net/wireless/rtlwifi/rtl8192se/trx.c |  3 +++
>  drivers/net/wireless/rtlwifi/rtl8821ae/phy.c | 12 ++++++------
>  10 files changed, 28 insertions(+), 25 deletions(-)
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: TCP NewReno and single retransmit
From: Marcelo Ricardo Leitner @ 2014-10-30 11:24 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: netdev, Yuchung Cheng, Eric Dumazet
In-Reply-To: <CADVnQynu7+wgkNTkdY=XDBeyWjFxpYYBjx=n98g1_awdG2OpnA@mail.gmail.com>

On 30-10-2014 00:03, Neal Cardwell wrote:
> On Mon, Oct 27, 2014 at 2:49 PM, Marcelo Ricardo Leitner
> <mleitner@redhat.com> wrote:
>> Hi,
>>
>> We have a report from a customer saying that on a very calm connection, like
>> having only a single data packet within some minutes, if this packet gets to
>> be re-transmitted, retrans_stamp is only cleared when the next acked packet
>> is received. But this may make we abort the connection too soon if this next
>> packet also gets lost, because the reference for the initial loss is still
>> for a big while ago..
> ...
>> @@ -2382,31 +2382,32 @@ static inline bool tcp_may_undo(const struct
>> tcp_sock *tp)
>>   static bool tcp_try_undo_recovery(struct sock *sk)
> ...
>>          if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) {
>>                  /* Hold old state until something *above* high_seq
>>                   * is ACKed. For Reno it is MUST to prevent false
>>                   * fast retransmits (RFC2582). SACK TCP is safe. */
>>                  tcp_moderate_cwnd(tp);
>> +               tp->retrans_stamp = 0;
>>                  return true;
>>          }
>>          tcp_set_ca_state(sk, TCP_CA_Open);
>>          return false;
>>   }
>>
>> We would still hold state, at least part of it.. WDYT?
>
> This approach sounds OK to me as long as we include a check of
> tcp_any_retrans_done(), as we do in the similar code paths (for
> motivation, see the comment above tcp_any_retrans_done()).

Yes, okay. I thought that this would be taken care of already by then but 
reading the code again now after your comment, I can see what you're saying. 
Thanks.

> So it sounds fine to me if you change that one new line to the following 2:
>
> +  if (!tcp_any_retrans_done(sk))
> +    tp->retrans_stamp = 0;

Will do.

> Nice catch!

A good part of it (including the diagram) was done by customer. :)
I'll post the patch as soon as we sync with them (credits).

Marcelo

^ permalink raw reply

* [net 0/4][pull request] Intel Wired LAN Driver Updates 2014-10-30
From: Jeff Kirsher @ 2014-10-30 12:33 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to e1000, igb and ixgbe.

Francesco Ruggeri fixes an issue with e1000 where in a VM the driver did
not support unicast filtering.

Roman Gushchin fixes an issue with igb where the driver was re-using
mapped pages so that packets were still getting dropped even if all
the memory issues are gone and there is free memory.

Junwei Zhang found where in the ixgbe_clean_rx_ring() we were repeating
the assignment of NULL to the receive buffer skb and fixes it.

Emil fixes a race condition between setup_link and SFP detection routine
in the watchdog when setting the advertised speed.

The following are changes since commit d70127e8a942364de8dd140fe73893efda363293:
  inet: frags: remove the WARN_ON from inet_evict_bucket
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net master

Emil Tantilov (1):
  ixgbe: fix race when setting advertised speed

Francesco Ruggeri (1):
  e1000: unset IFF_UNICAST_FLT on WMware 82545EM

Junwei Zhang (1):
  ixgbe: need not repeat init skb with NULL

Roman Gushchin (1):
  igb: don't reuse pages with pfmemalloc flag

 drivers/net/ethernet/intel/e1000/e1000_main.c    | 5 ++++-
 drivers/net/ethernet/intel/igb/igb_main.c        | 6 +++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 ++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 2 +-
 4 files changed, 14 insertions(+), 3 deletions(-)

-- 
1.9.3

^ permalink raw reply

* [net 1/4] e1000: unset IFF_UNICAST_FLT on WMware 82545EM
From: Jeff Kirsher @ 2014-10-30 12:33 UTC (permalink / raw)
  To: davem
  Cc: Francesco Ruggeri, netdev, nhorman, sassmann, jogreene,
	Francesco Ruggeri, Jeff Kirsher
In-Reply-To: <1414672436-20616-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Francesco Ruggeri <fruggeri@aristanetworks.com>

VMWare's e1000 implementation does not seem to support unicast filtering.
This can be observed by configuring a macvlan interface on eth0 in a VM in
VMWare Fusion 5.0.5, and trying to use that interface instead of eth0.
Tested on 3.16.

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 5f6aded..24f3986 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -1075,7 +1075,10 @@ static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 				  NETIF_F_HW_CSUM |
 				  NETIF_F_SG);
 
-	netdev->priv_flags |= IFF_UNICAST_FLT;
+	/* Do not set IFF_UNICAST_FLT for VMWare's 82545EM */
+	if (hw->device_id != E1000_DEV_ID_82545EM_COPPER ||
+	    hw->subsystem_vendor_id != PCI_VENDOR_ID_VMWARE)
+		netdev->priv_flags |= IFF_UNICAST_FLT;
 
 	adapter->en_mng_pt = e1000_enable_mng_pass_thru(hw);
 
-- 
1.9.3

^ permalink raw reply related

* [net 4/4] ixgbe: fix race when setting advertised speed
From: Jeff Kirsher @ 2014-10-30 12:33 UTC (permalink / raw)
  To: davem; +Cc: Emil Tantilov, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <1414672436-20616-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Emil Tantilov <emil.s.tantilov@intel.com>

Following commands:

modprobe ixgbe
ifconfig ethX up
ethtool -s ethX advertise 0x020

can lead to "setup link failed with code -14" error due to the setup_link
call racing with the SFP detection routine in the watchdog.

This patch resolves this issue by protecting the setup_link call with check
for __IXGBE_IN_SFP_INIT.

Reported-by: Scott Harrison <scoharr2@cisco.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 3ce4a25..0ae038b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -342,12 +342,16 @@ static int ixgbe_set_settings(struct net_device *netdev,
 		if (old == advertised)
 			return err;
 		/* this sets the link speed and restarts auto-neg */
+		while (test_and_set_bit(__IXGBE_IN_SFP_INIT, &adapter->state))
+			usleep_range(1000, 2000);
+
 		hw->mac.autotry_restart = true;
 		err = hw->mac.ops.setup_link(hw, advertised, true);
 		if (err) {
 			e_info(probe, "setup link failed with code %d\n", err);
 			hw->mac.ops.setup_link(hw, old, true);
 		}
+		clear_bit(__IXGBE_IN_SFP_INIT, &adapter->state);
 	} else {
 		/* in this case we currently only support 10Gb/FULL */
 		u32 speed = ethtool_cmd_speed(ecmd);
-- 
1.9.3

^ permalink raw reply related

* [net 2/4] igb: don't reuse pages with pfmemalloc flag
From: Jeff Kirsher @ 2014-10-30 12:33 UTC (permalink / raw)
  To: davem; +Cc: Roman Gushchin, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <1414672436-20616-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Roman Gushchin <klamm@yandex-team.ru>

Incoming packet is dropped silently by sk_filter(), if the skb was
allocated from pfmemalloc reserves and the corresponding socket is
not marked with the SOCK_MEMALLOC flag.

Igb driver allocates pages for DMA with __skb_alloc_page(), which
calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
of OOM condition, igb can get pages with pfmemalloc flag set.

If an incoming packet hits the pfmemalloc page and is large enough
(small packets are copying into the memory, allocated with
netdev_alloc_skb_ip_align(), so they are not affected), it will be
dropped.

This behavior is ok under high memory pressure, but the problem is
that the igb driver reuses these mapped pages. So, packets are still
dropping even if all memory issues are gone and there is a plenty
of free memory.

In my case, some TCP sessions hang on a small percentage (< 0.1%)
of machines days after OOMs.

Fix this by avoiding reuse of such pages.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Tested-by: Aaron Brown "aaron.f.brown@intel.com"
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index a21b144..a2d72a8 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6537,6 +6537,9 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
 	if (unlikely(page_to_nid(page) != numa_node_id()))
 		return false;

+	if (unlikely(page->pfmemalloc))
+		return false;
+
 #if (PAGE_SIZE < 8192)
 	/* if we are only owner of page we can reuse it */
 	if (unlikely(page_count(page) != 1))
@@ -6603,7 +6606,8 @@ static bool igb_add_rx_frag(struct igb_ring *rx_ring,
 		memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));

 		/* we can reuse buffer as-is, just make sure it is local */
-		if (likely(page_to_nid(page) == numa_node_id()))
+		if (likely((page_to_nid(page) == numa_node_id()) &&
+			   !page->pfmemalloc))
 			return true;

 		/* this page cannot be reused so discard it */
-- 
1.9.3

^ permalink raw reply related

* [net 3/4] ixgbe: need not repeat init skb with NULL
From: Jeff Kirsher @ 2014-10-30 12:33 UTC (permalink / raw)
  To: davem
  Cc: Junwei Zhang, netdev, nhorman, sassmann, jogreene, Martin Zhang,
	Jeff Kirsher
In-Reply-To: <1414672436-20616-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Junwei Zhang <linggao.zjw@alibaba-inc.com>

Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index fec5212..d2df4e3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4321,8 +4321,8 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 				IXGBE_CB(skb)->page_released = false;
 			}
 			dev_kfree_skb(skb);
+			rx_buffer->skb = NULL;
 		}
-		rx_buffer->skb = NULL;
 		if (rx_buffer->dma)
 			dma_unmap_page(dev, rx_buffer->dma,
 				       ixgbe_rx_pg_size(rx_ring),
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH net] gre: Fix regression in gretap TSO support
From: Neal Cardwell @ 2014-10-30 13:51 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: alexander.duyck, netdev, David Miller, H.K. Jerry Chu,
	Eric Dumazet, Alexander Duyck
In-Reply-To: <CALnjE+oK-O-PJH_u50HqQQLnvh+GyeCnv6tNQf5qzL0o1RiPQg@mail.gmail.com>

On Thu, Oct 30, 2014 at 1:14 AM, Pravin Shelar <pshelar@nicira.com> wrote:
> On Wed, Oct 29, 2014 at 8:26 PM,  <alexander.duyck@gmail.com> wrote:
>> From: Alexander Duyck <alexander.h.duyck@redhat.com>
>>
>> On recent kernels I found that TSO on gretap interfaces didn't work.  After
>> bisecting it I found that commit b884b1a4 had introduced a regression in
>> which the Ethernet header was being included in the GRE header length.
>>
>> This change corrects that by basing the GRE header length on the inner mac
>> header in the case of GRE tunnels using transparent Ethernet bridging, and
>> uses the network header for all other GRE tunnel types.
>>
>> Fixes: b884b1a4 ("gre_offload: simplify GRE header length calculation in gre_gso_segment()")

Hmm. There may be other protocols, either now or in the future, where
we want to be able to have a mac header inside the GRE header, rather
than a network header. AFAICT it would be safer to revert b884b1a4,
and go back to the previous code (from c50cd357), where we parse the
GRE header to figure out its length.

neal

^ permalink raw reply

* [PATCH net 1/2] net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN
From: Or Gerlitz @ 2014-10-30 13:59 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Or Gerlitz
In-Reply-To: <1414677568-28409-1-git-send-email-ogerlitz@mellanox.com>

For VXLAN/NVGRE encapsulation, the current HW doesn't support offloading
both the outer UDP TX checksum and the inner TCP/UDP TX checksum. 

The driver doesn't advertize SKB_GSO_UDP_TUNNEL_CSUM, however we are wrongly
telling the HW to offload the outer UDP checksum for encapsulated packets,
fix that.

Fixes: 837052d0ccc5 ('net/mlx4_en: Add netdev support for TCP/IP
		     offloads of vxlan tunneling')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 34c1378..454d9fe 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -836,8 +836,11 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	 * whether LSO is used */
 	tx_desc->ctrl.srcrb_flags = priv->ctrl_flags;
 	if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
-		tx_desc->ctrl.srcrb_flags |= cpu_to_be32(MLX4_WQE_CTRL_IP_CSUM |
-							 MLX4_WQE_CTRL_TCP_UDP_CSUM);
+		if (!skb->encapsulation)
+			tx_desc->ctrl.srcrb_flags |= cpu_to_be32(MLX4_WQE_CTRL_IP_CSUM |
+								 MLX4_WQE_CTRL_TCP_UDP_CSUM);
+		else
+			tx_desc->ctrl.srcrb_flags |= cpu_to_be32(MLX4_WQE_CTRL_IP_CSUM);
 		ring->tx_csum++;
 	}

-- 
1.7.1

^ permalink raw reply related

* [PATCH net 0/2] mlx4 driver encapsulation/steering fixes
From: Or Gerlitz @ 2014-10-30 13:59 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Or Gerlitz

Hi Dave,

The 1st patch fixes a bug in the TX path that supports offloading the 
TX checksum of (VXLAN) encapsulated TCP packets. It turns out that the 
bug is revealed only when the receiver runs in non-offloaded mode, so
we somehow missed it so far... please queue it for -stable >= 3.14 

The 2nd patch makes sure not to leak steering entry on error flow, 
please queue it to 3.17-stable 

thanks,

Or.

Or Gerlitz (2):
  net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN
  mlx4: Avoid leaking steering rules on flow creation error flow

 drivers/infiniband/hw/mlx4/main.c          |   10 ++++++++--
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |    7 +++++--
 drivers/net/ethernet/mellanox/mlx4/mcg.c   |    4 ++++
 3 files changed, 17 insertions(+), 4 deletions(-)

^ permalink raw reply

* [PATCH net 2/2] mlx4: Avoid leaking steering rules on flow creation error flow
From: Or Gerlitz @ 2014-10-30 13:59 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Or Gerlitz
In-Reply-To: <1414677568-28409-1-git-send-email-ogerlitz@mellanox.com>

If mlx4_ib_create_flow() attempts to create > 1 rules with the
firmware, and one of these registrations fail, we leaked the
already created flow rules.

One example of the leak is when the registration of the VXLAN ghost
steering rule fails, we didn't unregister the original rule requested
by the user, introduced in commit d2fce8a9060d "mlx4: Set
user-space raw Ethernet QPs to properly handle VXLAN traffic".

While here, add dump of the VXLAN portion of steering rules
so it can actually be seen when flow creation fails.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/infiniband/hw/mlx4/main.c        |   10 ++++++++--
 drivers/net/ethernet/mellanox/mlx4/mcg.c |    4 ++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index bda5994..8b72cf3 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1173,18 +1173,24 @@ static struct ib_flow *mlx4_ib_create_flow(struct ib_qp *qp,
 		err = __mlx4_ib_create_flow(qp, flow_attr, domain, type[i],
 					    &mflow->reg_id[i]);
 		if (err)
-			goto err_free;
+			goto err_create_flow;
 		i++;
 	}

 	if (i < ARRAY_SIZE(type) && flow_attr->type == IB_FLOW_ATTR_NORMAL) {
 		err = mlx4_ib_tunnel_steer_add(qp, flow_attr, &mflow->reg_id[i]);
 		if (err)
-			goto err_free;
+			goto err_create_flow;
+		i++;
 	}

 	return &mflow->ibflow;

+err_create_flow:
+	while (i) {
+		(void)__mlx4_ib_destroy_flow(to_mdev(qp->device)->dev, mflow->reg_id[i]);
+		i--;
+	}
 err_free:
 	kfree(mflow);
 	return ERR_PTR(err);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index ca0f98c..8728431 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -955,6 +955,10 @@ static void mlx4_err_rule(struct mlx4_dev *dev, char *str,
 					cur->ib.dst_gid_msk);
 			break;

+		case MLX4_NET_TRANS_RULE_ID_VXLAN:
+			len += snprintf(buf + len, BUF_SIZE - len,
+					"VNID = %d ", be32_to_cpu(cur->vxlan.vni));
+			break;
 		case MLX4_NET_TRANS_RULE_ID_IPV6:
 			break;

-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH net] gre: Fix regression in gretap TSO support
From: Alexander Duyck @ 2014-10-30 14:30 UTC (permalink / raw)
  To: Neal Cardwell, Pravin Shelar
  Cc: alexander.duyck, netdev, David Miller, H.K. Jerry Chu,
	Eric Dumazet
In-Reply-To: <CADVnQykSY_Q3HL0T1uNO4iUheWTpTsLLF7gCnM4SbJyGp3x6Wg@mail.gmail.com>


On 10/30/2014 06:51 AM, Neal Cardwell wrote:
> On Thu, Oct 30, 2014 at 1:14 AM, Pravin Shelar <pshelar@nicira.com> wrote:
>> On Wed, Oct 29, 2014 at 8:26 PM,  <alexander.duyck@gmail.com> wrote:
>>> From: Alexander Duyck <alexander.h.duyck@redhat.com>
>>>
>>> On recent kernels I found that TSO on gretap interfaces didn't work.  After
>>> bisecting it I found that commit b884b1a4 had introduced a regression in
>>> which the Ethernet header was being included in the GRE header length.
>>>
>>> This change corrects that by basing the GRE header length on the inner mac
>>> header in the case of GRE tunnels using transparent Ethernet bridging, and
>>> uses the network header for all other GRE tunnel types.
>>>
>>> Fixes: b884b1a4 ("gre_offload: simplify GRE header length calculation in gre_gso_segment()")
> Hmm. There may be other protocols, either now or in the future, where
> we want to be able to have a mac header inside the GRE header, rather
> than a network header. AFAICT it would be safer to revert b884b1a4,
> and go back to the previous code (from c50cd357), where we parse the
> GRE header to figure out its length.
>
> neal

The change is consistent with how we handle this in other spots 
throughout the kernel.  If nothing else you can just search for 
ETH_P_TEB and you will find multiple spots in the kernel where IP 
tunnels differentiate between transparent Ethernet bridging and regular 
IP in IP tunnels by checking for the protocol ETH_P_TEB.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH net] gre: Fix regression in gretap TSO support
From: Eric Dumazet @ 2014-10-30 15:00 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Neal Cardwell, Pravin Shelar, alexander.duyck, netdev,
	David Miller, H.K. Jerry Chu, Eric Dumazet
In-Reply-To: <54524B6A.3000503@redhat.com>

On Thu, 2014-10-30 at 07:30 -0700, Alexander Duyck wrote:

> The change is consistent with how we handle this in other spots 
> throughout the kernel.  If nothing else you can just search for 
> ETH_P_TEB and you will find multiple spots in the kernel where IP 
> tunnels differentiate between transparent Ethernet bridging and regular 
> IP in IP tunnels by checking for the protocol ETH_P_TEB.

Agreed, I think that GUE might supersedes GRE usage anyway ;)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH net] gre: Fix regression in gretap TSO support
From: Tom Herbert @ 2014-10-30 15:05 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Neal Cardwell, Pravin Shelar, Alexander Duyck, netdev,
	David Miller, H.K. Jerry Chu, Eric Dumazet
In-Reply-To: <54524B6A.3000503@redhat.com>

On Thu, Oct 30, 2014 at 7:30 AM, Alexander Duyck
<alexander.h.duyck@redhat.com> wrote:
>
> On 10/30/2014 06:51 AM, Neal Cardwell wrote:
>>
>> On Thu, Oct 30, 2014 at 1:14 AM, Pravin Shelar <pshelar@nicira.com> wrote:
>>>
>>> On Wed, Oct 29, 2014 at 8:26 PM,  <alexander.duyck@gmail.com> wrote:
>>>>
>>>> From: Alexander Duyck <alexander.h.duyck@redhat.com>
>>>>
>>>> On recent kernels I found that TSO on gretap interfaces didn't work.
>>>> After
>>>> bisecting it I found that commit b884b1a4 had introduced a regression in
>>>> which the Ethernet header was being included in the GRE header length.
>>>>
>>>> This change corrects that by basing the GRE header length on the inner
>>>> mac
>>>> header in the case of GRE tunnels using transparent Ethernet bridging,
>>>> and
>>>> uses the network header for all other GRE tunnel types.
>>>>
>>>> Fixes: b884b1a4 ("gre_offload: simplify GRE header length calculation in
>>>> gre_gso_segment()")
>>
>> Hmm. There may be other protocols, either now or in the future, where
>> we want to be able to have a mac header inside the GRE header, rather
>> than a network header. AFAICT it would be safer to revert b884b1a4,
>> and go back to the previous code (from c50cd357), where we parse the
>> GRE header to figure out its length.
>>
>> neal
>
>
> The change is consistent with how we handle this in other spots throughout
> the kernel.  If nothing else you can just search for ETH_P_TEB and you will
> find multiple spots in the kernel where IP tunnels differentiate between
> transparent Ethernet bridging and regular IP in IP tunnels by checking for
> the protocol ETH_P_TEB.
>
I'm not sure I understand this. We always use inner mac header in
__skb_udp_tunnel_segment for computing tunnel length and don't
distinguish between Ethernet or IP encapsulation. Presumably, in the
case of IP encapsulation inner mac header is equal to inner network
header. Why is this different for GRE?

Thanks,
Tom

> Thanks,
>
> Alex
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next v4 0/4] netns: allow to identify peer netns
From: Nicolas Dichtel @ 2014-10-30 15:25 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	luto-kltTT9wpgjJwATOyAt5JVQ, cwang-xCSkyg8dI+0RB7SZvlqPiA
In-Reply-To: <1412257690-31253-1-git-send-email-nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>

The goal of this serie is to be able to multicast netlink messages with an
attribute that identify a peer netns.
This is needed by the userland to interpret some informations contained in
netlink messages (like IFLA_LINK value, but also some other attributes in case
of x-netns netdevice (see also
http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).

Ids of peer netns are set by userland via a new genl messages. These ids are
stored per netns and are local (ie only valid in the netns where they are set).
To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
the id of a peer netns. Note that it will be possible to add a table (struct net
-> id) later to optimize this lookup if needed.

Patch 1/4 introduces the netlink API mechanism to set and get these ids.
Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
messages. And patch 4/4 shows that the netlink messages can be symetric between
a GET and a SET.

iproute2 patches are available, I can send them on demand.

Here is a small screenshot to show how it can be used by userland.

First, setup netns and required ids:
$ ip netns add foo
$ ip netns del foo
$ ip netns
$ touch /var/run/netns/init_net
$ mount --bind /proc/1/ns/net /var/run/netns/init_net
$ ip netns add foo
$ ip netns exec foo ip netns set init_net 0
$ ip netns
foo
init_net
$ ip netns exec foo ip netns
foo
init_net (id: 0)

Now, add and display an ipip tunnel, with its link part in init_net (id 0 in
netns foo) and the netdevice in foo:
$ ip netns exec foo ip link add ipip1 link-netnsid 0 type ipip remote 10.16.0.121 local 10.16.0.249
$ ip netns exec foo ip l ls ipip1
6: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default 
    link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 0

The parameter link-netnsid shows us where the interface sends and receives
packets (and thus we know where encapsulated addresses are set).

RFCv3 -> v4:
  rebase on net-next
  add copyright text in the new netns.h file

RFCv2 -> RFCv3:
  ids are now defined by userland (via netlink). Ids are stored in each netns
  (and they are local to this netns).
  add get_link_net support for ip6 tunnels
  netnsid is now a s32 instead of a u32

RFCv1 -> RFCv2:
  remove useless ()
  ids are now stored in the user ns. It's possible to get an id for a peer netns
  only if the current netns and the peer netns have the same user ns parent.

 MAINTAINERS                  |   1 +
 include/net/ip6_tunnel.h     |   1 +
 include/net/ip_tunnels.h     |   1 +
 include/net/net_namespace.h  |   5 ++
 include/net/rtnetlink.h      |   2 +
 include/uapi/linux/Kbuild    |   1 +
 include/uapi/linux/if_link.h |   1 +
 include/uapi/linux/netns.h   |  38 +++++++++
 net/core/net_namespace.c     | 195 +++++++++++++++++++++++++++++++++++++++++++
 net/core/rtnetlink.c         |  38 ++++++++-
 net/ipv4/ip_gre.c            |   2 +
 net/ipv4/ip_tunnel.c         |   8 ++
 net/ipv4/ip_vti.c            |   1 +
 net/ipv4/ipip.c              |   1 +
 net/ipv6/ip6_gre.c           |   1 +
 net/ipv6/ip6_tunnel.c        |   9 ++
 net/ipv6/ip6_vti.c           |   1 +
 net/ipv6/sit.c               |   1 +
 net/netlink/genetlink.c      |   4 +
 19 files changed, 308 insertions(+), 3 deletions(-)

Comments are welcome.

Regards,
Nicolas

^ permalink raw reply

* [PATCH net-next v4 2/4] rtnl: add link netns id to interface messages
From: Nicolas Dichtel @ 2014-10-30 15:25 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
  Cc: cwang-xCSkyg8dI+0RB7SZvlqPiA, Nicolas Dichtel,
	luto-kltTT9wpgjJwATOyAt5JVQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1414682728-4532-1-git-send-email-nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>

This patch adds a new attribute (IFLA_LINK_NETNSID) which contains the 'link'
netns id when this netns is different from the netns where the interface
stands (for example for x-net interfaces like ip tunnels). When there is no id,
we put NETNSA_NSINDEX_UNKNOWN into this attribute to indicate to userland that
the link netns is different from the interface netns. Hence, userland knows that
some information like IFLA_LINK are not interpretable.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
---
 include/net/rtnetlink.h      |  2 ++
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 13 +++++++++++++
 3 files changed, 16 insertions(+)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index e21b9f9653c0..6c6d5393fc34 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -46,6 +46,7 @@ static inline int rtnl_msg_family(const struct nlmsghdr *nlh)
  *			    to create when creating a new device.
  *	@get_num_rx_queues: Function to determine number of receive queues
  *			    to create when creating a new device.
+ *	@get_link_net: Function to get the i/o netns of the device
  */
 struct rtnl_link_ops {
 	struct list_head	list;
@@ -93,6 +94,7 @@ struct rtnl_link_ops {
 	int			(*fill_slave_info)(struct sk_buff *skb,
 						   const struct net_device *dev,
 						   const struct net_device *slave_dev);
+	struct net		*(*get_link_net)(const struct net_device *dev);
 };
 
 int __rtnl_link_register(struct rtnl_link_ops *ops);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 7072d8325016..d2729f63cf01 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -145,6 +145,7 @@ enum {
 	IFLA_CARRIER,
 	IFLA_PHYS_PORT_ID,
 	IFLA_CARRIER_CHANGES,
+	IFLA_LINK_NETNSID,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a6882686ca3a..1b9329512496 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -862,6 +862,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
+	       + nla_total_size(4) /* IFLA_LINK_NETNSID */
 	       + nla_total_size(ext_filter_mask
 			        & RTEXT_FILTER_VF ? 4 : 0) /* IFLA_NUM_VF */
 	       + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */
@@ -1134,6 +1135,18 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			goto nla_put_failure;
 	}
 
+	if (dev->rtnl_link_ops &&
+	    dev->rtnl_link_ops->get_link_net) {
+		struct net *link_net = dev->rtnl_link_ops->get_link_net(dev);
+
+		if (!net_eq(dev_net(dev), link_net)) {
+			int id = peernet2id(dev_net(dev), link_net);
+
+			if (nla_put_s32(skb, IFLA_LINK_NETNSID, id))
+				goto nla_put_failure;
+		}
+	}
+
 	if (!(af_spec = nla_nest_start(skb, IFLA_AF_SPEC)))
 		goto nla_put_failure;
 
-- 
2.1.0

^ permalink raw reply related

* [PATCH net-next v4 3/4] iptunnels: advertise link netns via netlink
From: Nicolas Dichtel @ 2014-10-30 15:25 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
  Cc: cwang-xCSkyg8dI+0RB7SZvlqPiA, Nicolas Dichtel,
	luto-kltTT9wpgjJwATOyAt5JVQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1414682728-4532-1-git-send-email-nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>

Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
---
 include/net/ip6_tunnel.h | 1 +
 include/net/ip_tunnels.h | 1 +
 net/ipv4/ip_gre.c        | 2 ++
 net/ipv4/ip_tunnel.c     | 8 ++++++++
 net/ipv4/ip_vti.c        | 1 +
 net/ipv4/ipip.c          | 1 +
 net/ipv6/ip6_gre.c       | 1 +
 net/ipv6/ip6_tunnel.c    | 9 +++++++++
 net/ipv6/ip6_vti.c       | 1 +
 net/ipv6/sit.c           | 1 +
 10 files changed, 26 insertions(+)

diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index a5593dab6af7..8648519f4555 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -69,6 +69,7 @@ int ip6_tnl_xmit_ctl(struct ip6_tnl *t);
 __u16 ip6_tnl_parse_tlv_enc_lim(struct sk_buff *skb, __u8 *raw);
 __u32 ip6_tnl_get_cap(struct ip6_tnl *t, const struct in6_addr *laddr,
 			     const struct in6_addr *raddr);
+struct net *ip6_tnl_get_link_net(const struct net_device *dev);
 
 static inline void ip6tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 {
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 5bc6edeb7143..ce4ff6161fab 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -122,6 +122,7 @@ struct ip_tunnel_net {
 int ip_tunnel_init(struct net_device *dev);
 void ip_tunnel_uninit(struct net_device *dev);
 void  ip_tunnel_dellink(struct net_device *dev, struct list_head *head);
+struct net *ip_tunnel_get_link_net(const struct net_device *dev);
 int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
 		       struct rtnl_link_ops *ops, char *devname);
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 12055fdbe716..9e2e29a8c989 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -827,6 +827,7 @@ static struct rtnl_link_ops ipgre_link_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipgre_get_size,
 	.fill_info	= ipgre_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
@@ -841,6 +842,7 @@ static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipgre_get_size,
 	.fill_info	= ipgre_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static int __net_init ipgre_tap_init_net(struct net *net)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 0bb8e141eacc..3e1edd544b27 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -972,6 +972,14 @@ void ip_tunnel_dellink(struct net_device *dev, struct list_head *head)
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_dellink);
 
+struct net *ip_tunnel_get_link_net(const struct net_device *dev)
+{
+	struct ip_tunnel *tunnel = netdev_priv(dev);
+
+	return tunnel->net;
+}
+EXPORT_SYMBOL(ip_tunnel_get_link_net);
+
 int ip_tunnel_init_net(struct net *net, int ip_tnl_net_id,
 				  struct rtnl_link_ops *ops, char *devname)
 {
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 3e861011e4a3..f0fab26e4ddc 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -530,6 +530,7 @@ static struct rtnl_link_ops vti_link_ops __read_mostly = {
 	.changelink	= vti_changelink,
 	.get_size	= vti_get_size,
 	.fill_info	= vti_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static int __init vti_init(void)
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 37096d64730e..e7a183baba0a 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -498,6 +498,7 @@ static struct rtnl_link_ops ipip_link_ops __read_mostly = {
 	.dellink	= ip_tunnel_dellink,
 	.get_size	= ipip_get_size,
 	.fill_info	= ipip_fill_info,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct xfrm_tunnel ipip_handler __read_mostly = {
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 12c3c8ef3849..5165ac7fde22 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1661,6 +1661,7 @@ static struct rtnl_link_ops ip6gre_link_ops __read_mostly = {
 	.dellink	= ip6gre_dellink,
 	.get_size	= ip6gre_get_size,
 	.fill_info	= ip6gre_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static struct rtnl_link_ops ip6gre_tap_ops __read_mostly = {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 9409887fb664..6b2534ea9c54 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1703,6 +1703,14 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+struct net *ip6_tnl_get_link_net(const struct net_device *dev)
+{
+	struct ip6_tnl *tunnel = netdev_priv(dev);
+
+	return tunnel->net;
+}
+EXPORT_SYMBOL(ip6_tnl_get_link_net);
+
 static const struct nla_policy ip6_tnl_policy[IFLA_IPTUN_MAX + 1] = {
 	[IFLA_IPTUN_LINK]		= { .type = NLA_U32 },
 	[IFLA_IPTUN_LOCAL]		= { .len = sizeof(struct in6_addr) },
@@ -1726,6 +1734,7 @@ static struct rtnl_link_ops ip6_link_ops __read_mostly = {
 	.dellink	= ip6_tnl_dellink,
 	.get_size	= ip6_tnl_get_size,
 	.fill_info	= ip6_tnl_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static struct xfrm6_tunnel ip4ip6_handler __read_mostly = {
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d440bb585524..43966dcc9603 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -992,6 +992,7 @@ static struct rtnl_link_ops vti6_link_ops __read_mostly = {
 	.changelink	= vti6_changelink,
 	.get_size	= vti6_get_size,
 	.fill_info	= vti6_fill_info,
+	.get_link_net	= ip6_tnl_get_link_net,
 };
 
 static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 58e5b4710127..c858d0eb267a 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1765,6 +1765,7 @@ static struct rtnl_link_ops sit_link_ops __read_mostly = {
 	.get_size	= ipip6_get_size,
 	.fill_info	= ipip6_fill_info,
 	.dellink	= ipip6_dellink,
+	.get_link_net	= ip_tunnel_get_link_net,
 };
 
 static struct xfrm_tunnel sit_handler __read_mostly = {
-- 
2.1.0

^ permalink raw reply related

* [PATCH net-next v4 4/4] rtnl: allow to create device with IFLA_LINK_NETNSID set
From: Nicolas Dichtel @ 2014-10-30 15:25 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
  Cc: cwang-xCSkyg8dI+0RB7SZvlqPiA, Nicolas Dichtel,
	luto-kltTT9wpgjJwATOyAt5JVQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1414682728-4532-1-git-send-email-nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>

This patch adds the ability to create a netdevice in a specified netns and
then move it into the final netns. In fact, it allows to have a symetry between
get and set rtnl messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>
---
 net/core/rtnetlink.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1b9329512496..57959a85ed2c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1211,6 +1211,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_NUM_RX_QUEUES]	= { .type = NLA_U32 },
 	[IFLA_PHYS_PORT_ID]	= { .type = NLA_BINARY, .len = MAX_PHYS_PORT_ID_LEN },
 	[IFLA_CARRIER_CHANGES]	= { .type = NLA_U32 },  /* ignored */
+	[IFLA_LINK_NETNSID]	= { .type = NLA_S32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -1983,7 +1984,7 @@ replay:
 		struct nlattr *slave_attr[m_ops ? m_ops->slave_maxtype + 1 : 0];
 		struct nlattr **data = NULL;
 		struct nlattr **slave_data = NULL;
-		struct net *dest_net;
+		struct net *dest_net, *link_net = NULL;
 
 		if (ops) {
 			if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) {
@@ -2089,7 +2090,18 @@ replay:
 		if (IS_ERR(dest_net))
 			return PTR_ERR(dest_net);
 
-		dev = rtnl_create_link(dest_net, ifname, name_assign_type, ops, tb);
+		if (tb[IFLA_LINK_NETNSID]) {
+			int id = nla_get_s32(tb[IFLA_LINK_NETNSID]);
+
+			link_net = get_net_ns_by_id(dest_net, id);
+			if (link_net == NULL) {
+				err =  -EINVAL;
+				goto out;
+			}
+		}
+
+		dev = rtnl_create_link(link_net ? : dest_net, ifname,
+				       name_assign_type, ops, tb);
 		if (IS_ERR(dev)) {
 			err = PTR_ERR(dev);
 			goto out;
@@ -2117,9 +2129,16 @@ replay:
 			}
 		}
 		err = rtnl_configure_link(dev, ifm);
-		if (err < 0)
+		if (err < 0) {
 			unregister_netdevice(dev);
+			goto out;
+		}
+
+		if (link_net)
+			err = dev_change_net_namespace(dev, dest_net, ifname);
 out:
+		if (link_net)
+			put_net(link_net);
 		put_net(dest_net);
 		return err;
 	}
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH 0/6 3.18] Fixes for iwlwifi drivers
From: Larry Finger @ 2014-10-30 15:27 UTC (permalink / raw)
  To: Luca Coelho
  Cc: linville-2XuSBdqkA4R54TAoqtyWWQ,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Murilo Opsfelder Araujo
In-Reply-To: <1414667312.27833.22.camel-XPOmlcxoEMv1KXRcyAk9cg@public.gmane.org>

On 10/30/2014 06:08 AM, Luca Coelho wrote:
> The cover-letter subject is wrong. :) I guess you meant
> s/iwlwifi/rtlwifi/ ;)

Yes, the changes were for rtlwifi, not iwlwifi. Sorry. (:

My laptop has an Intel 7260 card built in, and it is working correctly on both 
2.4 and 5G bands under mainline 3.18-rc2.

Those types of errors are what I get for trying to "work" while on a family 
vacation. Unfortunately, I needed to submit those patches quickly to prevent a 
set of conflicting updates from being accepted, and I made a silly mistake.

Larry

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next v4 1/4] netns: add genl cmd to add and get peer netns ids
From: Nicolas Dichtel @ 2014-10-30 15:25 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel
In-Reply-To: <1414682728-4532-1-git-send-email-nicolas.dichtel@6wind.com>

With this patch, a user can define an id for a peer netns by providing a FD or a
PID. These ids are local to netns (ie valid only into one netns).

This will be useful for netlink messages when a x-netns interface is dumped.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 MAINTAINERS                 |   1 +
 include/net/net_namespace.h |   5 ++
 include/uapi/linux/Kbuild   |   1 +
 include/uapi/linux/netns.h  |  38 +++++++++
 net/core/net_namespace.c    | 195 ++++++++++++++++++++++++++++++++++++++++++++
 net/netlink/genetlink.c     |   4 +
 6 files changed, 244 insertions(+)
 create mode 100644 include/uapi/linux/netns.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 43898b1a8a2d..de7e6fcbd5c2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6382,6 +6382,7 @@ F:	include/linux/netdevice.h
 F:	include/uapi/linux/in.h
 F:	include/uapi/linux/net.h
 F:	include/uapi/linux/netdevice.h
+F:	include/uapi/linux/netns.h
 F:	tools/net/
 F:	tools/testing/selftests/net/
 F:	lib/random32.c
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index e0d64667a4b3..0f1367a71b81 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -59,6 +59,7 @@ struct net {
 	struct list_head	exit_list;	/* Use only net_mutex */
 
 	struct user_namespace   *user_ns;	/* Owning user namespace */
+	struct idr		netns_ids;
 
 	unsigned int		proc_inum;
 
@@ -289,6 +290,10 @@ static inline struct net *read_pnet(struct net * const *pnet)
 #define __net_initconst	__initconst
 #endif
 
+int peernet2id(struct net *net, struct net *peer);
+struct net *get_net_ns_by_id(struct net *net, int id);
+int netns_genl_register(void);
+
 struct pernet_operations {
 	struct list_head list;
 	int (*init)(struct net *net);
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 6cad97485bad..d7f49c69585a 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -277,6 +277,7 @@ header-y += netfilter_decnet.h
 header-y += netfilter_ipv4.h
 header-y += netfilter_ipv6.h
 header-y += netlink.h
+header-y += netns.h
 header-y += netrom.h
 header-y += nfc.h
 header-y += nfs.h
diff --git a/include/uapi/linux/netns.h b/include/uapi/linux/netns.h
new file mode 100644
index 000000000000..2edf129377de
--- /dev/null
+++ b/include/uapi/linux/netns.h
@@ -0,0 +1,38 @@
+/* Copyright (c) 2014 6WIND S.A.
+ * Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ */
+#ifndef _UAPI_LINUX_NETNS_H_
+#define _UAPI_LINUX_NETNS_H_
+
+/* Generic netlink messages */
+
+#define NETNS_GENL_NAME			"netns"
+#define NETNS_GENL_VERSION		0x1
+
+/* Commands */
+enum {
+	NETNS_CMD_UNSPEC,
+	NETNS_CMD_NEWID,
+	NETNS_CMD_GETID,
+	__NETNS_CMD_MAX,
+};
+
+#define NETNS_CMD_MAX		(__NETNS_CMD_MAX - 1)
+
+/* Attributes */
+enum {
+	NETNSA_NONE,
+#define NETNSA_NSINDEX_UNKNOWN	-1
+	NETNSA_NSID,
+	NETNSA_PID,
+	NETNSA_FD,
+	__NETNSA_MAX,
+};
+
+#define NETNSA_MAX		(__NETNSA_MAX - 1)
+
+#endif /* _UAPI_LINUX_NETNS_H_ */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 7f155175bba8..4a5680ed42fb 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -15,6 +15,8 @@
 #include <linux/file.h>
 #include <linux/export.h>
 #include <linux/user_namespace.h>
+#include <linux/netns.h>
+#include <net/genetlink.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -144,6 +146,50 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
+/* This function is used by idr_for_each(). If net is equal to peer, the
+ * function returns the id so that idr_for_each() stops. Because we cannot
+ * returns the id 0 (idr_for_each() will not stop), we return the magic value
+ * -1 for it.
+ */
+static int net_eq_idr(int id, void *net, void *peer)
+{
+	if (net_eq(net, peer))
+		return id ? : -1;
+	return 0;
+}
+
+/* returns NETNSA_NSINDEX_UNKNOWN if not found */
+int peernet2id(struct net *net, struct net *peer)
+{
+	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
+
+	ASSERT_RTNL();
+
+	/* Magic value for id 0. */
+	if (id == -1)
+		return 0;
+	if (id == 0)
+		return NETNSA_NSINDEX_UNKNOWN;
+
+	return id;
+}
+
+struct net *get_net_ns_by_id(struct net *net, int id)
+{
+	struct net *peer;
+
+	if (id < 0)
+		return NULL;
+
+	rcu_read_lock();
+	peer = idr_find(&net->netns_ids, id);
+	if (peer)
+		get_net(peer);
+	rcu_read_unlock();
+
+	return peer;
+}
+
 /*
  * setup_net runs the initializers for the network namespace object.
  */
@@ -158,6 +204,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
 	net->user_ns = user_ns;
+	idr_init(&net->netns_ids);
 
 #ifdef NETNS_REFCNT_DEBUG
 	atomic_set(&net->use_count, 0);
@@ -288,6 +335,14 @@ static void cleanup_net(struct work_struct *work)
 	list_for_each_entry(net, &net_kill_list, cleanup_list) {
 		list_del_rcu(&net->list);
 		list_add_tail(&net->exit_list, &net_exit_list);
+		for_each_net(tmp) {
+			int id = peernet2id(tmp, net);
+
+			if (id >= 0)
+				idr_remove(&tmp->netns_ids, id);
+		}
+		idr_destroy(&net->netns_ids);
+
 	}
 	rtnl_unlock();
 
@@ -399,6 +454,146 @@ static struct pernet_operations __net_initdata net_ns_ops = {
 	.exit = net_ns_net_exit,
 };
 
+static struct genl_family netns_genl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= NETNS_GENL_NAME,
+	.version	= NETNS_GENL_VERSION,
+	.hdrsize	= 0,
+	.maxattr	= NETNSA_MAX,
+	.netnsok	= true,
+};
+
+static struct nla_policy netns_nl_policy[NETNSA_MAX + 1] = {
+	[NETNSA_NONE]		= { .type = NLA_UNSPEC },
+	[NETNSA_NSID]		= { .type = NLA_S32 },
+	[NETNSA_PID]		= { .type = NLA_U32 },
+	[NETNSA_FD]		= { .type = NLA_U32 },
+};
+
+static int netns_nl_cmd_newid(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct net *peer;
+	int nsid, err;
+
+	if (!info->attrs[NETNSA_NSID])
+		return -EINVAL;
+	nsid = nla_get_s32(info->attrs[NETNSA_NSID]);
+	if (nsid < 0)
+		return -EINVAL;
+
+	if (info->attrs[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
+	else if (info->attrs[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
+	else
+		return -EINVAL;
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	rtnl_lock();
+	if (peernet2id(net, peer) >= 0) {
+		err = -EEXIST;
+		goto out;
+	}
+
+	err = idr_alloc(&net->netns_ids, peer, nsid, nsid + 1, GFP_KERNEL);
+	if (err >= 0)
+		err = 0;
+out:
+	rtnl_unlock();
+	put_net(peer);
+	return err;
+}
+
+static int netns_nl_get_size(void)
+{
+	return nla_total_size(sizeof(s32)) /* NETNSA_NSID */
+	       ;
+}
+
+static int netns_nl_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
+			 int cmd, struct net *net, struct net *peer)
+{
+	void *hdr;
+	int id;
+
+	hdr = genlmsg_put(skb, portid, seq, &netns_genl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	rtnl_lock();
+	id = peernet2id(net, peer);
+	rtnl_unlock();
+	if (nla_put_s32(skb, NETNSA_NSID, id))
+		goto nla_put_failure;
+
+	return genlmsg_end(skb, hdr);
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int netns_nl_cmd_getid(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct sk_buff *msg;
+	int err = -ENOBUFS;
+	struct net *peer;
+
+	if (info->attrs[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
+	else if (info->attrs[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
+	else
+		return -EINVAL;
+
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	msg = genlmsg_new(netns_nl_get_size(), GFP_KERNEL);
+	if (!msg) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = netns_nl_fill(msg, info->snd_portid, info->snd_seq,
+			    NLM_F_ACK, NETNS_CMD_GETID, net, peer);
+	if (err < 0)
+		goto err_out;
+
+	err = genlmsg_unicast(net, msg, info->snd_portid);
+	goto out;
+
+err_out:
+	nlmsg_free(msg);
+out:
+	put_net(peer);
+	return err;
+}
+
+static struct genl_ops netns_genl_ops[] = {
+	{
+		.cmd = NETNS_CMD_NEWID,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_newid,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NETNS_CMD_GETID,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_getid,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+int netns_genl_register(void)
+{
+	return genl_register_family_with_ops(&netns_genl_family,
+					     netns_genl_ops);
+}
+
 static int __init net_ns_init(void)
 {
 	struct net_generic *ng;
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 76393f2f4b22..c6f39e40c9f3 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -1029,6 +1029,10 @@ static int __init genl_init(void)
 	if (err)
 		goto problem;
 
+	err = netns_genl_register();
+	if (err < 0)
+		goto problem;
+
 	return 0;
 
 problem:
-- 
2.1.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox