Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 04/10] net: smc911x: use io{read,write}*_rep accessors
From: Will Deacon @ 2012-12-11 14:49 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	benh@kernel.crashing.org, arnd@arndb.de, james.hogan@imgtec.com,
	matthew@mattleach.net, netdev@vger.kernel.org
In-Reply-To: <20121210.154705.1773845547652835423.davem@davemloft.net>

Hi David,

On Mon, Dec 10, 2012 at 08:47:05PM +0000, David Miller wrote:
> From: Will Deacon <will.deacon@arm.com>
> Date: Mon, 10 Dec 2012 19:12:36 +0000
> 
> > From: Matthew Leach <matthew@mattleach.net>
> > 
> > The {read,write}s{b,w,l} operations are not defined by all
> > architectures and are being removed from the asm-generic/io.h
> > interface.
> > 
> > This patch replaces the usage of these string functions in the smc911x
> > accessors with io{read,write}{8,16,32}_rep calls instead.
> > 
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Matthew Leach <matthew@mattleach.net>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> 
> This misses the two uses in smsc911x_tx_writefifo and
> smsc911x_rx_readfifo.

Well spotted, updated patch below.

Cheers,

Will

--->8

>From b46e33465e755e945136d19938c9a8331cbafce7 Mon Sep 17 00:00:00 2001
From: Matthew Leach <matthew@mattleach.net>
Date: Tue, 6 Nov 2012 14:51:11 +0000
Subject: [PATCH v3] net: smc911x: use io{read,write}*_rep accessors

The {read,write}s{b,w,l} operations are not defined by all
architectures and are being removed from the asm-generic/io.h
interface.

This patch replaces the usage of these string functions in the smc911x
accessors with io{read,write}{8,16,32}_rep calls instead.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Matthew Leach <matthew@mattleach.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/net/ethernet/smsc/smc911x.h  | 16 ++++++++--------
 drivers/net/ethernet/smsc/smsc911x.c |  8 ++++----
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smc911x.h b/drivers/net/ethernet/smsc/smc911x.h
index 3269292..d51261b 100644
--- a/drivers/net/ethernet/smsc/smc911x.h
+++ b/drivers/net/ethernet/smsc/smc911x.h
@@ -159,12 +159,12 @@ static inline void SMC_insl(struct smc911x_local *lp, int reg,
 	void __iomem *ioaddr = lp->base + reg;
 
 	if (lp->cfg.flags & SMC911X_USE_32BIT) {
-		readsl(ioaddr, addr, count);
+		ioread32_rep(ioaddr, addr, count);
 		return;
 	}
 
 	if (lp->cfg.flags & SMC911X_USE_16BIT) {
-		readsw(ioaddr, addr, count * 2);
+		ioread16_rep(ioaddr, addr, count * 2);
 		return;
 	}
 
@@ -177,12 +177,12 @@ static inline void SMC_outsl(struct smc911x_local *lp, int reg,
 	void __iomem *ioaddr = lp->base + reg;
 
 	if (lp->cfg.flags & SMC911X_USE_32BIT) {
-		writesl(ioaddr, addr, count);
+		iowrite32_rep(ioaddr, addr, count);
 		return;
 	}
 
 	if (lp->cfg.flags & SMC911X_USE_16BIT) {
-		writesw(ioaddr, addr, count * 2);
+		iowrite16_rep(ioaddr, addr, count * 2);
 		return;
 	}
 
@@ -196,14 +196,14 @@ static inline void SMC_outsl(struct smc911x_local *lp, int reg,
 		 writew(v & 0xFFFF, (lp)->base + (r));	 \
 		 writew(v >> 16, (lp)->base + (r) + 2); \
 	 } while (0)
-#define SMC_insl(lp, r, p, l)	 readsw((short*)((lp)->base + (r)), p, l*2)
-#define SMC_outsl(lp, r, p, l)	 writesw((short*)((lp)->base + (r)), p, l*2)
+#define SMC_insl(lp, r, p, l)	 ioread16_rep((short*)((lp)->base + (r)), p, l*2)
+#define SMC_outsl(lp, r, p, l)	 iowrite16_rep((short*)((lp)->base + (r)), p, l*2)
 
 #elif	SMC_USE_32BIT
 #define SMC_inl(lp, r)		 readl((lp)->base + (r))
 #define SMC_outl(v, lp, r)	 writel(v, (lp)->base + (r))
-#define SMC_insl(lp, r, p, l)	 readsl((int*)((lp)->base + (r)), p, l)
-#define SMC_outsl(lp, r, p, l)	 writesl((int*)((lp)->base + (r)), p, l)
+#define SMC_insl(lp, r, p, l)	 ioread32_rep((int*)((lp)->base + (r)), p, l)
+#define SMC_outsl(lp, r, p, l)	 iowrite32_rep((int*)((lp)->base + (r)), p, l)
 
 #endif /* SMC_USE_16BIT */
 #endif /* SMC_DYNAMIC_BUS_CONFIG */
diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c
index c53c0f4..9d46167 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -253,7 +253,7 @@ smsc911x_tx_writefifo(struct smsc911x_data *pdata, unsigned int *buf,
 	}
 
 	if (pdata->config.flags & SMSC911X_USE_32BIT) {
-		writesl(pdata->ioaddr + TX_DATA_FIFO, buf, wordcount);
+		iowrite32_rep(pdata->ioaddr + TX_DATA_FIFO, buf, wordcount);
 		goto out;
 	}
 
@@ -285,7 +285,7 @@ smsc911x_tx_writefifo_shift(struct smsc911x_data *pdata, unsigned int *buf,
 	}
 
 	if (pdata->config.flags & SMSC911X_USE_32BIT) {
-		writesl(pdata->ioaddr + __smsc_shift(pdata,
+		iowrite32_rep(pdata->ioaddr + __smsc_shift(pdata,
 						TX_DATA_FIFO), buf, wordcount);
 		goto out;
 	}
@@ -319,7 +319,7 @@ smsc911x_rx_readfifo(struct smsc911x_data *pdata, unsigned int *buf,
 	}
 
 	if (pdata->config.flags & SMSC911X_USE_32BIT) {
-		readsl(pdata->ioaddr + RX_DATA_FIFO, buf, wordcount);
+		ioread32_rep(pdata->ioaddr + RX_DATA_FIFO, buf, wordcount);
 		goto out;
 	}
 
@@ -351,7 +351,7 @@ smsc911x_rx_readfifo_shift(struct smsc911x_data *pdata, unsigned int *buf,
 	}
 
 	if (pdata->config.flags & SMSC911X_USE_32BIT) {
-		readsl(pdata->ioaddr + __smsc_shift(pdata,
+		ioread32_rep(pdata->ioaddr + __smsc_shift(pdata,
 						RX_DATA_FIFO), buf, wordcount);
 		goto out;
 	}
-- 
1.8.0

^ permalink raw reply related

* Re: Gianfar driver issue
From: Claudiu Manoil @ 2012-12-11 14:35 UTC (permalink / raw)
  To: Cedric VONCKEN; +Cc: netdev
In-Reply-To: <773DB8A82AB6A046AE0195C68612A31901412124@sbs2003.acksys.local>

On 12/11/2012 11:59 AM, Cedric VONCKEN wrote:
> 	Hi all,
>
> 	I think he have an issue in Gianfar driver.
>
> 	When the Netdev tx queue timeout occurred, the function
> gfar_timeout(..) is called. This function calls indirectly the
> gfar_init_mac(..) function.
>
> 	In this function, the rctrl register is set to a default value.
>
> 	If the Promiscuous is enable on the net dev ( flag IFF_PROMISC
> is set), the gfar_init_function does not reactivate it.
>
> 	The Promiscuous mode is used for example when the netdev is
> bridged.
> 	
> 	I apply this patch to fix it.
>
> 	--- a/drivers/net/ethernet/freescale/gianfar.c.	2012-06-01
> 09:16:13.000000000 +0200
> 	+++ b/drivers/net/ethernet/freescale/gianfar.c	2012-12-11
> 10:38:23.000000000 +0100
> 	@@ -356,6 +356,11 @@
>   	/* Configure the coalescing support */
>   	gfar_configure_coalescing(priv, 0xFF, 0xFF);
>
> +	if (ndev->flags & IFF_PROMISC) {
> +		/* Set RCTRL to PROM */
> +		rctrl |= RCTRL_PROM;
> +	}
> +
>   	if (priv->rx_filer_enable) {
>   		rctrl |= RCTRL_FILREN;
>   		/* Program the RIR0 reg with the required distribution
> */

Hello,

I don't see any issue with this code change, and there are other drivers 
too reconfiguring the promiscuous mode upon tx timeout.
A valid (formatted) patch needs to be sent however.

Thanks,
Claudiu

^ permalink raw reply

* Re: [PATCH] smsc75xx: only set mac address once on bind
From: Steve Glendinning @ 2012-12-11 14:35 UTC (permalink / raw)
  To: Dan Williams; +Cc: Steve Glendinning, netdev, Bjorn Mork
In-Reply-To: <1355180407.19156.32.camel@dcbw.foobar.com>

Hi Dan,

On 10 December 2012 23:00, Dan Williams <dcbw@redhat.com> wrote:
> On Mon, 2012-12-10 at 11:01 +0000, Steve Glendinning wrote:
>> This patch changes when we decide what the device's MAC address
>> is from per ifconfig up to once when the device is connected.
>>
>> Without this patch, a manually forced device MAC is overwritten
>> on ifconfig down/up.  Also devices that have no EEPROM are
>> assigned a new random address on ifconfig down/up instead of
>> persisting the same one.
>
> Does this mean that on devices without EEPROM, ifconfig XXX
> down/ifconfig XXX up will generate a *new* random address?  That seems a
> bit odd; why wouldn't the first random address generated for the device
> persist until either (a) changed by ifconfig or (b) device was
> disconnected?

Sorry if my commit message wasn't clear enough.  That is indeed a bit
odd, and it describes the (buggy) behaviour BEFORE this patch was
applied.

With this patch applied, devices without EEPROM (and without a
manually specified MAC address) will get a randomly generated address
assigned once at bind time, so this MAC will persist for the lifetime
the USB device is connected.  We also now won't trash a manually
specified MAC, for cases where userland sets the MAC before bringing
up the interface.

-- 
Steve Glendinning

^ permalink raw reply

* Re: netconsole fun
From: Neil Horman @ 2012-12-11 14:30 UTC (permalink / raw)
  To: Peter Hurley; +Cc: Cong Wang, netdev
In-Reply-To: <1355235592.2694.5.camel@thor>

On Tue, Dec 11, 2012 at 09:19:52AM -0500, Peter Hurley wrote:
> On Tue, 2012-12-11 at 04:51 +0000, Cong Wang wrote:
> > On Mon, 10 Dec 2012 at 14:17 GMT, Peter Hurley <peter@hurleysoftware.com> wrote:
> > > Now that netpoll has been disabled for slaved devices, is there a
> > > recommended method of running netconsole on a machine that has a slaved
> > > device?
> > >
> > 
> > Yes, running it on the master device instead.
> 
> Thanks for the suggestion, but:
> 
> [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.10.99/br0,30000@192.168.10.100/xx:xx:xx:xx:xx:xx
> ...
> [ 5.289869] netpoll: netconsole: local port 6665
> [ 5.289885] netpoll: netconsole: local IP 192.168.10.99
> [ 5.289892] netpoll: netconsole: interface 'br0'
> [ 5.289898] netpoll: netconsole: remote port 30000
> [ 5.289907] netpoll: netconsole: remote IP 192.168.10.100
> [ 5.289914] netpoll: netconsole: remote ethernet address xx:xx:xx:xx:xx:xx
> [ 5.289922] netpoll: netconsole: br0 doesn't exist, aborting
> [ 5.289929] netconsole: cleaning up
> ...
> [ 9.392291] Bridge firewalling registered
> [ 9.396805] device eth1 entered promiscuous mode
> [ 9.418350] eth1:  setting full-duplex.
> [ 9.421268] br0: port 1(eth1) entered forwarding state
> [ 9.423354] br0: port 1(eth1) entered forwarding state
> 
> 
> Is there a way to control or associate network device names prior to
> udev renaming?
> 
That looks like a systemd problem (or more specifically a boot dependency
problem).  You need to modify your netconsole unit/service file to start after
all your networking is up.  NetworkManager provides a dummy service file for
this purpose, called networkmanager-wait-online.service

Neil

> Thanks again,
> Peter
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: netconsole fun
From: Peter Hurley @ 2012-12-11 14:19 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev
In-Reply-To: <ka6e4e$6k5$1@ger.gmane.org>

On Tue, 2012-12-11 at 04:51 +0000, Cong Wang wrote:
> On Mon, 10 Dec 2012 at 14:17 GMT, Peter Hurley <peter@hurleysoftware.com> wrote:
> > Now that netpoll has been disabled for slaved devices, is there a
> > recommended method of running netconsole on a machine that has a slaved
> > device?
> >
> 
> Yes, running it on the master device instead.

Thanks for the suggestion, but:

[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.10.99/br0,30000@192.168.10.100/xx:xx:xx:xx:xx:xx
...
[ 5.289869] netpoll: netconsole: local port 6665
[ 5.289885] netpoll: netconsole: local IP 192.168.10.99
[ 5.289892] netpoll: netconsole: interface 'br0'
[ 5.289898] netpoll: netconsole: remote port 30000
[ 5.289907] netpoll: netconsole: remote IP 192.168.10.100
[ 5.289914] netpoll: netconsole: remote ethernet address xx:xx:xx:xx:xx:xx
[ 5.289922] netpoll: netconsole: br0 doesn't exist, aborting
[ 5.289929] netconsole: cleaning up
...
[ 9.392291] Bridge firewalling registered
[ 9.396805] device eth1 entered promiscuous mode
[ 9.418350] eth1:  setting full-duplex.
[ 9.421268] br0: port 1(eth1) entered forwarding state
[ 9.423354] br0: port 1(eth1) entered forwarding state


Is there a way to control or associate network device names prior to
udev renaming?

Thanks again,
Peter

^ permalink raw reply

* Re: [PATCH] ipv6: fix the bug when propagating Redirect Message
From: Steffen Klassert @ 2012-12-11 13:45 UTC (permalink / raw)
  To: Duan Jiong; +Cc: davem, netdev
In-Reply-To: <50C72DEC.1000008@gmail.com>

On Tue, Dec 11, 2012 at 08:58:20PM +0800, Duan Jiong wrote:
> 
> Just like you said, i try to use ndisc_parse_options() to instead
> of the loop, but i find the skb->data can't be changed in function
> ndisc_parse_options() due to lack of  arguments. So i think it is
> better to continue to use the loop. How do you think this?
> 

You can change the data pointer after ndisc_parse_options().
Something like the (untested) patch below should do it.

 include/net/ndisc.h |    7 +++++++
 net/ipv6/ndisc.c    |   20 ++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index 980d263..c17bccd 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -78,6 +78,13 @@ struct ra_msg {
 	__be32			retrans_timer;
 };
 
+struct rd_msg {
+        struct icmp6hdr	icmph;
+        struct in6_addr	target;
+        struct in6_addr	dest;
+	__u8		opt[0];
+};
+
 struct nd_opt_hdr {
 	__u8		nd_opt_type;
 	__u8		nd_opt_len;
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 2edce30..9afd23f 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1333,6 +1333,12 @@ out:
 
 static void ndisc_redirect_rcv(struct sk_buff *skb)
 {
+	u8 *hdr;
+	struct ndisc_options ndopts;
+	struct rd_msg *msg = (struct rd_msg *) skb_transport_header(skb);
+	u32 ndoptlen = skb->tail - (skb->transport_header +
+				    offsetof(struct rd_msg, opt));
+
 #ifdef CONFIG_IPV6_NDISC_NODETYPE
 	switch (skb->ndisc_nodetype) {
 	case NDISC_NODETYPE_HOST:
@@ -1349,6 +1355,20 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
 		return;
 	}
 
+	if (!ndisc_parse_options(msg->opt, ndoptlen, &ndopts)) {
+		ND_PRINTK(2, warn, "Redirect: invalid ND options\n");
+		return;
+	}
+
+	if (!ndopts.nd_opts_rh)
+		return;
+
+	hdr = (u8 *) ndopts.nd_opts_rh;
+	hdr += 8;
+
+	if (!pskb_pull(skb, hdr - skb_transport_header(skb)))
+		return;
+
 	icmpv6_notify(skb, NDISC_REDIRECT, 0, 0);
 }
 
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH][RFC] smsc95xx: enable dynamic autosuspend (RFC)
From: Steve Glendinning @ 2012-12-11 13:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, Steve Glendinning, netdev, linux-usb,
	Greg Kroah-Hartman
In-Reply-To: <CACVXFVMNshNMJ+hYHTCfE2+UOuMgGLb0dKq8qEM-LfUseuy2qA@mail.gmail.com>

On 11 December 2012 12:53, Ming Lei <ming.lei@canonical.com> wrote:
> On Tue, Dec 11, 2012 at 6:27 PM, Oliver Neukum <oliver@neukum.org> wrote:
>> So they can autosuspend if the interface is up and no cable is plugged
>> in?
>
> From the open datasheet, that is the suspend 1 mode, which is supported
> by all LAN95xx devices. Steve, correct me if I am wrong.

All parts support SUSPEND1, but some parts can't 100% reliably wake on
ENERGYON - some link partners will wake them but others won't.  The
driver already detects parts that work reliably with all link partners
and sets the FEATURE_PHY_NLP_CROSSOVER feature flag.

I didn't block these devices from configuring WOL, because they do
work in *some* cases and the user is explicitly requesting to wake the
system so we try to do that (and sometimes succeed).

>>> I suggest to introduce link-off triggered runtime suspend for these
>>> usbnet devices(non-LAN9500A device, devices which don't support
>>> USB auto-suspend), and I have posted one patch set before[1].
>>> If no one objects that, I'd like to post them again with some fix and
>>> update for checking link after link_reset().
>>
>> If you can get rid of a periodic work this would be great.
>
> For the LAN95xx devices, the periodic work isn't needed because
> they may generate remote wakeup when link change is detected.

As above, some parts will do this but some will not.  I think we
should only consider sleeping the part if we're sure it'll wake up
when a cable is connected!

-- 
Steve Glendinning

^ permalink raw reply

* [PATCHv8] virtio-spec: virtio network device multiqueue support
From: Michael S. Tsirkin @ 2012-12-11 13:10 UTC (permalink / raw)
  To: Rusty Russell; +Cc: bhutchings, netdev, kvm, virtualization

Add multiqueue support to virtio network device.
Add a new feature flag VIRTIO_NET_F_MQ for this feature, a new
configuration field max_virtqueue_pairs to detect supported number of
virtqueues as well as a new command VIRTIO_NET_CTRL_MQ to program
packet steering for unidirectional protocols.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

Changes in v8:
- Fix value for CTRL_MQ command

Changes in v7:
- 8000h -> 0x8000 at Rusty's request

Changes in v6:
- rename RFS -> multiqueue to avoid confusion with RFS in linux
  mention automatic receive steering as Rusty suggested

Changes in v5:
- Address Rusty's comments.
  Changes are only in the text, not the ideas.
- Some minor formatting changes.

Changes in v4:
- address Jason's comments
- have configuration specify the number of VQ pairs and not pairs - 1

Changes in v3:
- rename multiqueue -> rfs this is what we support
- Be more explicit about what driver should do.
- Simplify layout making VQs functionality depend on feature.
- Remove unused commands, only leave in programming # of queues

Changes in v2:
Address Jason's comments on v2:
- Changed STEERING_HOST to STEERING_RX_FOLLOWS_TX:
   this is both clearer and easier to support.
   It does not look like we need a separate steering command
   since host can just watch tx packets as they go.
- Moved RX and TX steering sections near each other.
- Add motivation for other changes in v2

Changes in v1 (from Jason's rfc):
- reserved vq 3: this makes all rx vqs even and tx vqs odd, which
   looks nicer to me.
- documented packet steering, added a generalized steering programming
   command. Current modes are single queue and host driven multiqueue,
   but I envision support for guest driven multiqueue in the future.
- make default vqs unused when in mq mode - this wastes some memory
   but makes it more efficient to switch between modes as
   we can avoid this causing packet reordering.

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 83f2771..8d16610 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -59,6 +59,7 @@
 \author -608949062 "Rusty Russell,,," 
 \author -385801441 "Cornelia Huck" cornelia.huck@de.ibm.com
 \author 1531152142 "Paolo Bonzini,,," 
+\author 1986246365 "Michael S. Tsirkin" 
 \end_header
 
 \begin_body
@@ -4170,9 +4171,46 @@ ID 1
 \end_layout
 
 \begin_layout Description
-Virtqueues 0:receiveq.
- 1:transmitq.
- 2:controlq
+Virtqueues 0:receiveq
+\change_inserted 1986246365 1352742829
+0
+\change_unchanged
+.
+ 1:transmitq
+\change_inserted 1986246365 1352742832
+0
+\change_deleted 1986246365 1352742947
+.
+ 
+\change_inserted 1986246365 1352742952
+.
+ ....
+ 2N
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1354531595
+N=0 if VIRTIO_NET_F_MQ is not negotiated, otherwise N is derived from 
+\emph on
+max_virtqueue_pairs
+\emph default
+ control
+\emph on
+ 
+\emph default
+field.
+ 
+\end_layout
+
+\end_inset
+
+: receivqN.
+ 2N+1: transmitqN.
+ 2N+
+\change_unchanged
+2:controlq
 \begin_inset Foot
 status open
 
@@ -4343,6 +4381,16 @@ VIRTIO_NET_F_CTRL_VLAN
 
 \begin_layout Description
 VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous packets.
+\change_inserted 1986246365 1352742767
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1986246365 1352742808
+VIRTIO_NET_F_MQ(22) Device supports multiqueue with automatic receive steering.
+\change_unchanged
+
 \end_layout
 
 \end_deeper
@@ -4355,11 +4403,45 @@ configuration
 \begin_inset space ~
 \end_inset
 
-layout Two configuration fields are currently defined.
+layout 
+\change_deleted 1986246365 1352743300
+Two
+\change_inserted 1986246365 1354531413
+Three
+\change_unchanged
+ configuration fields are currently defined.
  The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
  is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
  Two read-only bits are currently defined for the status field: VIRTIO_NET_S_LIN
 K_UP and VIRTIO_NET_S_ANNOUNCE.
+
+\change_inserted 1986246365 1354531470
+ The following read-only field, 
+\emph on
+max_virtqueue_pairs
+\emph default
+ only exists if VIRTIO_NET_F_MQ is set.
+ This field specifies the maximum number of each of transmit and receive
+ virtqueues (receiveq0..receiveq
+\emph on
+N
+\emph default
+ and transmitq0..transmitq
+\emph on
+N
+\emph default
+ respectively; 
+\emph on
+N
+\emph default
+=
+\emph on
+max_virtqueue_pairs - 1
+\emph default
+) that can be configured once VIRTIO_NET_F_MQ is negotiated.
+ Legal values for this field are 1 to 0x8000.
+
+\change_unchanged
  
 \begin_inset listings
 inline false
@@ -4392,6 +4474,17 @@ struct virtio_net_config {
 \begin_layout Plain Layout
 
     u16 status;
+\change_inserted 1986246365 1354531427
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1354531437
+
+    u16 max_virtqueue_pairs;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -4410,7 +4503,24 @@ Device Initialization
 
 \begin_layout Enumerate
 The initialization routine should identify the receive and transmission
- virtqueues.
+ virtqueues
+\change_inserted 1986246365 1352744077
+, up to N+1 of each kind
+\change_unchanged
+.
+
+\change_inserted 1986246365 1352743942
+ If VIRTIO_NET_F_MQ feature bit is negotiated, 
+\emph on
+N=max_virtqueue_pairs-1
+\emph default
+, otherwise identify 
+\emph on
+N=0
+\emph default
+.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Enumerate
@@ -4452,10 +4562,33 @@ status
 
  config field.
  Otherwise, the link should be assumed active.
+\change_inserted 1986246365 1354529306
+
 \end_layout
 
 \begin_layout Enumerate
-The receive virtqueue should be filled with receive buffers.
+
+\change_inserted 1986246365 1354531717
+Only receiveq0, transmitq0 and controlq are used by default.
+ To use more queues driver must negotiate the VIRTIO_NET_F_MQ feature;
+ initialize up to 
+\emph on
+max_virtqueue_pairs
+\emph default
+ of each of transmit and receive queues; execute_VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SE
+T command specifying the number of the transmit and receive queues that
+ is going to be used and wait until the device consumes the controlq buffer
+ and acks this command.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Enumerate
+The receive virtqueue
+\change_inserted 1986246365 1352743953
+s
+\change_unchanged
+ should be filled with receive buffers.
  This is described in detail below in 
 \begin_inset Quotes eld
 \end_inset
@@ -4550,8 +4683,15 @@ Device Operation
 \end_layout
 
 \begin_layout Standard
-Packets are transmitted by placing them in the transmitq, and buffers for
- incoming packets are placed in the receiveq.
+Packets are transmitted by placing them in the transmitq
+\change_inserted 1986246365 1353593685
+0..transmitqN
+\change_unchanged
+, and buffers for incoming packets are placed in the receiveq
+\change_inserted 1986246365 1353593692
+0..receiveqN
+\change_unchanged
+.
  In each case, the packet itself is preceeded by a header:
 \end_layout
 
@@ -4861,6 +5001,17 @@ If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at least the
 struct virtio_net_hdr
 \family default
 .
+\change_inserted 1986246365 1353594518
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353594638
+If VIRTIO_NET_F_MQ is negotiated, each of receiveq0...receiveqN that will
+ be used should be populated with receive buffers.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Subsection*
@@ -5293,8 +5444,151 @@ Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control vq.
  
 \end_layout
 
-\begin_layout Enumerate
+\begin_layout Subsection*
+
+\change_inserted 1986246365 1353593879
+Automatic receive steering in multiqueue mode
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1354528882
+If the driver negotiates the VIRTIO_NET_F_MQ feature bit (depends on VIRTIO_NET
+_F_CTRL_VQ), it can transmit outgoing packets on one of the multiple transmitq0..t
+ransmitqN and ask the device to queue incoming packets into one the multiple
+ receiveq0..receiveqN depending on the packet flow.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353594292
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594178
+
+struct virtio_net_ctrl_mq {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594212
+
+	u16 virtqueue_pairs;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594172
+
+};
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594172
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594263
+
+#define VIRTIO_NET_CTRL_MQ    4
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594273
+
+ #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET        0 
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594273
+
+ #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN        1 
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1986246365 1353594273
+
+ #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX        0x8000
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1354531492
+Multiqueue is disabled by default.
+ Driver enables multiqueue by executing the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command,
+ specifying the number of the transmit and receive queues that will be used;
+ thus transmitq0..transmitqn and receiveq0..receiveqn where 
+\emph on
+n=virtqueue_pairs-1
+\emph default
+ will be used.
+ All these virtqueues must have been pre-configured in advance.
+ The range of legal values for the
+\emph on
+ virtqueue_pairs
+\emph off
+ field is between 1 and 
+\emph on
+max_virtqueue_pairs
+\emph off
+.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1353595328
+When multiqueue is enabled, device uses automatic receive
+steering based on packet flow.
+Programming of the receive steering classificator is implicit.
+ Transmitting a packet of a specific flow on transmitqX will cause incoming
+ packets for this flow to be steered to receiveqX.
+ For uni-directional protocols, or where no packets have been transmitted
+ yet, device will steer a packet to a random queue out of the specified
+ receiveq0..receiveqn.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1986246365 1354528710
+Multiqueue is disabled by setting 
+\emph on
+virtqueue_pairs = 1
+\emph default
+ (this is the default).
+ After the command is consumed by the device, the device will not steer
+ new packets on virtqueues receveq1..receiveqN (i.e.
+ other than receiveq0) nor read from transmitq1..transmitqN (i.e.
+ other than transmitq0); accordingly, driver should not transmit new packets
+ on virtqueues other than transmitq0.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_deleted 1986246365 1353593873
 .
+
+\change_unchanged
  
 \end_layout

^ permalink raw reply related

* Re: [PATCH] ipv6: fix the bug when propagating Redirect Message
From: Duan Jiong @ 2012-12-11 12:58 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: davem, netdev
In-Reply-To: <20121024045410.GF27385@secunet.com>

于 2012/10/24 12:54, Steffen Klassert 写道:
> On Tue, Oct 23, 2012 at 11:26:25PM +0800, Duan Jiong wrote:
>>
>> Before using icmpv6_notify() to propagate redirect, change skb->data
>> to poing the IP packet that triggered the sending of the Redirect.
>>
>> Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
>> ---
>>  net/ipv6/ndisc.c |   39 +++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
>> index ff36194..0f73303 100644
>> --- a/net/ipv6/ndisc.c
>> +++ b/net/ipv6/ndisc.c
>> @@ -1334,6 +1334,11 @@ out:
>>  
>>  static void ndisc_redirect_rcv(struct sk_buff *skb)
>>  {
>> +	int opt_len;
>> +	int opt_offset;
>> +	int ndisc_head_len;
>> +	struct nd_opt_hdr *nd_opt;
>> +	
>>  #ifdef CONFIG_IPV6_NDISC_NODETYPE
>>  	switch (skb->ndisc_nodetype) {
>>  	case NDISC_NODETYPE_HOST:
>> @@ -1350,6 +1355,40 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
>>  		return;
>>  	}
>>  
>> +	ndisc_head_len = sizeof(struct icmp6hdr) + 2*sizeof(struct in6_addr);
>> +	if (!pskb_may_pull(skb, ndisc_head_len)) {
>> +		return;
>> +	}
>> +
>> +	nd_opt = (struct nd_opt_hdr *)(skb->data + ndisc_head_len);
>> +
>> +	opt_len = skb->tail - skb->transport_header - ndisc_head_len;
>> +	if (opt_len < 0) {
>> +		return;
>> +	}
>> +	while (opt_len) {
>> +		int l;
>> +	
>> +		if (opt_len < sizeof(struct nd_opt_hdr)) {
>> +			return;
>> +		}
>> +		l = nd_opt->nd_opt_len << 3;
>> +		if (opt_len < l || l == 0) {
>> +			return;
>> +		}
>> +		if (nd_opt->nd_opt_type == ND_OPT_REDIRECT_HDR) {
>> +			__skb_pull(skb, ndisc_head_len + opt_offset + 8);
>> +			break;
>> +		}
>> +		opt_len -= l;
>> +		nd_opt = ((void *)nd_opt) + 1;
>> +		opt_offset += 1;
>> +	}
> 
> Instead of the above loop, you could use ndisc_parse_options().
> This does the same what you are doing here and it would make it
> a bit clearer what's going on.
> 
I apologize for not replying to you earlier,and i will continue 
to update my patches. 

Just like you said, i try to use ndisc_parse_options() to instead
of the loop, but i find the skb->data can't be changed in function
ndisc_parse_options() due to lack of  arguments. So i think it is
better to continue to use the loop. How do you think this?

Thanks!

^ permalink raw reply

* Re: [PATCH][RFC] smsc95xx: enable dynamic autosuspend (RFC)
From: Ming Lei @ 2012-12-11 12:53 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Steve Glendinning, Steve Glendinning, netdev,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman
In-Reply-To: <1881907.JmjSTg2PBW-ugxBuEnWX9yG/4A2pS7c2Q@public.gmane.org>

On Tue, Dec 11, 2012 at 6:27 PM, Oliver Neukum <oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org> wrote:
> On Tuesday 11 December 2012 10:24:57 Ming Lei wrote:
>> On Mon, Dec 10, 2012 at 10:18 PM, Steve Glendinning <steve-nksJyM/082jR7s880joybQ@public.gmane.org> wrote:
>
>> > Thanks, so something like this should do the job?
>>
>> This will do, but not simple as clearing .manage_power function
>> pointer in bind(), and still disable runtime suspend for link off case
>> since these devices which don't support suspend 3 can generate
>> remote wakeup for link change event.
>
> So they can autosuspend if the interface is up and no cable is plugged
> in?

>From the open datasheet, that is the suspend 1 mode, which is supported
by all LAN95xx devices. Steve, correct me if I am wrong.

>
>> I suggest to introduce link-off triggered runtime suspend for these
>> usbnet devices(non-LAN9500A device, devices which don't support
>> USB auto-suspend), and I have posted one patch set before[1].
>> If no one objects that, I'd like to post them again with some fix and
>> update for checking link after link_reset().
>
> If you can get rid of a periodic work this would be great.

For the LAN95xx devices, the periodic work isn't needed because
they may generate remote wakeup when link change is detected.

In fact, I have test data which can show a much power save
on OMAP3 based beagle board plus asix usbnet device with
the periodic work. IMO, the power save after introducing periodic
timer depends on the arch or platform, there should be much power
save if the CPU power consumption is very less. So how about letting
module parameter switch on/off the periodic work?


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next rfc 0/2] Allow unpriveledge user to disable tuntap queue
From: Michael S. Tsirkin @ 2012-12-11 12:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: pmoore, netdev, linux-kernel, mprivozn
In-Reply-To: <1355223827-57290-1-git-send-email-jasowang@redhat.com>

On Tue, Dec 11, 2012 at 07:03:45PM +0800, Jason Wang wrote:
> This series is an rfc that tries to solve the issue that the queues of tuntap
> could not be disabled/enabled by unpriveledged user. This is needed for
> unpriveledge userspace such as qemu since guest may change the number of queues
> at any time, qemu needs to configure the tuntap to disable/enable a specific
> queue.
> 
> Instead of introducting new flag/ioctls, this series tries to re-use the current
> TUNSETQUEUE and IFF_ATTACH_QUEUE/IFF_DETACH_QUEUE. After this change,
> IFF_DETACH_QUEUE is used to disable a specific queue instead of detaching all
> its state from tuntap. IFF_ATTACH_QUEUE is used to do: 1) creating new queue to
> a tuntap device, in this situation, previous DAC check is still done. 2)
> re-enable the queue previously disabled by IFF_DETACH_QUEUE, in this situation,
> we can bypass some checking when we do during queue creating (the check need to
> be done here needs discussion.
> 
> Management software (such as libvirt) then can do:
> - TUNSETIFF to creating device and queue 0
> - TUNSETQUEUE to create the rest of queues
> - Passing them to unpriveledge userspace (such as qemu)

Sorry I find this somewhat confusing.
Why doesn't management call TUNSETIFF to create all queues -
seems cleaner, no? Also has the advantage that it works
without selinux changes.

So why don't we simply fix TUNSETQUEUE such that
1. It only works if already attached to device by TUNSETIFF
2. It does not attach/detach, instead simply enables/disables the queue

This way no new flags, just tweak the semantics of the
existing ones. Need to do this before 3.8 is out though
otherwise we'll end up maintaining the old semantics forever.

> Then the unpriveledge userspace can enable and disable a specific queue through
> IFF_ATTACH_QUEUE and IFF_DETACH_QUEUE.
> 
> This is done by introducing a enabled flags were used to notify whether the
> queue is enabled, and tuntap only send/receive packets when it was enabled.
> 
> Please comment, thanks!
> 
> Jason Wang (2):
>   tuntap: forbid calling TUNSETQUEUE for a persistent device with no
>     queues
>   tuntap: allow unpriveledge user to enable and disable queues
> 
>  drivers/net/tun.c |   78 +++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 73 insertions(+), 5 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next rfc 2/2] tuntap: allow unpriveledge user to enable and disable queues
From: Michael S. Tsirkin @ 2012-12-11 12:30 UTC (permalink / raw)
  To: Jason Wang; +Cc: pmoore, netdev, linux-kernel, mprivozn
In-Reply-To: <1355223827-57290-3-git-send-email-jasowang@redhat.com>

On Tue, Dec 11, 2012 at 07:03:47PM +0800, Jason Wang wrote:
> Currently, when a file is attached to tuntap through TUNSETQUEUE, the uid/gid
> and CAP_NET_ADMIN were checked, and we use this ioctl to create and destroy
> queues. Sometimes, userspace such as qemu need to the ability to enable and
> disable a specific queue without priveledge since guest operating system may
> change the number of queues it want use.
> 
> To support this kind of ability, this patch introduce a flag enabled which is
> used to track whether the queue is enabled by userspace. And also restrict that
> only one deivce could be used for a queue to attach. With this patch, the DAC
> checking when adding queues through IFF_ATTACH_QUEUE is still done and after
> this, IFF_DETACH_QUEUE/IFF_ATTACH_QUEUE  could be used to disable/enable this
> queue.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/tun.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 files changed, 73 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index d593f56..43831a7 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -138,6 +138,7 @@ struct tun_file {
>  	/* only used for fasnyc */
>  	unsigned int flags;
>  	u16 queue_index;
> +	bool enabled;
>  };
>  
>  struct tun_flow_entry {
> @@ -345,9 +346,11 @@ unlock:
>  static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
>  {
>  	struct tun_struct *tun = netdev_priv(dev);
> +	struct tun_file *tfile;
>  	struct tun_flow_entry *e;
>  	u32 txq = 0;
>  	u32 numqueues = 0;
> +	int i;
>  
>  	rcu_read_lock();
>  	numqueues = tun->numqueues;
> @@ -366,6 +369,19 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
>  			txq -= numqueues;
>  	}
>  
> +	tfile = rcu_dereference(tun->tfiles[txq]);
> +	if (unlikely(!tfile->enabled))

This unlikely tag is suspicious. It should be perfectly
legal to use less queues than created.

> +		/* tun_detach() should make sure there's at least one queue
> +		 * could be used to do the tranmission.
> +		 */
> +		for (i = 0; i < numqueues; i++) {
> +			tfile = rcu_dereference(tun->tfiles[i]);
> +			if (tfile->enabled) {
> +				txq = i;
> +				break;
> +			}
> +		}
> +

Worst case this will do a linear scan over all queueus on each packet.
Instead, I think we need a list of all queues and only install
the active ones in the array.

>  	rcu_read_unlock();
>  	return txq;
>  }
> @@ -386,6 +402,36 @@ static void tun_set_real_num_queues(struct tun_struct *tun)
>  	netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
>  }
>  
> +static int tun_enable(struct tun_file *tfile)
> +{
> +	if (tfile->enabled == true)

simply if (tfile->enabled)

> +		return -EINVAL;

Actually it's better to have operations be
idempotent. If it's enabled, enabling should
be a NOP not an error.

> +
> +	tfile->enabled = true;
> +	return 0;
> +}
> +
> +static int tun_disable(struct tun_file *tfile)
> +{
> +	struct tun_struct *tun = rcu_dereference_protected(tfile->tun,
> +							   lockdep_rtnl_is_held());
> +	u16 index = tfile->queue_index;
> +
> +	if (!tun)
> +		return -EINVAL;
> +
> +	if (tun->numqueues == 1)
> +		return -EINVAL;

So if there's a single queue we can't disable it,
but if there are > 1 we can disable them all.
This seems arbitrary.

> +
> +	BUG_ON(index >= tun->numqueues);
> +	tfile->enabled = false;
> +
> +	synchronize_net();
> +	tun_flow_delete_by_queue(tun, index);
> +
> +	return 0;
> +}
> +
>  static void __tun_detach(struct tun_file *tfile, bool clean)
>  {
>  	struct tun_file *ntfile;
> @@ -446,6 +492,7 @@ static void tun_detach_all(struct net_device *dev)
>  		BUG_ON(!tfile);
>  		wake_up_all(&tfile->wq.wait);
>  		rcu_assign_pointer(tfile->tun, NULL);
> +		tfile->enabled = false;
>  		--tun->numqueues;
>  	}
>  	BUG_ON(tun->numqueues != 0);
> @@ -490,6 +537,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
>  	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>  	sock_hold(&tfile->sk);
>  	tun->numqueues++;
> +	tfile->enabled = true;
>  
>  	tun_set_real_num_queues(tun);
>  
> @@ -672,6 +720,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  	if (txq >= tun->numqueues)
>  		goto drop;
>  
> +	/* Drop packet if the queue was not enabled */
> +	if (!tfile->enabled)
> +		goto drop;
> +
>  	tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
>  
>  	BUG_ON(!tfile);
> @@ -1010,6 +1062,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>  	bool zerocopy = false;
>  	int err;
>  
> +	if (!tfile->enabled)
> +		return -EINVAL;
> +
>  	if (!(tun->flags & TUN_NO_PI)) {
>  		if ((len -= sizeof(pi)) > total_len)
>  			return -EINVAL;
> @@ -1199,6 +1254,9 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>  	struct tun_pi pi = { 0, skb->protocol };
>  	ssize_t total = 0;
>  
> +	if (!tfile->enabled)
> +		return -EINVAL;
> +
>  	if (!(tun->flags & TUN_NO_PI)) {
>  		if ((len -= sizeof(pi)) < 0)
>  			return -EINVAL;
> @@ -1769,15 +1827,21 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>  		if (dev->netdev_ops != &tap_netdev_ops &&
>  			dev->netdev_ops != &tun_netdev_ops)
>  			ret = -EINVAL;
> -		else if (tun_not_capable(tun))
> -			ret = -EPERM;
> -		/* TUNSETIFF is needed to do permission checking */
> -		else if (tun->numqueues == 0)
> -			ret = -EPERM;
> -		else
> -			ret = tun_attach(tun, file);
> +		else {
> +			if (!rcu_dereference(tfile->tun)) {

Should be rcu_dereference_protected.

> +				if (tun_not_capable(tun) ||
> +				    tun->numqueues == 0)
> +					ret = -EPERM;
> +				else
> +					ret = tun_attach(tun, file);
> +			}
> +			else {
> +				/* FIXME: permission check? */
> +				ret = tun_enable(tfile);
> +			}
> +		}
>  	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
> -		__tun_detach(tfile, false);
> +		tun_disable(tfile);
>  	else
>  		ret = -EINVAL;
>  
> @@ -2085,6 +2149,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
>  	tfile->socket.file = file;
>  	tfile->socket.ops = &tun_socket_ops;
>  
> +	tfile->enabled = false;
>  	sock_init_data(&tfile->socket, &tfile->sk);
>  	sk_change_net(&tfile->sk, tfile->net);
>  
> -- 
> 1.7.1

^ permalink raw reply

* [PATCH ETHTOOL] Added dst-mac parameter for L3/L4 flow spec rules. This is usefull in vSwitch configurations.
From: Amir Vadai @ 2012-12-11 12:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Yan Burman
In-Reply-To: <1355227436-18383-1-git-send-email-amirv@mellanox.com>

From: Yan Burman <yanb@mellanox.com>

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 ethtool-copy.h | 11 +++++++----
 ethtool.8.in   |  6 ++++++
 ethtool.c      |  5 +++++
 rxclass.c      | 62 ++++++++++++++++++++++++++++++++++++++++------------------
 4 files changed, 61 insertions(+), 23 deletions(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index 4801eef..d352f20 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -500,13 +500,15 @@ union ethtool_flow_union {
 	struct ethtool_ah_espip4_spec		esp_ip4_spec;
 	struct ethtool_usrip4_spec		usr_ip4_spec;
 	struct ethhdr				ether_spec;
-	__u8					hdata[60];
+	__u8					hdata[52];
 };
 
 struct ethtool_flow_ext {
-	__be16	vlan_etype;
-	__be16	vlan_tci;
-	__be32	data[2];
+	__u8		padding[2];
+	unsigned char	h_dest[ETH_ALEN];	/* destination eth addr	*/
+	__be16		vlan_etype;
+	__be16		vlan_tci;
+	__be32		data[2];
 };
 
 /**
@@ -1027,6 +1029,7 @@ enum ethtool_sfeatures_retval_bits {
 #define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
 /* Flag to enable additional fields in struct ethtool_rx_flow_spec */
 #define	FLOW_EXT	0x80000000
+#define	FLOW_MAC_EXT	0x40000000
 
 /* L3-L4 network traffic flow hash options */
 #define	RXH_L2DA	(1 << 1)
diff --git a/ethtool.8.in b/ethtool.8.in
index e701919..a52e484 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -268,6 +268,7 @@ ethtool \- query or control network driver and hardware settings
 .BM vlan\-etype
 .BM vlan
 .BM user\-def
+.RB [ dst-mac \ \*(MA\ [ m \ \*(MA]]
 .BN action
 .BN loc
 .RB |
@@ -739,6 +740,11 @@ Includes the VLAN tag and an optional mask.
 .BI user\-def \ N \\fR\ [\\fPm \ N \\fR]\\fP
 Includes 64-bits of user-specific data and an optional mask.
 .TP
+.BR dst-mac \ \*(MA\ [ m \ \*(MA]
+Includes the destination MAC address, specified as 6 bytes in hexadecimal
+separated by colons, along with an optional mask.
+Valid for all IPv4 based flow-types.
+.TP
 .BI action \ N
 Specifies the Rx queue to send packets to, or some other action.
 .TS
diff --git a/ethtool.c b/ethtool.c
index 345c21c..55bc082 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3231,6 +3231,10 @@ static int flow_spec_to_ntuple(struct ethtool_rx_flow_spec *fsp,
 	if (fsp->location != RX_CLS_LOC_ANY)
 		return -1;
 
+	/* destination MAC address in L3/L4 rules is not supported by ntuple */
+	if (fsp->flow_type & FLOW_MAC_EXT)
+		return -1;
+
 	/* verify ring cookie can transfer to action */
 	if (fsp->ring_cookie > INT_MAX && fsp->ring_cookie < (u64)(-2))
 		return -1;
@@ -3814,6 +3818,7 @@ static const struct option {
 	  "			[ vlan-etype %x [m %x] ]\n"
 	  "			[ vlan %x [m %x] ]\n"
 	  "			[ user-def %x [m %x] ]\n"
+	  "			[ dst-mac %x:%x:%x:%x:%x:%x [m %x:%x:%x:%x:%x:%x] ]\n"
 	  "			[ action %d ]\n"
 	  "			[ loc %d]] |\n"
 	  "		delete %d\n" },
diff --git a/rxclass.c b/rxclass.c
index e1633a8..1564b62 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -41,26 +41,38 @@ static void rxclass_print_ipv4_rule(__be32 sip, __be32 sipm, __be32 dip,
 
 static void rxclass_print_nfc_spec_ext(struct ethtool_rx_flow_spec *fsp)
 {
-	u64 data, datam;
-	__u16 etype, etypem, tci, tcim;
+	if (fsp->flow_type & FLOW_EXT) {
+		u64 data, datam;
+		__u16 etype, etypem, tci, tcim;
+		etype = ntohs(fsp->h_ext.vlan_etype);
+		etypem = ntohs(~fsp->m_ext.vlan_etype);
+		tci = ntohs(fsp->h_ext.vlan_tci);
+		tcim = ntohs(~fsp->m_ext.vlan_tci);
+		data = (u64)ntohl(fsp->h_ext.data[0]) << 32;
+		data = (u64)ntohl(fsp->h_ext.data[1]);
+		datam = (u64)ntohl(~fsp->m_ext.data[0]) << 32;
+		datam |= (u64)ntohl(~fsp->m_ext.data[1]);
 
-	if (!(fsp->flow_type & FLOW_EXT))
-		return;
+		fprintf(stdout,
+			"\tVLAN EtherType: 0x%x mask: 0x%x\n"
+			"\tVLAN: 0x%x mask: 0x%x\n"
+			"\tUser-defined: 0x%llx mask: 0x%llx\n",
+			etype, etypem, tci, tcim, data, datam);
+	}
 
-	etype = ntohs(fsp->h_ext.vlan_etype);
-	etypem = ntohs(~fsp->m_ext.vlan_etype);
-	tci = ntohs(fsp->h_ext.vlan_tci);
-	tcim = ntohs(~fsp->m_ext.vlan_tci);
-	data = (u64)ntohl(fsp->h_ext.data[0]) << 32;
-	data = (u64)ntohl(fsp->h_ext.data[1]);
-	datam = (u64)ntohl(~fsp->m_ext.data[0]) << 32;
-	datam |= (u64)ntohl(~fsp->m_ext.data[1]);
+	if (fsp->flow_type & FLOW_MAC_EXT) {
+		unsigned char *dmac, *dmacm;
 
-	fprintf(stdout,
-		"\tVLAN EtherType: 0x%x mask: 0x%x\n"
-		"\tVLAN: 0x%x mask: 0x%x\n"
-		"\tUser-defined: 0x%llx mask: 0x%llx\n",
-		etype, etypem, tci, tcim, data, datam);
+		dmac = fsp->h_ext.h_dest;
+		dmacm = fsp->m_ext.h_dest;
+
+		fprintf(stdout,
+			"\tDest MAC addr: %02X:%02X:%02X:%02X:%02X:%02X"
+			" mask: %02X:%02X:%02X:%02X:%02X:%02X\n",
+			dmac[0], dmac[1], dmac[2], dmac[3], dmac[4],
+			dmac[5], dmacm[0], dmacm[1], dmacm[2], dmacm[3],
+			dmacm[4], dmacm[5]);
+	}
 }
 
 static void rxclass_print_nfc_rule(struct ethtool_rx_flow_spec *fsp)
@@ -70,7 +82,7 @@ static void rxclass_print_nfc_rule(struct ethtool_rx_flow_spec *fsp)
 
 	fprintf(stdout,	"Filter: %d\n", fsp->location);
 
-	flow_type = fsp->flow_type & ~FLOW_EXT;
+	flow_type = fsp->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT);
 
 	invert_flow_mask(fsp);
 
@@ -172,7 +184,7 @@ static void rxclass_print_nfc_rule(struct ethtool_rx_flow_spec *fsp)
 static void rxclass_print_rule(struct ethtool_rx_flow_spec *fsp)
 {
 	/* print the rule in this location */
-	switch (fsp->flow_type & ~FLOW_EXT) {
+	switch (fsp->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
 	case TCP_V4_FLOW:
 	case UDP_V4_FLOW:
 	case SCTP_V4_FLOW:
@@ -533,6 +545,7 @@ typedef enum {
 #define NTUPLE_FLAG_VLAN	0x100
 #define NTUPLE_FLAG_UDEF	0x200
 #define NTUPLE_FLAG_VETH	0x400
+#define NFC_FLAG_MAC_ADDR	0x800
 
 struct rule_opts {
 	const char	*name;
@@ -571,6 +584,9 @@ static const struct rule_opts rule_nfc_tcp_ip4[] = {
 	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
 	  offsetof(struct ethtool_rx_flow_spec, h_ext.data),
 	  offsetof(struct ethtool_rx_flow_spec, m_ext.data) },
+	{ "dst-mac", OPT_MAC, NFC_FLAG_MAC_ADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.h_dest),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.h_dest) },
 };
 
 static const struct rule_opts rule_nfc_esp_ip4[] = {
@@ -599,6 +615,9 @@ static const struct rule_opts rule_nfc_esp_ip4[] = {
 	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
 	  offsetof(struct ethtool_rx_flow_spec, h_ext.data),
 	  offsetof(struct ethtool_rx_flow_spec, m_ext.data) },
+	{ "dst-mac", OPT_MAC, NFC_FLAG_MAC_ADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.h_dest),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.h_dest) },
 };
 
 static const struct rule_opts rule_nfc_usr_ip4[] = {
@@ -639,6 +658,9 @@ static const struct rule_opts rule_nfc_usr_ip4[] = {
 	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
 	  offsetof(struct ethtool_rx_flow_spec, h_ext.data),
 	  offsetof(struct ethtool_rx_flow_spec, m_ext.data) },
+	{ "dst-mac", OPT_MAC, NFC_FLAG_MAC_ADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.h_dest),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.h_dest) },
 };
 
 static const struct rule_opts rule_nfc_ether[] = {
@@ -1063,6 +1085,8 @@ int rxclass_parse_ruleopts(struct cmd_context *ctx,
 		fsp->h_u.usr_ip4_spec.ip_ver = ETH_RX_NFC_IP4;
 	if (flags & (NTUPLE_FLAG_VLAN | NTUPLE_FLAG_UDEF | NTUPLE_FLAG_VETH))
 		fsp->flow_type |= FLOW_EXT;
+	if (flags & NFC_FLAG_MAC_ADDR)
+		fsp->flow_type |= FLOW_MAC_EXT;
 
 	return 0;
 
-- 
1.7.11.3

^ permalink raw reply related

* [PATCH net-next 0/3] Add destination MAC address to ethtool flow steering
From: Amir Vadai @ 2012-12-11 12:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Yan Burman

From: Yan Burman <yanb@mellanox.com>

In vSwitch configuration it is often beneficial to create flow steering
rules for L3/L4 traffic based on VM port. This requires destination MAC
address of that port to be present. Note that today the mlx4_en driver 
adds the mac address of itself to the flow spec, where under the new
ethtool flag suggested here it doesn't.

It may also be useful in macvlan devices.

These patches add kernel support for the new field (does not break old
userspace compatibility, so new ethtool will work on old kernels and
old ethtool will work with new kernels).

Also present here is the ethtool userspace patch.

See more details here http ://marc.info/?t=134977576500003

Yan Burman (2):
  net: ethtool: Add destination MAC address to flow steering API
  net/mlx4_en: Add support for destination MAC in steering rules

 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 21 +++++++++++++++------
 include/uapi/linux/ethtool.h                    | 11 +++++++----
 2 files changed, 22 insertions(+), 10 deletions(-)

-- 
1.7.11.3

^ permalink raw reply

* [PATCH net-next 1/2] net: ethtool: Add destination MAC address to flow steering API
From: Amir Vadai @ 2012-12-11 12:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Yan Burman
In-Reply-To: <1355227436-18383-1-git-send-email-amirv@mellanox.com>

From: Yan Burman <yanb@mellanox.com>

Add ability to specify destination MAC address for L3/L4 flow spec
in order to be able to specify action for different VM's under vSwitch
configuration. This change is transparent to older userspace.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 include/uapi/linux/ethtool.h | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index d3eaaaf..be8c41e 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -500,13 +500,15 @@ union ethtool_flow_union {
 	struct ethtool_ah_espip4_spec		esp_ip4_spec;
 	struct ethtool_usrip4_spec		usr_ip4_spec;
 	struct ethhdr				ether_spec;
-	__u8					hdata[60];
+	__u8					hdata[52];
 };
 
 struct ethtool_flow_ext {
-	__be16	vlan_etype;
-	__be16	vlan_tci;
-	__be32	data[2];
+	__u8		padding[2];
+	unsigned char	h_dest[ETH_ALEN];	/* destination eth addr	*/
+	__be16		vlan_etype;
+	__be16		vlan_tci;
+	__be32		data[2];
 };
 
 /**
@@ -1027,6 +1029,7 @@ enum ethtool_sfeatures_retval_bits {
 #define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
 /* Flag to enable additional fields in struct ethtool_rx_flow_spec */
 #define	FLOW_EXT	0x80000000
+#define	FLOW_MAC_EXT	0x40000000
 
 /* L3-L4 network traffic flow hash options */
 #define	RXH_L2DA	(1 << 1)
-- 
1.7.11.3

^ permalink raw reply related

* [PATCH net-next 2/2] net/mlx4_en: Add support for destination MAC in steering rules
From: Amir Vadai @ 2012-12-11 12:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Yan Burman
In-Reply-To: <1355227436-18383-1-git-send-email-amirv@mellanox.com>

From: Yan Burman <yanb@mellanox.com>

Implement destination MAC rule extension for L3/L4 rules in
flow steering. Usefull for vSwitch/macvlan configurations.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 4aaa7c3..87a87a7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -619,7 +619,13 @@ static int mlx4_en_validate_flow(struct net_device *dev,
 	if (cmd->fs.location >= MAX_NUM_OF_FS_RULES)
 		return -EINVAL;
 
-	switch (cmd->fs.flow_type & ~FLOW_EXT) {
+	if (cmd->fs.flow_type & FLOW_MAC_EXT) {
+		/* dest mac mask must be ff:ff:ff:ff:ff:ff */
+		if (memcmp(cmd->fs.m_ext.h_dest, &full_mac, ETH_ALEN))
+			return -EINVAL;
+	}
+
+	switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
 	case TCP_V4_FLOW:
 	case UDP_V4_FLOW:
 		if (cmd->fs.m_u.tcp_ip4_spec.tos)
@@ -747,7 +753,6 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
 					     struct list_head *rule_list_h)
 {
 	int err;
-	u64 mac;
 	__be64 be_mac;
 	struct ethhdr *eth_spec;
 	struct mlx4_en_priv *priv = netdev_priv(dev);
@@ -762,12 +767,16 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
 	if (!spec_l2)
 		return -ENOMEM;
 
-	mac = priv->mac & MLX4_MAC_MASK;
-	be_mac = cpu_to_be64(mac << 16);
+	if (cmd->fs.flow_type & FLOW_MAC_EXT) {
+		memcpy(&be_mac, cmd->fs.h_ext.h_dest, ETH_ALEN);
+	} else {
+		u64 mac = priv->mac & MLX4_MAC_MASK;
+		be_mac = cpu_to_be64(mac << 16);
+	}
 
 	spec_l2->id = MLX4_NET_TRANS_RULE_ID_ETH;
 	memcpy(spec_l2->eth.dst_mac_msk, &mac_msk, ETH_ALEN);
-	if ((cmd->fs.flow_type & ~FLOW_EXT) != ETHER_FLOW)
+	if ((cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) != ETHER_FLOW)
 		memcpy(spec_l2->eth.dst_mac, &be_mac, ETH_ALEN);
 
 	if ((cmd->fs.flow_type & FLOW_EXT) && cmd->fs.m_ext.vlan_tci) {
@@ -777,7 +786,7 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
 
 	list_add_tail(&spec_l2->list, rule_list_h);
 
-	switch (cmd->fs.flow_type & ~FLOW_EXT) {
+	switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
 	case ETHER_FLOW:
 		eth_spec = &cmd->fs.h_u.ether_spec;
 		memcpy(&spec_l2->eth.dst_mac, eth_spec->h_dest, ETH_ALEN);
-- 
1.7.11.3

^ permalink raw reply related

* Gianfar driver issue
From: Cedric VONCKEN @ 2012-12-11  9:59 UTC (permalink / raw)
  To: netdev

	Hi all, 

	I think he have an issue in Gianfar driver.

	When the Netdev tx queue timeout occurred, the function
gfar_timeout(..) is called. This function calls indirectly the
gfar_init_mac(..) function.

	In this function, the rctrl register is set to a default value.

	If the Promiscuous is enable on the net dev ( flag IFF_PROMISC
is set), the gfar_init_function does not reactivate it. 

	The Promiscuous mode is used for example when the netdev is
bridged. 
	
	I apply this patch to fix it.

	--- a/drivers/net/ethernet/freescale/gianfar.c.	2012-06-01
09:16:13.000000000 +0200
	+++ b/drivers/net/ethernet/freescale/gianfar.c	2012-12-11
10:38:23.000000000 +0100
	@@ -356,6 +356,11 @@
 	/* Configure the coalescing support */
 	gfar_configure_coalescing(priv, 0xFF, 0xFF);
 
+	if (ndev->flags & IFF_PROMISC) {
+		/* Set RCTRL to PROM */
+		rctrl |= RCTRL_PROM;
+	}
+
 	if (priv->rx_filer_enable) {
 		rctrl |= RCTRL_FILREN;
 		/* Program the RIR0 reg with the required distribution
*/



Cedric Voncken 

^ permalink raw reply

* [PATCH net-next rfc 2/2] tuntap: allow unpriveledge user to enable and disable queues
From: Jason Wang @ 2012-12-11 11:03 UTC (permalink / raw)
  To: mst, pmoore, netdev, linux-kernel; +Cc: mprivozn, Jason Wang
In-Reply-To: <1355223827-57290-1-git-send-email-jasowang@redhat.com>

Currently, when a file is attached to tuntap through TUNSETQUEUE, the uid/gid
and CAP_NET_ADMIN were checked, and we use this ioctl to create and destroy
queues. Sometimes, userspace such as qemu need to the ability to enable and
disable a specific queue without priveledge since guest operating system may
change the number of queues it want use.

To support this kind of ability, this patch introduce a flag enabled which is
used to track whether the queue is enabled by userspace. And also restrict that
only one deivce could be used for a queue to attach. With this patch, the DAC
checking when adding queues through IFF_ATTACH_QUEUE is still done and after
this, IFF_DETACH_QUEUE/IFF_ATTACH_QUEUE  could be used to disable/enable this
queue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 73 insertions(+), 8 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d593f56..43831a7 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -138,6 +138,7 @@ struct tun_file {
 	/* only used for fasnyc */
 	unsigned int flags;
 	u16 queue_index;
+	bool enabled;
 };
 
 struct tun_flow_entry {
@@ -345,9 +346,11 @@ unlock:
 static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
 {
 	struct tun_struct *tun = netdev_priv(dev);
+	struct tun_file *tfile;
 	struct tun_flow_entry *e;
 	u32 txq = 0;
 	u32 numqueues = 0;
+	int i;
 
 	rcu_read_lock();
 	numqueues = tun->numqueues;
@@ -366,6 +369,19 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
 			txq -= numqueues;
 	}
 
+	tfile = rcu_dereference(tun->tfiles[txq]);
+	if (unlikely(!tfile->enabled))
+		/* tun_detach() should make sure there's at least one queue
+		 * could be used to do the tranmission.
+		 */
+		for (i = 0; i < numqueues; i++) {
+			tfile = rcu_dereference(tun->tfiles[i]);
+			if (tfile->enabled) {
+				txq = i;
+				break;
+			}
+		}
+
 	rcu_read_unlock();
 	return txq;
 }
@@ -386,6 +402,36 @@ static void tun_set_real_num_queues(struct tun_struct *tun)
 	netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
 }
 
+static int tun_enable(struct tun_file *tfile)
+{
+	if (tfile->enabled == true)
+		return -EINVAL;
+
+	tfile->enabled = true;
+	return 0;
+}
+
+static int tun_disable(struct tun_file *tfile)
+{
+	struct tun_struct *tun = rcu_dereference_protected(tfile->tun,
+							   lockdep_rtnl_is_held());
+	u16 index = tfile->queue_index;
+
+	if (!tun)
+		return -EINVAL;
+
+	if (tun->numqueues == 1)
+		return -EINVAL;
+
+	BUG_ON(index >= tun->numqueues);
+	tfile->enabled = false;
+
+	synchronize_net();
+	tun_flow_delete_by_queue(tun, index);
+
+	return 0;
+}
+
 static void __tun_detach(struct tun_file *tfile, bool clean)
 {
 	struct tun_file *ntfile;
@@ -446,6 +492,7 @@ static void tun_detach_all(struct net_device *dev)
 		BUG_ON(!tfile);
 		wake_up_all(&tfile->wq.wait);
 		rcu_assign_pointer(tfile->tun, NULL);
+		tfile->enabled = false;
 		--tun->numqueues;
 	}
 	BUG_ON(tun->numqueues != 0);
@@ -490,6 +537,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
 	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
 	sock_hold(&tfile->sk);
 	tun->numqueues++;
+	tfile->enabled = true;
 
 	tun_set_real_num_queues(tun);
 
@@ -672,6 +720,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (txq >= tun->numqueues)
 		goto drop;
 
+	/* Drop packet if the queue was not enabled */
+	if (!tfile->enabled)
+		goto drop;
+
 	tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
 
 	BUG_ON(!tfile);
@@ -1010,6 +1062,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	bool zerocopy = false;
 	int err;
 
+	if (!tfile->enabled)
+		return -EINVAL;
+
 	if (!(tun->flags & TUN_NO_PI)) {
 		if ((len -= sizeof(pi)) > total_len)
 			return -EINVAL;
@@ -1199,6 +1254,9 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 	struct tun_pi pi = { 0, skb->protocol };
 	ssize_t total = 0;
 
+	if (!tfile->enabled)
+		return -EINVAL;
+
 	if (!(tun->flags & TUN_NO_PI)) {
 		if ((len -= sizeof(pi)) < 0)
 			return -EINVAL;
@@ -1769,15 +1827,21 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
 		if (dev->netdev_ops != &tap_netdev_ops &&
 			dev->netdev_ops != &tun_netdev_ops)
 			ret = -EINVAL;
-		else if (tun_not_capable(tun))
-			ret = -EPERM;
-		/* TUNSETIFF is needed to do permission checking */
-		else if (tun->numqueues == 0)
-			ret = -EPERM;
-		else
-			ret = tun_attach(tun, file);
+		else {
+			if (!rcu_dereference(tfile->tun)) {
+				if (tun_not_capable(tun) ||
+				    tun->numqueues == 0)
+					ret = -EPERM;
+				else
+					ret = tun_attach(tun, file);
+			}
+			else {
+				/* FIXME: permission check? */
+				ret = tun_enable(tfile);
+			}
+		}
 	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
-		__tun_detach(tfile, false);
+		tun_disable(tfile);
 	else
 		ret = -EINVAL;
 
@@ -2085,6 +2149,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 	tfile->socket.file = file;
 	tfile->socket.ops = &tun_socket_ops;
 
+	tfile->enabled = false;
 	sock_init_data(&tfile->socket, &tfile->sk);
 	sk_change_net(&tfile->sk, tfile->net);
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next rfc 1/2] tuntap: forbid calling TUNSETQUEUE for a persistent device with no queues
From: Jason Wang @ 2012-12-11 11:03 UTC (permalink / raw)
  To: mst, pmoore, netdev, linux-kernel; +Cc: mprivozn, Jason Wang
In-Reply-To: <1355223827-57290-1-git-send-email-jasowang@redhat.com>

When re-establish to a persistent deivce wihout queues attached, TUNSETIFF
should be called instead of TUNSETQUEUE to do the proper permission checking.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 14a0454..d593f56 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1771,6 +1771,9 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
 			ret = -EINVAL;
 		else if (tun_not_capable(tun))
 			ret = -EPERM;
+		/* TUNSETIFF is needed to do permission checking */
+		else if (tun->numqueues == 0)
+			ret = -EPERM;
 		else
 			ret = tun_attach(tun, file);
 	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next rfc 0/2] Allow unpriveledge user to disable tuntap queue
From: Jason Wang @ 2012-12-11 11:03 UTC (permalink / raw)
  To: mst, pmoore, netdev, linux-kernel; +Cc: mprivozn, Jason Wang

This series is an rfc that tries to solve the issue that the queues of tuntap
could not be disabled/enabled by unpriveledged user. This is needed for
unpriveledge userspace such as qemu since guest may change the number of queues
at any time, qemu needs to configure the tuntap to disable/enable a specific
queue.

Instead of introducting new flag/ioctls, this series tries to re-use the current
TUNSETQUEUE and IFF_ATTACH_QUEUE/IFF_DETACH_QUEUE. After this change,
IFF_DETACH_QUEUE is used to disable a specific queue instead of detaching all
its state from tuntap. IFF_ATTACH_QUEUE is used to do: 1) creating new queue to
a tuntap device, in this situation, previous DAC check is still done. 2)
re-enable the queue previously disabled by IFF_DETACH_QUEUE, in this situation,
we can bypass some checking when we do during queue creating (the check need to
be done here needs discussion.

Management software (such as libvirt) then can do:
- TUNSETIFF to creating device and queue 0
- TUNSETQUEUE to create the rest of queues
- Passing them to unpriveledge userspace (such as qemu)

Then the unpriveledge userspace can enable and disable a specific queue through
IFF_ATTACH_QUEUE and IFF_DETACH_QUEUE.

This is done by introducing a enabled flags were used to notify whether the
queue is enabled, and tuntap only send/receive packets when it was enabled.

Please comment, thanks!

Jason Wang (2):
  tuntap: forbid calling TUNSETQUEUE for a persistent device with no
    queues
  tuntap: allow unpriveledge user to enable and disable queues

 drivers/net/tun.c |   78 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 73 insertions(+), 5 deletions(-)

^ permalink raw reply

* Re: [PATCH][RFC] smsc95xx: enable dynamic autosuspend (RFC)
From: Oliver Neukum @ 2012-12-11 10:27 UTC (permalink / raw)
  To: Ming Lei
  Cc: Steve Glendinning, Steve Glendinning, netdev, linux-usb,
	Greg Kroah-Hartman
In-Reply-To: <CACVXFVOBgwoSyUDzxD+ghjkpFu5PL7MuaS5UJOZYvVnV0dwhGg@mail.gmail.com>

On Tuesday 11 December 2012 10:24:57 Ming Lei wrote:
> On Mon, Dec 10, 2012 at 10:18 PM, Steve Glendinning <steve@shawell.net> wrote:

> > Thanks, so something like this should do the job?
> 
> This will do, but not simple as clearing .manage_power function
> pointer in bind(), and still disable runtime suspend for link off case
> since these devices which don't support suspend 3 can generate
> remote wakeup for link change event.

So they can autosuspend if the interface is up and no cable is plugged
in?

> I suggest to introduce link-off triggered runtime suspend for these
> usbnet devices(non-LAN9500A device, devices which don't support
> USB auto-suspend), and I have posted one patch set before[1].
> If no one objects that, I'd like to post them again with some fix and
> update for checking link after link_reset().

If you can get rid of a periodic work this would be great.

	Regards
		Oliver

^ permalink raw reply

* Re: [PATCH RFC 0/5] Containerize syslog
From: Glauber Costa @ 2012-12-11  8:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andrew Morton, Rui Xiang, netdev, containers
In-Reply-To: <87r4n1buuw.fsf@xmission.com>

On 12/07/2012 10:05 PM, Eric W. Biederman wrote:
> Glauber Costa <glommer@parallels.com> writes:
> 
>> I keep asking myself if it isn't the case of forwarding to a container
>> all messages printed in process context. That will obviously exclude all
>> messages resulting from kthreads - that will always be in the initial
>> namespace anyway, interrupts, etc. There is no harm, for instance, in
>> delivering the same message twice: one to the container, and the other
>> to the host system.
> 
> Except that there is harm in double printing.  One of the better
> justifications for doing something with the kernel log is that it is
> possible to overflow the kernel log with operations performed
> exclusively in a container.
> 
I don't agree with you here.

If we are double printing, we are using up more memory, but we also have
an extra buffer anyway. The messages are print on behalf of the user,
but still, by the kernel.

So one of the following will necessarily hold:

1) There is no way that the process can overflow the main log, and as a
consequence, the container log, that has less messages than it.

2) The process will overflow the main log. But since we are not printing
anything extra to the main log compared to the scenario in which the
process lives in the main namespace, this would already be a problem
independent of namespaces. And needs to be fixed.

IOW, double printing should not print anything *extra* to the main log.
It just prints to the container log, and leaves a copy to the box admin
to see. I think it is very reasonable to imagine that the main admin
would like to see anything the kernel has to tell him about the box.

> I do think the idea of process context printks going to the current
> container one worth playing with.
> 

It still leaves the problem of prinkts outside process context that
should go to a namespace open. But it is easy to extend this idea to do
both.

^ permalink raw reply

* Re: [PATCH v4 1/5] net: Add support for hardware-offloaded encapsulation
From: saeed bishara @ 2012-12-11  8:11 UTC (permalink / raw)
  To: Dmitry Kravkov
  Cc: Joseph Gasparakis, davem@davemloft.net, shemminger@vyatta.com,
	chrisw@sous-sol.org, gospo@redhat.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, bhutchings@solarflare.com,
	Peter P Waskiewicz Jr, Alexander Duyck
In-Reply-To: <504C9EFCA2D0054393414C9CB605C37F1BFC3555@SJEXCHMB06.corp.ad.broadcom.com>

On Mon, Dec 10, 2012 at 9:58 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:
>> -----Original Message-----
>> From: saeed bishara [mailto:saeed.bishara@gmail.com]
>> Sent: Monday, December 10, 2012 12:04 PM
>> To: Joseph Gasparakis
>> Cc: davem@davemloft.net; shemminger@vyatta.com; chrisw@sous-sol.org;
>> gospo@redhat.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
>> Dmitry Kravkov; bhutchings@solarflare.com; Peter P Waskiewicz Jr; Alexander
>> Duyck
>> Subject: Re: [PATCH v4 1/5] net: Add support for hardware-offloaded
>> encapsulation
>>
>> > +static inline struct iphdr *inner_ip_hdr(const struct sk_buff *skb)
>> > +{
>> > +       return (struct iphdr *)skb_inner_network_header(skb);
>> > +}
>>
>> Hi,
>> I'm a little bit bothered because of those inner_ functions, what
>> about the following approach:
>> 1. the skb will have a new state, that state can be outer (normal
>> mode) and inner.
>> 2. when you change the state to inner, all the helper functions such
>> as ip_hdr will return the innter header.
>>
>> that's ofcourse the API side. the implementation may still use the
>> fields you added to the skb.
>>
>> what you think?
>> saeed
>
> Some drivers will probably need both inner_ and other_ in same flow, switching between two states will consume cpu cycles.
from performance perspective, I'm not sure the switching is worse, it
may be better as it reduces code size. please have a look at patch
2/5, with switching you can avoid doing the following change -> less
code, less if-else.
-                               skb_set_transport_header(skb,
-                                       skb_checksum_start_offset(skb));
+                               if (skb->encapsulation)
+                                       skb_set_inner_transport_header(skb,
+                                               skb_checksum_start_offset(skb));
+                               else
+                                       skb_set_transport_header(skb,
+                                               skb_checksum_start_offset(skb));
                                if (!(features & NETIF_F_ALL_CSUM) &&

I think also that from (stack) maintenance perspective, less code is better.

^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Jason Wang @ 2012-12-11  6:41 UTC (permalink / raw)
  To: Paul Moore
  Cc: Michael S. Tsirkin, netdev, linux-security-module, selinux,
	mprivozn
In-Reply-To: <1963349.P9uq3yvlyR@sifl>

On Monday, December 10, 2012 05:43:49 PM Paul Moore wrote:
> On Monday, December 10, 2012 07:50:35 PM Michael S. Tsirkin wrote:
> > On Mon, Dec 10, 2012 at 12:33:49PM -0500, Paul Moore wrote:
> > > On Monday, December 10, 2012 07:26:56 PM Michael S. Tsirkin wrote:
> > > > On Mon, Dec 10, 2012 at 12:04:35PM -0500, Paul Moore wrote:
> > > > > On Friday, December 07, 2012 02:25:16 PM Michael S. Tsirkin wrote:
> > > > > > On Thu, Dec 06, 2012 at 04:09:51PM -0500, Paul Moore wrote:
> > > > > > > On Thursday, December 06, 2012 10:57:16 PM Michael S. Tsirkin
> 
> wrote:
> > > > > > > > On Thu, Dec 06, 2012 at 11:56:45AM -0500, Paul Moore wrote:
> > > > > > > > > The SETQUEUE/tun_socket:create_queue permissions do not yet
> > > > > > > > > exist
> > > > > > > > > in any released SELinux policy as we are just now adding
> > > > > > > > > them
> > > > > > > > > with
> > > > > > > > > this patchset. With current policies loaded into a kernel
> > > > > > > > > with
> > > > > > > > > this patchset applied the SETQUEUE/tun_socket:create_queue
> > > > > > > > > permission would be treated according to the policy's
> > > > > > > > > unknown
> > > > > > > > > permission setting.
> > > > > > > > 
> > > > > > > > OK I think we need to rethink what we are doing here: what you
> > > > > > > > sent
> > > > > > > > addresses the problem as stated but I think we mis-stated it.
> > > > > > > > Let
> > > > > > > > me try to restate the problem: it is not just selinux problem.
> > > > > > > > Let's
> > > > > > > > assume qemu wants to use tun, I (libvirt) don't want to run it
> > > > > > > > as
> > > > > > > > root.
> > > > > > > > 
> > > > > > > > 1. TUNSETIFF: I can open tun, attach an fd and pass it to
> > > > > > > > qemu.
> > > > > > > > Now, qemu does not invoke TUNSETIFF so it can run without
> > > > > > > > kernel priveledges.
> > > > > > > 
> > > > > > > Correct me if I'm wrong, but I believe libvirt does this while
> > > > > > > running
> > > > > > > as root.  Assuming that is the case, why not simply
> > > > > > > setuid()/setgid()
> > > > > > > to the same credentials as the QEMU instance before creating the
> > > > > > > TUN
> > > > > > > device? You can always (re)configure the device afterwards while
> > > > > > > running as root/CAP_NET_ADMIN.
> > > > > > 
> > > > > > We want isolation between qemu instances.
> > > > > 
> > > > > Understood, I agree.
> > > > > 
> > > > > Achieving separation via SELinux is easily done, with libvirt/sVirt
> > > > > already doing this for us automatically in most cases; the only
> > > > > thing
> > > > > we
> > > > > will want to do is make sure the SELinux policy is aware of the new
> > > > > permission.
> > > > > 
> > > > > Achieving separation via DAC should also be easily done, simply run
> > > > > each
> > > > > QEMU instance with a separate UID and/or GID.
> > > > > 
> > > > > > Giving qemu right to open tun and SETIFF would give it rights
> > > > > > to access any tun device.
> > > > > 
> > > > > I'm quickly looked at tun_chr_open() again and I don't see any
> > > > > special
> > > > > rights/privileges required, the same for tun_chr_ioctl() and
> > > > > __tun_chr_ioctl().  Looking at tun_set_queue() I see we call
> > > > > tun_not_capable() which does a simple DAC check; it must have the
> > > > > same
> > > > > UID/GID or have CAP_NET_ADMIN.
> > > > > 
> > > > > I'm having a hard time seeing the problem you are describing; help
> > > > > me
> > > > > understand.
> > > > 
> > > > The issue is guest controls the number of queues in use.
> > > > So qemu would be required to be allowed to call tun_set_queue.
> > > > If we allow this we have a problem as one qemu will be
> > > > able to access any tun.
> > > 
> > > QEMU can call tun_set_queue() as long as it satisfies tun_not_capable(),
> > > which from a practical point of view means that the TUN device was
> > > created with the same UID/GID as the QEMU instance.  If you want TUN
> > > device separation between QEMU instances using DAC you need to run each
> > > QEMU instance with a different UID/GID (which you should be doing anyway
> > > if you want DAC enforced general separation).
> > > 
> > > I believe I've stated this point several times now and I don't feel
> > > you've
> > > addressed it properly.
> > 
> > Look at how it works at the moment:
> > a priveledged libvirt server calls tun_set_iff
> > and passes the fd to qemu which is not priveledged.
> > 
> > The result is isolation between qemu instances without
> > need to create uid per qemu instance.
> 
> Okay, good.  That is my understanding.
> 
> > How do we create multiple queues? It makes sense to
> > follow this model and pass in fds for individual queues.
> 
> Okay.
> 
> > However they need to be disabled initially
> > so libvirt can not do tun_set_queue for us.
> 
> Unrelated question: why do the queues need to be disabled initially?  Is
> this to prevent traffic from being queued up?  Some other reason?  I'm jus
> curious as to the reason ...

Only one queue is used by default, so queues other than 0 should be disabled 
after creating by either libvirt or qemu. There're several choices:

A. libvirt only calls TUNSETIFF, and passing this fd to qemu. Qemu creates the 
rest of the queues through TUNSETQUEUE, and also disable them by default
B. libvirt calls TUNSETIFF and creates queues through TUNSETQUEUE, then it 
passes all file descriptors to qemu. Qemu disables queues other than 0 by 
default.
C. libvirt call TUNSETIFF, TUNSETQUEUE to create queues and disable all queues 
other than queue 0. Then it can pass all the file descriptors to qemu.

Since qemu is not priveledged, method A is not applicable, since creating 
queues needs CAT_NET_ADMIN. Either B or C is ok if we add an extra flags to 
disable/enable the queue.
> 
> > When qemu later calls tun_set_queue it will fail which means we
> > can't utilize multiqueue.
> 
> I still don't understand why in the multiqueue case libvirt doesn't just
> change it's effective UID/GID when creating the TUN device, or just use the
> TUNSETOWNER/TUNSETGROUP commands. This would solve the problem you describe
> above and - at least to me - seems like a better solution conceptually.

I think it make sense to do this. Have a quick glance on libvirt code, looks 
like it does not call TUNSETOWNER/TUNSETGROUP. Maybe libvirt guys (cc'ed) can 
answer this question.
> 
> Help me understand why you believe that will not work.
> 
> Do you not want to give ownership of the TUN device to QEMU?  That would be
> the only reason I can think of, but all of your comments that I can recall
> have been about isolation between QEMU instances and not access control
> between a QEMU instance and its assigned TUN device.
> 
> > My solution is an unpriveledged variant
> > of tun_set_queue that only enables/disables
> > a queue without attach/detach.

^ permalink raw reply

* [PATCH 2/2] net: remove obsolete simple_strto<foo>
From: Abhijit Pawar @ 2012-12-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Pablo Neira Ayuso, Patrick McHardy,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	John W. Linville, Johannes Berg, Cong Wang, Eric Dumazet,
	Neil Horman, Joe Perches
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	netfilter-u79uwXL29TY76Z2rM5mHXA, coreteam-Cap9r6Oaw4JrovVCs/uTlw,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, Abhijit Pawar

This patch removes the redundant occurences of simple_strto<foo>

Signed-off-by: Abhijit Pawar <abhi.c.pawar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 net/core/netpoll.c                |    1 -
 net/mac80211/debugfs_sta.c        |    1 -
 net/netfilter/nf_conntrack_core.c |    1 -
 3 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 12c129f..3151acf 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -706,7 +706,6 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
 		*delim = 0;
 		if (*cur == ' ' || *cur == '\t')
 			np_info(np, "warning: whitespace is not allowed\n");
-		np->remote_port = simple_strtol(cur, NULL, 10);
 		if (kstrtou16(cur, 10, &np->remote_port))
 			goto parse_failed;
 		cur = delim;
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 0dedb4b..6fb1168 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -220,7 +220,6 @@ static ssize_t sta_agg_status_write(struct file *file, const char __user *userbu
 	} else
 		return -EINVAL;
 
-	tid = simple_strtoul(buf, NULL, 0);
 	ret = kstrtoul(buf, 0, &tid);
 	if (ret)
 		return ret;
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 37d9e62..08cdc71 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1422,7 +1422,6 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
 	if (!nf_conntrack_htable_size)
 		return param_set_uint(val, kp);
 
-	hashsize = simple_strtoul(val, NULL, 0);
 	rc = kstrtouint(val, 0, &hashsize);
 	if (rc)
 		return rc;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox