Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH RESEND] net: remove obsolete simple_strto<foo>
From: Abhijit Pawar @ 2012-12-11  3:33 UTC (permalink / raw)
  To: David Miller
  Cc: abhi.c.pawar, pablo, kaber, kuznet, jmorris, yoshfuji, linville,
	johannes, amwang, edumazet, nhorman, joe, netdev, linux-kernel,
	netfilter-devel, netfilter, coreteam, linux-wireless
In-Reply-To: <20121210.141002.450030380391247897.davem@davemloft.net>

On 12/11/2012 12:40 AM, David Miller wrote:
> From: Abhijit Pawar <abhi.c.pawar@gmail.com>
> Date: Mon, 10 Dec 2012 14:42:28 +0530
> 
>> This patch replace the obsolete simple_strto<foo> with kstrto<foo>
>>
>> Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>
> 
> Applied.
> 
Hi David,
It seems that there are occurences of simple_strto* still present in the
couple of files which are not yet removed correctly by this patch. I
will send a modified patch shortly. Please revert this commit and use
the newly sent patch to merge with the tree.

-- 
-
Abhijit

^ permalink raw reply

* [PATCH RESEND RESEND] net: remove obsolete simple_strto<foo>
From: Abhijit Pawar @ 2012-12-11  3:34 UTC (permalink / raw)
  To: David S. Miller, Pablo Neira Ayuso, Patrick McHardy,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	John W. Linville, Johannes Berg, Cong Wang, Eric Dumazet,
	Neil Horman, Joe Perches
  Cc: netdev, linux-kernel, netfilter-devel, netfilter, coreteam,
	linux-wireless, Abhijit Pawar

This patch replace the obsolete simple_strto<foo> with kstrto<foo>

Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>
---
 net/core/netpoll.c                 |    6 ++++--
 net/ipv4/netfilter/ipt_CLUSTERIP.c |    9 +++++++--
 net/mac80211/debugfs_sta.c         |    4 +++-
 net/netfilter/nf_conntrack_core.c  |    6 ++++--
 4 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 77a0388..3151acf 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -674,7 +674,8 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
 		if ((delim = strchr(cur, '@')) == NULL)
 			goto parse_failed;
 		*delim = 0;
-		np->local_port = simple_strtol(cur, NULL, 10);
+		if (kstrtou16(cur, 10, &np->local_port))
+			goto parse_failed;
 		cur = delim;
 	}
 	cur++;
@@ -705,7 +706,8 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
 		*delim = 0;
 		if (*cur == ' ' || *cur == '\t')
 			np_info(np, "warning: whitespace is not allowed\n");
-		np->remote_port = simple_strtol(cur, NULL, 10);
+		if (kstrtou16(cur, 10, &np->remote_port))
+			goto parse_failed;
 		cur = delim;
 	}
 	cur++;
diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index fe5daea..75e33a7 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -661,6 +661,7 @@ static ssize_t clusterip_proc_write(struct file *file, const char __user *input,
 #define PROC_WRITELEN	10
 	char buffer[PROC_WRITELEN+1];
 	unsigned long nodenum;
+	int rc;
 
 	if (size > PROC_WRITELEN)
 		return -EIO;
@@ -669,11 +670,15 @@ static ssize_t clusterip_proc_write(struct file *file, const char __user *input,
 	buffer[size] = 0;
 
 	if (*buffer == '+') {
-		nodenum = simple_strtoul(buffer+1, NULL, 10);
+		rc = kstrtoul(buffer+1, 10, &nodenum);
+		if (rc)
+			return rc;
 		if (clusterip_add_node(c, nodenum))
 			return -ENOMEM;
 	} else if (*buffer == '-') {
-		nodenum = simple_strtoul(buffer+1, NULL,10);
+		rc = kstrtoul(buffer+1, 10, &nodenum);
+		if (rc)
+			return rc;
 		if (clusterip_del_node(c, nodenum))
 			return -ENOENT;
 	} else
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 49a1c70..6fb1168 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -220,7 +220,9 @@ static ssize_t sta_agg_status_write(struct file *file, const char __user *userbu
 	} else
 		return -EINVAL;
 
-	tid = simple_strtoul(buf, NULL, 0);
+	ret = kstrtoul(buf, 0, &tid);
+	if (ret)
+		return ret;
 
 	if (tid >= IEEE80211_NUM_TIDS)
 		return -EINVAL;
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index af17516..08cdc71 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1409,7 +1409,7 @@ EXPORT_SYMBOL_GPL(nf_ct_alloc_hashtable);
 
 int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
 {
-	int i, bucket;
+	int i, bucket, rc;
 	unsigned int hashsize, old_size;
 	struct hlist_nulls_head *hash, *old_hash;
 	struct nf_conntrack_tuple_hash *h;
@@ -1422,7 +1422,9 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
 	if (!nf_conntrack_htable_size)
 		return param_set_uint(val, kp);
 
-	hashsize = simple_strtoul(val, NULL, 0);
+	rc = kstrtouint(val, 0, &hashsize);
+	if (rc)
+		return rc;
 	if (!hashsize)
 		return -EINVAL;
 
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH v2 1/1] net: ethernet: davinci_cpdma: Add boundary for rx and tx descriptors
From: Mugunthan V N @ 2012-12-11  3:43 UTC (permalink / raw)
  To: netdev; +Cc: davem, linux-arm-kernel, linux-omap, s.hauer, Mugunthan V N

When there is heavy transmission traffic in the CPDMA, then Rx descriptors
memory is also utilized as tx desc memory looses all rx descriptors and the
driver stops working then.

This patch adds boundary for tx and rx descriptors in bd ram dividing the
descriptor memory to ensure that during heavy transmission tx doesn't use
rx descriptors.

This patch is already applied to davinci_emac driver, since CPSW and
davici_dmac shares the same CPDMA, moving the boundry seperation from
Davinci EMAC driver to CPDMA driver which was done in the following
commit

commit 86d8c07ff2448eb4e860e50f34ef6ee78e45c40c
Author: Sascha Hauer <s.hauer@pengutronix.de>
Date:   Tue Jan 3 05:27:47 2012 +0000

    net/davinci: do not use all descriptors for tx packets

    The driver uses a shared pool for both rx and tx descriptors.
    During open it queues fixed number of 128 descriptors for receive
    packets. For each received packet it tries to queue another
    descriptor. If this fails the descriptor is lost for rx.
    The driver has no limitation on tx descriptors to use, so it
    can happen during a nmap / ping -f attack that the driver
    allocates all descriptors for tx and looses all rx descriptors.
    The driver stops working then.
    To fix this limit the number of tx descriptors used to half of
    the descriptors available, the rx path uses the other half.

    Tested on a custom board using nmap / ping -f to the board from
    two different hosts.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
---
Changes from initial patch:
* Modified commit message with proper description as in previous commit message

 drivers/net/ethernet/ti/davinci_cpdma.c |   20 ++++++++++++++------
 drivers/net/ethernet/ti/davinci_emac.c  |    8 --------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 4995673..d37f546 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -105,13 +105,13 @@ struct cpdma_ctlr {
 };
 
 struct cpdma_chan {
+	struct cpdma_desc __iomem	*head, *tail;
+	void __iomem			*hdp, *cp, *rxfree;
 	enum cpdma_state		state;
 	struct cpdma_ctlr		*ctlr;
 	int				chan_num;
 	spinlock_t			lock;
-	struct cpdma_desc __iomem	*head, *tail;
 	int				count;
-	void __iomem			*hdp, *cp, *rxfree;
 	u32				mask;
 	cpdma_handler_fn		handler;
 	enum dma_data_direction		dir;
@@ -217,7 +217,7 @@ desc_from_phys(struct cpdma_desc_pool *pool, dma_addr_t dma)
 }
 
 static struct cpdma_desc __iomem *
-cpdma_desc_alloc(struct cpdma_desc_pool *pool, int num_desc)
+cpdma_desc_alloc(struct cpdma_desc_pool *pool, int num_desc, bool is_rx)
 {
 	unsigned long flags;
 	int index;
@@ -225,8 +225,14 @@ cpdma_desc_alloc(struct cpdma_desc_pool *pool, int num_desc)
 
 	spin_lock_irqsave(&pool->lock, flags);
 
-	index = bitmap_find_next_zero_area(pool->bitmap, pool->num_desc, 0,
-					   num_desc, 0);
+	if (is_rx) {
+		index = bitmap_find_next_zero_area(pool->bitmap,
+				pool->num_desc/2, 0, num_desc, 0);
+	 } else {
+		index = bitmap_find_next_zero_area(pool->bitmap,
+				pool->num_desc, pool->num_desc/2, num_desc, 0);
+	}
+
 	if (index < pool->num_desc) {
 		bitmap_set(pool->bitmap, index, num_desc);
 		desc = pool->iomap + pool->desc_size * index;
@@ -660,6 +666,7 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
 	unsigned long			flags;
 	u32				mode;
 	int				ret = 0;
+	bool                            is_rx;
 
 	spin_lock_irqsave(&chan->lock, flags);
 
@@ -668,7 +675,8 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
 		goto unlock_ret;
 	}
 
-	desc = cpdma_desc_alloc(ctlr->pool, 1);
+	is_rx = (chan->rxfree != 0);
+	desc = cpdma_desc_alloc(ctlr->pool, 1, is_rx);
 	if (!desc) {
 		chan->stats.desc_alloc_fail++;
 		ret = -ENOMEM;
diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index fce89a0..f349273 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -120,7 +120,6 @@ static const char emac_version_string[] = "TI DaVinci EMAC Linux v6.1";
 #define EMAC_DEF_TX_CH			(0) /* Default 0th channel */
 #define EMAC_DEF_RX_CH			(0) /* Default 0th channel */
 #define EMAC_DEF_RX_NUM_DESC		(128)
-#define EMAC_DEF_TX_NUM_DESC		(128)
 #define EMAC_DEF_MAX_TX_CH		(1) /* Max TX channels configured */
 #define EMAC_DEF_MAX_RX_CH		(1) /* Max RX channels configured */
 #define EMAC_POLL_WEIGHT		(64) /* Default NAPI poll weight */
@@ -342,7 +341,6 @@ struct emac_priv {
 	u32 mac_hash2;
 	u32 multicast_hash_cnt[EMAC_NUM_MULTICAST_BITS];
 	u32 rx_addr_type;
-	atomic_t cur_tx;
 	const char *phy_id;
 #ifdef CONFIG_OF
 	struct device_node *phy_node;
@@ -1050,9 +1048,6 @@ static void emac_tx_handler(void *token, int len, int status)
 {
 	struct sk_buff		*skb = token;
 	struct net_device	*ndev = skb->dev;
-	struct emac_priv	*priv = netdev_priv(ndev);
-
-	atomic_dec(&priv->cur_tx);
 
 	if (unlikely(netif_queue_stopped(ndev)))
 		netif_start_queue(ndev);
@@ -1101,9 +1096,6 @@ static int emac_dev_xmit(struct sk_buff *skb, struct net_device *ndev)
 		goto fail_tx;
 	}
 
-	if (atomic_inc_return(&priv->cur_tx) >= EMAC_DEF_TX_NUM_DESC)
-		netif_stop_queue(ndev);
-
 	return NETDEV_TX_OK;
 
 fail_tx:
-- 
1.7.9.5


^ permalink raw reply related

* Re: [PATCH RESEND] net: remove obsolete simple_strto<foo>
From: David Miller @ 2012-12-11  4:43 UTC (permalink / raw)
  To: abhi.c.pawar
  Cc: amwang, joe, jmorris, edumazet, pablo, kaber, coreteam, yoshfuji,
	netfilter-devel, johannes, linux-kernel, netfilter, linville,
	linux-wireless, netdev, nhorman, kuznet
In-Reply-To: <CA+kxV1GBxdZKnjqs1bm1qBJE8dGXtOfqP_oYbB-hw5Ga1xE65g@mail.gmail.com>

From: Abhijit Pawar <abhi.c.pawar@gmail.com>
Date: Tue, 11 Dec 2012 06:36:59 +0530

> It looks like there are two occurences of simple_strtoul which has not been
> removed cleanly from the patch.
> They are in netpoll.c and debugfs_sta.c
> I will send the modified corrected clean patch shortly.

You can't simply send me a replacement patch, since I already applied
the original one and that patch will not be reverted.

^ permalink raw reply

* Re: [PATCH RESEND] net: remove obsolete simple_strto<foo>
From: David Miller @ 2012-12-11  4:48 UTC (permalink / raw)
  To: abhi.c.pawar
  Cc: pablo, kaber, kuznet, jmorris, yoshfuji, linville, johannes,
	amwang, edumazet, nhorman, joe, netdev, linux-kernel,
	netfilter-devel, netfilter, coreteam, linux-wireless
In-Reply-To: <50C6A995.1020908@gmail.com>

From: Abhijit Pawar <abhi.c.pawar@gmail.com>
Date: Tue, 11 Dec 2012 09:03:41 +0530

> On 12/11/2012 12:40 AM, David Miller wrote:
>> From: Abhijit Pawar <abhi.c.pawar@gmail.com>
>> Date: Mon, 10 Dec 2012 14:42:28 +0530
>> 
>>> This patch replace the obsolete simple_strto<foo> with kstrto<foo>
>>>
>>> Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>
>> 
>> Applied.
>> 
> Hi David,
> It seems that there are occurences of simple_strto* still present in the
> couple of files which are not yet removed correctly by this patch. I
> will send a modified patch shortly. Please revert this commit and use
> the newly sent patch to merge with the tree.

Again, you cannot send "modified" patches.

When I say I've applied your patch, that cannot be undone.

You must therefore send me fixup patches relative to the ones
I've applied already.

^ permalink raw reply

* Re: [PATCH RESEND RESEND] net: remove obsolete simple_strto<foo>
From: David Miller @ 2012-12-11  4:49 UTC (permalink / raw)
  To: abhi.c.pawar
  Cc: pablo, kaber, kuznet, jmorris, yoshfuji, linville, johannes,
	amwang, edumazet, nhorman, joe, netdev, linux-kernel,
	netfilter-devel, netfilter, coreteam, linux-wireless
In-Reply-To: <1355196860-10708-1-git-send-email-abhi.c.pawar@gmail.com>

From: Abhijit Pawar <abhi.c.pawar@gmail.com>
Date: Tue, 11 Dec 2012 09:04:20 +0530

> This patch replace the obsolete simple_strto<foo> with kstrto<foo>
> 
> Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>

You can't submit replacement patches for ones which I have already
applied.

Patches I apply are permanently applied, and therefore you must submit
changes relative the ones I've applied already.

^ permalink raw reply

* Re: netconsole fun
From: Cong Wang @ 2012-12-11  4:51 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1355149033.3142.14.camel@thor>

On Mon, 10 Dec 2012 at 14:17 GMT, Peter Hurley <peter@hurleysoftware.com> wrote:
> Now that netpoll has been disabled for slaved devices, is there a
> recommended method of running netconsole on a machine that has a slaved
> device?
>

Yes, running it on the master device instead.

^ permalink raw reply

* Re: [PATCH RESEND RESEND] net: remove obsolete simple_strto<foo>
From: Abhijit Pawar @ 2012-12-11  5:29 UTC (permalink / raw)
  To: David Miller
  Cc: abhi.c.pawar, pablo, kaber, kuznet, jmorris, yoshfuji, linville,
	johannes, amwang, edumazet, nhorman, joe, netdev, linux-kernel,
	netfilter-devel, netfilter, coreteam, linux-wireless
In-Reply-To: <20121210.234923.94776338088489257.davem@davemloft.net>

On 12/11/2012 10:19 AM, David Miller wrote:
> From: Abhijit Pawar <abhi.c.pawar@gmail.com>
> Date: Tue, 11 Dec 2012 09:04:20 +0530
> 
>> This patch replace the obsolete simple_strto<foo> with kstrto<foo>
>>
>> Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>
> 
> You can't submit replacement patches for ones which I have already
> applied.
> 
> Patches I apply are permanently applied, and therefore you must submit
> changes relative the ones I've applied already.
> 
I am sorry to create this confusion. I have created and sent the new
patch which you can apply over the old one to fix the issues.

-- 
-
Abhijit

^ permalink raw reply

* [PATCH 2/2] net: remove obsolete simple_strto<foo>
From: Abhijit Pawar @ 2012-12-11  5:30 UTC (permalink / raw)
  To: David S. Miller, Pablo Neira Ayuso, Patrick McHardy,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	John W. Linville, Johannes Berg, Cong Wang, Eric Dumazet,
	Neil Horman, Joe Perches
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	netfilter-u79uwXL29TY76Z2rM5mHXA, coreteam-Cap9r6Oaw4JrovVCs/uTlw,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, Abhijit Pawar

This patch removes the redundant occurences of simple_strto<foo>

Signed-off-by: Abhijit Pawar <abhi.c.pawar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 net/core/netpoll.c                |    1 -
 net/mac80211/debugfs_sta.c        |    1 -
 net/netfilter/nf_conntrack_core.c |    1 -
 3 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 12c129f..3151acf 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -706,7 +706,6 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
 		*delim = 0;
 		if (*cur == ' ' || *cur == '\t')
 			np_info(np, "warning: whitespace is not allowed\n");
-		np->remote_port = simple_strtol(cur, NULL, 10);
 		if (kstrtou16(cur, 10, &np->remote_port))
 			goto parse_failed;
 		cur = delim;
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 0dedb4b..6fb1168 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -220,7 +220,6 @@ static ssize_t sta_agg_status_write(struct file *file, const char __user *userbu
 	} else
 		return -EINVAL;
 
-	tid = simple_strtoul(buf, NULL, 0);
 	ret = kstrtoul(buf, 0, &tid);
 	if (ret)
 		return ret;
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 37d9e62..08cdc71 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1422,7 +1422,6 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
 	if (!nf_conntrack_htable_size)
 		return param_set_uint(val, kp);
 
-	hashsize = simple_strtoul(val, NULL, 0);
 	rc = kstrtouint(val, 0, &hashsize);
 	if (rc)
 		return rc;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Jason Wang @ 2012-12-11  6:41 UTC (permalink / raw)
  To: Paul Moore
  Cc: Michael S. Tsirkin, netdev, linux-security-module, selinux,
	mprivozn
In-Reply-To: <1963349.P9uq3yvlyR@sifl>

On Monday, December 10, 2012 05:43:49 PM Paul Moore wrote:
> On Monday, December 10, 2012 07:50:35 PM Michael S. Tsirkin wrote:
> > On Mon, Dec 10, 2012 at 12:33:49PM -0500, Paul Moore wrote:
> > > On Monday, December 10, 2012 07:26:56 PM Michael S. Tsirkin wrote:
> > > > On Mon, Dec 10, 2012 at 12:04:35PM -0500, Paul Moore wrote:
> > > > > On Friday, December 07, 2012 02:25:16 PM Michael S. Tsirkin wrote:
> > > > > > On Thu, Dec 06, 2012 at 04:09:51PM -0500, Paul Moore wrote:
> > > > > > > On Thursday, December 06, 2012 10:57:16 PM Michael S. Tsirkin
> 
> wrote:
> > > > > > > > On Thu, Dec 06, 2012 at 11:56:45AM -0500, Paul Moore wrote:
> > > > > > > > > The SETQUEUE/tun_socket:create_queue permissions do not yet
> > > > > > > > > exist
> > > > > > > > > in any released SELinux policy as we are just now adding
> > > > > > > > > them
> > > > > > > > > with
> > > > > > > > > this patchset. With current policies loaded into a kernel
> > > > > > > > > with
> > > > > > > > > this patchset applied the SETQUEUE/tun_socket:create_queue
> > > > > > > > > permission would be treated according to the policy's
> > > > > > > > > unknown
> > > > > > > > > permission setting.
> > > > > > > > 
> > > > > > > > OK I think we need to rethink what we are doing here: what you
> > > > > > > > sent
> > > > > > > > addresses the problem as stated but I think we mis-stated it.
> > > > > > > > Let
> > > > > > > > me try to restate the problem: it is not just selinux problem.
> > > > > > > > Let's
> > > > > > > > assume qemu wants to use tun, I (libvirt) don't want to run it
> > > > > > > > as
> > > > > > > > root.
> > > > > > > > 
> > > > > > > > 1. TUNSETIFF: I can open tun, attach an fd and pass it to
> > > > > > > > qemu.
> > > > > > > > Now, qemu does not invoke TUNSETIFF so it can run without
> > > > > > > > kernel priveledges.
> > > > > > > 
> > > > > > > Correct me if I'm wrong, but I believe libvirt does this while
> > > > > > > running
> > > > > > > as root.  Assuming that is the case, why not simply
> > > > > > > setuid()/setgid()
> > > > > > > to the same credentials as the QEMU instance before creating the
> > > > > > > TUN
> > > > > > > device? You can always (re)configure the device afterwards while
> > > > > > > running as root/CAP_NET_ADMIN.
> > > > > > 
> > > > > > We want isolation between qemu instances.
> > > > > 
> > > > > Understood, I agree.
> > > > > 
> > > > > Achieving separation via SELinux is easily done, with libvirt/sVirt
> > > > > already doing this for us automatically in most cases; the only
> > > > > thing
> > > > > we
> > > > > will want to do is make sure the SELinux policy is aware of the new
> > > > > permission.
> > > > > 
> > > > > Achieving separation via DAC should also be easily done, simply run
> > > > > each
> > > > > QEMU instance with a separate UID and/or GID.
> > > > > 
> > > > > > Giving qemu right to open tun and SETIFF would give it rights
> > > > > > to access any tun device.
> > > > > 
> > > > > I'm quickly looked at tun_chr_open() again and I don't see any
> > > > > special
> > > > > rights/privileges required, the same for tun_chr_ioctl() and
> > > > > __tun_chr_ioctl().  Looking at tun_set_queue() I see we call
> > > > > tun_not_capable() which does a simple DAC check; it must have the
> > > > > same
> > > > > UID/GID or have CAP_NET_ADMIN.
> > > > > 
> > > > > I'm having a hard time seeing the problem you are describing; help
> > > > > me
> > > > > understand.
> > > > 
> > > > The issue is guest controls the number of queues in use.
> > > > So qemu would be required to be allowed to call tun_set_queue.
> > > > If we allow this we have a problem as one qemu will be
> > > > able to access any tun.
> > > 
> > > QEMU can call tun_set_queue() as long as it satisfies tun_not_capable(),
> > > which from a practical point of view means that the TUN device was
> > > created with the same UID/GID as the QEMU instance.  If you want TUN
> > > device separation between QEMU instances using DAC you need to run each
> > > QEMU instance with a different UID/GID (which you should be doing anyway
> > > if you want DAC enforced general separation).
> > > 
> > > I believe I've stated this point several times now and I don't feel
> > > you've
> > > addressed it properly.
> > 
> > Look at how it works at the moment:
> > a priveledged libvirt server calls tun_set_iff
> > and passes the fd to qemu which is not priveledged.
> > 
> > The result is isolation between qemu instances without
> > need to create uid per qemu instance.
> 
> Okay, good.  That is my understanding.
> 
> > How do we create multiple queues? It makes sense to
> > follow this model and pass in fds for individual queues.
> 
> Okay.
> 
> > However they need to be disabled initially
> > so libvirt can not do tun_set_queue for us.
> 
> Unrelated question: why do the queues need to be disabled initially?  Is
> this to prevent traffic from being queued up?  Some other reason?  I'm jus
> curious as to the reason ...

Only one queue is used by default, so queues other than 0 should be disabled 
after creating by either libvirt or qemu. There're several choices:

A. libvirt only calls TUNSETIFF, and passing this fd to qemu. Qemu creates the 
rest of the queues through TUNSETQUEUE, and also disable them by default
B. libvirt calls TUNSETIFF and creates queues through TUNSETQUEUE, then it 
passes all file descriptors to qemu. Qemu disables queues other than 0 by 
default.
C. libvirt call TUNSETIFF, TUNSETQUEUE to create queues and disable all queues 
other than queue 0. Then it can pass all the file descriptors to qemu.

Since qemu is not priveledged, method A is not applicable, since creating 
queues needs CAT_NET_ADMIN. Either B or C is ok if we add an extra flags to 
disable/enable the queue.
> 
> > When qemu later calls tun_set_queue it will fail which means we
> > can't utilize multiqueue.
> 
> I still don't understand why in the multiqueue case libvirt doesn't just
> change it's effective UID/GID when creating the TUN device, or just use the
> TUNSETOWNER/TUNSETGROUP commands. This would solve the problem you describe
> above and - at least to me - seems like a better solution conceptually.

I think it make sense to do this. Have a quick glance on libvirt code, looks 
like it does not call TUNSETOWNER/TUNSETGROUP. Maybe libvirt guys (cc'ed) can 
answer this question.
> 
> Help me understand why you believe that will not work.
> 
> Do you not want to give ownership of the TUN device to QEMU?  That would be
> the only reason I can think of, but all of your comments that I can recall
> have been about isolation between QEMU instances and not access control
> between a QEMU instance and its assigned TUN device.
> 
> > My solution is an unpriveledged variant
> > of tun_set_queue that only enables/disables
> > a queue without attach/detach.

^ permalink raw reply

* Re: [PATCH v4 1/5] net: Add support for hardware-offloaded encapsulation
From: saeed bishara @ 2012-12-11  8:11 UTC (permalink / raw)
  To: Dmitry Kravkov
  Cc: Joseph Gasparakis, davem@davemloft.net, shemminger@vyatta.com,
	chrisw@sous-sol.org, gospo@redhat.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, bhutchings@solarflare.com,
	Peter P Waskiewicz Jr, Alexander Duyck
In-Reply-To: <504C9EFCA2D0054393414C9CB605C37F1BFC3555@SJEXCHMB06.corp.ad.broadcom.com>

On Mon, Dec 10, 2012 at 9:58 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:
>> -----Original Message-----
>> From: saeed bishara [mailto:saeed.bishara@gmail.com]
>> Sent: Monday, December 10, 2012 12:04 PM
>> To: Joseph Gasparakis
>> Cc: davem@davemloft.net; shemminger@vyatta.com; chrisw@sous-sol.org;
>> gospo@redhat.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
>> Dmitry Kravkov; bhutchings@solarflare.com; Peter P Waskiewicz Jr; Alexander
>> Duyck
>> Subject: Re: [PATCH v4 1/5] net: Add support for hardware-offloaded
>> encapsulation
>>
>> > +static inline struct iphdr *inner_ip_hdr(const struct sk_buff *skb)
>> > +{
>> > +       return (struct iphdr *)skb_inner_network_header(skb);
>> > +}
>>
>> Hi,
>> I'm a little bit bothered because of those inner_ functions, what
>> about the following approach:
>> 1. the skb will have a new state, that state can be outer (normal
>> mode) and inner.
>> 2. when you change the state to inner, all the helper functions such
>> as ip_hdr will return the innter header.
>>
>> that's ofcourse the API side. the implementation may still use the
>> fields you added to the skb.
>>
>> what you think?
>> saeed
>
> Some drivers will probably need both inner_ and other_ in same flow, switching between two states will consume cpu cycles.
from performance perspective, I'm not sure the switching is worse, it
may be better as it reduces code size. please have a look at patch
2/5, with switching you can avoid doing the following change -> less
code, less if-else.
-                               skb_set_transport_header(skb,
-                                       skb_checksum_start_offset(skb));
+                               if (skb->encapsulation)
+                                       skb_set_inner_transport_header(skb,
+                                               skb_checksum_start_offset(skb));
+                               else
+                                       skb_set_transport_header(skb,
+                                               skb_checksum_start_offset(skb));
                                if (!(features & NETIF_F_ALL_CSUM) &&

I think also that from (stack) maintenance perspective, less code is better.

^ permalink raw reply

* Re: [PATCH RFC 0/5] Containerize syslog
From: Glauber Costa @ 2012-12-11  8:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andrew Morton, Rui Xiang, netdev, containers
In-Reply-To: <87r4n1buuw.fsf@xmission.com>

On 12/07/2012 10:05 PM, Eric W. Biederman wrote:
> Glauber Costa <glommer@parallels.com> writes:
> 
>> I keep asking myself if it isn't the case of forwarding to a container
>> all messages printed in process context. That will obviously exclude all
>> messages resulting from kthreads - that will always be in the initial
>> namespace anyway, interrupts, etc. There is no harm, for instance, in
>> delivering the same message twice: one to the container, and the other
>> to the host system.
> 
> Except that there is harm in double printing.  One of the better
> justifications for doing something with the kernel log is that it is
> possible to overflow the kernel log with operations performed
> exclusively in a container.
> 
I don't agree with you here.

If we are double printing, we are using up more memory, but we also have
an extra buffer anyway. The messages are print on behalf of the user,
but still, by the kernel.

So one of the following will necessarily hold:

1) There is no way that the process can overflow the main log, and as a
consequence, the container log, that has less messages than it.

2) The process will overflow the main log. But since we are not printing
anything extra to the main log compared to the scenario in which the
process lives in the main namespace, this would already be a problem
independent of namespaces. And needs to be fixed.

IOW, double printing should not print anything *extra* to the main log.
It just prints to the container log, and leaves a copy to the box admin
to see. I think it is very reasonable to imagine that the main admin
would like to see anything the kernel has to tell him about the box.

> I do think the idea of process context printks going to the current
> container one worth playing with.
> 

It still leaves the problem of prinkts outside process context that
should go to a namespace open. But it is easy to extend this idea to do
both.

^ permalink raw reply

* Re: [PATCH][RFC] smsc95xx: enable dynamic autosuspend (RFC)
From: Oliver Neukum @ 2012-12-11 10:27 UTC (permalink / raw)
  To: Ming Lei
  Cc: Steve Glendinning, Steve Glendinning, netdev, linux-usb,
	Greg Kroah-Hartman
In-Reply-To: <CACVXFVOBgwoSyUDzxD+ghjkpFu5PL7MuaS5UJOZYvVnV0dwhGg@mail.gmail.com>

On Tuesday 11 December 2012 10:24:57 Ming Lei wrote:
> On Mon, Dec 10, 2012 at 10:18 PM, Steve Glendinning <steve@shawell.net> wrote:

> > Thanks, so something like this should do the job?
> 
> This will do, but not simple as clearing .manage_power function
> pointer in bind(), and still disable runtime suspend for link off case
> since these devices which don't support suspend 3 can generate
> remote wakeup for link change event.

So they can autosuspend if the interface is up and no cable is plugged
in?

> I suggest to introduce link-off triggered runtime suspend for these
> usbnet devices(non-LAN9500A device, devices which don't support
> USB auto-suspend), and I have posted one patch set before[1].
> If no one objects that, I'd like to post them again with some fix and
> update for checking link after link_reset().

If you can get rid of a periodic work this would be great.

	Regards
		Oliver

^ permalink raw reply

* [PATCH net-next rfc 0/2] Allow unpriveledge user to disable tuntap queue
From: Jason Wang @ 2012-12-11 11:03 UTC (permalink / raw)
  To: mst, pmoore, netdev, linux-kernel; +Cc: mprivozn, Jason Wang

This series is an rfc that tries to solve the issue that the queues of tuntap
could not be disabled/enabled by unpriveledged user. This is needed for
unpriveledge userspace such as qemu since guest may change the number of queues
at any time, qemu needs to configure the tuntap to disable/enable a specific
queue.

Instead of introducting new flag/ioctls, this series tries to re-use the current
TUNSETQUEUE and IFF_ATTACH_QUEUE/IFF_DETACH_QUEUE. After this change,
IFF_DETACH_QUEUE is used to disable a specific queue instead of detaching all
its state from tuntap. IFF_ATTACH_QUEUE is used to do: 1) creating new queue to
a tuntap device, in this situation, previous DAC check is still done. 2)
re-enable the queue previously disabled by IFF_DETACH_QUEUE, in this situation,
we can bypass some checking when we do during queue creating (the check need to
be done here needs discussion.

Management software (such as libvirt) then can do:
- TUNSETIFF to creating device and queue 0
- TUNSETQUEUE to create the rest of queues
- Passing them to unpriveledge userspace (such as qemu)

Then the unpriveledge userspace can enable and disable a specific queue through
IFF_ATTACH_QUEUE and IFF_DETACH_QUEUE.

This is done by introducing a enabled flags were used to notify whether the
queue is enabled, and tuntap only send/receive packets when it was enabled.

Please comment, thanks!

Jason Wang (2):
  tuntap: forbid calling TUNSETQUEUE for a persistent device with no
    queues
  tuntap: allow unpriveledge user to enable and disable queues

 drivers/net/tun.c |   78 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 73 insertions(+), 5 deletions(-)

^ permalink raw reply

* [PATCH net-next rfc 1/2] tuntap: forbid calling TUNSETQUEUE for a persistent device with no queues
From: Jason Wang @ 2012-12-11 11:03 UTC (permalink / raw)
  To: mst, pmoore, netdev, linux-kernel; +Cc: mprivozn, Jason Wang
In-Reply-To: <1355223827-57290-1-git-send-email-jasowang@redhat.com>

When re-establish to a persistent deivce wihout queues attached, TUNSETIFF
should be called instead of TUNSETQUEUE to do the proper permission checking.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 14a0454..d593f56 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1771,6 +1771,9 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
 			ret = -EINVAL;
 		else if (tun_not_capable(tun))
 			ret = -EPERM;
+		/* TUNSETIFF is needed to do permission checking */
+		else if (tun->numqueues == 0)
+			ret = -EPERM;
 		else
 			ret = tun_attach(tun, file);
 	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next rfc 2/2] tuntap: allow unpriveledge user to enable and disable queues
From: Jason Wang @ 2012-12-11 11:03 UTC (permalink / raw)
  To: mst, pmoore, netdev, linux-kernel; +Cc: mprivozn, Jason Wang
In-Reply-To: <1355223827-57290-1-git-send-email-jasowang@redhat.com>

Currently, when a file is attached to tuntap through TUNSETQUEUE, the uid/gid
and CAP_NET_ADMIN were checked, and we use this ioctl to create and destroy
queues. Sometimes, userspace such as qemu need to the ability to enable and
disable a specific queue without priveledge since guest operating system may
change the number of queues it want use.

To support this kind of ability, this patch introduce a flag enabled which is
used to track whether the queue is enabled by userspace. And also restrict that
only one deivce could be used for a queue to attach. With this patch, the DAC
checking when adding queues through IFF_ATTACH_QUEUE is still done and after
this, IFF_DETACH_QUEUE/IFF_ATTACH_QUEUE  could be used to disable/enable this
queue.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 73 insertions(+), 8 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d593f56..43831a7 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -138,6 +138,7 @@ struct tun_file {
 	/* only used for fasnyc */
 	unsigned int flags;
 	u16 queue_index;
+	bool enabled;
 };
 
 struct tun_flow_entry {
@@ -345,9 +346,11 @@ unlock:
 static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
 {
 	struct tun_struct *tun = netdev_priv(dev);
+	struct tun_file *tfile;
 	struct tun_flow_entry *e;
 	u32 txq = 0;
 	u32 numqueues = 0;
+	int i;
 
 	rcu_read_lock();
 	numqueues = tun->numqueues;
@@ -366,6 +369,19 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
 			txq -= numqueues;
 	}
 
+	tfile = rcu_dereference(tun->tfiles[txq]);
+	if (unlikely(!tfile->enabled))
+		/* tun_detach() should make sure there's at least one queue
+		 * could be used to do the tranmission.
+		 */
+		for (i = 0; i < numqueues; i++) {
+			tfile = rcu_dereference(tun->tfiles[i]);
+			if (tfile->enabled) {
+				txq = i;
+				break;
+			}
+		}
+
 	rcu_read_unlock();
 	return txq;
 }
@@ -386,6 +402,36 @@ static void tun_set_real_num_queues(struct tun_struct *tun)
 	netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
 }
 
+static int tun_enable(struct tun_file *tfile)
+{
+	if (tfile->enabled == true)
+		return -EINVAL;
+
+	tfile->enabled = true;
+	return 0;
+}
+
+static int tun_disable(struct tun_file *tfile)
+{
+	struct tun_struct *tun = rcu_dereference_protected(tfile->tun,
+							   lockdep_rtnl_is_held());
+	u16 index = tfile->queue_index;
+
+	if (!tun)
+		return -EINVAL;
+
+	if (tun->numqueues == 1)
+		return -EINVAL;
+
+	BUG_ON(index >= tun->numqueues);
+	tfile->enabled = false;
+
+	synchronize_net();
+	tun_flow_delete_by_queue(tun, index);
+
+	return 0;
+}
+
 static void __tun_detach(struct tun_file *tfile, bool clean)
 {
 	struct tun_file *ntfile;
@@ -446,6 +492,7 @@ static void tun_detach_all(struct net_device *dev)
 		BUG_ON(!tfile);
 		wake_up_all(&tfile->wq.wait);
 		rcu_assign_pointer(tfile->tun, NULL);
+		tfile->enabled = false;
 		--tun->numqueues;
 	}
 	BUG_ON(tun->numqueues != 0);
@@ -490,6 +537,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
 	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
 	sock_hold(&tfile->sk);
 	tun->numqueues++;
+	tfile->enabled = true;
 
 	tun_set_real_num_queues(tun);
 
@@ -672,6 +720,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (txq >= tun->numqueues)
 		goto drop;
 
+	/* Drop packet if the queue was not enabled */
+	if (!tfile->enabled)
+		goto drop;
+
 	tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
 
 	BUG_ON(!tfile);
@@ -1010,6 +1062,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	bool zerocopy = false;
 	int err;
 
+	if (!tfile->enabled)
+		return -EINVAL;
+
 	if (!(tun->flags & TUN_NO_PI)) {
 		if ((len -= sizeof(pi)) > total_len)
 			return -EINVAL;
@@ -1199,6 +1254,9 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 	struct tun_pi pi = { 0, skb->protocol };
 	ssize_t total = 0;
 
+	if (!tfile->enabled)
+		return -EINVAL;
+
 	if (!(tun->flags & TUN_NO_PI)) {
 		if ((len -= sizeof(pi)) < 0)
 			return -EINVAL;
@@ -1769,15 +1827,21 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
 		if (dev->netdev_ops != &tap_netdev_ops &&
 			dev->netdev_ops != &tun_netdev_ops)
 			ret = -EINVAL;
-		else if (tun_not_capable(tun))
-			ret = -EPERM;
-		/* TUNSETIFF is needed to do permission checking */
-		else if (tun->numqueues == 0)
-			ret = -EPERM;
-		else
-			ret = tun_attach(tun, file);
+		else {
+			if (!rcu_dereference(tfile->tun)) {
+				if (tun_not_capable(tun) ||
+				    tun->numqueues == 0)
+					ret = -EPERM;
+				else
+					ret = tun_attach(tun, file);
+			}
+			else {
+				/* FIXME: permission check? */
+				ret = tun_enable(tfile);
+			}
+		}
 	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
-		__tun_detach(tfile, false);
+		tun_disable(tfile);
 	else
 		ret = -EINVAL;
 
@@ -2085,6 +2149,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 	tfile->socket.file = file;
 	tfile->socket.ops = &tun_socket_ops;
 
+	tfile->enabled = false;
 	sock_init_data(&tfile->socket, &tfile->sk);
 	sk_change_net(&tfile->sk, tfile->net);
 
-- 
1.7.1

^ permalink raw reply related

* Gianfar driver issue
From: Cedric VONCKEN @ 2012-12-11  9:59 UTC (permalink / raw)
  To: netdev

	Hi all, 

	I think he have an issue in Gianfar driver.

	When the Netdev tx queue timeout occurred, the function
gfar_timeout(..) is called. This function calls indirectly the
gfar_init_mac(..) function.

	In this function, the rctrl register is set to a default value.

	If the Promiscuous is enable on the net dev ( flag IFF_PROMISC
is set), the gfar_init_function does not reactivate it. 

	The Promiscuous mode is used for example when the netdev is
bridged. 
	
	I apply this patch to fix it.

	--- a/drivers/net/ethernet/freescale/gianfar.c.	2012-06-01
09:16:13.000000000 +0200
	+++ b/drivers/net/ethernet/freescale/gianfar.c	2012-12-11
10:38:23.000000000 +0100
	@@ -356,6 +356,11 @@
 	/* Configure the coalescing support */
 	gfar_configure_coalescing(priv, 0xFF, 0xFF);
 
+	if (ndev->flags & IFF_PROMISC) {
+		/* Set RCTRL to PROM */
+		rctrl |= RCTRL_PROM;
+	}
+
 	if (priv->rx_filer_enable) {
 		rctrl |= RCTRL_FILREN;
 		/* Program the RIR0 reg with the required distribution
*/



Cedric Voncken 

^ permalink raw reply

* [PATCH net-next 2/2] net/mlx4_en: Add support for destination MAC in steering rules
From: Amir Vadai @ 2012-12-11 12:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Yan Burman
In-Reply-To: <1355227436-18383-1-git-send-email-amirv@mellanox.com>

From: Yan Burman <yanb@mellanox.com>

Implement destination MAC rule extension for L3/L4 rules in
flow steering. Usefull for vSwitch/macvlan configurations.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 4aaa7c3..87a87a7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -619,7 +619,13 @@ static int mlx4_en_validate_flow(struct net_device *dev,
 	if (cmd->fs.location >= MAX_NUM_OF_FS_RULES)
 		return -EINVAL;
 
-	switch (cmd->fs.flow_type & ~FLOW_EXT) {
+	if (cmd->fs.flow_type & FLOW_MAC_EXT) {
+		/* dest mac mask must be ff:ff:ff:ff:ff:ff */
+		if (memcmp(cmd->fs.m_ext.h_dest, &full_mac, ETH_ALEN))
+			return -EINVAL;
+	}
+
+	switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
 	case TCP_V4_FLOW:
 	case UDP_V4_FLOW:
 		if (cmd->fs.m_u.tcp_ip4_spec.tos)
@@ -747,7 +753,6 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
 					     struct list_head *rule_list_h)
 {
 	int err;
-	u64 mac;
 	__be64 be_mac;
 	struct ethhdr *eth_spec;
 	struct mlx4_en_priv *priv = netdev_priv(dev);
@@ -762,12 +767,16 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
 	if (!spec_l2)
 		return -ENOMEM;
 
-	mac = priv->mac & MLX4_MAC_MASK;
-	be_mac = cpu_to_be64(mac << 16);
+	if (cmd->fs.flow_type & FLOW_MAC_EXT) {
+		memcpy(&be_mac, cmd->fs.h_ext.h_dest, ETH_ALEN);
+	} else {
+		u64 mac = priv->mac & MLX4_MAC_MASK;
+		be_mac = cpu_to_be64(mac << 16);
+	}
 
 	spec_l2->id = MLX4_NET_TRANS_RULE_ID_ETH;
 	memcpy(spec_l2->eth.dst_mac_msk, &mac_msk, ETH_ALEN);
-	if ((cmd->fs.flow_type & ~FLOW_EXT) != ETHER_FLOW)
+	if ((cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) != ETHER_FLOW)
 		memcpy(spec_l2->eth.dst_mac, &be_mac, ETH_ALEN);
 
 	if ((cmd->fs.flow_type & FLOW_EXT) && cmd->fs.m_ext.vlan_tci) {
@@ -777,7 +786,7 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
 
 	list_add_tail(&spec_l2->list, rule_list_h);
 
-	switch (cmd->fs.flow_type & ~FLOW_EXT) {
+	switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
 	case ETHER_FLOW:
 		eth_spec = &cmd->fs.h_u.ether_spec;
 		memcpy(&spec_l2->eth.dst_mac, eth_spec->h_dest, ETH_ALEN);
-- 
1.7.11.3

^ permalink raw reply related

* [PATCH net-next 1/2] net: ethtool: Add destination MAC address to flow steering API
From: Amir Vadai @ 2012-12-11 12:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Yan Burman
In-Reply-To: <1355227436-18383-1-git-send-email-amirv@mellanox.com>

From: Yan Burman <yanb@mellanox.com>

Add ability to specify destination MAC address for L3/L4 flow spec
in order to be able to specify action for different VM's under vSwitch
configuration. This change is transparent to older userspace.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 include/uapi/linux/ethtool.h | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index d3eaaaf..be8c41e 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -500,13 +500,15 @@ union ethtool_flow_union {
 	struct ethtool_ah_espip4_spec		esp_ip4_spec;
 	struct ethtool_usrip4_spec		usr_ip4_spec;
 	struct ethhdr				ether_spec;
-	__u8					hdata[60];
+	__u8					hdata[52];
 };
 
 struct ethtool_flow_ext {
-	__be16	vlan_etype;
-	__be16	vlan_tci;
-	__be32	data[2];
+	__u8		padding[2];
+	unsigned char	h_dest[ETH_ALEN];	/* destination eth addr	*/
+	__be16		vlan_etype;
+	__be16		vlan_tci;
+	__be32		data[2];
 };
 
 /**
@@ -1027,6 +1029,7 @@ enum ethtool_sfeatures_retval_bits {
 #define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
 /* Flag to enable additional fields in struct ethtool_rx_flow_spec */
 #define	FLOW_EXT	0x80000000
+#define	FLOW_MAC_EXT	0x40000000
 
 /* L3-L4 network traffic flow hash options */
 #define	RXH_L2DA	(1 << 1)
-- 
1.7.11.3

^ permalink raw reply related

* [PATCH net-next 0/3] Add destination MAC address to ethtool flow steering
From: Amir Vadai @ 2012-12-11 12:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Yan Burman

From: Yan Burman <yanb@mellanox.com>

In vSwitch configuration it is often beneficial to create flow steering
rules for L3/L4 traffic based on VM port. This requires destination MAC
address of that port to be present. Note that today the mlx4_en driver 
adds the mac address of itself to the flow spec, where under the new
ethtool flag suggested here it doesn't.

It may also be useful in macvlan devices.

These patches add kernel support for the new field (does not break old
userspace compatibility, so new ethtool will work on old kernels and
old ethtool will work with new kernels).

Also present here is the ethtool userspace patch.

See more details here http ://marc.info/?t=134977576500003

Yan Burman (2):
  net: ethtool: Add destination MAC address to flow steering API
  net/mlx4_en: Add support for destination MAC in steering rules

 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 21 +++++++++++++++------
 include/uapi/linux/ethtool.h                    | 11 +++++++----
 2 files changed, 22 insertions(+), 10 deletions(-)

-- 
1.7.11.3

^ permalink raw reply

* [PATCH ETHTOOL] Added dst-mac parameter for L3/L4 flow spec rules. This is usefull in vSwitch configurations.
From: Amir Vadai @ 2012-12-11 12:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Yan Burman
In-Reply-To: <1355227436-18383-1-git-send-email-amirv@mellanox.com>

From: Yan Burman <yanb@mellanox.com>

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 ethtool-copy.h | 11 +++++++----
 ethtool.8.in   |  6 ++++++
 ethtool.c      |  5 +++++
 rxclass.c      | 62 ++++++++++++++++++++++++++++++++++++++++------------------
 4 files changed, 61 insertions(+), 23 deletions(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index 4801eef..d352f20 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -500,13 +500,15 @@ union ethtool_flow_union {
 	struct ethtool_ah_espip4_spec		esp_ip4_spec;
 	struct ethtool_usrip4_spec		usr_ip4_spec;
 	struct ethhdr				ether_spec;
-	__u8					hdata[60];
+	__u8					hdata[52];
 };
 
 struct ethtool_flow_ext {
-	__be16	vlan_etype;
-	__be16	vlan_tci;
-	__be32	data[2];
+	__u8		padding[2];
+	unsigned char	h_dest[ETH_ALEN];	/* destination eth addr	*/
+	__be16		vlan_etype;
+	__be16		vlan_tci;
+	__be32		data[2];
 };
 
 /**
@@ -1027,6 +1029,7 @@ enum ethtool_sfeatures_retval_bits {
 #define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
 /* Flag to enable additional fields in struct ethtool_rx_flow_spec */
 #define	FLOW_EXT	0x80000000
+#define	FLOW_MAC_EXT	0x40000000
 
 /* L3-L4 network traffic flow hash options */
 #define	RXH_L2DA	(1 << 1)
diff --git a/ethtool.8.in b/ethtool.8.in
index e701919..a52e484 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -268,6 +268,7 @@ ethtool \- query or control network driver and hardware settings
 .BM vlan\-etype
 .BM vlan
 .BM user\-def
+.RB [ dst-mac \ \*(MA\ [ m \ \*(MA]]
 .BN action
 .BN loc
 .RB |
@@ -739,6 +740,11 @@ Includes the VLAN tag and an optional mask.
 .BI user\-def \ N \\fR\ [\\fPm \ N \\fR]\\fP
 Includes 64-bits of user-specific data and an optional mask.
 .TP
+.BR dst-mac \ \*(MA\ [ m \ \*(MA]
+Includes the destination MAC address, specified as 6 bytes in hexadecimal
+separated by colons, along with an optional mask.
+Valid for all IPv4 based flow-types.
+.TP
 .BI action \ N
 Specifies the Rx queue to send packets to, or some other action.
 .TS
diff --git a/ethtool.c b/ethtool.c
index 345c21c..55bc082 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -3231,6 +3231,10 @@ static int flow_spec_to_ntuple(struct ethtool_rx_flow_spec *fsp,
 	if (fsp->location != RX_CLS_LOC_ANY)
 		return -1;
 
+	/* destination MAC address in L3/L4 rules is not supported by ntuple */
+	if (fsp->flow_type & FLOW_MAC_EXT)
+		return -1;
+
 	/* verify ring cookie can transfer to action */
 	if (fsp->ring_cookie > INT_MAX && fsp->ring_cookie < (u64)(-2))
 		return -1;
@@ -3814,6 +3818,7 @@ static const struct option {
 	  "			[ vlan-etype %x [m %x] ]\n"
 	  "			[ vlan %x [m %x] ]\n"
 	  "			[ user-def %x [m %x] ]\n"
+	  "			[ dst-mac %x:%x:%x:%x:%x:%x [m %x:%x:%x:%x:%x:%x] ]\n"
 	  "			[ action %d ]\n"
 	  "			[ loc %d]] |\n"
 	  "		delete %d\n" },
diff --git a/rxclass.c b/rxclass.c
index e1633a8..1564b62 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -41,26 +41,38 @@ static void rxclass_print_ipv4_rule(__be32 sip, __be32 sipm, __be32 dip,
 
 static void rxclass_print_nfc_spec_ext(struct ethtool_rx_flow_spec *fsp)
 {
-	u64 data, datam;
-	__u16 etype, etypem, tci, tcim;
+	if (fsp->flow_type & FLOW_EXT) {
+		u64 data, datam;
+		__u16 etype, etypem, tci, tcim;
+		etype = ntohs(fsp->h_ext.vlan_etype);
+		etypem = ntohs(~fsp->m_ext.vlan_etype);
+		tci = ntohs(fsp->h_ext.vlan_tci);
+		tcim = ntohs(~fsp->m_ext.vlan_tci);
+		data = (u64)ntohl(fsp->h_ext.data[0]) << 32;
+		data = (u64)ntohl(fsp->h_ext.data[1]);
+		datam = (u64)ntohl(~fsp->m_ext.data[0]) << 32;
+		datam |= (u64)ntohl(~fsp->m_ext.data[1]);
 
-	if (!(fsp->flow_type & FLOW_EXT))
-		return;
+		fprintf(stdout,
+			"\tVLAN EtherType: 0x%x mask: 0x%x\n"
+			"\tVLAN: 0x%x mask: 0x%x\n"
+			"\tUser-defined: 0x%llx mask: 0x%llx\n",
+			etype, etypem, tci, tcim, data, datam);
+	}
 
-	etype = ntohs(fsp->h_ext.vlan_etype);
-	etypem = ntohs(~fsp->m_ext.vlan_etype);
-	tci = ntohs(fsp->h_ext.vlan_tci);
-	tcim = ntohs(~fsp->m_ext.vlan_tci);
-	data = (u64)ntohl(fsp->h_ext.data[0]) << 32;
-	data = (u64)ntohl(fsp->h_ext.data[1]);
-	datam = (u64)ntohl(~fsp->m_ext.data[0]) << 32;
-	datam |= (u64)ntohl(~fsp->m_ext.data[1]);
+	if (fsp->flow_type & FLOW_MAC_EXT) {
+		unsigned char *dmac, *dmacm;
 
-	fprintf(stdout,
-		"\tVLAN EtherType: 0x%x mask: 0x%x\n"
-		"\tVLAN: 0x%x mask: 0x%x\n"
-		"\tUser-defined: 0x%llx mask: 0x%llx\n",
-		etype, etypem, tci, tcim, data, datam);
+		dmac = fsp->h_ext.h_dest;
+		dmacm = fsp->m_ext.h_dest;
+
+		fprintf(stdout,
+			"\tDest MAC addr: %02X:%02X:%02X:%02X:%02X:%02X"
+			" mask: %02X:%02X:%02X:%02X:%02X:%02X\n",
+			dmac[0], dmac[1], dmac[2], dmac[3], dmac[4],
+			dmac[5], dmacm[0], dmacm[1], dmacm[2], dmacm[3],
+			dmacm[4], dmacm[5]);
+	}
 }
 
 static void rxclass_print_nfc_rule(struct ethtool_rx_flow_spec *fsp)
@@ -70,7 +82,7 @@ static void rxclass_print_nfc_rule(struct ethtool_rx_flow_spec *fsp)
 
 	fprintf(stdout,	"Filter: %d\n", fsp->location);
 
-	flow_type = fsp->flow_type & ~FLOW_EXT;
+	flow_type = fsp->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT);
 
 	invert_flow_mask(fsp);
 
@@ -172,7 +184,7 @@ static void rxclass_print_nfc_rule(struct ethtool_rx_flow_spec *fsp)
 static void rxclass_print_rule(struct ethtool_rx_flow_spec *fsp)
 {
 	/* print the rule in this location */
-	switch (fsp->flow_type & ~FLOW_EXT) {
+	switch (fsp->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
 	case TCP_V4_FLOW:
 	case UDP_V4_FLOW:
 	case SCTP_V4_FLOW:
@@ -533,6 +545,7 @@ typedef enum {
 #define NTUPLE_FLAG_VLAN	0x100
 #define NTUPLE_FLAG_UDEF	0x200
 #define NTUPLE_FLAG_VETH	0x400
+#define NFC_FLAG_MAC_ADDR	0x800
 
 struct rule_opts {
 	const char	*name;
@@ -571,6 +584,9 @@ static const struct rule_opts rule_nfc_tcp_ip4[] = {
 	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
 	  offsetof(struct ethtool_rx_flow_spec, h_ext.data),
 	  offsetof(struct ethtool_rx_flow_spec, m_ext.data) },
+	{ "dst-mac", OPT_MAC, NFC_FLAG_MAC_ADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.h_dest),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.h_dest) },
 };
 
 static const struct rule_opts rule_nfc_esp_ip4[] = {
@@ -599,6 +615,9 @@ static const struct rule_opts rule_nfc_esp_ip4[] = {
 	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
 	  offsetof(struct ethtool_rx_flow_spec, h_ext.data),
 	  offsetof(struct ethtool_rx_flow_spec, m_ext.data) },
+	{ "dst-mac", OPT_MAC, NFC_FLAG_MAC_ADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.h_dest),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.h_dest) },
 };
 
 static const struct rule_opts rule_nfc_usr_ip4[] = {
@@ -639,6 +658,9 @@ static const struct rule_opts rule_nfc_usr_ip4[] = {
 	{ "user-def", OPT_BE64, NTUPLE_FLAG_UDEF,
 	  offsetof(struct ethtool_rx_flow_spec, h_ext.data),
 	  offsetof(struct ethtool_rx_flow_spec, m_ext.data) },
+	{ "dst-mac", OPT_MAC, NFC_FLAG_MAC_ADDR,
+	  offsetof(struct ethtool_rx_flow_spec, h_ext.h_dest),
+	  offsetof(struct ethtool_rx_flow_spec, m_ext.h_dest) },
 };
 
 static const struct rule_opts rule_nfc_ether[] = {
@@ -1063,6 +1085,8 @@ int rxclass_parse_ruleopts(struct cmd_context *ctx,
 		fsp->h_u.usr_ip4_spec.ip_ver = ETH_RX_NFC_IP4;
 	if (flags & (NTUPLE_FLAG_VLAN | NTUPLE_FLAG_UDEF | NTUPLE_FLAG_VETH))
 		fsp->flow_type |= FLOW_EXT;
+	if (flags & NFC_FLAG_MAC_ADDR)
+		fsp->flow_type |= FLOW_MAC_EXT;
 
 	return 0;
 
-- 
1.7.11.3

^ permalink raw reply related

* Re: [PATCH net-next rfc 2/2] tuntap: allow unpriveledge user to enable and disable queues
From: Michael S. Tsirkin @ 2012-12-11 12:30 UTC (permalink / raw)
  To: Jason Wang; +Cc: pmoore, netdev, linux-kernel, mprivozn
In-Reply-To: <1355223827-57290-3-git-send-email-jasowang@redhat.com>

On Tue, Dec 11, 2012 at 07:03:47PM +0800, Jason Wang wrote:
> Currently, when a file is attached to tuntap through TUNSETQUEUE, the uid/gid
> and CAP_NET_ADMIN were checked, and we use this ioctl to create and destroy
> queues. Sometimes, userspace such as qemu need to the ability to enable and
> disable a specific queue without priveledge since guest operating system may
> change the number of queues it want use.
> 
> To support this kind of ability, this patch introduce a flag enabled which is
> used to track whether the queue is enabled by userspace. And also restrict that
> only one deivce could be used for a queue to attach. With this patch, the DAC
> checking when adding queues through IFF_ATTACH_QUEUE is still done and after
> this, IFF_DETACH_QUEUE/IFF_ATTACH_QUEUE  could be used to disable/enable this
> queue.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/tun.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 files changed, 73 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index d593f56..43831a7 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -138,6 +138,7 @@ struct tun_file {
>  	/* only used for fasnyc */
>  	unsigned int flags;
>  	u16 queue_index;
> +	bool enabled;
>  };
>  
>  struct tun_flow_entry {
> @@ -345,9 +346,11 @@ unlock:
>  static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
>  {
>  	struct tun_struct *tun = netdev_priv(dev);
> +	struct tun_file *tfile;
>  	struct tun_flow_entry *e;
>  	u32 txq = 0;
>  	u32 numqueues = 0;
> +	int i;
>  
>  	rcu_read_lock();
>  	numqueues = tun->numqueues;
> @@ -366,6 +369,19 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
>  			txq -= numqueues;
>  	}
>  
> +	tfile = rcu_dereference(tun->tfiles[txq]);
> +	if (unlikely(!tfile->enabled))

This unlikely tag is suspicious. It should be perfectly
legal to use less queues than created.

> +		/* tun_detach() should make sure there's at least one queue
> +		 * could be used to do the tranmission.
> +		 */
> +		for (i = 0; i < numqueues; i++) {
> +			tfile = rcu_dereference(tun->tfiles[i]);
> +			if (tfile->enabled) {
> +				txq = i;
> +				break;
> +			}
> +		}
> +

Worst case this will do a linear scan over all queueus on each packet.
Instead, I think we need a list of all queues and only install
the active ones in the array.

>  	rcu_read_unlock();
>  	return txq;
>  }
> @@ -386,6 +402,36 @@ static void tun_set_real_num_queues(struct tun_struct *tun)
>  	netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
>  }
>  
> +static int tun_enable(struct tun_file *tfile)
> +{
> +	if (tfile->enabled == true)

simply if (tfile->enabled)

> +		return -EINVAL;

Actually it's better to have operations be
idempotent. If it's enabled, enabling should
be a NOP not an error.

> +
> +	tfile->enabled = true;
> +	return 0;
> +}
> +
> +static int tun_disable(struct tun_file *tfile)
> +{
> +	struct tun_struct *tun = rcu_dereference_protected(tfile->tun,
> +							   lockdep_rtnl_is_held());
> +	u16 index = tfile->queue_index;
> +
> +	if (!tun)
> +		return -EINVAL;
> +
> +	if (tun->numqueues == 1)
> +		return -EINVAL;

So if there's a single queue we can't disable it,
but if there are > 1 we can disable them all.
This seems arbitrary.

> +
> +	BUG_ON(index >= tun->numqueues);
> +	tfile->enabled = false;
> +
> +	synchronize_net();
> +	tun_flow_delete_by_queue(tun, index);
> +
> +	return 0;
> +}
> +
>  static void __tun_detach(struct tun_file *tfile, bool clean)
>  {
>  	struct tun_file *ntfile;
> @@ -446,6 +492,7 @@ static void tun_detach_all(struct net_device *dev)
>  		BUG_ON(!tfile);
>  		wake_up_all(&tfile->wq.wait);
>  		rcu_assign_pointer(tfile->tun, NULL);
> +		tfile->enabled = false;
>  		--tun->numqueues;
>  	}
>  	BUG_ON(tun->numqueues != 0);
> @@ -490,6 +537,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
>  	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
>  	sock_hold(&tfile->sk);
>  	tun->numqueues++;
> +	tfile->enabled = true;
>  
>  	tun_set_real_num_queues(tun);
>  
> @@ -672,6 +720,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  	if (txq >= tun->numqueues)
>  		goto drop;
>  
> +	/* Drop packet if the queue was not enabled */
> +	if (!tfile->enabled)
> +		goto drop;
> +
>  	tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
>  
>  	BUG_ON(!tfile);
> @@ -1010,6 +1062,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>  	bool zerocopy = false;
>  	int err;
>  
> +	if (!tfile->enabled)
> +		return -EINVAL;
> +
>  	if (!(tun->flags & TUN_NO_PI)) {
>  		if ((len -= sizeof(pi)) > total_len)
>  			return -EINVAL;
> @@ -1199,6 +1254,9 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>  	struct tun_pi pi = { 0, skb->protocol };
>  	ssize_t total = 0;
>  
> +	if (!tfile->enabled)
> +		return -EINVAL;
> +
>  	if (!(tun->flags & TUN_NO_PI)) {
>  		if ((len -= sizeof(pi)) < 0)
>  			return -EINVAL;
> @@ -1769,15 +1827,21 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>  		if (dev->netdev_ops != &tap_netdev_ops &&
>  			dev->netdev_ops != &tun_netdev_ops)
>  			ret = -EINVAL;
> -		else if (tun_not_capable(tun))
> -			ret = -EPERM;
> -		/* TUNSETIFF is needed to do permission checking */
> -		else if (tun->numqueues == 0)
> -			ret = -EPERM;
> -		else
> -			ret = tun_attach(tun, file);
> +		else {
> +			if (!rcu_dereference(tfile->tun)) {

Should be rcu_dereference_protected.

> +				if (tun_not_capable(tun) ||
> +				    tun->numqueues == 0)
> +					ret = -EPERM;
> +				else
> +					ret = tun_attach(tun, file);
> +			}
> +			else {
> +				/* FIXME: permission check? */
> +				ret = tun_enable(tfile);
> +			}
> +		}
>  	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
> -		__tun_detach(tfile, false);
> +		tun_disable(tfile);
>  	else
>  		ret = -EINVAL;
>  
> @@ -2085,6 +2149,7 @@ static int tun_chr_open(struct inode *inode, struct file * file)
>  	tfile->socket.file = file;
>  	tfile->socket.ops = &tun_socket_ops;
>  
> +	tfile->enabled = false;
>  	sock_init_data(&tfile->socket, &tfile->sk);
>  	sk_change_net(&tfile->sk, tfile->net);
>  
> -- 
> 1.7.1

^ permalink raw reply

* Re: [PATCH net-next rfc 0/2] Allow unpriveledge user to disable tuntap queue
From: Michael S. Tsirkin @ 2012-12-11 12:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: pmoore, netdev, linux-kernel, mprivozn
In-Reply-To: <1355223827-57290-1-git-send-email-jasowang@redhat.com>

On Tue, Dec 11, 2012 at 07:03:45PM +0800, Jason Wang wrote:
> This series is an rfc that tries to solve the issue that the queues of tuntap
> could not be disabled/enabled by unpriveledged user. This is needed for
> unpriveledge userspace such as qemu since guest may change the number of queues
> at any time, qemu needs to configure the tuntap to disable/enable a specific
> queue.
> 
> Instead of introducting new flag/ioctls, this series tries to re-use the current
> TUNSETQUEUE and IFF_ATTACH_QUEUE/IFF_DETACH_QUEUE. After this change,
> IFF_DETACH_QUEUE is used to disable a specific queue instead of detaching all
> its state from tuntap. IFF_ATTACH_QUEUE is used to do: 1) creating new queue to
> a tuntap device, in this situation, previous DAC check is still done. 2)
> re-enable the queue previously disabled by IFF_DETACH_QUEUE, in this situation,
> we can bypass some checking when we do during queue creating (the check need to
> be done here needs discussion.
> 
> Management software (such as libvirt) then can do:
> - TUNSETIFF to creating device and queue 0
> - TUNSETQUEUE to create the rest of queues
> - Passing them to unpriveledge userspace (such as qemu)

Sorry I find this somewhat confusing.
Why doesn't management call TUNSETIFF to create all queues -
seems cleaner, no? Also has the advantage that it works
without selinux changes.

So why don't we simply fix TUNSETQUEUE such that
1. It only works if already attached to device by TUNSETIFF
2. It does not attach/detach, instead simply enables/disables the queue

This way no new flags, just tweak the semantics of the
existing ones. Need to do this before 3.8 is out though
otherwise we'll end up maintaining the old semantics forever.

> Then the unpriveledge userspace can enable and disable a specific queue through
> IFF_ATTACH_QUEUE and IFF_DETACH_QUEUE.
> 
> This is done by introducing a enabled flags were used to notify whether the
> queue is enabled, and tuntap only send/receive packets when it was enabled.
> 
> Please comment, thanks!
> 
> Jason Wang (2):
>   tuntap: forbid calling TUNSETQUEUE for a persistent device with no
>     queues
>   tuntap: allow unpriveledge user to enable and disable queues
> 
>  drivers/net/tun.c |   78 +++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 73 insertions(+), 5 deletions(-)

^ permalink raw reply

* Re: [PATCH][RFC] smsc95xx: enable dynamic autosuspend (RFC)
From: Ming Lei @ 2012-12-11 12:53 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Steve Glendinning, Steve Glendinning, netdev,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman
In-Reply-To: <1881907.JmjSTg2PBW-ugxBuEnWX9yG/4A2pS7c2Q@public.gmane.org>

On Tue, Dec 11, 2012 at 6:27 PM, Oliver Neukum <oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org> wrote:
> On Tuesday 11 December 2012 10:24:57 Ming Lei wrote:
>> On Mon, Dec 10, 2012 at 10:18 PM, Steve Glendinning <steve-nksJyM/082jR7s880joybQ@public.gmane.org> wrote:
>
>> > Thanks, so something like this should do the job?
>>
>> This will do, but not simple as clearing .manage_power function
>> pointer in bind(), and still disable runtime suspend for link off case
>> since these devices which don't support suspend 3 can generate
>> remote wakeup for link change event.
>
> So they can autosuspend if the interface is up and no cable is plugged
> in?

>From the open datasheet, that is the suspend 1 mode, which is supported
by all LAN95xx devices. Steve, correct me if I am wrong.

>
>> I suggest to introduce link-off triggered runtime suspend for these
>> usbnet devices(non-LAN9500A device, devices which don't support
>> USB auto-suspend), and I have posted one patch set before[1].
>> If no one objects that, I'd like to post them again with some fix and
>> update for checking link after link_reset().
>
> If you can get rid of a periodic work this would be great.

For the LAN95xx devices, the periodic work isn't needed because
they may generate remote wakeup when link change is detected.

In fact, I have test data which can show a much power save
on OMAP3 based beagle board plus asix usbnet device with
the periodic work. IMO, the power save after introducing periodic
timer depends on the arch or platform, there should be much power
save if the CPU power consumption is very less. So how about letting
module parameter switch on/off the periodic work?


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] ipv6: fix the bug when propagating Redirect Message
From: Duan Jiong @ 2012-12-11 12:58 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: davem, netdev
In-Reply-To: <20121024045410.GF27385@secunet.com>

于 2012/10/24 12:54, Steffen Klassert 写道:
> On Tue, Oct 23, 2012 at 11:26:25PM +0800, Duan Jiong wrote:
>>
>> Before using icmpv6_notify() to propagate redirect, change skb->data
>> to poing the IP packet that triggered the sending of the Redirect.
>>
>> Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
>> ---
>>  net/ipv6/ndisc.c |   39 +++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
>> index ff36194..0f73303 100644
>> --- a/net/ipv6/ndisc.c
>> +++ b/net/ipv6/ndisc.c
>> @@ -1334,6 +1334,11 @@ out:
>>  
>>  static void ndisc_redirect_rcv(struct sk_buff *skb)
>>  {
>> +	int opt_len;
>> +	int opt_offset;
>> +	int ndisc_head_len;
>> +	struct nd_opt_hdr *nd_opt;
>> +	
>>  #ifdef CONFIG_IPV6_NDISC_NODETYPE
>>  	switch (skb->ndisc_nodetype) {
>>  	case NDISC_NODETYPE_HOST:
>> @@ -1350,6 +1355,40 @@ static void ndisc_redirect_rcv(struct sk_buff *skb)
>>  		return;
>>  	}
>>  
>> +	ndisc_head_len = sizeof(struct icmp6hdr) + 2*sizeof(struct in6_addr);
>> +	if (!pskb_may_pull(skb, ndisc_head_len)) {
>> +		return;
>> +	}
>> +
>> +	nd_opt = (struct nd_opt_hdr *)(skb->data + ndisc_head_len);
>> +
>> +	opt_len = skb->tail - skb->transport_header - ndisc_head_len;
>> +	if (opt_len < 0) {
>> +		return;
>> +	}
>> +	while (opt_len) {
>> +		int l;
>> +	
>> +		if (opt_len < sizeof(struct nd_opt_hdr)) {
>> +			return;
>> +		}
>> +		l = nd_opt->nd_opt_len << 3;
>> +		if (opt_len < l || l == 0) {
>> +			return;
>> +		}
>> +		if (nd_opt->nd_opt_type == ND_OPT_REDIRECT_HDR) {
>> +			__skb_pull(skb, ndisc_head_len + opt_offset + 8);
>> +			break;
>> +		}
>> +		opt_len -= l;
>> +		nd_opt = ((void *)nd_opt) + 1;
>> +		opt_offset += 1;
>> +	}
> 
> Instead of the above loop, you could use ndisc_parse_options().
> This does the same what you are doing here and it would make it
> a bit clearer what's going on.
> 
I apologize for not replying to you earlier,and i will continue 
to update my patches. 

Just like you said, i try to use ndisc_parse_options() to instead
of the loop, but i find the skb->data can't be changed in function
ndisc_parse_options() due to lack of  arguments. So i think it is
better to continue to use the loop. How do you think this?

Thanks!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox