Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 1/4] ipip: allow to deactivate the creation of fb dev
From: David Miller @ 2012-11-20  8:34 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: shemminger, netdev
In-Reply-To: <50AB3F99.7070701@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Tue, 20 Nov 2012 09:30:17 +0100

> Le 20/11/2012 01:07, David Miller a écrit :
>> Why not put this aside and submit the other parts of your
>> patch set on their own, since those looked fine to me?
> Ok, no problem. I wanted to get feedback about this kind of idea, I
> understand the point.
> 
> When you said "the other parts", you mean only patch 2/4 (sit: allow
> to configure 6rd tunnels via netlink) or 3/4 and 4/4 too?
> Because 3/4 and 4/4 are equivalent to this one.

I meant patch #2.

^ permalink raw reply

* Re: [Xen-devel] compound skb frag pages appearing in start_xmit
From: Stefan Bader @ 2012-11-20  8:30 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: ANNIE LI, Ian Campbell, Eric Dumazet, netdev@vger.kernel.org,
	Marcos E. Matsunaga, xen-devel, Konrad Rzeszutek Wilk,
	Eric Dumazet
In-Reply-To: <1333495676.20121119164304@eikelenboom.it>

[-- Attachment #1: Type: text/plain, Size: 2248 bytes --]

On 19.11.2012 16:43, Sander Eikelenboom wrote:
> 
> Thursday, November 15, 2012, 3:31:42 AM, you wrote:
> 
>> On 2012-10-11 18:14, Ian Campbell wrote:
>>> On Thu, 2012-10-11 at 11:05 +0100, Eric Dumazet wrote:
>>>> On Thu, 2012-10-11 at 12:00 +0200, Sander Eikelenboom wrote:
>>>>
>>>>> Probably due to the BUG_ON from the patch below, i changed it into a WARN_ON.
>>>>> And i seem to hit it, but only in one of the guests at the moment and it triggers quite irregularly.
>>>> xennet_make_frags() is able to split the skb->head in multiple page-size
>>>> chunks.
>>>>
>>>> It should do the same for fragments
>>> Right, I just want to be reproduce the issue so I can know I've fixed it
>>> properly ;-)
>> Hi Ian,
> 
>> I can reproduce this BUG_ON when running netperf/netserver test between 
>> two domus running on the same dom0. The domu and dom0 all use v3.7-rc1.
> 
>> When I tried to rebase my persistent grant netfront/netback patch on 
>> latest kernel, netperf/netserver test never succeeded. I did some test 
>> to find out that v3.6-rc7 works fine, but v3.7-rc1, v3.7-rc2 and 
>> v3.7-rc4 does not succeed in netperf/netserver test. So I keep my 
>> persistent grant patch only based on v3.4-rc3 now.
> 
>> Konrad thought about commit 6a8ed462f16b8455eec5ae00eb6014159a6721f0 in 
>> v3.7-rc1, and suggested me to test your debug patch in netfront. This 
>> BUG_ON happens soon after running the netperf/netserver test case.
> 
>> Thanks
>> Annie
> 
> Is there any progression with this bug (rc6 is out the door, so the release of 3.7-final seems to be eminent and this bug completely cripples any networking with guests) ?
> 

+1 on that. I was testing yesterday with a PVM domU running 3.7-rc5 on Xen 4.2
(but also reported from EC2 running Xen 3.4.3) c with one VCPU. I actually can
trigger it by just ssh'ing into the domU (from another machine) and then run
"find /". Output starts to stutter and then stops completely. When this happens
a new connection still can be made and as long as only shorter output is
generated the ssh connection is ok. From a dump taken it looks like user-space
is waiting in some select call (without any warnon I rather won't see the tx path).

-Stefan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 1/4] ipip: allow to deactivate the creation of fb dev
From: Nicolas Dichtel @ 2012-11-20  8:30 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, netdev
In-Reply-To: <20121119.190753.856116050444408241.davem@davemloft.net>

Le 20/11/2012 01:07, David Miller a écrit :
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Fri, 16 Nov 2012 17:46:11 +0100
>
>> By default, the fb device is created, so there is no change if you
>> don't set explicitly setup_fb to 0.
>
>
> Nicolas, this idea is contentous to me too.
>
> Why not put this aside and submit the other parts of your
> patch set on their own, since those looked fine to me?
Ok, no problem. I wanted to get feedback about this kind of idea, I understand 
the point.

When you said "the other parts", you mean only patch 2/4 (sit: allow to 
configure 6rd tunnels via netlink) or 3/4 and 4/4 too?
Because 3/4 and 4/4 are equivalent to this one.

^ permalink raw reply

* [PATCH 7/7] netprio_cgroup: allow nesting and inherit config on cgroup creation
From: Tejun Heo @ 2012-11-20  8:30 UTC (permalink / raw)
  To: daniel.wagner-98C5kh4wR6ohFhg+JK9F0w,
	serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1353400211-5182-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Inherit netprio configuration from ->css_online(), allow nesting and
remove .broken_hierarchy marking.  This makes netprio_cgroup's
behavior match netcls_cgroup's.

Note that this patch changes userland-visible behavior.  Nesting is
allowed and the first level cgroups below the root cgroup behave
differently - they inherit priorities from the root cgroup on creation
instead of starting with 0.  This is unfortunate but not doing so is
much crazier.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 Documentation/cgroups/net_prio.txt |  2 ++
 net/core/netprio_cgroup.c          | 42 ++++++++++++++++++++++----------------
 2 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/Documentation/cgroups/net_prio.txt b/Documentation/cgroups/net_prio.txt
index 01b3226..a82cbd2 100644
--- a/Documentation/cgroups/net_prio.txt
+++ b/Documentation/cgroups/net_prio.txt
@@ -51,3 +51,5 @@ One usage for the net_prio cgroup is with mqprio qdisc allowing application
 traffic to be steered to hardware/driver based traffic classes. These mappings
 can then be managed by administrators or other networking protocols such as
 DCBX.
+
+A new net_prio cgroup inherits the parent's configuration.
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index b2af0d0..bde53da 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -136,9 +136,6 @@ static struct cgroup_subsys_state *cgrp_css_alloc(struct cgroup *cgrp)
 {
 	struct cgroup_netprio_state *cs;
 
-	if (cgrp->parent && cgrp->parent->id)
-		return ERR_PTR(-EINVAL);
-
 	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
 	if (!cs)
 		return ERR_PTR(-ENOMEM);
@@ -146,16 +143,34 @@ static struct cgroup_subsys_state *cgrp_css_alloc(struct cgroup *cgrp)
 	return &cs->css;
 }
 
-static void cgrp_css_free(struct cgroup *cgrp)
+static int cgrp_css_online(struct cgroup *cgrp)
 {
-	struct cgroup_netprio_state *cs = cgrp_netprio_state(cgrp);
+	struct cgroup *parent = cgrp->parent;
 	struct net_device *dev;
+	int ret = 0;
+
+	if (!parent)
+		return 0;
 
 	rtnl_lock();
-	for_each_netdev(&init_net, dev)
-		WARN_ON_ONCE(netprio_set_prio(cgrp, dev, 0));
+	/*
+	 * Inherit prios from the parent.  As all prios are set during
+	 * onlining, there is no need to clear them on offline.
+	 */
+	for_each_netdev(&init_net, dev) {
+		u32 prio = netprio_prio(parent, dev);
+
+		ret = netprio_set_prio(cgrp, dev, prio);
+		if (ret)
+			break;
+	}
 	rtnl_unlock();
-	kfree(cs);
+	return ret;
+}
+
+static void cgrp_css_free(struct cgroup *cgrp)
+{
+	kfree(cgrp_netprio_state(cgrp));
 }
 
 static u64 read_prioidx(struct cgroup *cgrp, struct cftype *cft)
@@ -237,21 +252,12 @@ static struct cftype ss_files[] = {
 struct cgroup_subsys net_prio_subsys = {
 	.name		= "net_prio",
 	.css_alloc	= cgrp_css_alloc,
+	.css_online	= cgrp_css_online,
 	.css_free	= cgrp_css_free,
 	.attach		= net_prio_attach,
 	.subsys_id	= net_prio_subsys_id,
 	.base_cftypes	= ss_files,
 	.module		= THIS_MODULE,
-
-	/*
-	 * net_prio has artificial limit on the number of cgroups and
-	 * disallows nesting making it impossible to co-mount it with other
-	 * hierarchical subsystems.  Remove the artificially low PRIOIDX_SZ
-	 * limit and properly nest configuration such that children follow
-	 * their parents' configurations by default and are allowed to
-	 * override and remove the following.
-	 */
-	.broken_hierarchy = true,
 };
 
 static int netprio_device_event(struct notifier_block *unused,
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 6/7] netprio_cgroup: implement netprio[_set]_prio() helpers
From: Tejun Heo @ 2012-11-20  8:30 UTC (permalink / raw)
  To: daniel.wagner-98C5kh4wR6ohFhg+JK9F0w,
	serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1353400211-5182-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Introduce two helpers - netprio_prio() and netprio_set_prio() - which
hide the details of priomap access and expansion.  This will help
implementing hierarchy support.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
---
 net/core/netprio_cgroup.c | 72 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 22 deletions(-)

diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 9409cdf..b2af0d0 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -87,6 +87,51 @@ static int extend_netdev_table(struct net_device *dev, u32 target_idx)
 	return 0;
 }
 
+/**
+ * netprio_prio - return the effective netprio of a cgroup-net_device pair
+ * @cgrp: cgroup part of the target pair
+ * @dev: net_device part of the target pair
+ *
+ * Should be called under RCU read or rtnl lock.
+ */
+static u32 netprio_prio(struct cgroup *cgrp, struct net_device *dev)
+{
+	struct netprio_map *map = rcu_dereference_rtnl(dev->priomap);
+
+	if (map && cgrp->id < map->priomap_len)
+		return map->priomap[cgrp->id];
+	return 0;
+}
+
+/**
+ * netprio_set_prio - set netprio on a cgroup-net_device pair
+ * @cgrp: cgroup part of the target pair
+ * @dev: net_device part of the target pair
+ * @prio: prio to set
+ *
+ * Set netprio to @prio on @cgrp-@dev pair.  Should be called under rtnl
+ * lock and may fail under memory pressure for non-zero @prio.
+ */
+static int netprio_set_prio(struct cgroup *cgrp, struct net_device *dev,
+			    u32 prio)
+{
+	struct netprio_map *map;
+	int ret;
+
+	/* avoid extending priomap for zero writes */
+	map = rtnl_dereference(dev->priomap);
+	if (!prio && (!map || map->priomap_len <= cgrp->id))
+		return 0;
+
+	ret = extend_netdev_table(dev, cgrp->id);
+	if (ret)
+		return ret;
+
+	map = rtnl_dereference(dev->priomap);
+	map->priomap[cgrp->id] = prio;
+	return 0;
+}
+
 static struct cgroup_subsys_state *cgrp_css_alloc(struct cgroup *cgrp)
 {
 	struct cgroup_netprio_state *cs;
@@ -105,14 +150,10 @@ static void cgrp_css_free(struct cgroup *cgrp)
 {
 	struct cgroup_netprio_state *cs = cgrp_netprio_state(cgrp);
 	struct net_device *dev;
-	struct netprio_map *map;
 
 	rtnl_lock();
-	for_each_netdev(&init_net, dev) {
-		map = rtnl_dereference(dev->priomap);
-		if (map && cgrp->id < map->priomap_len)
-			map->priomap[cgrp->id] = 0;
-	}
+	for_each_netdev(&init_net, dev)
+		WARN_ON_ONCE(netprio_set_prio(cgrp, dev, 0));
 	rtnl_unlock();
 	kfree(cs);
 }
@@ -126,16 +167,10 @@ static int read_priomap(struct cgroup *cont, struct cftype *cft,
 			struct cgroup_map_cb *cb)
 {
 	struct net_device *dev;
-	u32 id = cont->id;
-	u32 priority;
-	struct netprio_map *map;
 
 	rcu_read_lock();
-	for_each_netdev_rcu(&init_net, dev) {
-		map = rcu_dereference(dev->priomap);
-		priority = (map && id < map->priomap_len) ? map->priomap[id] : 0;
-		cb->fill(cb, dev->name, priority);
-	}
+	for_each_netdev_rcu(&init_net, dev)
+		cb->fill(cb, dev->name, netprio_prio(cont, dev));
 	rcu_read_unlock();
 	return 0;
 }
@@ -145,7 +180,6 @@ static int write_priomap(struct cgroup *cgrp, struct cftype *cft,
 {
 	char devname[IFNAMSIZ + 1];
 	struct net_device *dev;
-	struct netprio_map *map;
 	u32 prio;
 	int ret;
 
@@ -158,14 +192,8 @@ static int write_priomap(struct cgroup *cgrp, struct cftype *cft,
 
 	rtnl_lock();
 
-	ret = extend_netdev_table(dev, cgrp->id);
-	if (ret)
-		goto out_unlock;
+	ret = netprio_set_prio(cgrp, dev, prio);
 
-	map = rtnl_dereference(dev->priomap);
-	if (map)
-		map->priomap[cgrp->id] = prio;
-out_unlock:
 	rtnl_unlock();
 	dev_put(dev);
 	return ret;
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 5/7] netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx
From: Tejun Heo @ 2012-11-20  8:30 UTC (permalink / raw)
  To: daniel.wagner-98C5kh4wR6ohFhg+JK9F0w,
	serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1353400211-5182-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

With priomap expansion no longer depending on knowing max id
allocated, netprio_cgroup can use cgroup->id insted of cs->prioidx.
Drop prioidx alloc/free logic and convert all uses to cgroup->id.

* In cgrp_css_alloc(), parent->id test is moved above @cs allocation
  to simplify error path.

* In cgrp_css_free(), @cs assignment is made initialization.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
---
 include/net/netprio_cgroup.h | 11 +++-----
 net/core/netprio_cgroup.c    | 65 ++++++++------------------------------------
 2 files changed, 15 insertions(+), 61 deletions(-)

diff --git a/include/net/netprio_cgroup.h b/include/net/netprio_cgroup.h
index 2760f4f..1d04b6f 100644
--- a/include/net/netprio_cgroup.h
+++ b/include/net/netprio_cgroup.h
@@ -27,7 +27,6 @@ struct netprio_map {
 
 struct cgroup_netprio_state {
 	struct cgroup_subsys_state css;
-	u32 prioidx;
 };
 
 extern void sock_update_netprioidx(struct sock *sk, struct task_struct *task);
@@ -36,13 +35,12 @@ extern void sock_update_netprioidx(struct sock *sk, struct task_struct *task);
 
 static inline u32 task_netprioidx(struct task_struct *p)
 {
-	struct cgroup_netprio_state *state;
+	struct cgroup_subsys_state *css;
 	u32 idx;
 
 	rcu_read_lock();
-	state = container_of(task_subsys_state(p, net_prio_subsys_id),
-			     struct cgroup_netprio_state, css);
-	idx = state->prioidx;
+	css = task_subsys_state(p, net_prio_subsys_id);
+	idx = css->cgroup->id;
 	rcu_read_unlock();
 	return idx;
 }
@@ -57,8 +55,7 @@ static inline u32 task_netprioidx(struct task_struct *p)
 	rcu_read_lock();
 	css = task_subsys_state(p, net_prio_subsys_id);
 	if (css)
-		idx = container_of(css,
-				   struct cgroup_netprio_state, css)->prioidx;
+		idx = css->cgroup->id;
 	rcu_read_unlock();
 	return idx;
 }
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 569d83d..9409cdf 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -28,10 +28,6 @@
 #include <linux/fdtable.h>
 
 #define PRIOMAP_MIN_SZ		128
-#define PRIOIDX_SZ 128
-
-static unsigned long prioidx_map[PRIOIDX_SZ];
-static DEFINE_SPINLOCK(prioidx_map_lock);
 
 static inline struct cgroup_netprio_state *cgrp_netprio_state(struct cgroup *cgrp)
 {
@@ -39,32 +35,6 @@ static inline struct cgroup_netprio_state *cgrp_netprio_state(struct cgroup *cgr
 			    struct cgroup_netprio_state, css);
 }
 
-static int get_prioidx(u32 *prio)
-{
-	unsigned long flags;
-	u32 prioidx;
-
-	spin_lock_irqsave(&prioidx_map_lock, flags);
-	prioidx = find_first_zero_bit(prioidx_map, sizeof(unsigned long) * PRIOIDX_SZ);
-	if (prioidx == sizeof(unsigned long) * PRIOIDX_SZ) {
-		spin_unlock_irqrestore(&prioidx_map_lock, flags);
-		return -ENOSPC;
-	}
-	set_bit(prioidx, prioidx_map);
-	spin_unlock_irqrestore(&prioidx_map_lock, flags);
-	*prio = prioidx;
-	return 0;
-}
-
-static void put_prioidx(u32 idx)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&prioidx_map_lock, flags);
-	clear_bit(idx, prioidx_map);
-	spin_unlock_irqrestore(&prioidx_map_lock, flags);
-}
-
 /*
  * Extend @dev->priomap so that it's large enough to accomodate
  * @target_idx.  @dev->priomap.priomap_len > @target_idx after successful
@@ -120,62 +90,50 @@ static int extend_netdev_table(struct net_device *dev, u32 target_idx)
 static struct cgroup_subsys_state *cgrp_css_alloc(struct cgroup *cgrp)
 {
 	struct cgroup_netprio_state *cs;
-	int ret = -EINVAL;
+
+	if (cgrp->parent && cgrp->parent->id)
+		return ERR_PTR(-EINVAL);
 
 	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
 	if (!cs)
 		return ERR_PTR(-ENOMEM);
 
-	if (cgrp->parent && cgrp_netprio_state(cgrp->parent)->prioidx)
-		goto out;
-
-	ret = get_prioidx(&cs->prioidx);
-	if (ret < 0) {
-		pr_warn("No space in priority index array\n");
-		goto out;
-	}
-
 	return &cs->css;
-out:
-	kfree(cs);
-	return ERR_PTR(ret);
 }
 
 static void cgrp_css_free(struct cgroup *cgrp)
 {
-	struct cgroup_netprio_state *cs;
+	struct cgroup_netprio_state *cs = cgrp_netprio_state(cgrp);
 	struct net_device *dev;
 	struct netprio_map *map;
 
-	cs = cgrp_netprio_state(cgrp);
 	rtnl_lock();
 	for_each_netdev(&init_net, dev) {
 		map = rtnl_dereference(dev->priomap);
-		if (map && cs->prioidx < map->priomap_len)
-			map->priomap[cs->prioidx] = 0;
+		if (map && cgrp->id < map->priomap_len)
+			map->priomap[cgrp->id] = 0;
 	}
 	rtnl_unlock();
-	put_prioidx(cs->prioidx);
 	kfree(cs);
 }
 
 static u64 read_prioidx(struct cgroup *cgrp, struct cftype *cft)
 {
-	return (u64)cgrp_netprio_state(cgrp)->prioidx;
+	return cgrp->id;
 }
 
 static int read_priomap(struct cgroup *cont, struct cftype *cft,
 			struct cgroup_map_cb *cb)
 {
 	struct net_device *dev;
-	u32 prioidx = cgrp_netprio_state(cont)->prioidx;
+	u32 id = cont->id;
 	u32 priority;
 	struct netprio_map *map;
 
 	rcu_read_lock();
 	for_each_netdev_rcu(&init_net, dev) {
 		map = rcu_dereference(dev->priomap);
-		priority = (map && prioidx < map->priomap_len) ? map->priomap[prioidx] : 0;
+		priority = (map && id < map->priomap_len) ? map->priomap[id] : 0;
 		cb->fill(cb, dev->name, priority);
 	}
 	rcu_read_unlock();
@@ -185,7 +143,6 @@ static int read_priomap(struct cgroup *cont, struct cftype *cft,
 static int write_priomap(struct cgroup *cgrp, struct cftype *cft,
 			 const char *buffer)
 {
-	u32 prioidx = cgrp_netprio_state(cgrp)->prioidx;
 	char devname[IFNAMSIZ + 1];
 	struct net_device *dev;
 	struct netprio_map *map;
@@ -201,13 +158,13 @@ static int write_priomap(struct cgroup *cgrp, struct cftype *cft,
 
 	rtnl_lock();
 
-	ret = extend_netdev_table(dev, prioidx);
+	ret = extend_netdev_table(dev, cgrp->id);
 	if (ret)
 		goto out_unlock;
 
 	map = rtnl_dereference(dev->priomap);
 	if (map)
-		map->priomap[prioidx] = prio;
+		map->priomap[cgrp->id] = prio;
 out_unlock:
 	rtnl_unlock();
 	dev_put(dev);
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 4/7] netprio_cgroup: reimplement priomap expansion
From: Tejun Heo @ 2012-11-20  8:30 UTC (permalink / raw)
  To: daniel.wagner-98C5kh4wR6ohFhg+JK9F0w,
	serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1353400211-5182-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

netprio kept track of the highest prioidx allocated and resized
priomaps accordingly when necessary.  This makes it necessary to keep
track of prioidx allocation and may end up resizing on every new
prioidx.

Update extend_netdev_table() such that it takes @target_idx which the
priomap should be able to accomodate.  If the priomap is large enough,
nothing happens; otherwise, the size is doubled until @target_idx can
be accomodated.

This makes max_prioidx and write_update_netdev_table() unnecessary.
write_priomap() now calls extend_netdev_table() directly.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
---
 net/core/netprio_cgroup.c | 56 ++++++++++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 23 deletions(-)

diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 92cc54c..569d83d 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -27,11 +27,11 @@
 
 #include <linux/fdtable.h>
 
+#define PRIOMAP_MIN_SZ		128
 #define PRIOIDX_SZ 128
 
 static unsigned long prioidx_map[PRIOIDX_SZ];
 static DEFINE_SPINLOCK(prioidx_map_lock);
-static atomic_t max_prioidx = ATOMIC_INIT(0);
 
 static inline struct cgroup_netprio_state *cgrp_netprio_state(struct cgroup *cgrp)
 {
@@ -51,8 +51,6 @@ static int get_prioidx(u32 *prio)
 		return -ENOSPC;
 	}
 	set_bit(prioidx, prioidx_map);
-	if (atomic_read(&max_prioidx) < prioidx)
-		atomic_set(&max_prioidx, prioidx);
 	spin_unlock_irqrestore(&prioidx_map_lock, flags);
 	*prio = prioidx;
 	return 0;
@@ -67,15 +65,40 @@ static void put_prioidx(u32 idx)
 	spin_unlock_irqrestore(&prioidx_map_lock, flags);
 }
 
-static int extend_netdev_table(struct net_device *dev, u32 new_len)
+/*
+ * Extend @dev->priomap so that it's large enough to accomodate
+ * @target_idx.  @dev->priomap.priomap_len > @target_idx after successful
+ * return.  Must be called under rtnl lock.
+ */
+static int extend_netdev_table(struct net_device *dev, u32 target_idx)
 {
-	size_t new_size = sizeof(struct netprio_map) +
-			   ((sizeof(u32) * new_len));
-	struct netprio_map *new = kzalloc(new_size, GFP_KERNEL);
-	struct netprio_map *old;
+	struct netprio_map *old, *new;
+	size_t new_sz, new_len;
 
+	/* is the existing priomap large enough? */
 	old = rtnl_dereference(dev->priomap);
+	if (old && old->priomap_len > target_idx)
+		return 0;
+
+	/*
+	 * Determine the new size.  Let's keep it power-of-two.  We start
+	 * from PRIOMAP_MIN_SZ and double it until it's large enough to
+	 * accommodate @target_idx.
+	 */
+	new_sz = PRIOMAP_MIN_SZ;
+	while (true) {
+		new_len = (new_sz - offsetof(struct netprio_map, priomap)) /
+			sizeof(new->priomap[0]);
+		if (new_len > target_idx)
+			break;
+		new_sz *= 2;
+		/* overflowed? */
+		if (WARN_ON(new_sz < PRIOMAP_MIN_SZ))
+			return -ENOSPC;
+	}
 
+	/* allocate & copy */
+	new = kzalloc(new_sz, GFP_KERNEL);
 	if (!new) {
 		pr_warn("Unable to alloc new priomap!\n");
 		return -ENOMEM;
@@ -87,26 +110,13 @@ static int extend_netdev_table(struct net_device *dev, u32 new_len)
 
 	new->priomap_len = new_len;
 
+	/* install the new priomap */
 	rcu_assign_pointer(dev->priomap, new);
 	if (old)
 		kfree_rcu(old, rcu);
 	return 0;
 }
 
-static int write_update_netdev_table(struct net_device *dev)
-{
-	int ret = 0;
-	u32 max_len;
-	struct netprio_map *map;
-
-	max_len = atomic_read(&max_prioidx) + 1;
-	map = rtnl_dereference(dev->priomap);
-	if (!map || map->priomap_len < max_len)
-		ret = extend_netdev_table(dev, max_len);
-
-	return ret;
-}
-
 static struct cgroup_subsys_state *cgrp_css_alloc(struct cgroup *cgrp)
 {
 	struct cgroup_netprio_state *cs;
@@ -191,7 +201,7 @@ static int write_priomap(struct cgroup *cgrp, struct cftype *cft,
 
 	rtnl_lock();
 
-	ret = write_update_netdev_table(dev);
+	ret = extend_netdev_table(dev, prioidx);
 	if (ret)
 		goto out_unlock;
 
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 3/7] netprio_cgroup: shorten variable names in extend_netdev_table()
From: Tejun Heo @ 2012-11-20  8:30 UTC (permalink / raw)
  To: daniel.wagner-98C5kh4wR6ohFhg+JK9F0w,
	serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1353400211-5182-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

The function is about to go through a rewrite.  In preparation,
shorten the variable names so that we don't repeat "priomap" so often.

This patch is cosmetic.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
---
 net/core/netprio_cgroup.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 66d98da..92cc54c 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -71,26 +71,25 @@ static int extend_netdev_table(struct net_device *dev, u32 new_len)
 {
 	size_t new_size = sizeof(struct netprio_map) +
 			   ((sizeof(u32) * new_len));
-	struct netprio_map *new_priomap = kzalloc(new_size, GFP_KERNEL);
-	struct netprio_map *old_priomap;
+	struct netprio_map *new = kzalloc(new_size, GFP_KERNEL);
+	struct netprio_map *old;
 
-	old_priomap  = rtnl_dereference(dev->priomap);
+	old = rtnl_dereference(dev->priomap);
 
-	if (!new_priomap) {
+	if (!new) {
 		pr_warn("Unable to alloc new priomap!\n");
 		return -ENOMEM;
 	}
 
-	if (old_priomap)
-		memcpy(new_priomap->priomap, old_priomap->priomap,
-		       old_priomap->priomap_len *
-		       sizeof(old_priomap->priomap[0]));
+	if (old)
+		memcpy(new->priomap, old->priomap,
+		       old->priomap_len * sizeof(old->priomap[0]));
 
-	new_priomap->priomap_len = new_len;
+	new->priomap_len = new_len;
 
-	rcu_assign_pointer(dev->priomap, new_priomap);
-	if (old_priomap)
-		kfree_rcu(old_priomap, rcu);
+	rcu_assign_pointer(dev->priomap, new);
+	if (old)
+		kfree_rcu(old, rcu);
 	return 0;
 }
 
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 2/7] netprio_cgroup: simplify write_priomap()
From: Tejun Heo @ 2012-11-20  8:30 UTC (permalink / raw)
  To: daniel.wagner-98C5kh4wR6ohFhg+JK9F0w,
	serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1353400211-5182-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

sscanf() doesn't bite.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
---
 net/core/netprio_cgroup.c | 56 ++++++++++-------------------------------------
 1 file changed, 11 insertions(+), 45 deletions(-)

diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index f0b6b0d..66d98da 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -176,66 +176,32 @@ static int read_priomap(struct cgroup *cont, struct cftype *cft,
 static int write_priomap(struct cgroup *cgrp, struct cftype *cft,
 			 const char *buffer)
 {
-	char *devname = kstrdup(buffer, GFP_KERNEL);
-	int ret = -EINVAL;
 	u32 prioidx = cgrp_netprio_state(cgrp)->prioidx;
-	unsigned long priority;
-	char *priostr;
+	char devname[IFNAMSIZ + 1];
 	struct net_device *dev;
 	struct netprio_map *map;
+	u32 prio;
+	int ret;
 
-	if (!devname)
-		return -ENOMEM;
-
-	/*
-	 * Minimally sized valid priomap string
-	 */
-	if (strlen(devname) < 3)
-		goto out_free_devname;
-
-	priostr = strstr(devname, " ");
-	if (!priostr)
-		goto out_free_devname;
-
-	/*
-	 *Separate the devname from the associated priority
-	 *and advance the priostr pointer to the priority value
-	 */
-	*priostr = '\0';
-	priostr++;
-
-	/*
-	 * If the priostr points to NULL, we're at the end of the passed
-	 * in string, and its not a valid write
-	 */
-	if (*priostr == '\0')
-		goto out_free_devname;
-
-	ret = kstrtoul(priostr, 10, &priority);
-	if (ret < 0)
-		goto out_free_devname;
-
-	ret = -ENODEV;
+	if (sscanf(buffer, "%"__stringify(IFNAMSIZ)"s %u", devname, &prio) != 2)
+		return -EINVAL;
 
 	dev = dev_get_by_name(&init_net, devname);
 	if (!dev)
-		goto out_free_devname;
+		return -ENODEV;
 
 	rtnl_lock();
+
 	ret = write_update_netdev_table(dev);
-	if (ret < 0)
-		goto out_put_dev;
+	if (ret)
+		goto out_unlock;
 
 	map = rtnl_dereference(dev->priomap);
 	if (map)
-		map->priomap[prioidx] = priority;
-
-out_put_dev:
+		map->priomap[prioidx] = prio;
+out_unlock:
 	rtnl_unlock();
 	dev_put(dev);
-
-out_free_devname:
-	kfree(devname);
 	return ret;
 }
 
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 1/7] netcls_cgroup: move config inheritance to ->css_online() and remove .broken_hierarchy marking
From: Tejun Heo @ 2012-11-20  8:30 UTC (permalink / raw)
  To: daniel.wagner-98C5kh4wR6ohFhg+JK9F0w,
	serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1353400211-5182-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

It turns out that we'll have to live with attributes which are
inherited at cgroup creation time but not affected by further updates
to the parent afterwards - such attributes are already in wide use
e.g. for cpuset.

So, there's nothing to do for netcls_cgroup for hierarchy support.
Its current behavior - inherit only during creation - is good enough.

Move config inheriting from ->css_alloc() to ->css_online() for
consistency, which doesn't change behavior at all, and remove
.broken_hierarchy marking.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 net/sched/cls_cgroup.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 8cdc18e..31f06b6 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -41,11 +41,15 @@ static struct cgroup_subsys_state *cgrp_css_alloc(struct cgroup *cgrp)
 	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
 	if (!cs)
 		return ERR_PTR(-ENOMEM);
+	return &cs->css;
+}
 
+static int cgrp_css_online(struct cgroup *cgrp)
+{
 	if (cgrp->parent)
-		cs->classid = cgrp_cls_state(cgrp->parent)->classid;
-
-	return &cs->css;
+		cgrp_cls_state(cgrp)->classid =
+			cgrp_cls_state(cgrp->parent)->classid;
+	return 0;
 }
 
 static void cgrp_css_free(struct cgroup *cgrp)
@@ -76,19 +80,11 @@ static struct cftype ss_files[] = {
 struct cgroup_subsys net_cls_subsys = {
 	.name		= "net_cls",
 	.css_alloc	= cgrp_css_alloc,
+	.css_online	= cgrp_css_online,
 	.css_free	= cgrp_css_free,
 	.subsys_id	= net_cls_subsys_id,
 	.base_cftypes	= ss_files,
 	.module		= THIS_MODULE,
-
-	/*
-	 * While net_cls cgroup has the rudimentary hierarchy support of
-	 * inheriting the parent's classid on cgroup creation, it doesn't
-	 * properly propagates config changes in ancestors to their
-	 * descendents.  A child should follow the parent's configuration
-	 * but be allowed to override it.  Fix it and remove the following.
-	 */
-	.broken_hierarchy = true,
 };
 
 struct cls_cgroup_head {
-- 
1.7.11.7

^ permalink raw reply related

* [PATCHSET REPOST v2 cgroup/for-3.8] netcls/prio_cgroup: update hierarchy support
From: Tejun Heo @ 2012-11-20  8:30 UTC (permalink / raw)
  To: daniel.wagner-98C5kh4wR6ohFhg+JK9F0w,
	serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w, nhorman-2XuSBdqkA4R54TAoqtyWWQ,
	tgraf-G/eBtMaohhA
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, lizefan-hv44wF8Li93QT0dZR+AlfA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

(Reposting w/ cc to netdev added as requested by David.  Sorry about
the noise.)

Hello, guys.

This patchset replaces the following two patchsets.

 [1] "[PATCHSET cgroup/for-3.8] netprio_cgroup: implement hierarchy support"
 [2] "[PATCHSET cgroup/for-3.8] netcls_cgroup: implement hierarchy support"

It turns out that we'll have to live with attributes which are
inherited only while creating a new cgroup - such attributes are
already in wide use in cpuset, and there doesn't seem to be much point
in trying to implement fully hierarchical behavior on all attributes
especially as it requires an otherwise-unnecessary userland behavior
change on netcls_cgroup.

This patchset drop .broken_hierarchy from netcls_cgroup while
retaining the current behavior, and implements the same behavior on
netprio_cgroup.

 0001-netcls_cgroup-move-config-inheritance-to-css_online-.patch
 0002-netprio_cgroup-simplify-write_priomap.patch
 0003-netprio_cgroup-shorten-variable-names-in-extend_netd.patch
 0004-netprio_cgroup-reimplement-priomap-expansion.patch
 0005-netprio_cgroup-use-cgroup-id-instead-of-cgroup_netpr.patch
 0006-netprio_cgroup-implement-netprio-_set-_prio-helpers.patch
 0007-netprio_cgroup-allow-nesting-and-inherit-config-on-c.patch

0001 drops .broken_hierarchy from netcls_cgroup.

0002-0006 prepare for inheritance implementation.  These patches are
identical from the previous posting[1].

0007 implements inheritance on cgroup creation.

This patchset is on top of cgroup/for-3.8 0a950f65e1 ("cgroup: add
cgroup->id").

and available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup-net-cls-prio-hierarchy

diffstat follows.

 Documentation/cgroups/net_prio.txt |    2
 include/net/netprio_cgroup.h       |   11 -
 net/core/netprio_cgroup.c          |  254 ++++++++++++++++---------------------
 net/sched/cls_cgroup.c             |   20 +-
 4 files changed, 124 insertions(+), 163 deletions(-)

Thanks.

--
tejun

[1] https://lkml.org/lkml/2012/11/16/514
[2] http://thread.gmane.org/gmane.linux.kernel.cgroups/5094

^ permalink raw reply

* Re: New ASIX USB3 product
From: Christian Riesch @ 2012-11-20  8:15 UTC (permalink / raw)
  To: Jérôme Poulin; +Cc: netdev
In-Reply-To: <CALJXSJp+m7FewpW2i6wJoFAN0oGwao4-KjWuUX0F_dcE5vsoPQ@mail.gmail.com>

Jérôme,

On 2012-11-19 20:02, Jérôme Poulin wrote:
> I just bought a gigabit ethernet USB3 adapter, it is a StarTech
> USB31000S adapter and it does not seem to be un ASIX product database
> yet.

The AX88179 device is not yet supported by the mainline asix drivers. 
You could try to use the driver from the ASIX website [1]. Or you could 
compare this driver to the code in drivers/net/usb/a* and see if one of 
these drivers would work for the AX88179 and add an entry for the 
AX88179 in usb_device_id struct in asix_devices.c.

Regards, Christian

[1] 
http://www.asix.com.tw/products.php?op=pItemdetail&PItemID=131;71;112&PLine=71

^ permalink raw reply

* Re: unable to handle paging request, arm, at aio/tcp code, only 3.6
From: Lluís Batlle i Rossell @ 2012-11-20  6:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1353360858.10798.86.camel@edumazet-glaptop>

On Mon, Nov 19, 2012 at 01:34:18PM -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> > Thanks for the report.
> > 
> > I believe this is a regression of commit
> > 35ad9b9cf7d8a2e6259a0d24022e910adb6f3489
> > (ipv6: Add helper inet6_csk_update_pmtu().)
> > 
> > I'll prepare a patch to fix this.
> 
> Please try the following fix.
> 
> Thanks !

Ok, I'm running it. Let's see if it crashes today.

Thank you!

^ permalink raw reply

* Re: [Suggestion] net/tipc: can delete checking: (if_local_len > TIPC_MAX_IF_NAME) || (if_peer_len  > TIPC_MAX_IF_NAME)
From: Chen Gang @ 2012-11-20  5:25 UTC (permalink / raw)
  To: Xue Ying; +Cc: David Miller, Shan Wei, Eric Dumazet, netdev
In-Reply-To: <50AB12EE.6050802@gmail.com>

于 2012年11月20日 13:19, Xue Ying 写道:
> Chen Gang wrote:
>> Hello David Miller:
>>
>>   at net/tipc/link.c:212,213, we can delete the check (if_local_len > TIPC_MAX_IF_NAME) and (if_peer_len  > TIPC_MAX_IF_NAME)
>>   for the total buffer length is no more than TIPC_MAX_IF_NAME, already (at line 182..186).
>>
>>   
> 
> The length of name_copy array is TIPC_MAX_LINK_NAME(i.e, 60) while
> TIPC_MAX_IF_NAME is defined to 16, why does the former is more than the
> latter?
> 
> Regards,
> Ying
> 

  It is my fault. thanks.

  Regards,

  gchen


>>   suggest to modify it, although it is minor.
>>
>> thanks.
>>
>>
>>  167 static int link_name_validate(const char *name,
>>  168                                 struct tipc_link_name *name_parts)
>>  169 {
>>  170         char name_copy[TIPC_MAX_LINK_NAME];
>>  171         char *addr_local;
>>  172         char *if_local;
>>  173         char *addr_peer;
>>  174         char *if_peer;
>>  175         char dummy;
>>  176         u32 z_local, c_local, n_local;
>>  177         u32 z_peer, c_peer, n_peer;
>>  178         u32 if_local_len;
>>  179         u32 if_peer_len;
>>  180 
>>  181         /* copy link name & ensure length is OK */
>>  182         name_copy[TIPC_MAX_LINK_NAME - 1] = 0;
>>  183         /* need above in case non-Posix strncpy() doesn't pad with nulls */
>>  184         strncpy(name_copy, name, TIPC_MAX_LINK_NAME);
>>  185         if (name_copy[TIPC_MAX_LINK_NAME - 1] != 0)
>>  186                 return 0;
>>  187 
>>  188         /* ensure all component parts of link name are present */
>>  189         addr_local = name_copy;
>>  190         if_local = strchr(addr_local, ':');
>>  191         if (if_local == NULL)
>>  192                 return 0;
>>  193         *(if_local++) = 0;
>>  194         addr_peer = strchr(if_local, '-');
>>  195         if (addr_peer == NULL)
>>  196                 return 0;
>>  197         *(addr_peer++) = 0;
>>  198         if_local_len = addr_peer - if_local;
>>  199         if_peer = strchr(addr_peer, ':');
>>  200         if (if_peer == NULL)
>>  201                 return 0;
>>  202         *(if_peer++) = 0;
>>  203         if_peer_len = strlen(if_peer) + 1;
>>  204 
>>  205         /* validate component parts of link name */
>>  206         if ((sscanf(addr_local, "%u.%u.%u%c",
>>  207                     &z_local, &c_local, &n_local, &dummy) != 3) ||
>>  208             (sscanf(addr_peer, "%u.%u.%u%c",
>>  209                     &z_peer, &c_peer, &n_peer, &dummy) != 3) ||
>>  210             (z_local > 255) || (c_local > 4095) || (n_local > 4095) ||
>>  211             (z_peer  > 255) || (c_peer  > 4095) || (n_peer  > 4095) ||
>>  212             (if_local_len <= 1) || (if_local_len > TIPC_MAX_IF_NAME) ||
>>  213             (if_peer_len  <= 1) || (if_peer_len  > TIPC_MAX_IF_NAME))
>>  214                 return 0;
>>  215 
>>  216         /* return link name components, if necessary */
>>  217         if (name_parts) {
>>  218                 name_parts->addr_local = tipc_addr(z_local, c_local, n_local);
>>  219                 strcpy(name_parts->if_local, if_local);
>>  220                 name_parts->addr_peer = tipc_addr(z_peer, c_peer, n_peer);
>>  221                 strcpy(name_parts->if_peer, if_peer);
>>  222         }
>>  223         return 1;
>>  224 }
>>
>>
>>
>>   
> 
> 
> 


-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* Re: [Suggestion] net/tipc: can delete checking: (if_local_len > TIPC_MAX_IF_NAME) || (if_peer_len  > TIPC_MAX_IF_NAME)
From: Xue Ying @ 2012-11-20  5:19 UTC (permalink / raw)
  To: Chen Gang; +Cc: David Miller, Shan Wei, Eric Dumazet, netdev
In-Reply-To: <50AB0249.20802@asianux.com>

Chen Gang wrote:
> Hello David Miller:
>
>   at net/tipc/link.c:212,213, we can delete the check (if_local_len > TIPC_MAX_IF_NAME) and (if_peer_len  > TIPC_MAX_IF_NAME)
>   for the total buffer length is no more than TIPC_MAX_IF_NAME, already (at line 182..186).
>
>   

The length of name_copy array is TIPC_MAX_LINK_NAME(i.e, 60) while
TIPC_MAX_IF_NAME is defined to 16, why does the former is more than the
latter?

Regards,
Ying

>   suggest to modify it, although it is minor.
>
> thanks.
>
>
>  167 static int link_name_validate(const char *name,
>  168                                 struct tipc_link_name *name_parts)
>  169 {
>  170         char name_copy[TIPC_MAX_LINK_NAME];
>  171         char *addr_local;
>  172         char *if_local;
>  173         char *addr_peer;
>  174         char *if_peer;
>  175         char dummy;
>  176         u32 z_local, c_local, n_local;
>  177         u32 z_peer, c_peer, n_peer;
>  178         u32 if_local_len;
>  179         u32 if_peer_len;
>  180 
>  181         /* copy link name & ensure length is OK */
>  182         name_copy[TIPC_MAX_LINK_NAME - 1] = 0;
>  183         /* need above in case non-Posix strncpy() doesn't pad with nulls */
>  184         strncpy(name_copy, name, TIPC_MAX_LINK_NAME);
>  185         if (name_copy[TIPC_MAX_LINK_NAME - 1] != 0)
>  186                 return 0;
>  187 
>  188         /* ensure all component parts of link name are present */
>  189         addr_local = name_copy;
>  190         if_local = strchr(addr_local, ':');
>  191         if (if_local == NULL)
>  192                 return 0;
>  193         *(if_local++) = 0;
>  194         addr_peer = strchr(if_local, '-');
>  195         if (addr_peer == NULL)
>  196                 return 0;
>  197         *(addr_peer++) = 0;
>  198         if_local_len = addr_peer - if_local;
>  199         if_peer = strchr(addr_peer, ':');
>  200         if (if_peer == NULL)
>  201                 return 0;
>  202         *(if_peer++) = 0;
>  203         if_peer_len = strlen(if_peer) + 1;
>  204 
>  205         /* validate component parts of link name */
>  206         if ((sscanf(addr_local, "%u.%u.%u%c",
>  207                     &z_local, &c_local, &n_local, &dummy) != 3) ||
>  208             (sscanf(addr_peer, "%u.%u.%u%c",
>  209                     &z_peer, &c_peer, &n_peer, &dummy) != 3) ||
>  210             (z_local > 255) || (c_local > 4095) || (n_local > 4095) ||
>  211             (z_peer  > 255) || (c_peer  > 4095) || (n_peer  > 4095) ||
>  212             (if_local_len <= 1) || (if_local_len > TIPC_MAX_IF_NAME) ||
>  213             (if_peer_len  <= 1) || (if_peer_len  > TIPC_MAX_IF_NAME))
>  214                 return 0;
>  215 
>  216         /* return link name components, if necessary */
>  217         if (name_parts) {
>  218                 name_parts->addr_local = tipc_addr(z_local, c_local, n_local);
>  219                 strcpy(name_parts->if_local, if_local);
>  220                 name_parts->addr_peer = tipc_addr(z_peer, c_peer, n_peer);
>  221                 strcpy(name_parts->if_peer, if_peer);
>  222         }
>  223         return 1;
>  224 }
>
>
>
>   

^ permalink raw reply

* [Suggestion] net/tipc: can delete checking: (if_local_len > TIPC_MAX_IF_NAME) || (if_peer_len  > TIPC_MAX_IF_NAME)
From: Chen Gang @ 2012-11-20  4:08 UTC (permalink / raw)
  To: David Miller; +Cc: Shan Wei, Eric Dumazet, netdev

Hello David Miller:

  at net/tipc/link.c:212,213, we can delete the check (if_local_len > TIPC_MAX_IF_NAME) and (if_peer_len  > TIPC_MAX_IF_NAME)
  for the total buffer length is no more than TIPC_MAX_IF_NAME, already (at line 182..186).

  suggest to modify it, although it is minor.

thanks.


 167 static int link_name_validate(const char *name,
 168                                 struct tipc_link_name *name_parts)
 169 {
 170         char name_copy[TIPC_MAX_LINK_NAME];
 171         char *addr_local;
 172         char *if_local;
 173         char *addr_peer;
 174         char *if_peer;
 175         char dummy;
 176         u32 z_local, c_local, n_local;
 177         u32 z_peer, c_peer, n_peer;
 178         u32 if_local_len;
 179         u32 if_peer_len;
 180 
 181         /* copy link name & ensure length is OK */
 182         name_copy[TIPC_MAX_LINK_NAME - 1] = 0;
 183         /* need above in case non-Posix strncpy() doesn't pad with nulls */
 184         strncpy(name_copy, name, TIPC_MAX_LINK_NAME);
 185         if (name_copy[TIPC_MAX_LINK_NAME - 1] != 0)
 186                 return 0;
 187 
 188         /* ensure all component parts of link name are present */
 189         addr_local = name_copy;
 190         if_local = strchr(addr_local, ':');
 191         if (if_local == NULL)
 192                 return 0;
 193         *(if_local++) = 0;
 194         addr_peer = strchr(if_local, '-');
 195         if (addr_peer == NULL)
 196                 return 0;
 197         *(addr_peer++) = 0;
 198         if_local_len = addr_peer - if_local;
 199         if_peer = strchr(addr_peer, ':');
 200         if (if_peer == NULL)
 201                 return 0;
 202         *(if_peer++) = 0;
 203         if_peer_len = strlen(if_peer) + 1;
 204 
 205         /* validate component parts of link name */
 206         if ((sscanf(addr_local, "%u.%u.%u%c",
 207                     &z_local, &c_local, &n_local, &dummy) != 3) ||
 208             (sscanf(addr_peer, "%u.%u.%u%c",
 209                     &z_peer, &c_peer, &n_peer, &dummy) != 3) ||
 210             (z_local > 255) || (c_local > 4095) || (n_local > 4095) ||
 211             (z_peer  > 255) || (c_peer  > 4095) || (n_peer  > 4095) ||
 212             (if_local_len <= 1) || (if_local_len > TIPC_MAX_IF_NAME) ||
 213             (if_peer_len  <= 1) || (if_peer_len  > TIPC_MAX_IF_NAME))
 214                 return 0;
 215 
 216         /* return link name components, if necessary */
 217         if (name_parts) {
 218                 name_parts->addr_local = tipc_addr(z_local, c_local, n_local);
 219                 strcpy(name_parts->if_local, if_local);
 220                 name_parts->addr_peer = tipc_addr(z_peer, c_peer, n_peer);
 221                 strcpy(name_parts->if_peer, if_peer);
 222         }
 223         return 1;
 224 }



-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* Re: [PATCH] bonding: rlb mode of bond should not alter ARP originating via bridge
From: Jay Vosburgh @ 2012-11-20  1:02 UTC (permalink / raw)
  To: Zheng Li; +Cc: netdev, andy, linux-kernel, davem, joe.jin
In-Reply-To: <1352972861-17577-1-git-send-email-zheng.x.li@oracle.com>

Zheng Li <zheng.x.li@oracle.com> wrote:

>ARP traffic passing through a bridge and out via the bond (when the bond is a 
>port of the bridge) should not have its source MAC address adjusted by the 
>receive load balance code in rlb_arp_xmit.

	This patch differs from prior versions in that it does more than
what's described here; it also disables the receive load balance logic
for any ARPs (request or reply) that are passing through the bond (not
of local origin).  For ARP replies, that's mostly harmless, as the ARPs
passing through will simply always be sent from one particular slave
(the active slave) instead of being balanced.

	For ARP requests, though, they are already always sent via the
active slave, but there is also some logic in rlb_arp_xmit to limit the
side effects from the broadcast ARP, in particular this part:

		/* The ARP reply packets must be delayed so that
		 * they can cancel out the influence of the ARP request.
		 */
		bond->alb_info.rlb_update_delay_counter = RLB_UPDATE_DELAY;

		/* arp requests are broadcast and are sent on the primary
		 * the arp request will collapse all clients on the subnet to
		 * the primary slave. We must register these clients to be
		 * updated with their assigned mac.
		 */
		rlb_req_update_subnet_clients(bond, arp->ip_src);

	that arranges for clients to be given ARP updates for their
slave assignments (which may change to the active slave, due to the ARP
broadcast being sent via the active slave).

	I think the ARP reply side of this is fine (and is what is
described in teh changelog), but the ARP request behavior change is new
with this version.

	Since prior versions of the patch didn't cause this code to be
skipped, is this change intentional?

	Did you check to see if the above logic is necessary for ARP
requests passing through via a bridge to prevent peers from "stacking"
(in terms of load balance assignment) on the active slave due to bridged
ARP traffic?

	-J

>Signed-off-by: Zheng Li <zheng.x.li@oracle.com>
>Cc: Jay Vosburgh <fubar@us.ibm.com>
>Cc: Andy Gospodarek <andy@greyhouse.net>
>Cc: "David S. Miller" <davem@davemloft.net>
>
>---
> drivers/net/bonding/bond_alb.c |    6 ++++++
> drivers/net/bonding/bonding.h  |   13 +++++++++++++
> 2 files changed, 19 insertions(+), 0 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index e15cc11..75f6f0d 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -694,6 +694,12 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, struct bonding *bond)
> 	struct arp_pkt *arp = arp_pkt(skb);
> 	struct slave *tx_slave = NULL;
>
>+	/* Only modify ARP's MAC if it originates locally;
>+	 * don't change ARPs arriving via a bridge.
>+	 */
>+	if (!bond_slave_has_mac(bond, arp->mac_src))
>+		return NULL;
>+
> 	if (arp->op_code == htons(ARPOP_REPLY)) {
> 		/* the arp must be sent on the selected
> 		* rx channel
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index f8af2fc..6dded56 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -22,6 +22,7 @@
> #include <linux/in6.h>
> #include <linux/netpoll.h>
> #include <linux/inetdevice.h>
>+#include <linux/etherdevice.h>
> #include "bond_3ad.h"
> #include "bond_alb.h"
>
>@@ -450,6 +451,18 @@ static inline void bond_destroy_proc_dir(struct bond_net *bn)
> }
> #endif
>
>+static inline struct slave *bond_slave_has_mac(struct bonding *bond,
>+					       const u8 *mac)
>+{
>+	int i = 0;
>+	struct slave *tmp;
>+
>+	bond_for_each_slave(bond, tmp, i)
>+		if (ether_addr_equal_64bits(mac, tmp->dev->dev_addr))
>+			return tmp;
>+
>+	return NULL;
>+}
>
> /* exported from bond_main.c */
> extern int bond_net_id;
>-- 
>1.7.6.5

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH net-next 0/3] Remove inet_lro remnants
From: David Miller @ 2012-11-20  0:14 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, eric.dumazet
In-Reply-To: <1353105635.2743.49.camel@bwh-desktop.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Fri, 16 Nov 2012 22:40:35 +0000

> Eric's comments on LRO/GRO conversion prompted me to check what was
> still using inet_lro.  Not very many drivers, but there are a few
> unnecessary references outside the code.

Series applied, thanks Ben.

^ permalink raw reply

* Re: [PATCH] netfilter: Remove the spurious \ in __ip_vs_lblc_init
From: David Miller @ 2012-11-20  0:12 UTC (permalink / raw)
  To: horms; +Cc: ebiederm, netdev, pablo
In-Reply-To: <20121120000649.GC13829@verge.net.au>

From: Simon Horman <horms@verge.net.au>
Date: Tue, 20 Nov 2012 09:06:49 +0900

> On Mon, Nov 19, 2012 at 07:26:30AM -0800, Eric W. Biederman wrote:
>> 
>> In (464dc801c76a net: Don't export sysctls to unprivileged users)
>> I typoed and introduced a spurious backslash.  Delete it.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Signed-off-by: Simon Horman <horms@verge.net.au>
> 
> Pablo, can you take this directly through your tree?

I already applied this to net-next yesterday.

^ permalink raw reply

* Re: [PATCH] netfilter: Remove the spurious \ in __ip_vs_lblc_init
From: Simon Horman @ 2012-11-20  0:12 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: David Miller, netdev, Pablo Neira Ayuso
In-Reply-To: <20121120000649.GC13829@verge.net.au>

On Tue, Nov 20, 2012 at 09:06:49AM +0900, Simon Horman wrote:
> On Mon, Nov 19, 2012 at 07:26:30AM -0800, Eric W. Biederman wrote:
> > 
> > In (464dc801c76a net: Don't export sysctls to unprivileged users)
> > I typoed and introduced a spurious backslash.  Delete it.
> > 
> > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Signed-off-by: Simon Horman <horms@verge.net.au>
> 
> Pablo, can you take this directly through your tree?

Scratch that, I see that David has already applied this change.

^ permalink raw reply

* Re: [PATCH] net: remove unnecessary wireless includes
From: David Miller @ 2012-11-20  0:10 UTC (permalink / raw)
  To: johannes; +Cc: linux-wireless, netdev, johannes.berg
In-Reply-To: <1353099561-14339-1-git-send-email-johannes@sipsolutions.net>

From: Johannes Berg <johannes@sipsolutions.net>
Date: Fri, 16 Nov 2012 21:59:21 +0100

> From: Johannes Berg <johannes.berg@intel.com>
> 
> The wireless and wext includes in net-sysfs.c aren't
> needed, so remove them.
> 
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Applied, thanks Johannes.

^ permalink raw reply

* Re: [PATCH net-next] gro: Handle inline VLAN tags
From: David Miller @ 2012-11-20  0:10 UTC (permalink / raw)
  To: eric.dumazet; +Cc: bhutchings, netdev, linux-net-drivers, gallatin, herbert
In-Reply-To: <1353114559.10798.66.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 16 Nov 2012 17:09:19 -0800

> On Sat, 2012-11-17 at 00:32 +0000, Ben Hutchings wrote:
>> On Fri, 2012-11-16 at 16:16 -0800, Eric Dumazet wrote:
>> > On Sat, 2012-11-17 at 00:00 +0000, Ben Hutchings wrote:
>> > 
>> > > I'm not sure what you mean by this.  Is your point that the
>> > > copy-on-write is never needed?  It is still possible for pskb_may_pull()
>> > > to fail.
>> > > 
>> > 
>> > A packet sniffer should have a copy of bad frames, even if dropped later
>> > in our stacks.
>> > 
>> > GRO layer is not allowed to drop a frame, even if not 'correct'.
>> 
>> What do you think the accelerated hardware does with frames that have a
>> truncated VLAN tag?
> 
> The hardware should send us the frame, exactly like when RX checksum is
> wrong.

I agree with Eric, and therefore will not apply this patch.

^ permalink raw reply

* Re: [PATCH net-next 1/4] ipip: allow to deactivate the creation of fb dev
From: David Miller @ 2012-11-20  0:07 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: shemminger, netdev
In-Reply-To: <50A66DD3.20807@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Fri, 16 Nov 2012 17:46:11 +0100

> By default, the fb device is created, so there is no change if you
> don't set explicitly setup_fb to 0.


Nicolas, this idea is contentous to me too.

Why not put this aside and submit the other parts of your
patch set on their own, since those looked fine to me?

Thanks.

^ permalink raw reply

* Re: [PATCH] netfilter: Remove the spurious \ in __ip_vs_lblc_init
From: Simon Horman @ 2012-11-20  0:06 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: David Miller, netdev, Pablo Neira Ayuso
In-Reply-To: <87zk2d7gwp.fsf@xmission.com>

On Mon, Nov 19, 2012 at 07:26:30AM -0800, Eric W. Biederman wrote:
> 
> In (464dc801c76a net: Don't export sysctls to unprivileged users)
> I typoed and introduced a spurious backslash.  Delete it.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Signed-off-by: Simon Horman <horms@verge.net.au>

Pablo, can you take this directly through your tree?

> ---
>  net/netfilter/ipvs/ip_vs_lblc.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
> index d742aa9..fdd89b9 100644
> --- a/net/netfilter/ipvs/ip_vs_lblc.c
> +++ b/net/netfilter/ipvs/ip_vs_lblc.c
> @@ -574,7 +574,7 @@ static int __net_init __ip_vs_lblc_init(struct net *net)
>  		register_net_sysctl(net, "net/ipv4/vs", ipvs->lblc_ctl_table);
>  	if (!ipvs->lblc_ctl_header) {
>  		if (!net_eq(net, &init_net))
> -			kfree(ipvs->lblc_ctl_table);\
> +			kfree(ipvs->lblc_ctl_table);
>  		return -ENOMEM;
>  	}
>  
> -- 
> 1.7.5.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH net-next 1/4] ipip: allow to deactivate the creation of fb dev
From: David Miller @ 2012-11-20  0:06 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev
In-Reply-To: <1353082456-21234-2-git-send-email-nicolas.dichtel@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Fri, 16 Nov 2012 17:14:13 +0100

> Now that tunnels can be configured via rtnetlink, this device is not mandatory.
> The default is conservative.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

I'm not too thrilled about this change, mostly because of my dislike of
module parameters in general.

But in this case there appears to be real bugs in the two sets of changes
where you add this setup_fb thing.

> @@ -1057,7 +1066,8 @@ static void __net_exit ipip_exit_net(struct net *net)
>  
>  	rtnl_lock();
>  	ipip_destroy_tunnels(ipn, &list);
> -	unregister_netdevice_queue(ipn->fb_tunnel_dev, &list);
> +	if (setup_fb)
> +		unregister_netdevice_queue(ipn->fb_tunnel_dev, &list);
>  	unregister_netdevice_many(&list);
>  	rtnl_unlock();
>  }

Users can modify module parameter values via sysfs after the module
is loaded, so you need a more internal and protected state to use
to decide whether you really need to unregister the thing or not.

But to me it's just symptomatic of what a bad idea this is in the
first place.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox