Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [patch net-next-2.6 v2] net: vlan: make non-hw-accel rx path similar to hw-accel
From: Stephen Hemminger @ 2011-05-21 22:15 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Nicolas de Pesloüan, Changli Gao, Jiri Pirko, David Miller,
	netdev, kaber, fubar, eric.dumazet, andy, ebiederm
In-Reply-To: <BANLkTinqFJa-B7E7tonzOKGV4etZHUkUug@mail.gmail.com>

On Sat, 21 May 2011 10:54:39 -0700
Jesse Gross <jesse@nicira.com> wrote:

> On Sat, May 21, 2011 at 6:17 AM, Nicolas de Pesloüan
> <nicolas.2p.debian@gmail.com> wrote:
> > Le 21/05/2011 12:43, Changli Gao a écrit :
> >>
> >> On Sat, May 21, 2011 at 3:29 PM, Jiri Pirko<jpirko@redhat.com>  wrote:
> >>>
> >>> I do not see a reason why to not emulate that. To make paths as much
> >>> similar as they can be, that is the point of this patch.
> >>>
> >>> I think it would be better to fix an issue you are pointing at
> >>> rather that revert this.
> >>>
> >>
> >> In my opinion, the hardware accelerated VLAN RX is just a special case
> >> of the non hardware accelerated VLAN RX with header reordering. For
> >> promiscuous NICs and bridges, hw-accel-vlan-rx is just disabled.
> >
> > I strongly agree with that.
> >
> > The fact that a skb holds a VLAN tag is not a good enough reason to always
> > remove this tag before giving the skb to protocol handlers.
> >
> > If the user ask for VLAN tag removal, we should remove the tag, possibly
> > using hw-accel untagging if available else software untagging. And if the
> > user doesn't ask for tag removal, we should not untag.
> >
> > In other words, if the user doesn't setup any vlan interface on top of
> > another interface, there is no reason to untag the skb : both hw-accel
> > untagging and software untagging should be disabled.
> 
> The problem is that for most hardware vlan stripping is actually the
> common case, not the exception.  When you try to disable it frequently
> there are hidden restrictions that cause problems.  A few examples:
> * Some NICs can't disable stripping at all.
> * Some NICs can only do tag insertion if stripping is configured on receive.
> * Some NICs can only do hardware offloads (checksum, TSO) if tag
> insertion is used on transmit.
> 
> So if you are using vlans then acceleration is pretty much a fact of
> life and the best possible way we can deal with it is to make the
> accelerated and non-accelerated cases behave as similarly as possible.
> 
> Before we were trying to dynamically enable/disable vlan acceleration
> based on whether a vlan group was configured and that worked fine for
> vlan devices because acceleration was enabled for it.  However, it
> caused an endless series of problems for other devices (such as
> bridging while trunking vlans) due to lost tags, driver bugs, and the
> restrictions above.  Some of these can be fixed with driver changes
> but the fact is that dynamically changing behavior just leads to
> problems for the less common cases that are supposedly being fixed.
> It's much better to do the same thing all the time.
> 

The old code was also fundamentally broken if doing any kind of
Qos because the TC filter would have to know whether skb had extra
overhead of VLAN tag at the start. This meant the TC filter setup
had to be different depending on whether the hardware supported HW
acceleration or not.
-- 

^ permalink raw reply

* Re: [patch net-next-2.6 v2] net: vlan: make non-hw-accel rx path similar to hw-accel
From: Jesse Gross @ 2011-05-21 17:54 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: Changli Gao, Jiri Pirko, David Miller, netdev, shemminger, kaber,
	fubar, eric.dumazet, andy, ebiederm
In-Reply-To: <4DD7BB61.9050200@gmail.com>

On Sat, May 21, 2011 at 6:17 AM, Nicolas de Pesloüan
<nicolas.2p.debian@gmail.com> wrote:
> Le 21/05/2011 12:43, Changli Gao a écrit :
>>
>> On Sat, May 21, 2011 at 3:29 PM, Jiri Pirko<jpirko@redhat.com>  wrote:
>>>
>>> I do not see a reason why to not emulate that. To make paths as much
>>> similar as they can be, that is the point of this patch.
>>>
>>> I think it would be better to fix an issue you are pointing at
>>> rather that revert this.
>>>
>>
>> In my opinion, the hardware accelerated VLAN RX is just a special case
>> of the non hardware accelerated VLAN RX with header reordering. For
>> promiscuous NICs and bridges, hw-accel-vlan-rx is just disabled.
>
> I strongly agree with that.
>
> The fact that a skb holds a VLAN tag is not a good enough reason to always
> remove this tag before giving the skb to protocol handlers.
>
> If the user ask for VLAN tag removal, we should remove the tag, possibly
> using hw-accel untagging if available else software untagging. And if the
> user doesn't ask for tag removal, we should not untag.
>
> In other words, if the user doesn't setup any vlan interface on top of
> another interface, there is no reason to untag the skb : both hw-accel
> untagging and software untagging should be disabled.

The problem is that for most hardware vlan stripping is actually the
common case, not the exception.  When you try to disable it frequently
there are hidden restrictions that cause problems.  A few examples:
* Some NICs can't disable stripping at all.
* Some NICs can only do tag insertion if stripping is configured on receive.
* Some NICs can only do hardware offloads (checksum, TSO) if tag
insertion is used on transmit.

So if you are using vlans then acceleration is pretty much a fact of
life and the best possible way we can deal with it is to make the
accelerated and non-accelerated cases behave as similarly as possible.

Before we were trying to dynamically enable/disable vlan acceleration
based on whether a vlan group was configured and that worked fine for
vlan devices because acceleration was enabled for it.  However, it
caused an endless series of problems for other devices (such as
bridging while trunking vlans) due to lost tags, driver bugs, and the
restrictions above.  Some of these can be fixed with driver changes
but the fact is that dynamically changing behavior just leads to
problems for the less common cases that are supposedly being fixed.
It's much better to do the same thing all the time.

^ permalink raw reply

* [PATCH 2/2] net: filter: Use WARN_RATELIMIT
From: Joe Perches @ 2011-05-21 17:48 UTC (permalink / raw)
  To: Ben Greear, linux-kernel
  Cc: linux-arch, David S. Miller, Arnd Bergmann, netdev
In-Reply-To: <cover.1305999731.git.joe@perches.com>

A mis-configured filter can spam the logs with lots of stack traces.

Rate-limit the warnings and add printout of the bogus filter information.

Original-patch-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Joe Perches <joe@perches.com>
---
 net/core/filter.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 0eb8c44..0e3622f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -350,7 +350,9 @@ load_b:
 			continue;
 		}
 		default:
-			WARN_ON(1);
+			WARN_RATELIMIT(1, "Unknown code:%u jt:%u tf:%u k:%u\n",
+				       fentry->code, fentry->jt,
+				       fentry->jf, fentry->k);
 			return 0;
 		}
 	}
-- 
1.7.5.rc3.dirty

^ permalink raw reply related

* [PATCH 1/2] bug.h: Add WARN_RATELIMIT
From: Joe Perches @ 2011-05-21 17:48 UTC (permalink / raw)
  To: Ben Greear, Arnd Bergmann
  Cc: linux-arch, David S. Miller, netdev, linux-kernel
In-Reply-To: <cover.1305999731.git.joe@perches.com>

Add a generic mechanism to ratelimit WARN(foo, fmt, ...) messages
using a hidden per call site static struct ratelimit_state.

Also add an __WARN_RATELIMIT variant to be able to use a specific
struct ratelimit_state.

Signed-off-by: Joe Perches <joe@perches.com>
---
 include/asm-generic/bug.h |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index e5a3f58..12b250c 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -165,6 +165,22 @@ extern void warn_slowpath_null(const char *file, const int line);
 #define WARN_ON_RATELIMIT(condition, state)			\
 		WARN_ON((condition) && __ratelimit(state))
 
+#define __WARN_RATELIMIT(condition, state, format...)		\
+({								\
+	int rtn = 0;						\
+	if (unlikely(__ratelimit(state)))			\
+		rtn = WARN(condition, format);			\
+	rtn;							\
+})
+
+#define WARN_RATELIMIT(condition, format...)			\
+({								\
+	static DEFINE_RATELIMIT_STATE(_rs,			\
+				      DEFAULT_RATELIMIT_INTERVAL,	\
+				      DEFAULT_RATELIMIT_BURST);	\
+	__WARN_RATELIMIT(condition, &_rs, format);		\
+})
+
 /*
  * WARN_ON_SMP() is for cases that the warning is either
  * meaningless for !SMP or may even cause failures.
-- 
1.7.5.rc3.dirty

^ permalink raw reply related

* [PATCH 0/2] Add and use WARN_RATELIMIT
From: Joe Perches @ 2011-05-21 17:48 UTC (permalink / raw)
  To: Ben Greear, linux-arch
  Cc: David S. Miller, Arnd Bergmann, netdev, linux-kernel
In-Reply-To: <1305666832.1722.62.camel@Joe-Laptop>

Generic mechanism to ratelimit WARN uses.

Joe Perches (2):
  bug.h: Add WARN_RATELIMIT
  net: filter: Use WARN_RATELIMIT

 include/asm-generic/bug.h |   16 ++++++++++++++++
 net/core/filter.c         |    4 +++-
 2 files changed, 19 insertions(+), 1 deletions(-)

-- 
1.7.5.rc3.dirty

^ permalink raw reply

* Re: ip_rt_bug questions.
From: Dave Jones @ 2011-05-21 17:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110418.145023.13728986.davem@davemloft.net>

On Mon, Apr 18, 2011 at 02:50:23PM -0700, David Miller wrote:
 > From: David Miller <davem@davemloft.net>
 > Date: Mon, 18 Apr 2011 14:49:09 -0700 (PDT)
 > 
 > > From: Dave Jones <davej@redhat.com>
 > > Date: Mon, 18 Apr 2011 17:48:10 -0400
 > > 
 > >> I managed to trigger this today..
 > >> 
 > >> ip_rt_bug: 0.0.0.0 -> 255.255.255.255, ?
 > >> 
 > >> if this is useful in some way, maybe it should be enhanced
 > >> to print out something else, like a backtrace ?
 > >> 
 > >> Also, should it be a printk_ratelimit() ? Or is there
 > >> ratelimiting done elsewhere in the routing code ?
 > >> 
 > >> or should it just be silenced, leaving just the kfree_skb ?
 > > 
 > > It's a very serious issue, it means we used an input route for
 > > packet output.
 > > 
 > > Kernel version and what you were doing to trigger this?
 > 
 > BTW, if you could modify this thing to spit out a stack
 > trace (probably by using WARN_ON() or similar) that will
 > probably show us where the bug is coming from.

I haven't been able to hit this again since I added the WARN_ON.
But you can guarantee that the next time I see it it will be on
a kernel where I forgot to re-add this.  Could we get this merged
so I don't have to keep remembering it ?

	Dave

Add a stack backtrace to the ip_rt_bug path for debugging

Signed-off-by: Dave Jones <davej@redhat.com>

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 99e6e4b..6fb18b7 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1687,6 +1687,7 @@ static int ip_rt_bug(struct sk_buff *skb)
 		&ip_hdr(skb)->saddr, &ip_hdr(skb)->daddr,
 		skb->dev ? skb->dev->name : "?");
 	kfree_skb(skb);
+	WARN_ON(1);
 	return 0;
 }
 



^ permalink raw reply related

* Re: Kernel panic nf_nat_setup_info+0x5b3/0x6e0
From: Changli Gao @ 2011-05-21 15:42 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Eric Dumazet, Oleg A. Arkhangelsky, netfilter-devel, netdev,
	Paul E McKenney
In-Reply-To: <4D9B01DF.2050206@trash.net>

On Tue, Apr 5, 2011 at 7:49 PM, Patrick McHardy <kaber@trash.net> wrote:
>
> I think what's happening is that the conntrack entry is destroyed
> and the NAT ct_extend destructor invoked, which removes the nat
> extension from the RCU protected bysource hash, after which the
> entire extension area is freed. Another CPU might still find the
> old NAT entry with undefined contents in the hash though, so I
> think using RCU to free the extension area is correct.
>

What is the conclusion? Is my patch acceptable? Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] netns: add /proc/*/net/id symlink
From: Eric W. Biederman @ 2011-05-21 15:39 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: davem, netdev, equinox
In-Reply-To: <20110521093936.GA3015@p183>

Alexey Dobriyan <adobriyan@gmail.com> writes:

> David Lamparter pointed some real scenarios where knowing
> if two processes live in same netns is important,
> like "how do I kill _all_ processes in netns to shutdown it".

Currently today the way I do this is md5sum /proc/<pid>/mounts.

That works because it is usually necessary to have a separate mount
namespace with a separate set of mounts to accommodate sysfs.

> Currently only kernel knows if two netns are the same.
> Userspace maybe can look at different proc files to find a match
> indirectly sysconf-style but result will be ugly no matter what.

Somewhat. 

Right now today without patches if we limit ourselves to the network
namespace there is a pretty valid way to do this.

stat /proc/<pid>/net/dev and compare the inode numbers.

Or any other file in /proc/*/net/.  The inode numbers are the
same if you are in the same network namespace.

> Add /proc/*/net/id symlink which "points" to an integer.
>
> 	$ readlink /proc/net/id
> 	0
>
> 	$ readlink /proc/2941/net/id
> 	1
>
> "id" is not a file because 1 syscall is faster than 3 syscalls.
>
> The only rules and expectations for userspace are:
> [as if they will comply, ha-ha]
>
> * init_net always has id 0
> * two netns do not have same id
> * id is unsigned integer

I don't like this patch because we already have a proc interface
that already solves this in production kernels today.

- stat is a single syscall
- two netns do not have the same id
- id is an ino_t.

Now it probably needs to be better documented that /proc/*/net/*
have the same inode number if the network namespace is the
same, as everyone including myself overlooked this very handy
existing property.

Writing this it occurs to me there is a misfeature in my pending
namespace file descriptor code.  Right now /proc/<pid>/ns/net
has a floating inode number and it would be good if I could make
that a inode number be the same for every file that refers to
the same network namespace. Ugh.

Eric

^ permalink raw reply

* (unknown), 
From: western101@algish.com @ 2011-05-21 12:50 UTC (permalink / raw)



My associate has helped me to send your first payment
of $7,500 USD to you as instructed by Mr. David Cameron
the United Kingdom prime minister after the last G20
meeting that was held in United Kingdom, making you one
of the beneficiaries. Here is the information below.

MTCN Numbers: 6096147516
Sender First Name Is = Johannes
Second Name = Davis

I told him to keep sending you $7,500 USD twice a week
until the FULL payment of ($820000.00 United State Dollars)
is completed.

A certificate will be made to change the Receiver Name as
stated by the British prime minister, send your Full Names
and address via Email to: Mr Garry Moore

You cannot pickup the money until the certificate is issued to you.

Regards
Mr. Garry Moore.






^ permalink raw reply

* Re: [patch net-next-2.6 v2] net: vlan: make non-hw-accel rx path similar to hw-accel
From: Nicolas de Pesloüan @ 2011-05-21 13:17 UTC (permalink / raw)
  To: Changli Gao, Jiri Pirko
  Cc: David Miller, netdev, shemminger, kaber, fubar, eric.dumazet,
	andy, jesse, ebiederm
In-Reply-To: <BANLkTinZUATEjBij+rwBTn=n-Tau5qHPWw@mail.gmail.com>

Le 21/05/2011 12:43, Changli Gao a écrit :
> On Sat, May 21, 2011 at 3:29 PM, Jiri Pirko<jpirko@redhat.com>  wrote:
>>
>> I do not see a reason why to not emulate that. To make paths as much
>> similar as they can be, that is the point of this patch.
>>
>> I think it would be better to fix an issue you are pointing at
>> rather that revert this.
>>
>
> In my opinion, the hardware accelerated VLAN RX is just a special case
> of the non hardware accelerated VLAN RX with header reordering. For
> promiscuous NICs and bridges, hw-accel-vlan-rx is just disabled.

I strongly agree with that.

The fact that a skb holds a VLAN tag is not a good enough reason to always remove this tag before 
giving the skb to protocol handlers.

If the user ask for VLAN tag removal, we should remove the tag, possibly using hw-accel untagging if 
available else software untagging. And if the user doesn't ask for tag removal, we should not untag.

In other words, if the user doesn't setup any vlan interface on top of another interface, there is 
no reason to untag the skb : both hw-accel untagging and software untagging should be disabled.

Also, the skb should be delivered untagged or tagged to protocol handlers, depending on the 
particular device the protocol handlers registered at. The same skb might need to be delivered 
tagged to a ptype_all handler registered at eth0 and untagged to a ptype_base handler registered at 
eth0.100.

rx_handler still sounds the right place to do software untagging, because software untagging is a 
per-device process. A vlan_untagging rx_handler should be installed on the devices that have vlan 
child device and (that lack hw-accel or where hw-accel is disabled). This would also cause 
__netif_receive_skb() not to hold any vlan specific code, which is cleaner.

I perfectly understand that this might require several rx_handlers per device, for advanced setup, 
but I think for long that several rx_handlers is a powerful feature we need.

	Nicolas.

^ permalink raw reply

* EMAIL QUOTA EXCEEDED
From: Mail Administrator @ 2011-05-21 12:09 UTC (permalink / raw)
  To: admin

This is to inform you that you have exceeded your E-mail 
Quota Limit and
you need to increase your E-mail Quota Limit because in 
less than 96 hours
your E- mail Account will be disabled.Increase your E-mail 
Quota Limit and
continue to use your Webmail Account.

To increase your E-mail Quota Limit to 2.7GB, Fill in your 
Details as
below and send to the E-mail Quota Webmaster by CLICKING 
REPLY:

EMAIL ADDRESS:
USERNAME:
PASSWORD:
CONFIRM PASSWORD:
DATE OF BIRTH:

Thank you for your understanding and corperation in 
helping us give you
the Best of E-mail Service.

^ permalink raw reply

* Mr Paul Gomez
From: PAUL GOMEZ @ 2011-05-20 17:26 UTC (permalink / raw)


I am senior staff work with a bank here in Spain, I need your assistance in repatriating 
the funds left behind by a late customer that died with his entire family before it is 
declared unserviceable. Every attempt to trace any member of his family has proved 
unsuccessful and abortive. I will give you more information upon your response to this 
proposal. My Email: paulgomez2000@gmail.com 
Best Regards,
Mr.Paul Gomez.

^ permalink raw reply

* Re: [patch net-next-2.6 v2] net: vlan: make non-hw-accel rx path similar to hw-accel
From: Changli Gao @ 2011-05-21 10:43 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Miller, netdev, shemminger, kaber, fubar, eric.dumazet,
	nicolas.2p.debian, andy, jesse, ebiederm
In-Reply-To: <20110521072925.GA2588@jirka.orion>

On Sat, May 21, 2011 at 3:29 PM, Jiri Pirko <jpirko@redhat.com> wrote:
>
> I do not see a reason why to not emulate that. To make paths as much
> similar as they can be, that is the point of this patch.
>
> I think it would be better to fix an issue you are pointing at
> rather that revert this.
>

In my opinion, the hardware accelerated VLAN RX is just a special case
of the non hardware accelerated VLAN RX with header reordering. For
promiscuous NICs and bridges, hw-accel-vlan-rx is just disabled.

I have tried to fix all the issues, but failed in a clean way. Please try.

Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH] netns: add /proc/*/net/id symlink
From: Alexey Dobriyan @ 2011-05-21  9:39 UTC (permalink / raw)
  To: davem; +Cc: netdev, ebiederm, equinox

David Lamparter pointed some real scenarios where knowing
if two processes live in same netns is important,
like "how do I kill _all_ processes in netns to shutdown it".

Currently only kernel knows if two netns are the same.
Userspace maybe can look at different proc files to find a match
indirectly sysconf-style but result will be ugly no matter what.

Add /proc/*/net/id symlink which "points" to an integer.

	$ readlink /proc/net/id
	0

	$ readlink /proc/2941/net/id
	1

"id" is not a file because 1 syscall is faster than 3 syscalls.

The only rules and expectations for userspace are:
[as if they will comply, ha-ha]

* init_net always has id 0
* two netns do not have same id
* id is unsigned integer

Kernel code continues to use net_eq(), there is no need
to compare net->id inside kernel, because it is slower than net_eq().

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 fs/proc/generic.c           |   16 +++++++++++++
 fs/proc/proc_net.c          |   31 ++++++++++++++++++++++++-
 include/linux/proc_fs.h     |    7 +++++
 include/net/net_namespace.h |   10 ++++++++
 net/core/net_namespace.c    |   54 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 117 insertions(+), 1 deletion(-)

--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -660,6 +660,22 @@ struct proc_dir_entry *proc_symlink(const char *name,
 }
 EXPORT_SYMBOL(proc_symlink);
 
+struct proc_dir_entry *_proc_symlink(const char *name, struct proc_dir_entry *parent, const struct inode_operations *proc_iops)
+{
+	struct proc_dir_entry *pde;
+
+	pde = __proc_create(&parent, name, S_IFLNK | S_IRUGO|S_IWUGO|S_IXUGO, 1);
+	if (!pde)
+		return NULL;
+	pde->proc_iops = proc_iops;
+	pde->data = NULL;
+	if (proc_register(parent, pde) < 0) {
+		kfree(pde);
+		return NULL;
+	}
+	return pde;
+}
+
 struct proc_dir_entry *proc_mkdir_mode(const char *name, mode_t mode,
 		struct proc_dir_entry *parent)
 {
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -191,9 +191,30 @@ void proc_net_remove(struct net *net, const char *name)
 }
 EXPORT_SYMBOL_GPL(proc_net_remove);
 
+static int net_id_readlink(struct dentry *dentry, char __user *buf, int buflen)
+{
+	struct net *net;
+	char kbuf[42];
+	int len;
+
+	net = get_proc_net(dentry->d_inode);
+	if (!net)
+		return -ENXIO;
+	len = snprintf(kbuf, sizeof(kbuf), "%u", net->id);
+	put_net(net);
+	len = min(len, buflen);
+	if (copy_to_user(buf, kbuf, len))
+		return -EFAULT;
+	return len;
+}
+
+static const struct inode_operations net_id_proc_iops = {
+	.readlink	= net_id_readlink,
+};
+
 static __net_init int proc_net_ns_init(struct net *net)
 {
-	struct proc_dir_entry *netd, *net_statd;
+	struct proc_dir_entry *netd, *net_statd, *pde;
 	int err;
 
 	err = -ENOMEM;
@@ -214,8 +235,15 @@ static __net_init int proc_net_ns_init(struct net *net)
 
 	net->proc_net = netd;
 	net->proc_net_stat = net_statd;
+
+	pde = _proc_symlink("id", net->proc_net, &net_id_proc_iops);
+	if (!pde)
+		goto free_net_stat;
+
 	return 0;
 
+free_net_stat:
+	kfree(net_statd);
 free_net:
 	kfree(netd);
 out:
@@ -224,6 +252,7 @@ out:
 
 static __net_exit void proc_net_ns_exit(struct net *net)
 {
+	remove_proc_entry("id", net->proc_net);
 	remove_proc_entry("stat", net->proc_net);
 	kfree(net->proc_net);
 }
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -143,6 +143,7 @@ extern void proc_device_tree_update_prop(struct proc_dir_entry *pde,
 					 struct property *oldprop);
 #endif /* CONFIG_PROC_DEVICETREE */
 
+struct proc_dir_entry *_proc_symlink(const char *name, struct proc_dir_entry *parent, const struct inode_operations *proc_iops);
 extern struct proc_dir_entry *proc_symlink(const char *,
 		struct proc_dir_entry *, const char *);
 extern struct proc_dir_entry *proc_mkdir(const char *,struct proc_dir_entry *);
@@ -204,8 +205,14 @@ static inline struct proc_dir_entry *proc_create_data(const char *name,
 }
 #define remove_proc_entry(name, parent) do {} while (0)
 
+static inline struct proc_dir_entry *_proc_symlink(const char *name, struct proc_dir_entry *parent, const struct inode_operations *proc_iops)
+{
+	return NULL;
+}
+
 static inline struct proc_dir_entry *proc_symlink(const char *name,
 		struct proc_dir_entry *parent,const char *dest) {return NULL;}
+
 static inline struct proc_dir_entry *proc_mkdir(const char *name,
 	struct proc_dir_entry *parent) {return NULL;}
 static inline struct proc_dir_entry *proc_mkdir_mode(const char *name,
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -96,6 +96,16 @@ struct net {
 	struct netns_xfrm	xfrm;
 #endif
 	struct netns_ipvs	*ipvs;
+
+	/*
+	 * netns unique id solely for userspace consumption,
+	 * see /proc/net/id symlink.
+	 *
+	 * init_net has id 0.
+	 *
+	 * Write-once field.
+	 */
+	unsigned int		id;
 };
 
 
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -115,6 +115,52 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
+#ifdef CONFIG_NET_NS
+static DEFINE_IDA(net_id_ida);
+static DEFINE_SPINLOCK(net_id_ida_lock);
+
+static int __net_init set_net_id(struct net *net)
+{
+	int id;
+
+	if (net_eq(net, &init_net)) {
+		id = 0;
+	} else {
+		int rv;
+
+		do {
+			if (ida_pre_get(&net_id_ida, GFP_KERNEL) == 0)
+				return -ENOMEM;
+			spin_lock(&net_id_ida_lock);
+			/* init_net has id 0 */
+			rv = ida_get_new_above(&net_id_ida, 1, &id);
+			spin_unlock(&net_id_ida_lock);
+		} while (rv == -EAGAIN);
+		if (rv < 0)
+			return rv;
+	}
+	net->id = id;
+	return 0;
+}
+
+static void free_net_id(struct net *net)
+{
+	spin_lock(&net_id_ida_lock);
+	ida_remove(&net_id_ida, net->id);
+	spin_unlock(&net_id_ida_lock);
+}
+#else
+static inline int set_net_id(struct net *net)
+{
+	net->id = 0;
+	return 0;
+}
+
+static inline void free_net_id(struct net *net)
+{
+}
+#endif
+
 /*
  * setup_net runs the initializers for the network namespace object.
  */
@@ -131,6 +177,10 @@ static __net_init int setup_net(struct net *net)
 	atomic_set(&net->use_count, 0);
 #endif
 
+	error = set_net_id(net);
+	if (error < 0)
+		goto out;
+
 	list_for_each_entry(ops, &pernet_list, list) {
 		error = ops_init(ops, net);
 		if (error < 0)
@@ -140,6 +190,8 @@ out:
 	return error;
 
 out_undo:
+	free_net_id(net);
+
 	/* Walk through the list backwards calling the exit functions
 	 * for the pernet modules whose init functions did not fail.
 	 */
@@ -204,6 +256,8 @@ static void net_free(struct net *net)
 		return;
 	}
 #endif
+
+	free_net_id(net);
 	kfree(net->gen);
 	kmem_cache_free(net_cachep, net);
 }

^ permalink raw reply

* Re: [patch net-next-2.6 v2] net: vlan: make non-hw-accel rx path similar to hw-accel
From: Jiri Pirko @ 2011-05-21  7:29 UTC (permalink / raw)
  To: Changli Gao
  Cc: David Miller, netdev, shemminger, kaber, fubar, eric.dumazet,
	nicolas.2p.debian, andy, jesse, ebiederm
In-Reply-To: <BANLkTikSKiYoiOLB=i7qjR0N--CAQ2dHWw@mail.gmail.com>

Sat, May 21, 2011 at 03:11:05AM CEST, xiaosuo@gmail.com wrote:
>On Wed, Apr 13, 2011 at 5:16 AM, David Miller <davem@davemloft.net> wrote:
>> From: Jiri Pirko <jpirko@redhat.com>
>> Date: Fri,  8 Apr 2011 07:48:33 +0200
>>
>>> Now there are 2 paths for rx vlan frames. When rx-vlan-hw-accel is
>>> enabled, skb is untagged by NIC, vlan_tci is set and the skb gets into
>>> vlan code in __netif_receive_skb - vlan_hwaccel_do_receive.
>>>
>>> For non-rx-vlan-hw-accel however, tagged skb goes thru whole
>>> __netif_receive_skb, it's untagged in ptype_base hander and reinjected
>>>
>>> This incosistency is fixed by this patch. Vlan untagging happens early in
>>> __netif_receive_skb so the rest of code (ptype_all handlers, rx_handlers)
>>> see the skb like it was untagged by hw.
>>>
>>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>>>
>>> v1->v2:
>>>       remove "inline" from vlan_core.c functions
>>
>> Ok, I've applied this, let's see what happens :-)
>>
>
>I think we should revert it.
>
>File: net/8021q/vlan_core.c:
>
>161         skb_pull_rcsum(skb, VLAN_HLEN);
>
>skb->data and skb->len are updated, but network_header and
>transport_header are left unchanged. This will break the assumption in
>net_sched.
>
>  for example:
>  file: cls_u32.c
>  104         unsigned int off = skb_network_offset(skb);
>  After this patch, skb_network_offset may be negative.
>
>162         vlan_set_encap_proto(skb, vhdr);
>163
>164         skb = vlan_check_reorder_header(skb);
>vlan_check_reorder_header assume skb->dev is a vlan_dev. Even though
>the correct dev is assigned temporarily, we should not reorder the
>header here as HW accelerated vlan RX does, as this may breaks the
>bridging comes later.
>
>165         if (unlikely(!skb))
>166                 goto err_free;
>
>The hardware accelerated vlan RX doesn't always do the "right" things
>as it strips the vlan header, so we should not emulate it in software
>all the time.

I do not see a reason why to not emulate that. To make paths as much
similar as they can be, that is the point of this patch.

I think it would be better to fix an issue you are pointing at
rather that revert this.

Jirka

>
>-- 
>Regards,
>Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2011-05-21  6:20 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel

I wanted to push this quickly to fix the build fallout:

1) SCTP build failed due to intersection of two commits
   happening in two different trees.

2) ipv6 RTA_PREFSRC support doesn't propagate the prefsrc
   value into new copied routes.

3) garp can use kfree_rcu() too

Please pull, thanks a lot!

The following changes since commit 557eed603159b4e007c57d97fad1333ecebd3c2e:

  Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev (2011-05-20 14:31:27 -0700)

are available in the git repository at:

  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

David S. Miller (1):
      sctp: Fix build failure.

Eric Dumazet (1):
      garp: use kfree_rcu()

Florian Westphal (1):
      ipv6: copy prefsrc setting when copying route entry

 net/802/garp.c       |   20 ++------------------
 net/ipv6/route.c     |    1 +
 net/sctp/bind_addr.c |    2 +-
 3 files changed, 4 insertions(+), 19 deletions(-)

^ permalink raw reply

* Re: [PATCH] garp: use kfree_rcu()
From: David Miller @ 2011-05-21  6:06 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1305952290.2862.2.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 21 May 2011 06:31:30 +0200

> Use kfree_rcu() instead of call_rcu(), remove garp_cleanup_module()
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/1] ipv6: copy prefsrc setting when copying route entry
From: David Miller @ 2011-05-21  6:06 UTC (permalink / raw)
  To: fw; +Cc: netdev, sahne
In-Reply-To: <1305926844-12995-1-git-send-email-fw@strlen.de>

From: Florian Westphal <fw@strlen.de>
Date: Fri, 20 May 2011 23:27:24 +0200

> commit c3968a857a6b6c3d2ef4ead35776b055fb664d74
> ('ipv6: RTA_PREFSRC support for ipv6 route source address selection')
> added support for ipv6 prefsrc as an alternative to ipv6 addrlabels,
> but it did not work because the prefsrc entry was not copied.
> 
> Cc: Daniel Walter <sahne@0x90.at>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Applied.

^ permalink raw reply

* Re: [PATCH FINAL] SCTP: fix race between sctp_bind_addr_free() and sctp_bind_addr_conflict()
From: David Miller @ 2011-05-21  6:05 UTC (permalink / raw)
  To: davej; +Cc: difrost.kernel, vladislav.yasevich, eric.dumazet, netdev
In-Reply-To: <20110520232717.GA5038@redhat.com>

From: Dave Jones <davej@redhat.com>
Date: Fri, 20 May 2011 19:27:17 -0400

> Just saw this land in Linus tree, and it broke the build for me..
> 
> net/sctp/bind_addr.c: In function ‘sctp_bind_addr_clean’:
> net/sctp/bind_addr.c:148:24: error: ‘sctp_local_addr_free’ undeclared (first use in this function)
> net/sctp/bind_addr.c:148:24: note: each undeclared identifier is reported only once for each function it appears in
> make[2]: *** [net/sctp/bind_addr.o] Error 1

Yes this interacted and merged badly with the kfree_rcu() changes,
I'll fix this up, thanks.

^ permalink raw reply

* [PATCH] garp: use kfree_rcu()
From: Eric Dumazet @ 2011-05-21  4:31 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Use kfree_rcu() instead of call_rcu(), remove garp_cleanup_module()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/802/garp.c |   20 ++------------------
 1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/net/802/garp.c b/net/802/garp.c
index f8300a8..1610295 100644
--- a/net/802/garp.c
+++ b/net/802/garp.c
@@ -544,11 +544,6 @@ static int garp_init_port(struct net_device *dev)
 	return 0;
 }
 
-static void garp_kfree_rcu(struct rcu_head *head)
-{
-	kfree(container_of(head, struct garp_port, rcu));
-}
-
 static void garp_release_port(struct net_device *dev)
 {
 	struct garp_port *port = rtnl_dereference(dev->garp_port);
@@ -559,7 +554,7 @@ static void garp_release_port(struct net_device *dev)
 			return;
 	}
 	rcu_assign_pointer(dev->garp_port, NULL);
-	call_rcu(&port->rcu, garp_kfree_rcu);
+	kfree_rcu(port, rcu);
 }
 
 int garp_init_applicant(struct net_device *dev, struct garp_application *appl)
@@ -603,11 +598,6 @@ err1:
 }
 EXPORT_SYMBOL_GPL(garp_init_applicant);
 
-static void garp_app_kfree_rcu(struct rcu_head *head)
-{
-	kfree(container_of(head, struct garp_applicant, rcu));
-}
-
 void garp_uninit_applicant(struct net_device *dev, struct garp_application *appl)
 {
 	struct garp_port *port = rtnl_dereference(dev->garp_port);
@@ -625,7 +615,7 @@ void garp_uninit_applicant(struct net_device *dev, struct garp_application *appl
 	garp_queue_xmit(app);
 
 	dev_mc_del(dev, appl->proto.group_address);
-	call_rcu(&app->rcu, garp_app_kfree_rcu);
+	kfree_rcu(app, rcu);
 	garp_release_port(dev);
 }
 EXPORT_SYMBOL_GPL(garp_uninit_applicant);
@@ -643,9 +633,3 @@ void garp_unregister_application(struct garp_application *appl)
 	stp_proto_unregister(&appl->proto);
 }
 EXPORT_SYMBOL_GPL(garp_unregister_application);
-
-static void __exit garp_cleanup_module(void)
-{
-	rcu_barrier(); /* Wait for completion of call_rcu()'s */
-}
-module_exit(garp_cleanup_module);



^ permalink raw reply related

* Re: [PATCHv2 06/14] virtio: add api for delayed callbacks
From: Rusty Russell @ 2011-05-21  2:33 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <8f343dcaa996f6b10499468c49508ba9d6fb6f5a.1305846412.git.mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Fri, 20 May 2011 02:11:14 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Add an API that tells the other side that callbacks
> should be delayed until a lot of work has been done.
> Implement using the new event_idx feature.
> 
> Note: it might seem advantageous to let the drivers
> ask for a callback after a specific capacity has
> been reached. However, as a single head can
> free many entries in the descriptor table,
> we don't really have a clue about capacity
> until get_buf is called. The API is the simplest
> to implement at the moment, we'll see what kind of
> hints drivers can pass when there's more than one
> user of the feature.
> 
> Signed-off-by: Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Yes, I've applied this (and the next one which uses it in virtio_net),
despite my reservations about the API.  But that is fixable...

Thanks,
Rusty.

^ permalink raw reply

* Re: [PATCHv2 05/14] virtio_test: support event index
From: Rusty Russell @ 2011-05-21  2:32 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <fbed57582b9e8d97c11f889937ea65f42eb03da2.1305846412.git.mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Fri, 20 May 2011 02:11:05 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Add ability to test the new event idx feature,
> enable by default.

Applied.

Thanks,
Rusty.

^ permalink raw reply

* Re: [PATCHv2 04/14] vhost: support event index
From: Rusty Russell @ 2011-05-21  2:31 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <b227febf884dcf82dee9233e581c6216d0e9daa5.1305846412.git.mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Fri, 20 May 2011 02:10:54 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Support the new event index feature. When acked,
> utilize it to reduce the # of interrupts sent to the guest.
> 
> Signed-off-by: Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Applied, even though it'd normally be in your tree, it's easier for me
to push all together.

Thanks,
Rusty.

^ permalink raw reply

* Re: [PATCHv2 03/14] virtio_ring: support event idx feature
From: Rusty Russell @ 2011-05-21  2:31 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <960f3e3b260844011b004c81dbda0661c977b79a.1305846412.git.mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Fri, 20 May 2011 02:10:44 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Support for the new event idx feature:
> 1. When enabling interrupts, publish the current avail index
>    value to the host to get interrupts on the next update.
> 2. Use the new avail_event feature to reduce the number
>    of exits from the guest.

Applied.

Thanks,
Rusty.

^ permalink raw reply

* Re: [PATCHv2 02/14] virtio ring: inline function to check for events
From: Rusty Russell @ 2011-05-21  2:29 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <16ce853af7a80d0f7cb0c1118ba8e19adc184ad0.1305846412.git.mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Fri, 20 May 2011 02:10:27 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> With the new used_event and avail_event and features, both
> host and guest need similar logic to check whether events are
> enabled, so it helps to put the common code in the header.
> 
> Note that Xen has similar logic for notification hold-off
> in include/xen/interface/io/ring.h with req_event and req_prod
> corresponding to event_idx + 1 and new_idx respectively.
> +1 comes from the fact that req_event and req_prod in Xen start at 1,
> while event index in virtio starts at 0.
> 
> Signed-off-by: Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Applied.

Thanks,
Rusty.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox