* Re: [PATCH 16/16] net: netlink support for moving devices between network namespaces.
From: David Miller @ 2007-09-12 11:57 UTC (permalink / raw)
To: ebiederm; +Cc: netdev, containers
In-Reply-To: <m1tzq4u92n.fsf_-_@ebiederm.dsl.xmission.com>
From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:43:44 -0600
>
> The simplest thing to implement is moving network devices between
> namespaces. However with the same attribute IFLA_NET_NS_PID we can
> easily implement creating devices in the destination network
> namespace as well. However that is a little bit trickier so this
> patch sticks to what is simple and easy.
>
> A pid is used to identify a process that happens to be a member
> of the network namespace we want to move the network device to.
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Applied to net-2.6.24, thanks.
^ permalink raw reply
* Re: [PATCH 17/16] net: Disable netfilter sockopts when not in the initial network namespace
From: David Miller @ 2007-09-12 11:59 UTC (permalink / raw)
To: ebiederm; +Cc: netdev, containers
In-Reply-To: <m1ps0su8wv.fsf_-_@ebiederm.dsl.xmission.com>
From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:47:12 -0600
>
> Until we support multiple network namespaces with netfilter only allow
> netfilter configuration in the initial network namespace.
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Applied to net-2.6.24, thanks.
^ permalink raw reply
* Re: [PATCH 17/16] net: Disable netfilter sockopts when not in the initial network namespace
From: David Miller @ 2007-09-12 12:03 UTC (permalink / raw)
To: ebiederm; +Cc: netdev, containers
In-Reply-To: <20070912.045903.104046208.davem@davemloft.net>
I added the following patch to net-2.6.24 to kill a warning
since net_alloc() has no users (yet).
commit f444fa9b5d70b3d431e1554e0975e012514c39f3
Author: David S. Miller <davem@kimchee.(none)>
Date: Wed Sep 12 14:01:08 2007 +0200
[NET]: #if 0 out net_alloc() for now.
We will undo this once it is actually used.
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index f259a9b..1fc513c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -32,10 +32,12 @@ void net_unlock(void)
mutex_unlock(&net_list_mutex);
}
+#if 0
static struct net *net_alloc(void)
{
return kmem_cache_alloc(net_cachep, GFP_KERNEL);
}
+#endif
static void net_free(struct net *net)
{
^ permalink raw reply related
* [PATCH 1/3] sk98lin: restore driver
From: Stephen Hemminger @ 2007-09-12 12:05 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linux Netdev
In-Reply-To: <6278d2220707271227g51cb147ch2f6dbc05c73f618d@mail.gmail.com>
This reverts commit e1abecc48938fbe1966ea6e78267fc673fa59295.
The driver works on some hardware that skge doesn't handle yet.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
-----------
Patch too large for mailing list. Download from:
http://developer.osdl.org/shemminger/patches/sk98lin-2.6.23-restore.patch
^ permalink raw reply
* [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)
From: Wolfgang Walter @ 2007-09-12 12:07 UTC (permalink / raw)
To: trond.myklebust, bfields; +Cc: netdev, nfs, linux-kernel
Hello,
as already described old temporary sockets (client is gone) of lockd aren't
closed after some time. So, with enough clients and some time gone, there
are 80 open dangling sockets and you start getting messages of the form:
lockd: too many open TCP sockets, consider increasing the number of nfsd threads.
If I understand the code then the intention was that the server closes
temporary sockets after about 6 to 12 minutes:
a timer is started which calls svc_age_temp_sockets every 6 minutes.
svc_age_temp_sockets:
if a socket is marked OLD it gets closed.
sockets which are not marked as OLD are marked OLD
every time the sockets receives something OLD is cleared.
But svc_age_temp_sockets never closes any socket though because it only
closes sockets with svsk->sk_inuse == 0. This seems to be a bug.
Here is a patch against 2.6.22.6 which changes the test to
svsk->sk_inuse <= 0 which was probably meant. The patched kernel runs fine
here. Unused sockets get closed (after 6 to 12 minutes)
Signed-off-by: Wolfgang Walter <wolfgang.walter@studentenwerk.mhn.de>
--- ../linux-2.6.22.6/net/sunrpc/svcsock.c 2007-08-27 18:10:14.000000000 +0200
+++ net/sunrpc/svcsock.c 2007-09-11 11:07:13.000000000 +0200
@@ -1572,7 +1575,7 @@
if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
continue;
- if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY, &svsk->sk_flags))
+ if (atomic_read(&svsk->sk_inuse) <= 0 || test_bit(SK_BUSY, &svsk->sk_flags))
continue;
atomic_inc(&svsk->sk_inuse);
list_move(le, &to_be_aged);
As svc_age_temp_sockets did not do anything before this change may trigger
hidden bugs.
To be true I don't see why this check
(atomic_read(&svsk->sk_inuse) <= 0 || test_bit(SK_BUSY, &svsk->sk_flags))
is needed at all (it can only be an optimation) as this fields change after
the check. In svc_tcp_accept there is no such check when a temporary socket
is closed.
Regards,
--
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
^ permalink raw reply
* Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
From: jamal @ 2007-09-12 12:12 UTC (permalink / raw)
To: Bill Fink
Cc: James Chapman, netdev, davem, jeff, mandeep.baines, ossthema,
Stephen Hemminger
In-Reply-To: <20070912030428.16059af6.billfink@mindspring.com>
On Wed, 2007-12-09 at 03:04 -0400, Bill Fink wrote:
> On Fri, 07 Sep 2007, jamal wrote:
> > I am going to be the devil's advocate[1]:
>
> So let me be the angel's advocate. :-)
I think this would make you God's advocate ;->
(http://en.wikipedia.org/wiki/God%27s_advocate)
> I view his results much more favorably.
The challenge is, under _low traffic_: bad bad CPU use.
Thats what is at stake, correct?
Lets bury the stats for a sec ...
1) Has that CPU situation improved? No, it has gotten worse.
2) Was there a throughput problem? No.
Remember, this is _low traffic and the complaint is not NAPI doesnt do
high throughput. I am not willing to spend 34% more cpu to get a few
hundred pps (under low traffic!).
3)Latency improvement is good. But is 34% cost worthwile for the corner
case of low traffic?
Heres an analogy:
I went to buy bread and complained that 66cents was too much for such
a tiny sliced loaf.
You tell me you have solved my problem: asking me to pay a dollar
because you made the bread slices crispier. I was complaining on the _66
cents price_ not on the crispiness of the slices ;-> Crispier slices are
good - but am i, the person who was complaining about price, willing to
pay 40-50% more? People are bitching about NAPI abusing CPU, is the
answer to abuse more CPU than NAPI?;->
The answer could be "I am not solving that problem anymore" - at least
thats what James is saying;->
Note: I am not saying theres no problem - just saying the result is not
addressing the problem.
> You can't always improve on all metrics of a workload.
But you gotta try to be consistent.
If, for example, one packet size/rate got negative results but the next
got positive results - thats lacking consistency.
> Sometimes there
> are tradeoffs to be made to be decided by the user based on what's most
> important to that user and his specific workload. And the suggested
> ethtool option (defaulting to current behavior) would enable the user
> to make that decision.
And the challenge is:
What workload is willing to invest that much cpu for low traffic?
Can you name one? One that may come close is database benchmarks for
latency - but those folks wouldnt touch this with a mile-long pole if
you told them their cpu use is going to get worse than what NAPI (that
big bad CPU hog under low traffic) is giving them.
>
> P.S. I agree that some tests run in parallel with some CPU hogs also
> running might be beneficial and enlightening.
indeed.
cheers,
jamal
^ permalink raw reply
* Re: [PATCH 07/16] net: Make /proc/net per network namespace
From: Daniel Lezcano @ 2007-09-12 12:12 UTC (permalink / raw)
To: David Miller; +Cc: ebiederm, containers, netdev
In-Reply-To: <20070912.030211.102564635.davem@davemloft.net>
David Miller wrote:
> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Sat, 08 Sep 2007 15:20:36 -0600
>
>> This patch makes /proc/net per network namespace. It modifies the global
>> variables proc_net and proc_net_stat to be per network namespace.
>> The proc_net file helpers are modified to take a network namespace argument,
>> and all of their callers are fixed to pass &init_net for that argument.
>> This ensures that all of the /proc/net files are only visible and
>> usable in the initial network namespace until the code behind them
>> has been updated to be handle multiple network namespaces.
>>
>> Making /proc/net per namespace is necessary as at least some files
>> in /proc/net depend upon the set of network devices which is per
>> network namespace, and even more files in /proc/net have contents
>> that are relevant to a single network namespace.
>>
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>
> Patch applied, thanks.
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
>
Hi Dave,
it seems the fs/proc/proc_net.c was not added to the git repository.
Regards.
-- Daniel
^ permalink raw reply
* Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
From: Eric Dumazet @ 2007-09-12 12:16 UTC (permalink / raw)
To: David Miller; +Cc: herbert, netdev, Christoph Hellwig
In-Reply-To: <20070912.041200.15253216.davem@davemloft.net>
On Wed, 12 Sep 2007 04:12:00 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Wed, 12 Sep 2007 12:08:45 +0200
>
> > Unfortunatly, there is no equivalent for this one.
> > This gives on my Opterons a nice "prefetchnta"
> >
> > prefetch(addr) is more like __builtin_prefetch(addr, 0, 3)
> >
> > I would like to avoid to zap L2 cache with useless data.
> >
> > __builtin_prefetch() is included from gcc 3.1 (2002), so every
> > platform should support it, as linux-2.6 requires gcc 3.2 at least.
> >
> > I guess you are going to tell me to first publish a patch to lkml :)
>
> Basically, yes :-) You won't be the only person to find this
> useful.
OK, let's try a normal prefetch(), I'll change it later when/if a
new generic macro is added. I added the missing 'static' and a comment
about the "struct {} dst_garbage". I also corrected spelling error on
patch title (collection)
Thank you
[PATCH] NET : convert IP route cache garbage collection from softirq processing to a workqueue
When the periodic IP route cache flush is done (every 600 seconds on
default configuration), some hosts suffer a lot and eventually trigger
the "soft lockup" message.
dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
eventually freeing some (less than 1%) of them, while holding the
dst_lock spinlock for the whole scan.
Then it rearms a timer to redo the full thing 1/10 s later...
The slowdown can last one minute or so, depending on how active are
the tcp sessions.
This second version of the patch converts the processing from a softirq
based one to a workqueue.
Even if the list of entries in garbage_list is huge, host is still
responsive to softirqs and can make progress.
Instead of resetting gc timer to 0.1 second if one entry was freed in a
gc run, we do this if more than 10% of entries were freed.
Before patch :
Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0!
Aug 16 06:21:37 SRV1 kernel:
Aug 16 06:21:37 SRV1 kernel: Call Trace:
Aug 16 06:21:37 SRV1 kernel: <IRQ> [<ffffffff802286f0>] wake_up_process+0x10/0x20
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80251e09>] softlockup_tick+0xe9/0x110
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803cd380>] dst_run_gc+0x0/0x140
Aug 16 06:21:37 SRV1 kernel: [<ffffffff802376f3>] run_local_timers+0x13/0x20
Aug 16 06:21:37 SRV1 kernel: [<ffffffff802379c7>] update_process_times+0x57/0x90
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80216034>] smp_local_timer_interrupt+0x34/0x60
Aug 16 06:21:37 SRV1 kernel: [<ffffffff802165cc>] smp_apic_timer_interrupt+0x5c/0x80
Aug 16 06:21:37 SRV1 kernel: [<ffffffff8020a816>] apic_timer_interrupt+0x66/0x70
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803cd3d3>] dst_run_gc+0x53/0x140
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803cd3c6>] dst_run_gc+0x46/0x140
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80237148>] run_timer_softirq+0x148/0x1c0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff8023340c>] __do_softirq+0x6c/0xe0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff8020ad6c>] call_softirq+0x1c/0x30
Aug 16 06:21:37 SRV1 kernel: <EOI> [<ffffffff8020cb34>] do_softirq+0x34/0x90
Aug 16 06:21:37 SRV1 kernel: [<ffffffff802331cf>] local_bh_enable_ip+0x3f/0x60
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80422913>] _spin_unlock_bh+0x13/0x20
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803dfde8>] rt_garbage_collect+0x1d8/0x320
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803cd4dd>] dst_alloc+0x1d/0xa0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803e1433>] __ip_route_output_key+0x573/0x800
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803c02e2>] sock_common_recvmsg+0x32/0x50
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803e16dc>] ip_route_output_flow+0x1c/0x60
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80400160>] tcp_v4_connect+0x150/0x610
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803ebf07>] inet_bind_bucket_create+0x17/0x60
Aug 16 06:21:37 SRV1 kernel: [<ffffffff8040cd16>] inet_stream_connect+0xa6/0x2c0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80422981>] _spin_lock_bh+0x11/0x30
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803c0bdf>] lock_sock_nested+0xcf/0xe0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80422981>] _spin_lock_bh+0x11/0x30
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803be551>] sys_connect+0x71/0xa0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803eee3f>] tcp_setsockopt+0x1f/0x30
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803c030f>] sock_common_setsockopt+0xf/0x20
Aug 16 06:21:37 SRV1 kernel: [<ffffffff803be4bd>] sys_setsockopt+0x9d/0xc0
Aug 16 06:21:37 SRV1 kernel: [<ffffffff8028881e>] sys_ioctl+0x5e/0x80
Aug 16 06:21:37 SRV1 kernel: [<ffffffff80209c4e>] system_call+0x7e/0x83
After patch : (RT_CACHE_DEBUG set to 2 to get following traces)
dst_total: 75469 delayed: 74109 work_perf: 141 expires: 150 elapsed: 8092 us
dst_total: 78725 delayed: 73366 work_perf: 743 expires: 400 elapsed: 8542 us
dst_total: 86126 delayed: 71844 work_perf: 1522 expires: 775 elapsed: 8849 us
dst_total: 100173 delayed: 68791 work_perf: 3053 expires: 1256 elapsed: 9748 us
dst_total: 121798 delayed: 64711 work_perf: 4080 expires: 1997 elapsed: 10146 us
dst_total: 154522 delayed: 58316 work_perf: 6395 expires: 25 elapsed: 11402 us
dst_total: 154957 delayed: 58252 work_perf: 64 expires: 150 elapsed: 6148 us
dst_total: 157377 delayed: 57843 work_perf: 409 expires: 400 elapsed: 6350 us
dst_total: 163745 delayed: 56679 work_perf: 1164 expires: 775 elapsed: 7051 us
dst_total: 176577 delayed: 53965 work_perf: 2714 expires: 1389 elapsed: 8120 us
dst_total: 198993 delayed: 49627 work_perf: 4338 expires: 1997 elapsed: 8909 us
dst_total: 226638 delayed: 46865 work_perf: 2762 expires: 2748 elapsed: 7351 us
I successfully reduced the IP route cache of many hosts by a four factor
thanks to this patch. Previously, I had to disable "ip route flush cache"
to avoid crashes.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
diff --git a/net/core/dst.c b/net/core/dst.c
index c6a0587..e250e01 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -9,6 +9,7 @@
#include <linux/errno.h>
#include <linux/init.h>
#include <linux/kernel.h>
+#include <linux/workqueue.h>
#include <linux/mm.h>
#include <linux/module.h>
#include <linux/netdevice.h>
@@ -18,50 +19,72 @@
#include <net/dst.h>
-/* Locking strategy:
- * 1) Garbage collection state of dead destination cache
- * entries is protected by dst_lock.
- * 2) GC is run only from BH context, and is the only remover
- * of entries.
- * 3) Entries are added to the garbage list from both BH
- * and non-BH context, so local BH disabling is needed.
- * 4) All operations modify state, so a spinlock is used.
+/*
+ * Theory of operations:
+ * 1) We use a list, protected by a spinlock, to add
+ * new entries from both BH and non-BH context.
+ * 2) In order to keep spinlock held for a small delay,
+ * we use a second list where are stored long lived
+ * entries, that are handled by the garbage collect thread
+ * fired by a workqueue.
+ * 3) This list is guarded by a mutex,
+ * so that the gc_task and dst_dev_event() can be synchronized.
*/
-static struct dst_entry *dst_garbage_list;
#if RT_CACHE_DEBUG >= 2
static atomic_t dst_total = ATOMIC_INIT(0);
#endif
-static DEFINE_SPINLOCK(dst_lock);
-static unsigned long dst_gc_timer_expires;
-static unsigned long dst_gc_timer_inc = DST_GC_MAX;
-static void dst_run_gc(unsigned long);
+/*
+ * We want to keep lock & list close together
+ * to dirty as few cache lines as possible in __dst_free().
+ * As this is not a very strong hint, we dont force an alignment on SMP.
+ */
+static struct {
+ spinlock_t lock;
+ struct dst_entry *list;
+ unsigned long timer_inc;
+ unsigned long timer_expires;
+} dst_garbage = {
+ .lock = __SPIN_LOCK_UNLOCKED(dst_garbage.lock),
+ .timer_inc = DST_GC_MAX,
+};
+static void dst_gc_task(struct work_struct *work);
static void ___dst_free(struct dst_entry * dst);
-static DEFINE_TIMER(dst_gc_timer, dst_run_gc, DST_GC_MIN, 0);
+static DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task);
-static void dst_run_gc(unsigned long dummy)
+static DEFINE_MUTEX(dst_gc_mutex);
+/*
+ * long lived entries are maintained in this list, guarded by dst_gc_mutex
+ */
+static struct dst_entry *dst_busy_list;
+
+static void dst_gc_task(struct work_struct *work)
{
int delayed = 0;
- int work_performed;
- struct dst_entry * dst, **dstp;
+ int work_performed = 0;
+ unsigned long expires = ~0L;
+ struct dst_entry *dst, *next, head;
+ struct dst_entry *last = &head;
+#if RT_CACHE_DEBUG >= 2
+ ktime_t time_start = ktime_get();
+ struct timespec elapsed;
+#endif
- if (!spin_trylock(&dst_lock)) {
- mod_timer(&dst_gc_timer, jiffies + HZ/10);
- return;
- }
+ mutex_lock(&dst_gc_mutex);
+ next = dst_busy_list;
- del_timer(&dst_gc_timer);
- dstp = &dst_garbage_list;
- work_performed = 0;
- while ((dst = *dstp) != NULL) {
- if (atomic_read(&dst->__refcnt)) {
- dstp = &dst->next;
+loop:
+ while ((dst = next) != NULL) {
+ next = dst->next;
+ prefetch(&next->next);
+ if (likely(atomic_read(&dst->__refcnt))) {
+ last->next = dst;
+ last = dst;
delayed++;
continue;
}
- *dstp = dst->next;
- work_performed = 1;
+ work_performed++;
dst = dst_destroy(dst);
if (dst) {
@@ -77,38 +100,56 @@ static void dst_run_gc(unsigned long dummy)
continue;
___dst_free(dst);
- dst->next = *dstp;
- *dstp = dst;
- dstp = &dst->next;
+ dst->next = next;
+ next = dst;
}
}
- if (!dst_garbage_list) {
- dst_gc_timer_inc = DST_GC_MAX;
- goto out;
+
+ spin_lock_bh(&dst_garbage.lock);
+ next = dst_garbage.list;
+ if (next) {
+ dst_garbage.list = NULL;
+ spin_unlock_bh(&dst_garbage.lock);
+ goto loop;
}
- if (!work_performed) {
- if ((dst_gc_timer_expires += dst_gc_timer_inc) > DST_GC_MAX)
- dst_gc_timer_expires = DST_GC_MAX;
- dst_gc_timer_inc += DST_GC_INC;
- } else {
- dst_gc_timer_inc = DST_GC_INC;
- dst_gc_timer_expires = DST_GC_MIN;
+ last->next = NULL;
+ dst_busy_list = head.next;
+ if (!dst_busy_list)
+ dst_garbage.timer_inc = DST_GC_MAX;
+ else {
+ /*
+ * if we freed less than 1/10 of delayed entries,
+ * we can sleep longer.
+ */
+ if (work_performed <= delayed/10) {
+ dst_garbage.timer_expires += dst_garbage.timer_inc;
+ if (dst_garbage.timer_expires > DST_GC_MAX)
+ dst_garbage.timer_expires = DST_GC_MAX;
+ dst_garbage.timer_inc += DST_GC_INC;
+ } else {
+ dst_garbage.timer_inc = DST_GC_INC;
+ dst_garbage.timer_expires = DST_GC_MIN;
+ }
+ expires = dst_garbage.timer_expires;
+ /*
+ * if the next desired timer is more than 4 seconds in the future
+ * then round the timer to whole seconds
+ */
+ if (expires > 4*HZ)
+ expires = round_jiffies_relative(expires);
+ schedule_delayed_work(&dst_gc_work, expires);
}
+
+ spin_unlock_bh(&dst_garbage.lock);
+ mutex_unlock(&dst_gc_mutex);
#if RT_CACHE_DEBUG >= 2
- printk("dst_total: %d/%d %ld\n",
- atomic_read(&dst_total), delayed, dst_gc_timer_expires);
+ elapsed = ktime_to_timespec(ktime_sub(ktime_get(), time_start));
+ printk(KERN_DEBUG "dst_total: %d delayed: %d work_perf: %d"
+ " expires: %lu elapsed: %lu us\n",
+ atomic_read(&dst_total), delayed, work_performed,
+ expires,
+ elapsed.tv_sec * USEC_PER_SEC + elapsed.tv_nsec / NSEC_PER_USEC);
#endif
- /* if the next desired timer is more than 4 seconds in the future
- * then round the timer to whole seconds
- */
- if (dst_gc_timer_expires > 4*HZ)
- mod_timer(&dst_gc_timer,
- round_jiffies(jiffies + dst_gc_timer_expires));
- else
- mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires);
-
-out:
- spin_unlock(&dst_lock);
}
static int dst_discard(struct sk_buff *skb)
@@ -153,16 +194,16 @@ static void ___dst_free(struct dst_entry * dst)
void __dst_free(struct dst_entry * dst)
{
- spin_lock_bh(&dst_lock);
+ spin_lock_bh(&dst_garbage.lock);
___dst_free(dst);
- dst->next = dst_garbage_list;
- dst_garbage_list = dst;
- if (dst_gc_timer_inc > DST_GC_INC) {
- dst_gc_timer_inc = DST_GC_INC;
- dst_gc_timer_expires = DST_GC_MIN;
- mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires);
+ dst->next = dst_garbage.list;
+ dst_garbage.list = dst;
+ if (dst_garbage.timer_inc > DST_GC_INC) {
+ dst_garbage.timer_inc = DST_GC_INC;
+ dst_garbage.timer_expires = DST_GC_MIN;
+ schedule_delayed_work(&dst_gc_work, dst_garbage.timer_expires);
}
- spin_unlock_bh(&dst_lock);
+ spin_unlock_bh(&dst_garbage.lock);
}
struct dst_entry *dst_destroy(struct dst_entry * dst)
@@ -250,16 +291,30 @@ static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev,
static int dst_dev_event(struct notifier_block *this, unsigned long event, void *ptr)
{
struct net_device *dev = ptr;
- struct dst_entry *dst;
+ struct dst_entry *dst, *last = NULL;
switch (event) {
case NETDEV_UNREGISTER:
case NETDEV_DOWN:
- spin_lock_bh(&dst_lock);
- for (dst = dst_garbage_list; dst; dst = dst->next) {
+ mutex_lock(&dst_gc_mutex);
+ for (dst = dst_busy_list; dst; dst = dst->next) {
+ last = dst;
+ dst_ifdown(dst, dev, event != NETDEV_DOWN);
+ }
+
+ spin_lock_bh(&dst_garbage.lock);
+ dst = dst_garbage.list;
+ dst_garbage.list = NULL;
+ spin_unlock_bh(&dst_garbage.lock);
+
+ if (last)
+ last->next = dst;
+ else
+ dst_busy_list = dst;
+ for (; dst; dst = dst->next) {
dst_ifdown(dst, dev, event != NETDEV_DOWN);
}
- spin_unlock_bh(&dst_lock);
+ mutex_unlock(&dst_gc_mutex);
break;
}
return NOTIFY_DONE;
^ permalink raw reply related
* Re: [PATCH 17/16] net: Disable netfilter sockopts when not in the initial network namespace
From: Eric W. Biederman @ 2007-09-12 12:16 UTC (permalink / raw)
To: David Miller; +Cc: netdev, containers
In-Reply-To: <20070912.050311.78725619.davem@davemloft.net>
David Miller <davem@davemloft.net> writes:
> I added the following patch to net-2.6.24 to kill a warning
> since net_alloc() has no users (yet).
Reasonable, and thanks for merging these.
Having a solid place to start helps a lot.
I will see if I can get the /proc races fixed shortly.
Eric
^ permalink raw reply
* Re: [PATCH 07/16] net: Make /proc/net per network namespace
From: David Miller @ 2007-09-12 12:19 UTC (permalink / raw)
To: dlezcano; +Cc: ebiederm, containers, netdev
In-Reply-To: <46E7D794.5090006@fr.ibm.com>
From: Daniel Lezcano <dlezcano@fr.ibm.com>
Date: Wed, 12 Sep 2007 14:12:04 +0200
> it seems the fs/proc/proc_net.c was not added to the git repository.
Fixed, thanks for catching that.
^ permalink raw reply
* Re: [PATCH 4/6] [IPROUTE2]: Overhead calculation is now done in the kernel
From: Jesper Dangaard Brouer @ 2007-09-12 12:22 UTC (permalink / raw)
To: Stephen Hemminger
Cc: netdev@vger.kernel.org, Patrick McHardy, David S. Miller
In-Reply-To: <20070912130509.0a6dea32@oldman>
On Wed, 2007-09-12 at 13:05 +0200, Stephen Hemminger wrote:
> How is this binary compatable with older kernels?
It will be binary compatable, as I use/rename some unused variables in
struct tc_ratespec.
--
Med venlig hilsen / Best regards
Jesper Brouer
ComX Networks A/S
Linux Network developer
Cand. Scient Datalog / MSc.
Author of http://adsl-optimizer.dk
^ permalink raw reply
* Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
From: David Miller @ 2007-09-12 12:30 UTC (permalink / raw)
To: dada1; +Cc: herbert, netdev, hch
In-Reply-To: <20070912141656.394e1f28.dada1@cosmosbay.com>
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Wed, 12 Sep 2007 14:16:56 +0200
> OK, let's try a normal prefetch(), I'll change it later when/if a
> new generic macro is added. I added the missing 'static' and a comment
> about the "struct {} dst_garbage". I also corrected spelling error on
> patch title (collection)
I sorted out the conflicts with the network namespace stuff
I just checked in and added your patch to net-2.6.24
Thanks!
^ permalink raw reply
* Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170
From: David Miller @ 2007-09-12 12:34 UTC (permalink / raw)
To: johannes
Cc: herbert, satyam, flo, linux-kernel, netdev, linux-wireless,
michal.k.k.piotrowski, ipw3945-devel, yi.zhu
In-Reply-To: <1189091995.28781.99.camel@johannes.berg>
From: Johannes Berg <johannes@sipsolutions.net>
Date: Thu, 06 Sep 2007 17:19:55 +0200
>
> Oh btw. Can we stick a might_sleep() into dev_close() *before* the test
> whether the device is up? That way, we'd have seen the bug, but
> apparently nobody before Florian ever did a 'ip link set wmaster0 down'
> while the other interfaces were still open.
I've added this to net-2.6.24
^ permalink raw reply
* [NETLINK]: Introduce nested and byteorder flag to netlink attribute
From: Thomas Graf @ 2007-09-12 12:41 UTC (permalink / raw)
To: davem; +Cc: netdev, netfilter-devel
This change allows the generic attribute interface to be used within
the netfilter subsystem where this flag was initially introduced.
The byte-order flag is yet unused, it's intended use is to
allow automatic byte order convertions for all atomic types.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Index: net-2.6.24/include/linux/netlink.h
===================================================================
--- net-2.6.24.orig/include/linux/netlink.h 2007-09-12 13:29:49.000000000 +0200
+++ net-2.6.24/include/linux/netlink.h 2007-09-12 13:59:41.000000000 +0200
@@ -129,6 +129,20 @@
__u16 nla_type;
};
+/*
+ * nla_type (16 bits)
+ * +---+---+-------------------------------+
+ * | N | O | Attribute Type |
+ * +---+---+-------------------------------+
+ * N := Carries nested attributes
+ * O := Payload stored in network byte order
+ *
+ * Note: The N and O flag are mutually exclusive.
+ */
+#define NLA_F_NESTED (1 << 15)
+#define NLA_F_NET_BYTEORDER (1 << 14)
+#define NLA_TYPE_MASK ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER)
+
#define NLA_ALIGNTO 4
#define NLA_ALIGN(len) (((len) + NLA_ALIGNTO - 1) & ~(NLA_ALIGNTO - 1))
#define NLA_HDRLEN ((int) NLA_ALIGN(sizeof(struct nlattr)))
Index: net-2.6.24/include/net/netlink.h
===================================================================
--- net-2.6.24.orig/include/net/netlink.h 2007-09-12 13:29:50.000000000 +0200
+++ net-2.6.24/include/net/netlink.h 2007-09-12 14:17:56.000000000 +0200
@@ -667,6 +667,15 @@
}
/**
+ * nla_type - attribute type
+ * @nla: netlink attribute
+ */
+static inline int nla_type(const struct nlattr *nla)
+{
+ return nla->nla_type & NLA_TYPE_MASK;
+}
+
+/**
* nla_data - head of payload
* @nla: netlink attribute
*/
Index: net-2.6.24/net/ipv4/fib_frontend.c
===================================================================
--- net-2.6.24.orig/net/ipv4/fib_frontend.c 2007-09-12 13:29:51.000000000 +0200
+++ net-2.6.24/net/ipv4/fib_frontend.c 2007-09-12 13:59:41.000000000 +0200
@@ -487,7 +487,7 @@
}
nlmsg_for_each_attr(attr, nlh, sizeof(struct rtmsg), remaining) {
- switch (attr->nla_type) {
+ switch (nla_type(attr)) {
case RTA_DST:
cfg->fc_dst = nla_get_be32(attr);
break;
Index: net-2.6.24/net/ipv4/fib_semantics.c
===================================================================
--- net-2.6.24.orig/net/ipv4/fib_semantics.c 2007-09-12 13:29:51.000000000 +0200
+++ net-2.6.24/net/ipv4/fib_semantics.c 2007-09-12 13:59:41.000000000 +0200
@@ -743,7 +743,7 @@
int remaining;
nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
- int type = nla->nla_type;
+ int type = nla_type(nla);
if (type) {
if (type > RTAX_MAX)
Index: net-2.6.24/net/ipv6/route.c
===================================================================
--- net-2.6.24.orig/net/ipv6/route.c 2007-09-12 13:29:51.000000000 +0200
+++ net-2.6.24/net/ipv6/route.c 2007-09-12 13:59:41.000000000 +0200
@@ -1278,7 +1278,7 @@
int remaining;
nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
- int type = nla->nla_type;
+ int type = nla_type(nla);
if (type) {
if (type > RTAX_MAX) {
Index: net-2.6.24/net/netlabel/netlabel_cipso_v4.c
===================================================================
--- net-2.6.24.orig/net/netlabel/netlabel_cipso_v4.c 2007-09-12 13:29:51.000000000 +0200
+++ net-2.6.24/net/netlabel/netlabel_cipso_v4.c 2007-09-12 13:59:41.000000000 +0200
@@ -130,7 +130,7 @@
return -EINVAL;
nla_for_each_nested(nla, info->attrs[NLBL_CIPSOV4_A_TAGLST], nla_rem)
- if (nla->nla_type == NLBL_CIPSOV4_A_TAG) {
+ if (nla_type(nla) == NLBL_CIPSOV4_A_TAG) {
if (iter >= CIPSO_V4_TAG_MAXCNT)
return -EINVAL;
doi_def->tags[iter++] = nla_get_u8(nla);
@@ -192,13 +192,13 @@
nla_for_each_nested(nla_a,
info->attrs[NLBL_CIPSOV4_A_MLSLVLLST],
nla_a_rem)
- if (nla_a->nla_type == NLBL_CIPSOV4_A_MLSLVL) {
+ if (nla_type(nla_a) == NLBL_CIPSOV4_A_MLSLVL) {
if (nla_validate_nested(nla_a,
NLBL_CIPSOV4_A_MAX,
netlbl_cipsov4_genl_policy) != 0)
goto add_std_failure;
nla_for_each_nested(nla_b, nla_a, nla_b_rem)
- switch (nla_b->nla_type) {
+ switch (nla_type(nla_b)) {
case NLBL_CIPSOV4_A_MLSLVLLOC:
if (nla_get_u32(nla_b) >
CIPSO_V4_MAX_LOC_LVLS)
@@ -240,7 +240,7 @@
nla_for_each_nested(nla_a,
info->attrs[NLBL_CIPSOV4_A_MLSLVLLST],
nla_a_rem)
- if (nla_a->nla_type == NLBL_CIPSOV4_A_MLSLVL) {
+ if (nla_type(nla_a) == NLBL_CIPSOV4_A_MLSLVL) {
struct nlattr *lvl_loc;
struct nlattr *lvl_rem;
@@ -265,13 +265,13 @@
nla_for_each_nested(nla_a,
info->attrs[NLBL_CIPSOV4_A_MLSCATLST],
nla_a_rem)
- if (nla_a->nla_type == NLBL_CIPSOV4_A_MLSCAT) {
+ if (nla_type(nla_a) == NLBL_CIPSOV4_A_MLSCAT) {
if (nla_validate_nested(nla_a,
NLBL_CIPSOV4_A_MAX,
netlbl_cipsov4_genl_policy) != 0)
goto add_std_failure;
nla_for_each_nested(nla_b, nla_a, nla_b_rem)
- switch (nla_b->nla_type) {
+ switch (nla_type(nla_b)) {
case NLBL_CIPSOV4_A_MLSCATLOC:
if (nla_get_u32(nla_b) >
CIPSO_V4_MAX_LOC_CATS)
@@ -315,7 +315,7 @@
nla_for_each_nested(nla_a,
info->attrs[NLBL_CIPSOV4_A_MLSCATLST],
nla_a_rem)
- if (nla_a->nla_type == NLBL_CIPSOV4_A_MLSCAT) {
+ if (nla_type(nla_a) == NLBL_CIPSOV4_A_MLSCAT) {
struct nlattr *cat_loc;
struct nlattr *cat_rem;
Index: net-2.6.24/net/netlink/attr.c
===================================================================
--- net-2.6.24.orig/net/netlink/attr.c 2007-09-12 13:29:51.000000000 +0200
+++ net-2.6.24/net/netlink/attr.c 2007-09-12 14:10:51.000000000 +0200
@@ -27,12 +27,12 @@
const struct nla_policy *policy)
{
const struct nla_policy *pt;
- int minlen = 0, attrlen = nla_len(nla);
+ int minlen = 0, attrlen = nla_len(nla), type = nla_type(nla);
- if (nla->nla_type <= 0 || nla->nla_type > maxtype)
+ if (type <= 0 || type > maxtype)
return 0;
- pt = &policy[nla->nla_type];
+ pt = &policy[type];
BUG_ON(pt->type > NLA_TYPE_MAX);
@@ -149,7 +149,7 @@
memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1));
nla_for_each_attr(nla, head, len, rem) {
- u16 type = nla->nla_type;
+ u16 type = nla_type(nla);
if (type > 0 && type <= maxtype) {
if (policy) {
@@ -185,7 +185,7 @@
int rem;
nla_for_each_attr(nla, head, len, rem)
- if (nla->nla_type == attrtype)
+ if (nla_type(nla) == attrtype)
return nla;
return NULL;
^ permalink raw reply
* Re: [NETLINK]: Introduce nested and byteorder flag to netlink attribute
From: David Miller @ 2007-09-12 12:45 UTC (permalink / raw)
To: tgraf; +Cc: netdev, netfilter-devel
In-Reply-To: <20070912124145.GC18480@postel.suug.ch>
From: Thomas Graf <tgraf@suug.ch>
Date: Wed, 12 Sep 2007 14:41:45 +0200
> This change allows the generic attribute interface to be used within
> the netfilter subsystem where this flag was initially introduced.
>
> The byte-order flag is yet unused, it's intended use is to
> allow automatic byte order convertions for all atomic types.
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
Applied to net-2.6.24, thanks Thomas.
^ permalink raw reply
* [net-2.6.24][NETNS][patch 0/3] fixes for the core network namespace
From: dlezcano @ 2007-09-12 12:38 UTC (permalink / raw)
To: davem; +Cc: ebiederm, containers, netdev
The following patches fixes some compilation errors and boot problems
related to the network namespace patchset.
They apply to net-2.6.24
--
^ permalink raw reply
* [net-2.6.24][NETNS][patch 3/3] fix bad macro definition
From: dlezcano @ 2007-09-12 12:38 UTC (permalink / raw)
To: davem; +Cc: ebiederm, containers, netdev, Benjamin Thery
In-Reply-To: <20070912123811.075269132@mai.toulouse-stg.fr.ibm.com>
[-- Attachment #1: net-fix-llc-core-init-panic.patch --]
[-- Type: text/plain, Size: 2205 bytes --]
From: Daniel Lezcano <dlezcano@fr.ibm.com>
The macro definition is bad. When calling next_net_device with
parameter name "dev", the resulting code is:
struct net_device *dev = dev and that leads to an unexpected
behavior. Especially when llc_core is compiled in, the kernel panics
at boot time.
The patchset change macro definition with static inline functions as
they were defined before.
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
include/linux/netdevice.h | 35 +++++++++++++++++------------------
1 file changed, 17 insertions(+), 18 deletions(-)
Index: net-2.6.24/include/linux/netdevice.h
===================================================================
--- net-2.6.24.orig/include/linux/netdevice.h
+++ net-2.6.24/include/linux/netdevice.h
@@ -41,7 +41,8 @@
#include <linux/dmaengine.h>
#include <linux/workqueue.h>
-struct net;
+#include <net/net_namespace.h>
+
struct vlan_group;
struct ethtool_ops;
struct netpoll_info;
@@ -739,23 +740,21 @@
list_for_each_entry_continue(d, &(net)->dev_base_head, dev_list)
#define net_device_entry(lh) list_entry(lh, struct net_device, dev_list)
-#define next_net_device(d) \
-({ \
- struct net_device *dev = d; \
- struct list_head *lh; \
- struct net *net; \
- \
- net = dev->nd_net; \
- lh = dev->dev_list.next; \
- lh == &net->dev_base_head ? NULL : net_device_entry(lh); \
-})
-
-#define first_net_device(N) \
-({ \
- struct net *NET = (N); \
- list_empty(&NET->dev_base_head) ? NULL : \
- net_device_entry(NET->dev_base_head.next); \
-})
+static inline struct net_device *next_net_device(struct net_device *dev)
+{
+ struct list_head *lh;
+ struct net *net;
+
+ net = dev->nd_net;
+ lh = dev->dev_list.next;
+ return lh == &net->dev_base_head ? NULL : net_device_entry(lh);
+}
+
+static inline struct net_device *first_net_device(struct net *net)
+{
+ return list_empty(&net->dev_base_head) ? NULL :
+ net_device_entry(net->dev_base_head.next);
+}
extern int netdev_boot_setup_check(struct net_device *dev);
extern unsigned long netdev_boot_base(const char *prefix, int unit);
--
^ permalink raw reply
* [net-2.6.24][NETNS][patch 2/3] fix loopback network namespace initialization
From: dlezcano @ 2007-09-12 12:38 UTC (permalink / raw)
To: davem; +Cc: ebiederm, containers, netdev
In-Reply-To: <20070912123811.075269132@mai.toulouse-stg.fr.ibm.com>
[-- Attachment #1: net-fix-kernel-bug-dev-nd-net-null.patch --]
[-- Type: text/plain, Size: 833 bytes --]
From: Daniel Lezcano <dlezcano@fr.ibm.com>
The core patchset of the network namespace sent by
Eric Biederman does not do dynamic loopback creation.
So there is no call to alloc_netdev_mq which fills the
network namespace field of the netdevice.
This patch assign the loopback to the init network namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
drivers/net/loopback.c | 1 +
1 file changed, 1 insertion(+)
Index: net-2.6.24/drivers/net/loopback.c
===================================================================
--- net-2.6.24.orig/drivers/net/loopback.c
+++ net-2.6.24/drivers/net/loopback.c
@@ -225,6 +225,7 @@
| NETIF_F_LLTX
| NETIF_F_NETNS_LOCAL,
.ethtool_ops = &loopback_ethtool_ops,
+ .nd_net = &init_net,
};
/* Setup and register the loopback device. */
--
^ permalink raw reply
* [net-2.6.24][NETNS][patch 1/3] fix export symbols
From: dlezcano @ 2007-09-12 12:38 UTC (permalink / raw)
To: davem; +Cc: ebiederm, containers, netdev, Mark Nelson, Benjamin Thery
In-Reply-To: <20070912123811.075269132@mai.toulouse-stg.fr.ibm.com>
[-- Attachment #1: net-fix-export-symbol.patch --]
[-- Type: text/plain, Size: 1116 bytes --]
From: Daniel Lezcano <dlezcano@fr.ibm.com>
Add the appropriate EXPORT_SYMBOLS for proc_net_create,
proc_net_fops_create and proc_net_remove to fix errors when
compiling allmodconfig
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
---
fs/proc/proc_net.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: net-2.6.24/fs/proc/proc_net.c
===================================================================
--- net-2.6.24.orig/fs/proc/proc_net.c
+++ net-2.6.24/fs/proc/proc_net.c
@@ -31,6 +31,7 @@
{
return create_proc_info_entry(name,mode, net->proc_net, get_info);
}
+EXPORT_SYMBOL_GPL(proc_net_create);
struct proc_dir_entry *proc_net_fops_create(struct net *net,
const char *name, mode_t mode, const struct file_operations *fops)
@@ -42,12 +43,13 @@
res->proc_fops = fops;
return res;
}
+EXPORT_SYMBOL_GPL(proc_net_fops_create);
void proc_net_remove(struct net *net, const char *name)
{
remove_proc_entry(name, net->proc_net);
}
-
+EXPORT_SYMBOL_GPL(proc_net_remove);
static struct proc_dir_entry *proc_net_shadow;
--
^ permalink raw reply
* Re: [PATCH net-2.6.23-rc5] ipsec interfamily route handling fix
From: David Miller @ 2007-09-12 12:46 UTC (permalink / raw)
To: joakim.koskela; +Cc: netdev
In-Reply-To: <200709061900.10508.joakim.koskela@hiit.fi>
From: Joakim Koskela <joakim.koskela@hiit.fi>
Date: Thu, 6 Sep 2007 19:00:10 +0300
> This patch addresses a couple of issues related to interfamily ipsec
> modes. The problem is that the structure of the routing info changes
> with the family during the __xfrmX_bundle_create, which hasn't been
> taken properly into account. Seems that by coincidence it hasn't
> caused problems on 32bit platforms, but crashes for example on x86_64
> in 6-4 around line 209 of xfrm6_policy.c as rt doesn't point to a
> rt6_info anymore, but actually a struct rtable. With 64bit pointers,
> the rt->rt6i_node pointer seems to hit something usually not null in
> the rtable that rt now points to, making it go for the path_cookie
> assignment and subsequently crashing.
>
> Tested on both 32/64bit with all four (44/46/64/66) combinations of
> transformation. I'm still a bit worried about how for example nested
> transformations work with all of this and would appreciate if someone
> more familiar with the details of these structs could comment.
>
> Signed-off-by: Joakim Koskela <jookos@gmail.com>
This fix basically looks fine to me, but I'd like at least one
other person to review it too.
^ permalink raw reply
* Re: new NAPI interface broken
From: David Miller @ 2007-09-12 12:50 UTC (permalink / raw)
To: ossthema; +Cc: shemminger, netdev, themann, raisch
In-Reply-To: <200709071137.02801.ossthema@de.ibm.com>
From: Jan-Bernd Themann <ossthema@de.ibm.com>
Date: Fri, 7 Sep 2007 11:37:02 +0200
> 2) On SMP systems: after netif_rx_complete has been called on CPU1
> (+interruts enabled), netif_rx_schedule could be called on CPU2
> (irq handler) before net_rx_action on CPU1 has checked NAPI_STATE_SCHED.
> In that case the device would be added to poll lists of CPU1 and CPU2
> as net_rx_action would see NAPI_STATE_SCHED set.
> This must not happen. It will be caught when netif_rx_complete is
> called the second time (BUG() called)
>
> This would mean we have a problem on all SMP machines right now.
This is not a correct statement.
Only on your platform do network device interrupts get moved
around, no other platform does this.
Sparc64 doesn't, all interrupts stay in one location after
the cpu is initially choosen.
x86 and x86_64 specifically do not move around network
device interrupts, even though other device types do
get dynamic IRQ cpu distribution.
That's why you are the only person seeing this problem.
I agree that it should be fixed, but we should also fix the IRQ
distribution scheme used on powerpc platforms which is totally
broken in these cases.
^ permalink raw reply
* Re: [net-2.6.24][NETNS][patch 1/3] fix export symbols
From: David Miller @ 2007-09-12 12:53 UTC (permalink / raw)
To: dlezcano; +Cc: ebiederm, containers, netdev, markn, benjamin.thery
In-Reply-To: <20070912124414.640551634@mai.toulouse-stg.fr.ibm.com>
From: dlezcano@fr.ibm.com
Date: Wed, 12 Sep 2007 14:38:12 +0200
> From: Daniel Lezcano <dlezcano@fr.ibm.com>
>
> Add the appropriate EXPORT_SYMBOLS for proc_net_create,
> proc_net_fops_create and proc_net_remove to fix errors when
> compiling allmodconfig
>
> Signed-off-by: Mark Nelson <markn@au1.ibm.com>
> Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Applied to net-2.6.24, thanks.
Why aren't you signing off on these patches? Please
do so in the future.
Because "From: " usually means you are the patch author, and I can't
tell who wrote these patches, you or these other people listed in the
signoff area.
^ permalink raw reply
* Re: [net-2.6.24][NETNS][patch 2/3] fix loopback network namespace initialization
From: David Miller @ 2007-09-12 12:55 UTC (permalink / raw)
To: dlezcano; +Cc: ebiederm, containers, netdev
In-Reply-To: <20070912124419.150545693@mai.toulouse-stg.fr.ibm.com>
From: dlezcano@fr.ibm.com
Date: Wed, 12 Sep 2007 14:38:13 +0200
> From: Daniel Lezcano <dlezcano@fr.ibm.com>
>
> The core patchset of the network namespace sent by
> Eric Biederman does not do dynamic loopback creation.
> So there is no call to alloc_netdev_mq which fills the
> network namespace field of the netdevice.
>
> This patch assign the loopback to the init network namespace.
>
> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Applied, thanks.
^ permalink raw reply
* [PATCH 1/4] [IPROUTE2] Revert "Make ip utility veth driver aware"
From: Eric W. Biederman @ 2007-09-12 12:55 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, Pavel Emelyanov, Patrick McHardy
Stephen it looks like you weren't cc'd on the latest version
of the veth support. So this patchset first reverts the old
version of the veth support you merged. Then merges a tested
version of the veth support.
This reverts commit 4ed390ce43d1ec7c881721f312260df901d8390d.
Conflicts:
ip/ip.c
---
ip/Makefile | 2 +-
ip/ip.c | 4 +-
ip/veth.c | 196 -----------------------------------------------------------
ip/veth.h | 17 -----
4 files changed, 2 insertions(+), 217 deletions(-)
delete mode 100644 ip/veth.c
delete mode 100644 ip/veth.h
diff --git a/ip/Makefile b/ip/Makefile
index 209c5c8..9a5bfe3 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -1,7 +1,7 @@
IPOBJ=ip.o ipaddress.o iproute.o iprule.o \
rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \
ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \
- ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o veth.o
+ ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o
RTMONOBJ=rtmon.o
diff --git a/ip/ip.c b/ip/ip.c
index 829fc64..4bdb83b 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -27,7 +27,6 @@
#include "SNAPSHOT.h"
#include "utils.h"
#include "ip_common.h"
-#include "veth.h"
int preferred_family = AF_UNSPEC;
int show_stats = 0;
@@ -48,7 +47,7 @@ static void usage(void)
"Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n"
" ip [ -force ] [-batch filename\n"
"where OBJECT := { link | addr | route | rule | neigh | ntable | tunnel |\n"
-" maddr | mroute | monitor | xfrm | veth }\n"
+" maddr | mroute | monitor | xfrm }\n"
" OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
" -f[amily] { inet | inet6 | ipx | dnet | link } |\n"
" -o[neline] | -t[imestamp] }\n");
@@ -78,7 +77,6 @@ static const struct cmd {
{ "monitor", do_ipmonitor },
{ "xfrm", do_xfrm },
{ "mroute", do_multiroute },
- { "veth", do_veth },
{ "help", do_help },
{ 0 }
};
diff --git a/ip/veth.c b/ip/veth.c
deleted file mode 100644
index d4eecc8..0000000
--- a/ip/veth.c
+++ /dev/null
@@ -1,196 +0,0 @@
-/*
- * veth.c "ethernet tunnel"
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- *
- * Authors: Pavel Emelianov, <xemul@openvz.org>
- *
- */
-
-#include <stdio.h>
-#include <string.h>
-#include <unistd.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <linux/genetlink.h>
-
-#include "utils.h"
-#include "veth.h"
-
-#define GENLMSG_DATA(glh) ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN))
-#define NLA_DATA(na) ((void *)((char*)(na) + NLA_HDRLEN))
-
-static int do_veth_help(void)
-{
- fprintf(stderr, "Usage: ip veth add DEVICE PEER_NAME\n");
- fprintf(stderr, " del DEVICE\n");
- exit(-1);
-}
-
-static int genl_ctrl_resolve_family(const char *family)
-{
- struct rtnl_handle rth;
- struct nlmsghdr *nlh;
- struct genlmsghdr *ghdr;
- int ret = 0;
- struct {
- struct nlmsghdr n;
- char buf[4096];
- } req;
-
- memset(&req, 0, sizeof(req));
-
- nlh = &req.n;
- nlh->nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
- nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
- nlh->nlmsg_type = GENL_ID_CTRL;
-
- ghdr = NLMSG_DATA(&req.n);
- ghdr->cmd = CTRL_CMD_GETFAMILY;
-
- if (rtnl_open_byproto(&rth, 0, NETLINK_GENERIC) < 0) {
- fprintf(stderr, "Cannot open generic netlink socket\n");
- exit(1);
- }
-
- addattr_l(nlh, 128, CTRL_ATTR_FAMILY_NAME, family, strlen(family) + 1);
-
- if (rtnl_talk(&rth, nlh, 0, 0, nlh, NULL, NULL) < 0) {
- fprintf(stderr, "Error talking to the kernel\n");
- goto errout;
- }
-
- {
- struct rtattr *tb[CTRL_ATTR_MAX + 1];
- struct genlmsghdr *ghdr = NLMSG_DATA(nlh);
- int len = nlh->nlmsg_len;
- struct rtattr *attrs;
-
- if (nlh->nlmsg_type != GENL_ID_CTRL) {
- fprintf(stderr, "Not a controller message, nlmsg_len=%d "
- "nlmsg_type=0x%x\n", nlh->nlmsg_len, nlh->nlmsg_type);
- goto errout;
- }
-
- if (ghdr->cmd != CTRL_CMD_NEWFAMILY) {
- fprintf(stderr, "Unkown controller command %d\n", ghdr->cmd);
- goto errout;
- }
-
- len -= NLMSG_LENGTH(GENL_HDRLEN);
-
- if (len < 0) {
- fprintf(stderr, "wrong controller message len %d\n", len);
- return -1;
- }
-
- attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN);
- parse_rtattr(tb, CTRL_ATTR_MAX, attrs, len);
-
- if (tb[CTRL_ATTR_FAMILY_ID] == NULL) {
- fprintf(stderr, "Missing family id TLV\n");
- goto errout;
- }
-
- ret = *(__u16 *) RTA_DATA(tb[CTRL_ATTR_FAMILY_ID]);
- }
-
-errout:
- rtnl_close(&rth);
- return ret;
-}
-
-static int do_veth_operate(char *dev, char *peer, int cmd)
-{
- struct rtnl_handle rth;
- struct nlmsghdr *nlh;
- struct genlmsghdr *ghdr;
- struct nlattr *attr;
- struct {
- struct nlmsghdr n;
- struct genlmsghdr h;
- char bug[1024];
- } req;
- int family, len;
- int err = 0;
-
- family = genl_ctrl_resolve_family("veth");
- if (family == 0) {
- fprintf(stderr, "veth: Can't resolve family\n");
- exit(1);
- }
-
- if (rtnl_open_byproto(&rth, 0, NETLINK_GENERIC) < 0)
- exit(1);
-
- nlh = &req.n;
- nlh->nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
- nlh->nlmsg_flags = NLM_F_REQUEST;
- nlh->nlmsg_type = family;
- nlh->nlmsg_seq = 0;
-
- ghdr = &req.h;
- ghdr->cmd = cmd;
-
- attr = (struct nlattr *) GENLMSG_DATA(&req);
- len = strlen(dev);
- attr->nla_type = VETH_ATTR_DEVNAME;
- attr->nla_len = len + 1 + NLA_HDRLEN;
- memcpy(NLA_DATA(attr), dev, len);
- nlh->nlmsg_len += NLMSG_ALIGN(attr->nla_len);
-
- if (peer) {
- attr = (struct nlattr *)((char *)attr +
- NLMSG_ALIGN(attr->nla_len));
- len = strlen(peer);
- attr->nla_type = VETH_ATTR_PEERNAME;
- attr->nla_len = len + 1 + NLA_HDRLEN;
- memcpy(NLA_DATA(attr), peer, len);
- nlh->nlmsg_len += NLMSG_ALIGN(attr->nla_len);
- }
-
- if (rtnl_send(&rth, (char *) &req, nlh->nlmsg_len) < 0) {
- err = -1;
- fprintf(stderr, "Error talking to the kernel (add)\n");
- }
-
- rtnl_close(&rth);
- return err;
-}
-
-static int do_veth_add(int argc, char **argv)
-{
- if (argc < 2)
- return do_veth_help();
-
- return do_veth_operate(argv[0], argv[1], VETH_CMD_ADD);
-}
-
-static int do_veth_del(int argc, char **argv)
-{
- char *name;
-
- if (argc < 1)
- return do_veth_help();
-
- return do_veth_operate(argv[0], NULL, VETH_CMD_DEL);
-}
-
-int do_veth(int argc, char **argv)
-{
- if (argc == 0)
- return do_veth_help();
-
- if (strcmp(*argv, "add") == 0 || strcmp(*argv, "a") == 0)
- return do_veth_add(argc - 1, argv + 1);
- if (strcmp(*argv, "del") == 0 || strcmp(*argv, "d") == 0)
- return do_veth_del(argc - 1, argv + 1);
- if (strcmp(*argv, "help") == 0)
- return do_veth_help();
-
- fprintf(stderr, "Command \"%s\" is unknown, try \"ip veth help\".\n", *argv);
- exit(-1);
-}
diff --git a/ip/veth.h b/ip/veth.h
deleted file mode 100644
index 4d7b357..0000000
--- a/ip/veth.h
+++ /dev/null
@@ -1,17 +0,0 @@
-int do_veth(int argc, char **argv);
-
-enum {
- VETH_CMD_UNSPEC,
- VETH_CMD_ADD,
- VETH_CMD_DEL,
-
- VETH_CMD_MAX
-};
-
-enum {
- VETH_ATTR_UNSPEC,
- VETH_ATTR_DEVNAME,
- VETH_ATTR_PEERNAME,
-
- VETH_ATTR_MAX
-};
--
1.5.3.rc6.17.g1911
^ permalink raw reply related
* Re: [PATCH 1/4] [IPROUTE2] Revert "Make ip utility veth driver aware"
From: Pavel Emelyanov @ 2007-09-12 12:55 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Stephen Hemminger, netdev, Patrick McHardy
In-Reply-To: <m14pi0cac5.fsf@ebiederm.dsl.xmission.com>
Eric W. Biederman wrote:
> Stephen it looks like you weren't cc'd on the latest version
> of the veth support. So this patchset first reverts the old
He was. The latest version looks completely different from what
is reversed in this patch.
> version of the veth support you merged. Then merges a tested
> version of the veth support.
>
> This reverts commit 4ed390ce43d1ec7c881721f312260df901d8390d.
>
> Conflicts:
>
> ip/ip.c
> ---
> ip/Makefile | 2 +-
> ip/ip.c | 4 +-
> ip/veth.c | 196 -----------------------------------------------------------
> ip/veth.h | 17 -----
> 4 files changed, 2 insertions(+), 217 deletions(-)
> delete mode 100644 ip/veth.c
> delete mode 100644 ip/veth.h
>
> diff --git a/ip/Makefile b/ip/Makefile
> index 209c5c8..9a5bfe3 100644
> --- a/ip/Makefile
> +++ b/ip/Makefile
> @@ -1,7 +1,7 @@
> IPOBJ=ip.o ipaddress.o iproute.o iprule.o \
> rtm_map.o iptunnel.o ip6tunnel.o tunnel.o ipneigh.o ipntable.o iplink.o \
> ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o \
> - ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o veth.o
> + ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o
>
> RTMONOBJ=rtmon.o
>
> diff --git a/ip/ip.c b/ip/ip.c
> index 829fc64..4bdb83b 100644
> --- a/ip/ip.c
> +++ b/ip/ip.c
> @@ -27,7 +27,6 @@
> #include "SNAPSHOT.h"
> #include "utils.h"
> #include "ip_common.h"
> -#include "veth.h"
>
> int preferred_family = AF_UNSPEC;
> int show_stats = 0;
> @@ -48,7 +47,7 @@ static void usage(void)
> "Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n"
> " ip [ -force ] [-batch filename\n"
> "where OBJECT := { link | addr | route | rule | neigh | ntable | tunnel |\n"
> -" maddr | mroute | monitor | xfrm | veth }\n"
> +" maddr | mroute | monitor | xfrm }\n"
> " OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
> " -f[amily] { inet | inet6 | ipx | dnet | link } |\n"
> " -o[neline] | -t[imestamp] }\n");
> @@ -78,7 +77,6 @@ static const struct cmd {
> { "monitor", do_ipmonitor },
> { "xfrm", do_xfrm },
> { "mroute", do_multiroute },
> - { "veth", do_veth },
> { "help", do_help },
> { 0 }
> };
> diff --git a/ip/veth.c b/ip/veth.c
> deleted file mode 100644
> index d4eecc8..0000000
> --- a/ip/veth.c
> +++ /dev/null
> @@ -1,196 +0,0 @@
> -/*
> - * veth.c "ethernet tunnel"
> - *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of the GNU General Public License
> - * as published by the Free Software Foundation; either version
> - * 2 of the License, or (at your option) any later version.
> - *
> - * Authors: Pavel Emelianov, <xemul@openvz.org>
> - *
> - */
> -
> -#include <stdio.h>
> -#include <string.h>
> -#include <unistd.h>
> -#include <sys/types.h>
> -#include <sys/socket.h>
> -#include <linux/genetlink.h>
> -
> -#include "utils.h"
> -#include "veth.h"
> -
> -#define GENLMSG_DATA(glh) ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN))
> -#define NLA_DATA(na) ((void *)((char*)(na) + NLA_HDRLEN))
> -
> -static int do_veth_help(void)
> -{
> - fprintf(stderr, "Usage: ip veth add DEVICE PEER_NAME\n");
> - fprintf(stderr, " del DEVICE\n");
> - exit(-1);
> -}
> -
> -static int genl_ctrl_resolve_family(const char *family)
> -{
> - struct rtnl_handle rth;
> - struct nlmsghdr *nlh;
> - struct genlmsghdr *ghdr;
> - int ret = 0;
> - struct {
> - struct nlmsghdr n;
> - char buf[4096];
> - } req;
> -
> - memset(&req, 0, sizeof(req));
> -
> - nlh = &req.n;
> - nlh->nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
> - nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
> - nlh->nlmsg_type = GENL_ID_CTRL;
> -
> - ghdr = NLMSG_DATA(&req.n);
> - ghdr->cmd = CTRL_CMD_GETFAMILY;
> -
> - if (rtnl_open_byproto(&rth, 0, NETLINK_GENERIC) < 0) {
> - fprintf(stderr, "Cannot open generic netlink socket\n");
> - exit(1);
> - }
> -
> - addattr_l(nlh, 128, CTRL_ATTR_FAMILY_NAME, family, strlen(family) + 1);
> -
> - if (rtnl_talk(&rth, nlh, 0, 0, nlh, NULL, NULL) < 0) {
> - fprintf(stderr, "Error talking to the kernel\n");
> - goto errout;
> - }
> -
> - {
> - struct rtattr *tb[CTRL_ATTR_MAX + 1];
> - struct genlmsghdr *ghdr = NLMSG_DATA(nlh);
> - int len = nlh->nlmsg_len;
> - struct rtattr *attrs;
> -
> - if (nlh->nlmsg_type != GENL_ID_CTRL) {
> - fprintf(stderr, "Not a controller message, nlmsg_len=%d "
> - "nlmsg_type=0x%x\n", nlh->nlmsg_len, nlh->nlmsg_type);
> - goto errout;
> - }
> -
> - if (ghdr->cmd != CTRL_CMD_NEWFAMILY) {
> - fprintf(stderr, "Unkown controller command %d\n", ghdr->cmd);
> - goto errout;
> - }
> -
> - len -= NLMSG_LENGTH(GENL_HDRLEN);
> -
> - if (len < 0) {
> - fprintf(stderr, "wrong controller message len %d\n", len);
> - return -1;
> - }
> -
> - attrs = (struct rtattr *) ((char *) ghdr + GENL_HDRLEN);
> - parse_rtattr(tb, CTRL_ATTR_MAX, attrs, len);
> -
> - if (tb[CTRL_ATTR_FAMILY_ID] == NULL) {
> - fprintf(stderr, "Missing family id TLV\n");
> - goto errout;
> - }
> -
> - ret = *(__u16 *) RTA_DATA(tb[CTRL_ATTR_FAMILY_ID]);
> - }
> -
> -errout:
> - rtnl_close(&rth);
> - return ret;
> -}
> -
> -static int do_veth_operate(char *dev, char *peer, int cmd)
> -{
> - struct rtnl_handle rth;
> - struct nlmsghdr *nlh;
> - struct genlmsghdr *ghdr;
> - struct nlattr *attr;
> - struct {
> - struct nlmsghdr n;
> - struct genlmsghdr h;
> - char bug[1024];
> - } req;
> - int family, len;
> - int err = 0;
> -
> - family = genl_ctrl_resolve_family("veth");
> - if (family == 0) {
> - fprintf(stderr, "veth: Can't resolve family\n");
> - exit(1);
> - }
> -
> - if (rtnl_open_byproto(&rth, 0, NETLINK_GENERIC) < 0)
> - exit(1);
> -
> - nlh = &req.n;
> - nlh->nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
> - nlh->nlmsg_flags = NLM_F_REQUEST;
> - nlh->nlmsg_type = family;
> - nlh->nlmsg_seq = 0;
> -
> - ghdr = &req.h;
> - ghdr->cmd = cmd;
> -
> - attr = (struct nlattr *) GENLMSG_DATA(&req);
> - len = strlen(dev);
> - attr->nla_type = VETH_ATTR_DEVNAME;
> - attr->nla_len = len + 1 + NLA_HDRLEN;
> - memcpy(NLA_DATA(attr), dev, len);
> - nlh->nlmsg_len += NLMSG_ALIGN(attr->nla_len);
> -
> - if (peer) {
> - attr = (struct nlattr *)((char *)attr +
> - NLMSG_ALIGN(attr->nla_len));
> - len = strlen(peer);
> - attr->nla_type = VETH_ATTR_PEERNAME;
> - attr->nla_len = len + 1 + NLA_HDRLEN;
> - memcpy(NLA_DATA(attr), peer, len);
> - nlh->nlmsg_len += NLMSG_ALIGN(attr->nla_len);
> - }
> -
> - if (rtnl_send(&rth, (char *) &req, nlh->nlmsg_len) < 0) {
> - err = -1;
> - fprintf(stderr, "Error talking to the kernel (add)\n");
> - }
> -
> - rtnl_close(&rth);
> - return err;
> -}
> -
> -static int do_veth_add(int argc, char **argv)
> -{
> - if (argc < 2)
> - return do_veth_help();
> -
> - return do_veth_operate(argv[0], argv[1], VETH_CMD_ADD);
> -}
> -
> -static int do_veth_del(int argc, char **argv)
> -{
> - char *name;
> -
> - if (argc < 1)
> - return do_veth_help();
> -
> - return do_veth_operate(argv[0], NULL, VETH_CMD_DEL);
> -}
> -
> -int do_veth(int argc, char **argv)
> -{
> - if (argc == 0)
> - return do_veth_help();
> -
> - if (strcmp(*argv, "add") == 0 || strcmp(*argv, "a") == 0)
> - return do_veth_add(argc - 1, argv + 1);
> - if (strcmp(*argv, "del") == 0 || strcmp(*argv, "d") == 0)
> - return do_veth_del(argc - 1, argv + 1);
> - if (strcmp(*argv, "help") == 0)
> - return do_veth_help();
> -
> - fprintf(stderr, "Command \"%s\" is unknown, try \"ip veth help\".\n", *argv);
> - exit(-1);
> -}
> diff --git a/ip/veth.h b/ip/veth.h
> deleted file mode 100644
> index 4d7b357..0000000
> --- a/ip/veth.h
> +++ /dev/null
> @@ -1,17 +0,0 @@
> -int do_veth(int argc, char **argv);
> -
> -enum {
> - VETH_CMD_UNSPEC,
> - VETH_CMD_ADD,
> - VETH_CMD_DEL,
> -
> - VETH_CMD_MAX
> -};
> -
> -enum {
> - VETH_ATTR_UNSPEC,
> - VETH_ATTR_DEVNAME,
> - VETH_ATTR_PEERNAME,
> -
> - VETH_ATTR_MAX
> -};
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox