Netdev List
 help / color / mirror / Atom feed
* Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key
From: Timo Teräs @ 2008-01-17  8:11 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, hadi, netdev
In-Reply-To: <20080116.235923.208347316.davem@davemloft.net>

David Miller wrote:
> Doing anything other than "life support" bug fixes for AF_KEY is
> inappropriate.

Yes. I thought my patch would qualify as "life support" bug fix.
Currently racoon fails to work if there are too many SPDs or SAs
because the kernel cannot handle the dump request properly. And
this is what my patch fixes for pfkey. It adds no new features or
functionality; just makes the dumping work with large databases.

Then there's also the xfrm dumping changes which change the
algorithm from O(n^2) to O(n) with some memory overhead, but
that is a different story. Any comments on that?

- Timo

^ permalink raw reply

* Re: [Bugme-new] [Bug 9767] New: missing native u32 classifier for routing policy
From: Andrew Morton @ 2008-01-17  8:46 UTC (permalink / raw)
  To: netdev; +Cc: bugme-daemon, pupilla
In-Reply-To: <bug-9767-10286@http.bugzilla.kernel.org/>

On Thu, 17 Jan 2008 00:30:49 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9767
> 
>            Summary: missing native u32 classifier for routing policy
>            Product: Networking
>            Version: 2.5
>      KernelVersion: all since 2.2
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: low
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: pupilla@hotmail.com
> 
> 
> This is not a bug report, but a feature request.
> routing policy database management is supported since linux 2.2, but it lacks
> u32 selector (matching by IP protocols, transport ports).
> fwmark is a workaround for this missing feature, but source ip address
> selection will not work anyway: the mark value can't be used for source address
> selection because at the time source address selection is performed, there is
> no packet yet and thus no mark value.
> 

^ permalink raw reply

* Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key
From: David Miller @ 2008-01-17  8:49 UTC (permalink / raw)
  To: timo.teras; +Cc: herbert, hadi, netdev
In-Reply-To: <478F0DA5.2060401@iki.fi>

From: Timo_Teräs <timo.teras@iki.fi>
Date: Thu, 17 Jan 2008 10:11:17 +0200

> I thought my patch would qualify as "life support" bug fix.
> Currently racoon fails to work if there are too many SPDs or SAs
> because the kernel cannot handle the dump request properly. And this
> is what my patch fixes for pfkey. It adds no new features or
> functionality; just makes the dumping work with large databases.

Racoon should use netlink for reasons far and beyond the
problem you are trying to address.

The dumping behavior of AF_KEY is just horrific, as one of
several examples.

> Then there's also the xfrm dumping changes which change the
> algorithm from O(n^2) to O(n) with some memory overhead, but
> that is a different story. Any comments on that?

I have no general objections to those changes although I am
backlogged and thus have not studied them in detail.  Jamal
is having what appears to be a healthy dialogue with you about
the details so I'm not concerned much :)

^ permalink raw reply

* Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key
From: Timo Teräs @ 2008-01-17  9:20 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, hadi, netdev
In-Reply-To: <20080117.004900.58497170.davem@davemloft.net>

David Miller wrote:
> From: Timo_Teräs <timo.teras@iki.fi>
> Date: Thu, 17 Jan 2008 10:11:17 +0200
> 
>> I thought my patch would qualify as "life support" bug fix.
>> Currently racoon fails to work if there are too many SPDs or SAs
>> because the kernel cannot handle the dump request properly. And this
>> is what my patch fixes for pfkey. It adds no new features or
>> functionality; just makes the dumping work with large databases.
> 
> Racoon should use netlink for reasons far and beyond the
> problem you are trying to address.

Yes. But this is fairly major thing to do. One needs to create
API abstraction layer (still need to use pfkey in *BSD). Test it.
A lot of work that is not going to happen very soon.

Where as the pfkey bug fix is non-intrusive and helps all
legacy applications still using af_key by _fixing a bug in
kernel_.

> The dumping behavior of AF_KEY is just horrific, as one of
> several examples.

If af_key is all that bad and does not qualify to get maintanace
bug fixes, why not remove it complitely?

That would make userland adapt faster.

>> Then there's also the xfrm dumping changes which change the
>> algorithm from O(n^2) to O(n) with some memory overhead, but
>> that is a different story. Any comments on that?
> 
> I have no general objections to those changes although I am
> backlogged and thus have not studied them in detail.  Jamal
> is having what appears to be a healthy dialogue with you about
> the details so I'm not concerned much :)

Ok. I hope someone can also give feedback on the naming
conventions. And about the api changes to xfrm policy/state
walking.

- Timo


^ permalink raw reply

* Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key
From: David Miller @ 2008-01-17  9:31 UTC (permalink / raw)
  To: timo.teras; +Cc: herbert, hadi, netdev
In-Reply-To: <478F1DEA.5070903@iki.fi>

From: Timo_Teräs <timo.teras@iki.fi>
Date: Thu, 17 Jan 2008 11:20:42 +0200

> Where as the pfkey bug fix is non-intrusive and helps all
> legacy applications still using af_key by _fixing a bug in
> kernel_.

It's not a bug.  You're fixing a speed issue, not a crash
or a case where AF_KEY is providing incorrect data.

That is what I mean when I mean "life support", we fix crashes and
data corruption.  We don't make performance tweaks.

^ permalink raw reply

* Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key
From: Timo Teräs @ 2008-01-17  9:38 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, hadi, netdev
In-Reply-To: <20080117.013107.241902256.davem@davemloft.net>

David Miller wrote:
> From: Timo_Teräs <timo.teras@iki.fi>
> Date: Thu, 17 Jan 2008 11:20:42 +0200
> 
>> Where as the pfkey bug fix is non-intrusive and helps all
>> legacy applications still using af_key by _fixing a bug in
>> kernel_.
> 
> It's not a bug.  You're fixing a speed issue, not a crash
> or a case where AF_KEY is providing incorrect data.
> 
> That is what I mean when I mean "life support", we fix crashes and
> data corruption.  We don't make performance tweaks.

No. The speed issue is complitely handled in xfrm_state
and xfrm_user changes.

The af_key issue is that in big dumps you get only first X
entries. The rest of the entries are dropped because the
socket receive buffer goes full. You get data corruption:
missing entries.

- Timo


^ permalink raw reply

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: Arnaldo Carvalho de Melo @ 2008-01-17  9:40 UTC (permalink / raw)
  To: David Miller; +Cc: elendil, jesse.brandeburg, slavon, netdev, linux-kernel
In-Reply-To: <20080117.000002.37027317.davem@davemloft.net>

Em Thu, Jan 17, 2008 at 12:00:02AM -0800, David Miller escreveu:
> From: Frans Pop <elendil@planet.nl>
> Date: Thu, 17 Jan 2008 08:51:55 +0100
> 
> > On Thursday 17 January 2008, David Miller wrote:
> > > From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
> > >
> > > > We spent Wednesday trying to reproduce (without the patch) these issues
> > > > without much luck, and have applied the patch cleanly and will continue
> > > > testing it.  Given the simplicity of the changes, and the community
> > > > testing, I'll give my ack and we will continue testing.
> > >
> > > You need a slow CPU, and you need to make sure you do actually
> > > trigger the TX limiting code there.
> > 
> > Hmmm. Is a dual core Pentium D 3.20GHz considered slow these days?
> 
> No of course :-)  I guess it therefore depends upon the load
> as well.

I saw it just once, yesterday:

[root@doppio ~]# uname -r
2.6.24-rc5
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
  Tx Queue             <0>
  TDH                  <58>
  TDT                  <8f>
  next_to_use          <8f>
  next_to_clean        <55>
buffer_info[next_to_clean]
  time_stamp           <105e973a9>
  next_to_watch        <56>
  jiffies              <105e97992>
  next_to_watch.status <1>
[root@doppio ~]#

on a lenovo T60W, core2duo machine (2GHz), when using it to stress test
another machine, I was using netperf TCP_STREAM ranging from 1 to 8
streams + a ping -f using various packet sizes.

I'll update this machine today to 2.6.24-rc8-git + net-2.6 and try again
to reproduce.

I also applied David's patch while trying some RT experiments on
another, 8 way machine used as a server, but on this machine I didn't
experience the Tx Unit Hang message with or without the patch.

- Arnaldo

^ permalink raw reply

* Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key
From: David Miller @ 2008-01-17  9:44 UTC (permalink / raw)
  To: timo.teras; +Cc: herbert, hadi, netdev
In-Reply-To: <478F2205.80403@iki.fi>

From: Timo_Teräs <timo.teras@iki.fi>
Date: Thu, 17 Jan 2008 11:38:13 +0200

> The af_key issue is that in big dumps you get only first X
> entries. The rest of the entries are dropped because the
> socket receive buffer goes full. You get data corruption:
> missing entries.

This is an inherent aspect of AF_KEY (and what it was
derived from, BSD routing sockets).

It has to provide dumps atomically, and if there is no
space there is no way to provide those entries which
would require more rcvbuf space.

^ permalink raw reply

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: David Miller @ 2008-01-17  9:45 UTC (permalink / raw)
  To: acme; +Cc: elendil, jesse.brandeburg, slavon, netdev, linux-kernel
In-Reply-To: <20080117094007.GF321@ghostprotocols.net>

From: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Thu, 17 Jan 2008 07:40:07 -0200

> I'll update this machine today to 2.6.24-rc8-git + net-2.6 and try again
> to reproduce.

Thanks for the datapoints and testing.

^ permalink raw reply

* Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key
From: Timo Teräs @ 2008-01-17 10:01 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, hadi, netdev
In-Reply-To: <20080117.014458.16733544.davem@davemloft.net>

David Miller wrote:
> From: Timo_Teräs <timo.teras@iki.fi>
> Date: Thu, 17 Jan 2008 11:38:13 +0200
> 
>> The af_key issue is that in big dumps you get only first X
>> entries. The rest of the entries are dropped because the
>> socket receive buffer goes full. You get data corruption:
>> missing entries.
> 
> This is an inherent aspect of AF_KEY (and what it was
> derived from, BSD routing sockets).

Yes, this is the way BSD does it.
 
> It has to provide dumps atomically, and if there is no
> space there is no way to provide those entries which
> would require more rcvbuf space.

RFC does not say it has to be atomic.

It does say that the dump is terminated with SADB_DUMP
message having sadb_seq field set to zero. Currently
that is dropped too when the problem occurs. Thus the
socket is left in a bad state: dump ends never. This
can cause applications without any workarounds to hang.

- Timo
 

^ permalink raw reply

* Broken "Make ip6_frags per namespace" patch
From: Alexey Dobriyan @ 2008-01-17 10:05 UTC (permalink / raw)
  To: dlezcano, davem; +Cc: den, netdev, devel

> commit c064c4811b3e87ff8202f5a966ff4eea0bc54575
> Author: Daniel Lezcano <dlezcano@fr.ibm.com>
> Date:   Thu Jan 10 02:56:03 2008 -0800
> 
>     [NETNS][IPV6]: Make ip6_frags per namespace.
>     
>     The ip6_frags is moved to the network namespace structure.  Because
>     there can be multiple instances of the network namespaces, and the
>     ip6_frags is no longer a global static variable, a helper function has
>     been added to facilitate the initialization of the variables.
>     
>     Until the ipv6 protocol is not per namespace, the variables are
>     accessed relatively from the initial network namespace.

> --- a/include/net/netns/ipv6.h
> +++ b/include/net/netns/ipv6.h

> @@ -11,6 +13,7 @@ struct netns_sysctl_ipv6 {
>  #ifdef CONFIG_SYSCTL
>  	struct ctl_table_header *table;
>  #endif
> +	struct inet_frags_ctl frags;

> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c

> @@ -632,6 +625,11 @@ static struct inet6_protocol frag_protocol =
>  	.flags		=	INET6_PROTO_NOPOLICY,
>  };
>  
> +void ipv6_frag_sysctl_init(struct net *net)
> +{
> +	ip6_frags.ctl = &net->ipv6.sysctl.frags;
> +}

_This_ can't work. ip6frags is only one and ->ctl pointer is flipped
onto per-netns data. Changelog is also misleading: ip6_frags_ctl is
moved to netns not all ip6_frags.

Oopsing place below -- f->ctl dereference in preparation of mod_timer() call.



BUG: unable to handle kernel paging request at virtual address f5da8fc8
printing eip: c11d868a *pdpt = 0000000000003001 *pde = 0000000001728067 *pte = 0000000035da8000 
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: ebt_ip ebt_dnat ebt_arpreply ebt_arp ebt_among ebtable_nat ip6t_REJECT ip6table_filter ip6_tables ebtable_filter ebtable_broute ebt_802_3 ebtables des_generic nf_conntrack_netbios_ns nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables deflate zlib_deflate zlib_inflate cryptomgr crypto_hash cpufreq_stats cpufreq_ondemand cdrom cbc bridge llc blkcipher crypto_algapi arpt_mangle arptable_filter arp_tables x_tables ah6 af_packet ipv6

Pid: 0, comm: swapper Not tainted (2.6.24-rc7-net-2.6.25-nf-sysfs-n #30)
EIP: 0060:[<c11d868a>] EFLAGS: 00010246 CPU: 1
EIP is at inet_frag_secret_rebuild+0xaa/0xd0
EAX: f5da8fbc EBX: 00000000 ECX: c1310000 EDX: 00000100
ESI: f7cba000 EDI: f898f7a0 EBP: 00000040 ESP: c1310f90
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c1310000 task=f7c9a580 task.ti=f7c9b000)
Stack: f898f7a8 f898f8a8 000ddcbd f898f7a0 f7cba000 c1310fc4 00000100 c1026d60 
       00000002 00000001 c1191183 c4779ddc c11d85e0 f898c860 f898c860 c12c4a88 
       00000001 c1308da0 0000000a c1023477 00000001 c130b640 c130b640 f7c9bf34 
Call Trace:
 [<c1026d60>] run_timer_softirq+0x120/0x190
 [<c1191183>] net_rx_action+0x53/0x220
 [<c11d85e0>] inet_frag_secret_rebuild+0x0/0xd0
 [<c1023477>] __do_softirq+0x87/0x100
 [<c10059cf>] do_softirq+0xaf/0x110
 [<c10233e3>] irq_exit+0x83/0x90
 [<c1010ce7>] smp_apic_timer_interrupt+0x57/0x90
 [<c10036e1>] apic_timer_interrupt+0x29/0x38
 [<c10036eb>] apic_timer_interrupt+0x33/0x38
 [<c1001460>] default_idle+0x0/0x60
 [<c10014a0>] default_idle+0x40/0x60
 [<c1000ea3>] cpu_idle+0x73/0xb0
=======================
Code: 8b 10 85 d2 89 13 74 03 89 5a 04 89 18 89 43 04 85 f6 89 f3 75 bb 45 83 fd 40 75 a5 8b 44 24 04 e8 4c 3f 01 00 8b 87 50 01 00 00 <8b> 50 0c 01 54 24 08 8d 87 38 01 00 00 8b 54 24 08 83 c4 0c 5b 
EIP: [<c11d868a>] inet_frag_secret_rebuild+0xaa/0xd0 SS:ESP 0068:c1310f90
Kernel panic - not syncing: Fatal exception in interrupt


^ permalink raw reply

* Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key
From: David Miller @ 2008-01-17 10:06 UTC (permalink / raw)
  To: timo.teras; +Cc: herbert, hadi, netdev
In-Reply-To: <478F276D.8080407@iki.fi>

From: Timo_Teräs <timo.teras@iki.fi>
Date: Thu, 17 Jan 2008 12:01:17 +0200

> David Miller wrote:
> > This is an inherent aspect of AF_KEY (and what it was
> > derived from, BSD routing sockets).
> 
> Yes, this is the way BSD does it.
>  
> > It has to provide dumps atomically, and if there is no
> > space there is no way to provide those entries which
> > would require more rcvbuf space.
> 
> RFC does not say it has to be atomic.

Every application out there in the universe expects BSD socket
semantics, and therefore atomic dumps.  You cannot "fix" things
without breaking applications.

^ permalink raw reply

* [PATCH 0/3 net-2.6.25] call FIB rule->action in the correct namespace
From: Denis V. Lunev @ 2008-01-17 10:08 UTC (permalink / raw)
  To: David Miller; +Cc: Daniel Lezcano, netdev, Linux Containers, devel

FIB rule->action should operate in the same namespace as fib_lookup.
This is definitely missed right now.

There are two ways to implement this: pass struct net into another rules
API call (2 levels) or place netns into rule struct directly. The second
approach seems better as the code will grow less.

Additionally, the patchset cleanups struct net from
fib_rules_register/unregister to have network namespace context at the
time of default rules creation.

Signed-off-by: Denis V. Lunev <den@openvz.org>

^ permalink raw reply

* [PATCH 2/3 net-2.6.25] [NETNS] FIB rules API cleanup.
From: Denis V. Lunev @ 2008-01-17 10:09 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, dlezcano, containers, Denis V. Lunev
In-Reply-To: <478F2933.1000007@openvz.org>

Remove struct net from fib_rules_register(unregister)/notify_change paths
and diet code size a bit.

add/remove: 0/0 grow/shrink: 10/12 up/down: 35/-100 (-65)
function                                     old     new   delta
notify_rule_change                           273     280      +7
trie_show_stats                              471     475      +4
fn_trie_delete                               473     477      +4
fib_rules_unregister                         144     148      +4
fib4_rule_compare                            119     123      +4
resize                                      2842    2845      +3
fn_trie_select_default                       515     518      +3
inet_sk_rebuild_header                       836     838      +2
fib_trie_seq_show                            764     766      +2
__devinet_sysctl_register                    276     278      +2
fn_trie_lookup                              1124    1123      -1
ip_fib_check_default                         133     131      -2
devinet_conf_sysctl                          223     221      -2
snmp_fold_field                              126     123      -3
fn_trie_insert                              2091    2086      -5
inet_create                                  876     870      -6
fib4_rules_init                              197     191      -6
fib_sync_down                                452     444      -8
inet_gso_send_check                          334     325      -9
fib_create_info                             3003    2991     -12
fib_nl_delrule                               568     553     -15
fib_nl_newrule                               883     852     -31

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/fib_rules.h |    4 ++--
 net/core/fib_rules.c    |   20 +++++++++++++-------
 net/decnet/dn_rules.c   |    4 ++--
 net/ipv4/fib_rules.c    |    6 +++---
 net/ipv6/fib6_rules.c   |    4 ++--
 5 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 6910e01..7f9f4ae 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -102,8 +102,8 @@ static inline u32 frh_get_table(struct fib_rule_hdr *frh, struct nlattr **nla)
 	return frh->table;
 }
 
-extern int fib_rules_register(struct net *, struct fib_rules_ops *);
-extern void fib_rules_unregister(struct net *, struct fib_rules_ops *);
+extern int fib_rules_register(struct fib_rules_ops *);
+extern void fib_rules_unregister(struct fib_rules_ops *);
 extern void                     fib_rules_cleanup_ops(struct fib_rules_ops *);
 
 extern int			fib_rules_lookup(struct fib_rules_ops *,
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 541728a..3cd4f13 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -37,8 +37,7 @@ int fib_default_rule_add(struct fib_rules_ops *ops,
 }
 EXPORT_SYMBOL(fib_default_rule_add);
 
-static void notify_rule_change(struct net *net, int event,
-			       struct fib_rule *rule,
+static void notify_rule_change(int event, struct fib_rule *rule,
 			       struct fib_rules_ops *ops, struct nlmsghdr *nlh,
 			       u32 pid);
 
@@ -72,10 +71,13 @@ static void flush_route_cache(struct fib_rules_ops *ops)
 		ops->flush_cache();
 }
 
-int fib_rules_register(struct net *net, struct fib_rules_ops *ops)
+int fib_rules_register(struct fib_rules_ops *ops)
 {
 	int err = -EEXIST;
 	struct fib_rules_ops *o;
+	struct net *net;
+
+	net = ops->fro_net;
 
 	if (ops->rule_size < sizeof(struct fib_rule))
 		return -EINVAL;
@@ -112,8 +114,9 @@ void fib_rules_cleanup_ops(struct fib_rules_ops *ops)
 }
 EXPORT_SYMBOL_GPL(fib_rules_cleanup_ops);
 
-void fib_rules_unregister(struct net *net, struct fib_rules_ops *ops)
+void fib_rules_unregister(struct fib_rules_ops *ops)
 {
+	struct net *net = ops->fro_net;
 
 	spin_lock(&net->rules_mod_lock);
 	list_del_rcu(&ops->list);
@@ -333,7 +336,7 @@ static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 	else
 		list_add_rcu(&rule->list, &ops->rules_list);
 
-	notify_rule_change(net, RTM_NEWRULE, rule, ops, nlh, NETLINK_CB(skb).pid);
+	notify_rule_change(RTM_NEWRULE, rule, ops, nlh, NETLINK_CB(skb).pid);
 	flush_route_cache(ops);
 	rules_ops_put(ops);
 	return 0;
@@ -423,7 +426,7 @@ static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 		}
 
 		synchronize_rcu();
-		notify_rule_change(net, RTM_DELRULE, rule, ops, nlh,
+		notify_rule_change(RTM_DELRULE, rule, ops, nlh,
 				   NETLINK_CB(skb).pid);
 		fib_rule_put(rule);
 		flush_route_cache(ops);
@@ -561,13 +564,15 @@ static int fib_nl_dumprule(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
-static void notify_rule_change(struct net *net, int event, struct fib_rule *rule,
+static void notify_rule_change(int event, struct fib_rule *rule,
 			       struct fib_rules_ops *ops, struct nlmsghdr *nlh,
 			       u32 pid)
 {
+	struct net *net;
 	struct sk_buff *skb;
 	int err = -ENOBUFS;
 
+	net = ops->fro_net;
 	skb = nlmsg_new(fib_rule_nlmsg_size(ops, rule), GFP_KERNEL);
 	if (skb == NULL)
 		goto errout;
@@ -579,6 +584,7 @@ static void notify_rule_change(struct net *net, int event, struct fib_rule *rule
 		kfree_skb(skb);
 		goto errout;
 	}
+
 	err = rtnl_notify(skb, net, pid, ops->nlgroup, nlh, GFP_KERNEL);
 errout:
 	if (err < 0)
diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index 964e658..5b7539b 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -256,12 +256,12 @@ void __init dn_fib_rules_init(void)
 {
 	BUG_ON(fib_default_rule_add(&dn_fib_rules_ops, 0x7fff,
 			            RT_TABLE_MAIN, 0));
-	fib_rules_register(&init_net, &dn_fib_rules_ops);
+	fib_rules_register(&dn_fib_rules_ops);
 }
 
 void __exit dn_fib_rules_cleanup(void)
 {
-	fib_rules_unregister(&init_net, &dn_fib_rules_ops);
+	fib_rules_unregister(&dn_fib_rules_ops);
 }
 
 
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 8d0ebe7..3b7affd 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -317,7 +317,7 @@ int __net_init fib4_rules_init(struct net *net)
 	INIT_LIST_HEAD(&ops->rules_list);
 	ops->fro_net = net;
 
-	fib_rules_register(net, ops);
+	fib_rules_register(ops);
 
 	err = fib_default_rules_init(ops);
 	if (err < 0)
@@ -327,13 +327,13 @@ int __net_init fib4_rules_init(struct net *net)
 
 fail:
 	/* also cleans all rules already added */
-	fib_rules_unregister(net, ops);
+	fib_rules_unregister(ops);
 	kfree(ops);
 	return err;
 }
 
 void __net_exit fib4_rules_exit(struct net *net)
 {
-	fib_rules_unregister(net, net->ipv4.rules_ops);
+	fib_rules_unregister(net->ipv4.rules_ops);
 	kfree(net->ipv4.rules_ops);
 }
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index ead5ab2..695c0ca 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -274,7 +274,7 @@ int __init fib6_rules_init(void)
 	if (ret)
 		goto out;
 
-	ret = fib_rules_register(&init_net, &fib6_rules_ops);
+	ret = fib_rules_register(&fib6_rules_ops);
 	if (ret)
 		goto out_default_rules_init;
 out:
@@ -287,5 +287,5 @@ out_default_rules_init:
 
 void fib6_rules_cleanup(void)
 {
-	fib_rules_unregister(&init_net, &fib6_rules_ops);
+	fib_rules_unregister(&fib6_rules_ops);
 }
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 1/3 net-2.6.25] Add netns to fib_rules_ops.
From: Denis V. Lunev @ 2008-01-17 10:09 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, dlezcano, containers, Denis V. Lunev
In-Reply-To: <478F2933.1000007@openvz.org>

The backward link from FIB rules operations to the network namespace will
allow to simplify the API a bit.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/fib_rules.h |    1 +
 net/decnet/dn_rules.c   |    1 +
 net/ipv4/fib_rules.c    |    2 ++
 net/ipv6/fib6_rules.c   |    1 +
 4 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 4f47250..6910e01 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -67,6 +67,7 @@ struct fib_rules_ops
 	const struct nla_policy	*policy;
 	struct list_head	rules_list;
 	struct module		*owner;
+	struct net		*fro_net;
 };
 
 #define FRA_GENERIC_POLICY \
diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index c1fae23..964e658 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -249,6 +249,7 @@ static struct fib_rules_ops dn_fib_rules_ops = {
 	.policy		= dn_fib_rule_policy,
 	.rules_list	= LIST_HEAD_INIT(dn_fib_rules_ops.rules_list),
 	.owner		= THIS_MODULE,
+	.fro_net	= &init_net,
 };
 
 void __init dn_fib_rules_init(void)
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 72232ab..8d0ebe7 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -315,6 +315,8 @@ int __net_init fib4_rules_init(struct net *net)
 	if (ops == NULL)
 		return -ENOMEM;
 	INIT_LIST_HEAD(&ops->rules_list);
+	ops->fro_net = net;
+
 	fib_rules_register(net, ops);
 
 	err = fib_default_rules_init(ops);
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index 76437a1..ead5ab2 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -249,6 +249,7 @@ static struct fib_rules_ops fib6_rules_ops = {
 	.policy			= fib6_rule_policy,
 	.rules_list		= LIST_HEAD_INIT(fib6_rules_ops.rules_list),
 	.owner			= THIS_MODULE,
+	.fro_net		= &init_net,
 };
 
 static int __init fib6_default_rules_init(void)
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 3/3 net-2.6.25] Process FIB rule action in the context of the namespace.
From: Denis V. Lunev @ 2008-01-17 10:09 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, dlezcano, containers, Denis V. Lunev
In-Reply-To: <478F2933.1000007@openvz.org>

Save namespace context on the fib rule at the rule creation time and call
routing lookup in the correct namespace.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/fib_rules.h |    1 +
 net/core/fib_rules.c    |    2 ++
 net/ipv4/fib_rules.c    |    2 +-
 3 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 7f9f4ae..34349f9 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -22,6 +22,7 @@ struct fib_rule
 	u32			target;
 	struct fib_rule *	ctarget;
 	struct rcu_head		rcu;
+	struct net *		fr_net;
 };
 
 struct fib_lookup_arg
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 3cd4f13..42ccaf5 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -29,6 +29,7 @@ int fib_default_rule_add(struct fib_rules_ops *ops,
 	r->pref = pref;
 	r->table = table;
 	r->flags = flags;
+	r->fr_net = ops->fro_net;
 
 	/* The lock is not required here, the list in unreacheable
 	 * at the moment this function is called */
@@ -242,6 +243,7 @@ static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 		err = -ENOMEM;
 		goto errout;
 	}
+	rule->fr_net = net;
 
 	if (tb[FRA_PRIORITY])
 		rule->pref = nla_get_u32(tb[FRA_PRIORITY]);
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 3b7affd..d2001f1 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -91,7 +91,7 @@ static int fib4_rule_action(struct fib_rule *rule, struct flowi *flp,
 		goto errout;
 	}
 
-	if ((tbl = fib_get_table(&init_net, rule->table)) == NULL)
+	if ((tbl = fib_get_table(rule->fr_net, rule->table)) == NULL)
 		goto errout;
 
 	err = tbl->tb_lookup(tbl, flp, (struct fib_result *) arg->result);
-- 
1.5.3.rc5


^ permalink raw reply related

* Re: [PATCH 1/5] spidernet: add missing initialization
From: Ishizaki Kou @ 2008-01-17 10:22 UTC (permalink / raw)
  To: jens; +Cc: netdev, cbe-oss-dev
In-Reply-To: <200801111344.35652.jens@de.ibm.com>

Jens-san,

> Hi Ishizaki,
>
> Linas has left the company and is no longer doing kernel related stuff,
> so I suggest, given Jeff is ok with that, that the two of us take over
> spidernet maintainership.
 (snip)
> Change maintainership for spidernet.
>
> Signed-off-by: Jens Osterkamp <jens@de.ibm.com>

I apologize to my late reply.

I hope to accept your suggestion. But I have to get authorization
to take maintainership in my company. I have started negotiation
to my boss.


I can't check that spidernet driver works on Cell Blade, because I
don't have one.  So I hope you check spidernet driver works on Cell
Blade when it changes.

And then, will you review our latest patches?

Best regards,
Kou Ishizaki

^ permalink raw reply

* Re: Broken "Make ip6_frags per namespace" patch
From: Daniel Lezcano @ 2008-01-17 10:40 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: davem, den, netdev, devel
In-Reply-To: <20080117100524.GF6217@localhost.sw.ru>

Alexey Dobriyan wrote:
>> commit c064c4811b3e87ff8202f5a966ff4eea0bc54575
>> Author: Daniel Lezcano <dlezcano@fr.ibm.com>
>> Date:   Thu Jan 10 02:56:03 2008 -0800
>>
>>     [NETNS][IPV6]: Make ip6_frags per namespace.
>>     
>>     The ip6_frags is moved to the network namespace structure.  Because
>>     there can be multiple instances of the network namespaces, and the
>>     ip6_frags is no longer a global static variable, a helper function has
>>     been added to facilitate the initialization of the variables.
>>     
>>     Until the ipv6 protocol is not per namespace, the variables are
>>     accessed relatively from the initial network namespace.
> 
>> --- a/include/net/netns/ipv6.h
>> +++ b/include/net/netns/ipv6.h
> 
>> @@ -11,6 +13,7 @@ struct netns_sysctl_ipv6 {
>>  #ifdef CONFIG_SYSCTL
>>  	struct ctl_table_header *table;
>>  #endif
>> +	struct inet_frags_ctl frags;
> 
>> --- a/net/ipv6/reassembly.c
>> +++ b/net/ipv6/reassembly.c
> 
>> @@ -632,6 +625,11 @@ static struct inet6_protocol frag_protocol =
>>  	.flags		=	INET6_PROTO_NOPOLICY,
>>  };
>>  
>> +void ipv6_frag_sysctl_init(struct net *net)
>> +{
>> +	ip6_frags.ctl = &net->ipv6.sysctl.frags;
>> +}
> 
> _This_ can't work. ip6frags is only one and ->ctl pointer is flipped
> onto per-netns data. Changelog is also misleading: ip6_frags_ctl is
> moved to netns not all ip6_frags.
> 
> Oopsing place below -- f->ctl dereference in preparation of mod_timer() call.
> 
> 
> 
> BUG: unable to handle kernel paging request at virtual address f5da8fc8
> printing eip: c11d868a *pdpt = 0000000000003001 *pde = 0000000001728067 *pte = 0000000035da8000 
> Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> Modules linked in: ebt_ip ebt_dnat ebt_arpreply ebt_arp ebt_among ebtable_nat ip6t_REJECT ip6table_filter ip6_tables ebtable_filter ebtable_broute ebt_802_3 ebtables des_generic nf_conntrack_netbios_ns nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables deflate zlib_deflate zlib_inflate cryptomgr crypto_hash cpufreq_stats cpufreq_ondemand cdrom cbc bridge llc blkcipher crypto_algapi arpt_mangle arptable_filter arp_tables x_tables ah6 af_packet ipv6
> 
> Pid: 0, comm: swapper Not tainted (2.6.24-rc7-net-2.6.25-nf-sysfs-n #30)
> EIP: 0060:[<c11d868a>] EFLAGS: 00010246 CPU: 1
> EIP is at inet_frag_secret_rebuild+0xaa/0xd0
> EAX: f5da8fbc EBX: 00000000 ECX: c1310000 EDX: 00000100
> ESI: f7cba000 EDI: f898f7a0 EBP: 00000040 ESP: c1310f90
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process swapper (pid: 0, ti=c1310000 task=f7c9a580 task.ti=f7c9b000)
> Stack: f898f7a8 f898f8a8 000ddcbd f898f7a0 f7cba000 c1310fc4 00000100 c1026d60 
>        00000002 00000001 c1191183 c4779ddc c11d85e0 f898c860 f898c860 c12c4a88 
>        00000001 c1308da0 0000000a c1023477 00000001 c130b640 c130b640 f7c9bf34 
> Call Trace:
>  [<c1026d60>] run_timer_softirq+0x120/0x190
>  [<c1191183>] net_rx_action+0x53/0x220
>  [<c11d85e0>] inet_frag_secret_rebuild+0x0/0xd0
>  [<c1023477>] __do_softirq+0x87/0x100
>  [<c10059cf>] do_softirq+0xaf/0x110
>  [<c10233e3>] irq_exit+0x83/0x90
>  [<c1010ce7>] smp_apic_timer_interrupt+0x57/0x90
>  [<c10036e1>] apic_timer_interrupt+0x29/0x38
>  [<c10036eb>] apic_timer_interrupt+0x33/0x38
>  [<c1001460>] default_idle+0x0/0x60
>  [<c10014a0>] default_idle+0x40/0x60
>  [<c1000ea3>] cpu_idle+0x73/0xb0
> =======================
> Code: 8b 10 85 d2 89 13 74 03 89 5a 04 89 18 89 43 04 85 f6 89 f3 75 bb 45 83 fd 40 75 a5 8b 44 24 04 e8 4c 3f 01 00 8b 87 50 01 00 00 <8b> 50 0c 01 54 24 08 8d 87 38 01 00 00 8b 54 24 08 83 c4 0c 5b 
> EIP: [<c11d868a>] inet_frag_secret_rebuild+0xaa/0xd0 SS:ESP 0068:c1310f90
> Kernel panic - not syncing: Fatal exception in interrupt

Hi Alexey,

does it happen after unsharing the network ?

^ permalink raw reply

* Re: [PATCH 0/3 net-2.6.25] call FIB rule->action in the correct namespace
From: Daniel Lezcano @ 2008-01-17 10:41 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: David Miller, netdev, Linux Containers, devel
In-Reply-To: <478F2933.1000007@openvz.org>

Denis V. Lunev wrote:
> FIB rule->action should operate in the same namespace as fib_lookup.
> This is definitely missed right now.
> 
> There are two ways to implement this: pass struct net into another rules
> API call (2 levels) or place netns into rule struct directly. The second
> approach seems better as the code will grow less.
> 
> Additionally, the patchset cleanups struct net from
> fib_rules_register/unregister to have network namespace context at the
> time of default rules creation.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>

Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>

-- 






















































Sauf indication contraire ci-dessus:
Compagnie IBM France
Sie`ge Social : Tour Descartes, 2, avenue Gambetta, La De'fense 5, 92400
Courbevoie
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 542.737.118 ?
SIREN/SIRET : 552 118 465 02430

^ permalink raw reply

* Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key
From: Timo Teräs @ 2008-01-17 11:00 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, hadi, netdev
In-Reply-To: <20080117.020616.136852595.davem@davemloft.net>

David Miller wrote:
> From: Timo_Teräs <timo.teras@iki.fi>
> Date: Thu, 17 Jan 2008 12:01:17 +0200
> 
>> David Miller wrote:
>>> This is an inherent aspect of AF_KEY (and what it was
>>> derived from, BSD routing sockets).
>> Yes, this is the way BSD does it.
>>  
>>> It has to provide dumps atomically, and if there is no
>>> space there is no way to provide those entries which
>>> would require more rcvbuf space.
>> RFC does not say it has to be atomic.
> 
> Every application out there in the universe expects BSD socket
> semantics, and therefore atomic dumps.  You cannot "fix" things
> without breaking applications.

IMHO, it's a lot better then losing >50% of entries and the end
of sequence message on big dumps. SPD and SADB are not that
volatile; in most of the cases the dump would be as good as an
atomic one.

Even if it did change during ongoing dump you still get an usable
dump. All the entries reflect real data and there is no dependency
between different entries.

I'm not sure if there's other major applications that we should
be concerned about, but at least ipsec-tools racoon does not
expect to get atomic dumps (which btw, comes originally from BSD).

Cheers,
  Timo


^ permalink raw reply

* Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key
From: David Miller @ 2008-01-17 11:08 UTC (permalink / raw)
  To: timo.teras; +Cc: herbert, hadi, netdev
In-Reply-To: <478F3539.5060903@iki.fi>

From: Timo_Teräs <timo.teras@iki.fi>
Date: Thu, 17 Jan 2008 13:00:09 +0200

> IMHO, it's a lot better then losing >50% of entries and the end
> of sequence message on big dumps. SPD and SADB are not that
> volatile; in most of the cases the dump would be as good as an
> atomic one.

I humbly disagree with you.  Interface behavior stability
is more important.

> I'm not sure if there's other major applications that we should
> be concerned about, but at least ipsec-tools racoon does not
> expect to get atomic dumps (which btw, comes originally from BSD).

Racoon was written as an addon to the BSD stack by an IPV6/IPSEC
project in Japan named KAME, it did not "come from BSD".  It was
added to BSD.

There are also other BSD based IPSEC daemons such as the one written
by the OpenBSD folks.

I don't think this is arguable at all.  We're not changing semantics
over what we've done for 4+ years and applications might depend upon.
It's for a deprecated interface, which makes any semantic changes that
much less inviting.

You can argue all you want, but it will not change the invariants in
the previous paragraph.

All of the time you've spent arguing is time not spent on adding
netlink support to the daemons that do not do so already.  And that
would be 2 steps forwards compared to the 1 step backwards your
desired change would be.

I've stated my position as well as I can at this point so
respectfully, since I have tons of other things to do, I'm stepping
out of this specific discussion for now.

Thank you.


^ permalink raw reply

* Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key
From: Herbert Xu @ 2008-01-17 11:11 UTC (permalink / raw)
  To: Timo Teräs; +Cc: jamal, netdev
In-Reply-To: <478EED98.6080603@iki.fi>

On Thu, Jan 17, 2008 at 07:54:32AM +0200, Timo Teräs wrote:
>
> > Racoon doesn't use pfkey dumping as far as I know.
> 
> ipsec-tools racoon uses pfkey and only pfkey. And it's non trivial to
> make it use netlink; it relies heavily all around the code to pfkey
> structs. It also runs on BSD so we cannot rip pfkey away; adding a
> layer to work with both pfkey and netlink would be doable, but just a
> lot of work.

Sure racoon uses pfkey but the question is does it use pfkey dumping?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 3/4] bonding: Fix work rearming
From: Jarek Poplawski @ 2008-01-17 11:18 UTC (permalink / raw)
  To: Makito SHIOKAWA; +Cc: netdev
In-Reply-To: <478EE7FC.4040301@miraclelinux.com>

On Thu, Jan 17, 2008 at 02:30:36PM +0900, Makito SHIOKAWA wrote:
>> But, since during this change from sysfs cancel_delayed_work_sync()
>> could be probably used, and it's rather efficient with killing
>> rearming works, it seems this check could be unnecessary yet.
> What going to be cancelled in bonding_store_miimon() when setting miimon to 
> 0 is arp monitor, not mii monitor. So, this check will be needed to stop 
> rearming mii monitor when value 0 is set to miimon.

Hmm... I'm not sure I understand your point, but it seems both
bonding_store_arp_interval() and bonding_store_miimon() where this
field could be changed, currently use cancel_delayed_work() with
flush_workqueue(), so I presume, there is no rtnl_lock() nor
write_lock(&bond->lock) held, so cancel_delayed_work_sync() could
be used, which doesn't require this additional check.

...Unless you mean that despite miimon value is changed there,
mii_work for some reason can't be cancelled at the same time?

Of course, if there is such a reason for doing this check each time
a work runs instead of controlling where the value changes, then OK!

Regards,
Jarek P.

^ permalink raw reply

* [PATCH net-2.6.25] net: Improve cache line coherency of ingress qdisc
From: Neil Turton @ 2008-01-17 11:04 UTC (permalink / raw)
  To: netdev; +Cc: linux-net-drivers

Move the ingress qdisc members of struct net_device from the transmit
cache line to the receive cache line to avoid cache line ping-pong.
These members are only used on the receive path.

Signed-off-by: Neil Turton <nturton@solarflare.com>
---

--- net-2.6.25.git-orig/include/linux/netdevice.h	2008-01-15 17:43:08.000000000 +0000
+++ net-2.6.25.git-ndt1/include/linux/netdevice.h	2008-01-16 09:46:19.000000000 +0000
@@ -597,37 +597,37 @@ struct net_device
 /*
  * Cache line mostly used on receive path (including eth_type_trans())
  */
 	unsigned long		last_rx;	/* Time of last Rx	*/
 	/* Interface address info used in eth_type_trans() */
 	unsigned char		dev_addr[MAX_ADDR_LEN];	/* hw address, (before bcast 
 							because most packets are unicast) */
 
 	unsigned char		broadcast[MAX_ADDR_LEN];	/* hw bcast add	*/
 
+	/* ingress path synchronizer */
+	spinlock_t		ingress_lock;
+	struct Qdisc		*qdisc_ingress;
+
 /*
  * Cache line mostly used on queue transmit path (qdisc)
  */
 	/* device queue lock */
 	spinlock_t		queue_lock ____cacheline_aligned_in_smp;
 	struct Qdisc		*qdisc;
 	struct Qdisc		*qdisc_sleeping;
 	struct list_head	qdisc_list;
 	unsigned long		tx_queue_len;	/* Max frames per queue allowed */
 
 	/* Partially transmitted GSO packet. */
 	struct sk_buff		*gso_skb;
 
-	/* ingress path synchronizer */
-	spinlock_t		ingress_lock;
-	struct Qdisc		*qdisc_ingress;
-
 /*
  * One part is mostly used on xmit path (device)
  */
 	/* hard_start_xmit synchronizer */
 	spinlock_t		_xmit_lock ____cacheline_aligned_in_smp;
 	/* cpu id of processor entered to hard_start_xmit or -1,
 	   if nobody entered there.
 	 */
 	int			xmit_lock_owner;
 	void			*priv;	/* pointer to private data	*/


^ permalink raw reply

* Re: Broken "Make ip6_frags per namespace" patch
From: Alexey Dobriyan @ 2008-01-17 11:30 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: davem, den, netdev, devel
In-Reply-To: <478F30AA.7080704@fr.ibm.com>

On Thu, Jan 17, 2008 at 11:40:42AM +0100, Daniel Lezcano wrote:
> Alexey Dobriyan wrote:
> >>commit c064c4811b3e87ff8202f5a966ff4eea0bc54575
> >>Author: Daniel Lezcano <dlezcano@fr.ibm.com>
> >>Date:   Thu Jan 10 02:56:03 2008 -0800
> >>
> >>    [NETNS][IPV6]: Make ip6_frags per namespace.
> >>    
> >>    The ip6_frags is moved to the network namespace structure.  Because
> >>    there can be multiple instances of the network namespaces, and the
> >>    ip6_frags is no longer a global static variable, a helper function has
> >>    been added to facilitate the initialization of the variables.
> >>    
> >>    Until the ipv6 protocol is not per namespace, the variables are
> >>    accessed relatively from the initial network namespace.
> >
> >>--- a/include/net/netns/ipv6.h
> >>+++ b/include/net/netns/ipv6.h
> >
> >>@@ -11,6 +13,7 @@ struct netns_sysctl_ipv6 {
> >> #ifdef CONFIG_SYSCTL
> >> 	struct ctl_table_header *table;
> >> #endif
> >>+	struct inet_frags_ctl frags;
> >
> >>--- a/net/ipv6/reassembly.c
> >>+++ b/net/ipv6/reassembly.c
> >
> >>@@ -632,6 +625,11 @@ static struct inet6_protocol frag_protocol =
> >> 	.flags		=	INET6_PROTO_NOPOLICY,
> >> };
> >> 
> >>+void ipv6_frag_sysctl_init(struct net *net)
> >>+{
> >>+	ip6_frags.ctl = &net->ipv6.sysctl.frags;
> >>+}
> >
> >_This_ can't work. ip6frags is only one and ->ctl pointer is flipped
> >onto per-netns data. Changelog is also misleading: ip6_frags_ctl is
> >moved to netns not all ip6_frags.
> >
> >Oopsing place below -- f->ctl dereference in preparation of mod_timer() 
> >call.
> >
> >
> >
> >BUG: unable to handle kernel paging request at virtual address f5da8fc8
> >printing eip: c11d868a *pdpt = 0000000000003001 *pde = 0000000001728067 
> >*pte = 0000000035da8000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> >Modules linked in: ebt_ip ebt_dnat ebt_arpreply ebt_arp ebt_among 
> >ebtable_nat ip6t_REJECT ip6table_filter ip6_tables ebtable_filter 
> >ebtable_broute ebt_802_3 ebtables des_generic nf_conntrack_netbios_ns 
> >nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT 
> >iptable_filter ip_tables deflate zlib_deflate zlib_inflate cryptomgr 
> >crypto_hash cpufreq_stats cpufreq_ondemand cdrom cbc bridge llc blkcipher 
> >crypto_algapi arpt_mangle arptable_filter arp_tables x_tables ah6 
> >af_packet ipv6
> >
> >Pid: 0, comm: swapper Not tainted (2.6.24-rc7-net-2.6.25-nf-sysfs-n #30)
> >EIP: 0060:[<c11d868a>] EFLAGS: 00010246 CPU: 1
> >EIP is at inet_frag_secret_rebuild+0xaa/0xd0
> >EAX: f5da8fbc EBX: 00000000 ECX: c1310000 EDX: 00000100
> >ESI: f7cba000 EDI: f898f7a0 EBP: 00000040 ESP: c1310f90
> > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> >Process swapper (pid: 0, ti=c1310000 task=f7c9a580 task.ti=f7c9b000)
> >Stack: f898f7a8 f898f8a8 000ddcbd f898f7a0 f7cba000 c1310fc4 00000100 
> >c1026d60 00000002 00000001 c1191183 c4779ddc c11d85e0 f898c860 
> >       f898c860 c12c4a88 00000001 c1308da0 0000000a c1023477 00000001 
> >       c130b640 c130b640 f7c9bf34 Call Trace:
> > [<c1026d60>] run_timer_softirq+0x120/0x190
> > [<c1191183>] net_rx_action+0x53/0x220
> > [<c11d85e0>] inet_frag_secret_rebuild+0x0/0xd0
> > [<c1023477>] __do_softirq+0x87/0x100
> > [<c10059cf>] do_softirq+0xaf/0x110
> > [<c10233e3>] irq_exit+0x83/0x90
> > [<c1010ce7>] smp_apic_timer_interrupt+0x57/0x90
> > [<c10036e1>] apic_timer_interrupt+0x29/0x38
> > [<c10036eb>] apic_timer_interrupt+0x33/0x38
> > [<c1001460>] default_idle+0x0/0x60
> > [<c10014a0>] default_idle+0x40/0x60
> > [<c1000ea3>] cpu_idle+0x73/0xb0
> >=======================
> >Code: 8b 10 85 d2 89 13 74 03 89 5a 04 89 18 89 43 04 85 f6 89 f3 75 bb 45 
> >83 fd 40 75 a5 8b 44 24 04 e8 4c 3f 01 00 8b 87 50 01 00 00 <8b> 50 0c 01 
> >54 24 08 8d 87 38 01 00 00 8b 54 24 08 83 c4 0c 5b EIP: [<c11d868a>] 
> >inet_frag_secret_rebuild+0xaa/0xd0 SS:ESP 0068:c1310f90
> >Kernel panic - not syncing: Fatal exception in interrupt
> 
> Hi Alexey,
> 
> does it happen after unsharing the network ?

Yep. clone(CLONE_NEWNET) in a loop and sooner or later you'll see this.


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox