Netdev List

Netdev List
 help / color / mirror / Atom feed

* [patch net-2.6.25 08/10][NETNS][IPV6] make mld_max_msf readonly in other namespaces
From: Daniel Lezcano @ 2008-01-09 16:45 UTC (permalink / raw)
  To: davem; +Cc: netdev, benjamin.thery
In-Reply-To: <20080109164533.695191040@localhost.localdomain>

[-- Attachment #1: make-mld_max_msf-readonly.patch --]
[-- Type: text/plain, Size: 1366 bytes --]

The mld_max_msf protects the system with a maximum allowed multicast 
source filters. Making this variable per namespace can be potentially
an problem if someone inside a namespace set it to a big value, that
will impact the whole system including other namespaces.

I don't see any benefits to have it per namespace for now, so in order 
to keep a directory entry in a newly created namespace, I make it
read-only when we are not in the initial network namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 net/ipv6/sysctl_net_ipv6.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: net-2.6.25/net/ipv6/sysctl_net_ipv6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/sysctl_net_ipv6.c
+++ net-2.6.25/net/ipv6/sysctl_net_ipv6.c
@@ -122,6 +122,12 @@ static int ipv6_sysctl_net_init(struct n
       	ipv6_table[5].data = &net->ipv6.sysctl.frags.timeout;
     	ipv6_table[6].data = &net->ipv6.sysctl.frags.secret_interval;
 
+ 	/* We don't want this value to be per namespace, it should be global
+	   to all namespaces, so make it read-only when we are not in the
+	   init network namespace */
+    	if (net != &init_net)
+    		ipv6_table[7].mode = 0444;
+
 	net->ipv6.sysctl.table = register_net_sysctl_table(net, net_ipv6_ctl_path,
 							   ipv6_table);
 	if (!net->ipv6.sysctl.table)

-- 

^ permalink raw reply

* [patch net-2.6.25 02/10][NETNS][IPV6] make a subsystem for af_inet6
From: Daniel Lezcano @ 2008-01-09 16:45 UTC (permalink / raw)
  To: davem; +Cc: netdev, benjamin.thery
In-Reply-To: <20080109164533.695191040@localhost.localdomain>

[-- Attachment #1: make-af-inet6-a-subsystem.patch --]
[-- Type: text/plain, Size: 1965 bytes --]

This patch add a network namespace subsystem for the af_inet6 module. 
It does nothing right now, but one of its purpose is to receive the 
different variables for sysctl in order to initialize them.

When the sysctl variable will be moved to the network namespace structure,
they will be no longer initialized as global static variables, so we must
find a place to initialize them. Because the sysctl can be disabled, it 
has no sense to store them in the sysctl_net_ipv6 file.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 net/ipv6/af_inet6.c |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Index: net-2.6.25/net/ipv6/af_inet6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/af_inet6.c
+++ net-2.6.25/net/ipv6/af_inet6.c
@@ -719,6 +719,21 @@ static void cleanup_ipv6_mibs(void)
 	snmp_mib_free((void **)udplite_stats_in6);
 }
 
+static int inet6_net_init(struct net *net)
+{
+	return 0;
+}
+
+static void inet6_net_exit(struct net *net)
+{
+	return;
+}
+
+static struct pernet_operations inet6_net_ops = {
+	.init = inet6_net_init,
+	.exit = inet6_net_exit,
+};
+
 static int __init inet6_init(void)
 {
 	struct sk_buff *dummy_skb;
@@ -782,6 +797,10 @@ static int __init inet6_init(void)
 	 *	able to communicate via both network protocols.
 	 */
 
+	err = register_pernet_subsys(&inet6_net_ops);
+	if (err)
+		goto register_pernet_fail;
+
 #ifdef CONFIG_SYSCTL
 	err = ipv6_sysctl_register();
 	if (err)
@@ -901,6 +920,8 @@ icmp_fail:
 	ipv6_sysctl_unregister();
 sysctl_fail:
 #endif
+	unregister_pernet_subsys(&inet6_net_ops);
+register_pernet_fail:
 	cleanup_ipv6_mibs();
 out_unregister_sock:
 	sock_unregister(PF_INET6);
@@ -956,6 +977,7 @@ static void __exit inet6_exit(void)
 #ifdef CONFIG_SYSCTL
 	ipv6_sysctl_unregister();
 #endif
+	unregister_pernet_subsys(&inet6_net_ops);
 	cleanup_ipv6_mibs();
 	proto_unregister(&rawv6_prot);
 	proto_unregister(&udplitev6_prot);

-- 

^ permalink raw reply

* [patch net-2.6.25 00/10][NETNS][IPV6] make sysctl per namespace - V3
From: Daniel Lezcano @ 2008-01-09 16:45 UTC (permalink / raw)
  To: davem; +Cc: netdev, benjamin.thery

The following patchset makes the ipv6 sysctl to handle multiple
network namespaces. Each instance of a network namespace as its own
set of sysctl values, that means the behavior of the ipv6 stack can be
different depending on the sysctl values setup in the different
network namespaces.

Changelog:
	V3 : fixed compilation error when CONFIG_SYSCTL=n,
	     fixed missing initialization when CONFIG_SYSCTL=n

	V2 : make the mld_max_msf variable readonly when we are
	     not in the initial network namespace

	V1 : initial post

-- 

^ permalink raw reply

* Re: Linux IPv6 DAD not full conform to RFC 4862 ?
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2008-01-09 16:40 UTC (permalink / raw)
  To: kkeil; +Cc: netdev, yoshfuji
In-Reply-To: <20080110.013857.37616214.yoshfuji@linux-ipv6.org>

In article <20080110.013857.37616214.yoshfuji@linux-ipv6.org> (at Thu, 10 Jan 2008 01:38:57 +0900 (JST)), YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> says:

> - we could have "dad_reaction" interface variable and
>  > 1: disable interface
>  = 1: disable IPv6
>  < 0: ignore (as we do now)

Argh, >0, 0 and <0, maybe.

--yoshfuji

^ permalink raw reply

* Re: Linux IPv6 DAD not full conform to RFC 4862 ?
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2008-01-09 16:38 UTC (permalink / raw)
  To: kkeil; +Cc: netdev, yoshfuji
In-Reply-To: <20080109153656.GA16962@pingi.kke.suse.de>

In article <20080109153656.GA16962@pingi.kke.suse.de> (at Wed, 9 Jan 2008 16:36:56 +0100), Karsten Keil <kkeil@suse.de> says:

> So I think we should disable the interface now, if DAD fails on a
> hardware based LLA.

I don't want to do this, at least, unconditionally.

Options (not exclusive):

- we could have "enable_ipv6" interface flag and check it in
  input/output paths
- we could have "dad_reaction" interface variable and
 > 1: disable interface
 = 1: disable IPv6
 < 0: ignore (as we do now)

--yoshfuji

^ permalink raw reply

* Re: Linux IPv6 DAD not full conform to RFC 4862 ?
From: Neil Horman @ 2008-01-09 16:17 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20080109153656.GA16962@pingi.kke.suse.de>

On Wed, Jan 09, 2008 at 04:36:56PM +0100, Karsten Keil wrote:
> Hi,
> 
> I tried to run the 1.5.0 Beta2  TAHI Selftest on recent Linux kernel.
> It fails in the Stateless Address Autoconfiguration section with
> 6 tests.
> These tests are for Duplicate Address Detection (DAD).
> They are detect for the Link Local Address a duplicate address on the
> network. It seems that our current behavior is to log an message and
> do not assign this address.
> 
> But the RFC 4862 says:
> 
> 5.4.5.  When Duplicate Address Detection Fails
> 
>    A tentative address that is determined to be a duplicate as described
>    above MUST NOT be assigned to an interface, and the node SHOULD log a
>    system management error.
> 
>    If the address is a link-local address formed from an interface
>    identifier based on the hardware address, which is supposed to be
>    uniquely assigned (e.g., EUI-64 for an Ethernet interface), IP
>    operation on the interface SHOULD be disabled.  By disabling IP
>    operation, the node will then:
> 
>    -  not send any IP packets from the interface,
> 
>    -  silently drop any IP packets received on the interface, and
> 
>    -  not forward any IP packets to the interface (when acting as a
>       router or processing a packet with a Routing header).
> 
>    In this case, the IP address duplication probably means duplicate
>    hardware addresses are in use, and trying to recover from it by
>    configuring another IP address will not result in a usable network.
>    In fact, it probably makes things worse by creating problems that are
>    harder to diagnose than just disabling network operation on the
>    interface; the user will see a partially working network where some
>    things work, and other things do not.
> 
>    On the other hand, if the duplicate link-local address is not formed
>    from an interface identifier based on the hardware address, which is
>    supposed to be uniquely assigned, IP operation on the interface MAY
>    be continued.
> 
> 
> So I think we should disable the interface now, if DAD fails on a
> hardware based LLA.
> 

Not sure I agree with that.  I assume that by disable, you mean that we should
clear the IFF_UP flag?  If we do that, and another ip address is assigned to
that interface, then your proposal would discontinue the functionality of those
already established addresses, which would be bad.  I could see a DOS scenario
comming out of that as well.  Simply send ndisc na's for a recently advertised
address, and you could prevent network communication for an entire system.

Reading the section you reference, we do follow all the MUST requirements, and
we log an error.  Given that the disable section is a SHOULD, I think we can at
least be somewhat more restrictive in our implementation.  Perhaps we should
just disable the interface iff the failed address is link-local AND there are no
other functional address assigned to the interface.

Neil

> -- 
> Karsten Keil
> SuSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/0]: Cassini bug fixes.
From: Laszlo Attila Toth @ 2008-01-09 16:13 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, bazsi, hidden
In-Reply-To: <20080104.003231.127196736.davem@davemloft.net>

David Miller wrote:
> Over the past day I've put together the following set of bug fixes for
> the Cassini driver.
> 
> At least with my setup it appears to basically work fine, not leak
> memory, and the SKB BUG messages go away too.
> 
> I'll be honest and say that I've devoted a couple days to this work,
> and therefore I have to turn my attention back to other tasks.  As a
> result, it means it will be some time before I can look seriously into
> any feedback folks provide.  And for that I apologize, but this
> already consumed too much of my time.
> 
> I'll be pushing these to Linus and -stable shortly.
> 
> Thanks.
> 

We tested the card, it works well, all previous bugs are gone (truesize 
bug messages and memory comsumption).

Thank you again.

--
Attila

^ permalink raw reply

* Linux IPv6 DAD not full conform to RFC 4862 ?
From: Karsten Keil @ 2008-01-09 15:36 UTC (permalink / raw)
  To: netdev

Hi,

I tried to run the 1.5.0 Beta2  TAHI Selftest on recent Linux kernel.
It fails in the Stateless Address Autoconfiguration section with
6 tests.
These tests are for Duplicate Address Detection (DAD).
They are detect for the Link Local Address a duplicate address on the
network. It seems that our current behavior is to log an message and
do not assign this address.

But the RFC 4862 says:

5.4.5.  When Duplicate Address Detection Fails

   A tentative address that is determined to be a duplicate as described
   above MUST NOT be assigned to an interface, and the node SHOULD log a
   system management error.

   If the address is a link-local address formed from an interface
   identifier based on the hardware address, which is supposed to be
   uniquely assigned (e.g., EUI-64 for an Ethernet interface), IP
   operation on the interface SHOULD be disabled.  By disabling IP
   operation, the node will then:

   -  not send any IP packets from the interface,

   -  silently drop any IP packets received on the interface, and

   -  not forward any IP packets to the interface (when acting as a
      router or processing a packet with a Routing header).

   In this case, the IP address duplication probably means duplicate
   hardware addresses are in use, and trying to recover from it by
   configuring another IP address will not result in a usable network.
   In fact, it probably makes things worse by creating problems that are
   harder to diagnose than just disabling network operation on the
   interface; the user will see a partially working network where some
   things work, and other things do not.

   On the other hand, if the duplicate link-local address is not formed
   from an interface identifier based on the hardware address, which is
   supposed to be uniquely assigned, IP operation on the interface MAY
   be continued.

So I think we should disable the interface now, if DAD fails on a
hardware based LLA.

-- 
Karsten Keil
SuSE Labs

^ permalink raw reply

* Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24
From: Andy Gospodarek @ 2008-01-09 15:27 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Krzysztof Oledzki, netdev, Jeff Garzik, David Miller,
	Andy Gospodarek, Herbert Xu
In-Reply-To: <17850.1199865514@death>

On Tue, Jan 08, 2008 at 11:58:34PM -0800, Jay Vosburgh wrote:
> Krzysztof Oledzki <olel@ans.pl> wrote:
> 
> >Fine. Just let you know that someone test your patches and everything
> >works, except mentioned problem.
> 
> 	And I appreciate it; I just wanted to make sure our many fans
> following along at home didn't misunderstand.
> 
> 	Could you let me know if the patch below make the lockdep
> warning go away?  This applies on top of the previous three, although it
> should be trivial to do by hand.
> 
> 	I'm still checking to make sure this is safe with regard to
> mutexing the bonding structures, but it would be good to know if it
> eliminates the warning.
> 
> 	-J
> 

Jay,

My initial concern was that a slave device could disappear out from
under us, but it seems like this certainly isn't the case since all
calls to bond_release are protected by rtnl-locks, so I think you are
correct that we are safe.  I'll test this on my setup here and let you
know if I see any problems.

-andy




^ permalink raw reply

* Re: Top 10 kernel oopses for the week ending January 5th, 2008
From: Arjan van de Ven @ 2008-01-09 15:28 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Linux Kernel Mailing List, Linus Torvalds, Andrew Morton, NetDev
In-Reply-To: <1199887950.6762.26.camel@johannes.berg>

Johannes Berg wrote:
>> Rank 1: __ieee80211_rx
>> 	Warning at net/mac80211/rx.c:1672
>> 	Reported 6 times (11 total reports)
>> 	Same issue that was ranked 2nd last week
>> 	Johannes has diagnosed this as a driver bug in the iwlwifi drivers
>> 	More info: http://www.kerneloops.org/search.php?search=__ieee80211_rx
> 
> Note that because we don't get the module list for WARN_ON, we don't
> actually know whether all of these instances are from the iwlwifi
> drivers. A few other drivers suffer from the same problem. In one of
> these cases, iwlwifi was contained in the stack trace, but in the common
> case that isn't happening because packet processing is delayed to a
> tasklet.
> 

and fwiw a patch to get this added to WARN_ON was posted by my last week to fix this;
once this goes into 2.6.25-rc this annoyance/hinderance in debugging will be fixed.

^ permalink raw reply

* Re: SACK scoreboard
From: John Heffner @ 2008-01-09 14:56 UTC (permalink / raw)
  To: David Miller; +Cc: andi, ilpo.jarvinen, lachlan.andrew, netdev, quetchen
In-Reply-To: <20080108.224144.234253941.davem@davemloft.net>

David Miller wrote:
> From: John Heffner <jheffner@psc.edu>
> Date: Tue, 08 Jan 2008 23:27:08 -0500
> 
>> I also wonder how much of a problem this is (for now, with window sizes 
>> of order 10000 packets.  My understanding is that the biggest problems 
>> arise from O(N^2) time for recovery because every ack was expensive. 
>> Have current tests shown the final ack to be a major source of problems?
> 
> Yes, several people have reported this.

I may have missed some of this.  Does anyone have a link to some recent 
data?

   -John

^ permalink raw reply

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
From: Paul E. McKenney @ 2008-01-09 14:43 UTC (permalink / raw)
  To: David Miller; +Cc: dada1, herbert, dipankar, netdev, josh
In-Reply-To: <20080109.063126.68241252.davem@davemloft.net>

On Wed, Jan 09, 2008 at 06:31:26AM -0800, David Miller wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Date: Wed, 9 Jan 2008 06:22:58 -0800
> 
> > On Wed, Jan 09, 2008 at 11:37:27AM +0100, Eric Dumazet wrote:
> > > On Wed, 9 Jan 2008 20:46:37 +1100
> > > Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > > 
> > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > > index d337706..28484f3 100644
> > > --- a/net/ipv4/route.c
> > > +++ b/net/ipv4/route.c
> > > @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
> > >  			break;
> > >  		rcu_read_unlock_bh();
> > >  	}
> > > -	return r;
> > > +	return rcu_dereference(r);
> > >  }
> > 
> > Would it be possible to tag rt_cache_get_first() with an __acquires(RCU)
> > to help out sparse?
> 
> Sparse can't handle conditional locking very well, as is done here.
> There is a seperate thread where Eric reworks how all of this
> locking is done in order to pacify sparse and be able to add the
> __acquires() etc. tags and some of us found it too ugly to
> swallow :-)

Ah!  ;-)

							Thanx, Paul

^ permalink raw reply

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
From: David Miller @ 2008-01-09 14:31 UTC (permalink / raw)
  To: paulmck; +Cc: dada1, herbert, dipankar, netdev
In-Reply-To: <20080109142258.GC13714@linux.vnet.ibm.com>

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Wed, 9 Jan 2008 06:22:58 -0800

> On Wed, Jan 09, 2008 at 11:37:27AM +0100, Eric Dumazet wrote:
> > On Wed, 9 Jan 2008 20:46:37 +1100
> > Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > 
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index d337706..28484f3 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
> >  			break;
> >  		rcu_read_unlock_bh();
> >  	}
> > -	return r;
> > +	return rcu_dereference(r);
> >  }
> 
> Would it be possible to tag rt_cache_get_first() with an __acquires(RCU)
> to help out sparse?

Sparse can't handle conditional locking very well, as is done here.
There is a seperate thread where Eric reworks how all of this
locking is done in order to pacify sparse and be able to add the
__acquires() etc. tags and some of us found it too ugly to
swallow :-)

^ permalink raw reply

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
From: Paul E. McKenney @ 2008-01-09 14:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Herbert Xu, davem, dipankar, netdev
In-Reply-To: <20080109113727.50eae500.dada1@cosmosbay.com>

On Wed, Jan 09, 2008 at 11:37:27AM +0100, Eric Dumazet wrote:
> On Wed, 9 Jan 2008 20:46:37 +1100
> Herbert Xu <herbert@gondor.apana.org.au> wrote:
> 
> > On Wed, Jan 09, 2008 at 08:38:56AM +0100, Eric Dumazet wrote:
> > > 
> > > I am not sure this is valid, since it will do this :
> > > 
> > > r = rt_hash_table[st->bucket].chain;
> > > if (r)
> > >     return rcu_dereference(r);
> > > 
> > > So compiler might be dumb enough do dereference 
> > > &rt_hash_table[st->bucket].chain two times.
> > 
> > That wouldn't be a problem at all.  The key is to add a barrier between
> > reading the pointer:
> > 
> > 	r = rt_hash_table[st->bucket].chain
> > 
> > and dereferencing it later, e.g.,
> > 
> > 	r->u.dst.rt_next
> > 
> > The barrier is there so that when we dereference r we don't read
> > stale cache that was there before the memory at r was initialised.
> > How many times you read the pointer value before the barrier is
> > irrelevant to the effectiveness of the barrier preceding the
> > dereference.

Agreed -- as long as you don't try to dereference the pointer before
passing it through rcu_dereference(), and as long as both the initial
fetch of the pointer, the rcu_dereference(), and the actual dereferencing
of the pointer are all within the same RCU read-side critical section.

> You are absolutely right Herbert, so I changed the patch to :
> 
> [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
> 
> In rt_cache_get_next(), no need to guard seq->private by a rcu_dereference()
> since seq is private to the thread running this function. Reading seq.private
> once (as guaranted bu rcu_dereference()) or several time if compiler really is 
> dumb enough wont change the result.
> 
> But we miss real spots where rcu_dereference() are needed, both in 
> rt_cache_get_first() and rt_cache_get_next()
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index d337706..28484f3 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
>  			break;
>  		rcu_read_unlock_bh();
>  	}
> -	return r;
> +	return rcu_dereference(r);
>  }

Would it be possible to tag rt_cache_get_first() with an __acquires(RCU)
to help out sparse?

>  static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
>  {
> -	struct rt_cache_iter_state *st = rcu_dereference(seq->private);
> +	struct rt_cache_iter_state *st = seq->private;
> 
>  	r = r->u.dst.rt_next;
>  	while (!r) {
> @@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
>  		rcu_read_lock_bh();
>  		r = rt_hash_table[st->bucket].chain;
>  	}
> -	return r;
> +	return rcu_dereference(r);
>  }

Ditto for rt_cache_get_next()?

>  static struct rtable *rt_cache_get_idx(struct seq_file *seq, loff_t pos)

There would need to be a __releases(RCU) somewhere -- possibly
in rt_cache_seq_stop(), but need to defer to you guys on this one.

						Thanx, Paul

^ permalink raw reply

* Re: FW:  ccid2/ccid3 oopses
From: Gerrit Renker @ 2008-01-09 14:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, devzero, dccp, netdev
In-Reply-To: <20080109140211.GA9857@ghostprotocols.net>

| > >> the easiest way to reproduce is:
| > >> 
| > >> while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
| > >> after short time, the kernel oopses (messages below)
| > >> 
<snip>
| 
| Gerrit, the control socket isn't attached to any CCID module, so the
| CCID modules should be safe to remove, and IIRC they were safe to
| unload.
| 
Ah, right. I have misread the email. And can confirm the above: running
the for-loop at the top of the message (60 seconds uninterrupted for
CCID2,3 each) brought no oopses.
So maybe the cause triggering this oops is somewhere else.

^ permalink raw reply

* Re: Top 10 kernel oopses for the week ending January 5th, 2008
From: Johannes Berg @ 2008-01-09 14:12 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linux Kernel Mailing List, Linus Torvalds, Andrew Morton, NetDev
In-Reply-To: <477FF149.4070609@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 670 bytes --]

> Rank 1: __ieee80211_rx
> 	Warning at net/mac80211/rx.c:1672
> 	Reported 6 times (11 total reports)
> 	Same issue that was ranked 2nd last week
> 	Johannes has diagnosed this as a driver bug in the iwlwifi drivers
> 	More info: http://www.kerneloops.org/search.php?search=__ieee80211_rx

Note that because we don't get the module list for WARN_ON, we don't
actually know whether all of these instances are from the iwlwifi
drivers. A few other drivers suffer from the same problem. In one of
these cases, iwlwifi was contained in the stack trace, but in the common
case that isn't happening because packet processing is delayed to a
tasklet.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: FW:  ccid2/ccid3 oopses
From: Arnaldo Carvalho de Melo @ 2008-01-09 14:02 UTC (permalink / raw)
  To: Gerrit Renker, devzero, dccp, netdev
In-Reply-To: <20080109122827.GC4461@gerrit.erg.abdn.ac.uk>

Em Wed, Jan 09, 2008 at 12:28:27PM +0000, Gerrit Renker escreveu:
> Roland, -
> 
> >> apparently, i got crashes when loading/unloading other driver modules just
> >> after ccid2 or ccid3 had been loaded/unloaded _once_ (have not used them at
> >> all, just modprobe module;modprobe -r module) >
> >> 
> <snip>
> >> the easiest way to reproduce is:
> >> 
> >> while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
> >> after short time, the kernel oopses (messages below)
> >> 
> >> i`m not sure if this is worth to be filed at kernel bugzilla, so i`m contacting
> >> you personally first.
> >>
> The issue is known: once loaded, the DCCP modules can not be unloaded
> without causing a crash as the one you have observed. This is due to the
> fact that dccp_ipv{4,6} use control sockets which need to be released
> before the module can be unloaded.
> When the control sockets are not released then crashes will always
> result.
> In earlier versions of DCCP there was a kernel option known as "unload hack",
> which conditionally inserted 
> 	sock_release(dccp_v{4,6}_ctl_socket);
> in 
> 	dccp_v{4,6}_exit()
> 
> However, as the name says, it is a hack since there are other issues to 
> be considered:
> 	* sockets in timewait state
> 	* other wait states (e.g. half-open connections)
> 	* memory which has not been released
> 	* module dependencies
> 
> With regard to the latter, I am normally using the Unload Hack and
> release modules in the following order:
> 
> 	dccp_probe => dccp_ccid2 => dccp_ccid3 => dccp_tfrc_lib =>
>         dccp_ipv6  => dccp_ipv4  => dccp_diag  => dccp
> 
> Long story short
>  * the CCID/DCCP modules can currently not safely be unloaded
>  * maybe we should disable module unloading for the mainline kernel
>  * if anyone is interested to use the unload hack, here is the old patch
>    http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/Unload_Hack.diff

Gerrit, the control socket isn't attached to any CCID module, so the
CCID modules should be safe to remove, and IIRC they were safe to
unload.

The unload hack was for something else, for the core DCCP modules. We
can't unload because there are refcounts held by the control sock, so
the unload hack would just destroy the control sock and thus the module
refcount would reach zero and it could then be unloaded.

I've been consistently being sidetracked with work (huh :-)) and
couldn't look at this issue, but the CCID modules should be safe to
unload.

- Arnaldo

^ permalink raw reply

* Re: SACK scoreboard
From: Andi Kleen @ 2008-01-09 14:02 UTC (permalink / raw)
  To: Evgeniy Polyakov
  Cc: Andi Kleen, David Miller, jheffner, ilpo.jarvinen, lachlan.andrew,
	netdev, quetchen
In-Reply-To: <20080109094725.GA22140@2ka.mipt.ru>

> Postponing freeing of the skb has major drawbacks. Some time ago I

Yes, the trick would be to make sure that it also does not tie up
too much memory. e.g. it would need some throttling at least.

Also the fast path of kmem_cache_free() is actually not that
much different from just putting something on a list so perhaps
it would not make that much difference.

-Andi

^ permalink raw reply

* Re: SACK scoreboard
From: Ilpo Järvinen @ 2008-01-09 12:55 UTC (permalink / raw)
  To: John Heffner; +Cc: Andi Kleen, David Miller, lachlan.andrew, Netdev, quetchen
In-Reply-To: <47844D1C.1060706@psc.edu>

On Tue, 8 Jan 2008, John Heffner wrote:

> Andi Kleen wrote:
> > David Miller <davem@davemloft.net> writes:
> > > The big problem is that recovery from even a single packet loss in a
> > > window makes us run kfree_skb() for a all the packets in a full
> > > window's worth of data when recovery completes.
> > 
> > Why exactly is it a problem to free them all at once? Are you worried
> > about kernel preemption latencies?
> 
> I also wonder how much of a problem this is (for now, with window sizes of
> order 10000 packets.  My understanding is that the biggest problems arise from
> O(N^2) time for recovery because every ack was expensive. Have current 
> tests shown the final ack to be a major source of problems?

This thread got started because I tried to solve the other latencies but 
realized that it helps very little because this latency spike would 
have remained unsolved and it happens in one of the most common case.

-- 
 i.

^ permalink raw reply

* Re: FW:  ccid2/ccid3 oopses
From: Gerrit Renker @ 2008-01-09 12:28 UTC (permalink / raw)
  To: devzero; +Cc: dccp, netdev
In-Reply-To: <93680347@web.de>

Roland, -

>> apparently, i got crashes when loading/unloading other driver modules just
>> after ccid2 or ccid3 had been loaded/unloaded _once_ (have not used them at
>> all, just modprobe module;modprobe -r module) >
>> 
<snip>
>> the easiest way to reproduce is:
>> 
>> while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
>> after short time, the kernel oopses (messages below)
>> 
>> i`m not sure if this is worth to be filed at kernel bugzilla, so i`m contacting
>> you personally first.
>>
The issue is known: once loaded, the DCCP modules can not be unloaded
without causing a crash as the one you have observed. This is due to the
fact that dccp_ipv{4,6} use control sockets which need to be released
before the module can be unloaded.
When the control sockets are not released then crashes will always
result.
In earlier versions of DCCP there was a kernel option known as "unload hack",
which conditionally inserted 
	sock_release(dccp_v{4,6}_ctl_socket);
in 
	dccp_v{4,6}_exit()

However, as the name says, it is a hack since there are other issues to 
be considered:
	* sockets in timewait state
	* other wait states (e.g. half-open connections)
	* memory which has not been released
	* module dependencies

With regard to the latter, I am normally using the Unload Hack and
release modules in the following order:

	dccp_probe => dccp_ccid2 => dccp_ccid3 => dccp_tfrc_lib =>
        dccp_ipv6  => dccp_ipv4  => dccp_diag  => dccp

Long story short
 * the CCID/DCCP modules can currently not safely be unloaded
 * maybe we should disable module unloading for the mainline kernel
 * if anyone is interested to use the unload hack, here is the old patch
   http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/Unload_Hack.diff

Please feel free to come back on this issue
Gerrit

^ permalink raw reply

* Re: [PATCH net-2.6.25] [IPVS] Added include for ip_vs.h for ctl_path (build was broken)
From: David Miller @ 2008-01-09 11:57 UTC (permalink / raw)
  To: ramirose; +Cc: netdev
In-Reply-To: <eb3ff54b0801090333r4c1770dakd65f61e356aa0304@mail.gmail.com>

From: "Rami Rosen" <ramirose@gmail.com>
Date: Wed, 9 Jan 2008 13:33:49 +0200

> Hi,
>    The build was broken with this error:
> 	
>   In file included from net/ipv4/ipvs/ip_vs_rr.c:27:
>   include/net/ip_vs.h:857: error: array type has incomplete element type
>   make[3]: *** [net/ipv4/ipvs/ip_vs_rr.o] Error 1
> 
> 	This was due to missing include to the header file for ctl_path.
> 	
> 	This patch added #include <linux/sysctl.h> to ip_vs_.h to avoid it
> 
> Signed-off-by: Rami Rosen <ramirose@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [ATM]: Check IP header validity in mpc_send_packet
From: David Miller @ 2008-01-09 11:52 UTC (permalink / raw)
  To: herbert; +Cc: netdev, viro, chas
In-Reply-To: <20080109102745.GA29297@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 9 Jan 2008 21:27:45 +1100

> [ATM]: Check IP header validity in mpc_send_packet
> 
> Al went through the ip_fast_csum callers and found this piece of code
> that did not validate the IP header.  While root crashing the machine
> by sending bogus packets through raw or AF_PACKET sockets isn't that
> serious, it is still nice to react gracefully.
> 
> This patch ensures that the skb has enough data for an IP header and
> that the header length field is valid.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks for following up on this.

^ permalink raw reply

* Re: make ipv6_sysctl_register to return a value
From: David Miller @ 2008-01-09 11:50 UTC (permalink / raw)
  To: dlezcano; +Cc: netdev
In-Reply-To: <47849A60.1020909@fr.ibm.com>

From: Daniel Lezcano <dlezcano@fr.ibm.com>
Date: Wed, 09 Jan 2008 10:56:48 +0100

> To clear out any confusion, please can you just ignore all my previous 
> patches, I will resend a new serie rebased on the work done by Pavel.

Ok, thanks.

^ permalink raw reply

* Re: [PATCH net-2.6.25 0/4] [NET]: Bloat, bloat and more bloat
From: Ilpo Järvinen @ 2008-01-09 11:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Netdev
In-Reply-To: <20080105144608.GD12379@ghostprotocols.net>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1247 bytes --]

On Sat, 5 Jan 2008, Arnaldo Carvalho de Melo wrote:

> Em Sat, Jan 05, 2008 at 03:39:04PM +0200, Ilpo Järvinen escreveu:
> > Hi Dave,
> > 
> > After Arnaldo got codiff's inline instrumentation bugs fixed
> > (thanks! :-)), I got my .c-inline-bloat-o-meter to power up
> > reliably after some tweaking and bug fixing on my behalf...
> > It shows some very high readings every now and then in the
> > code under net/.
> 
> Thank you for the reports and for showing how these tools can be put to
> good use! 
> 
> If you have any further suggestions on how to make codiff and the
> dwarves to be of more help or find any other bug, please let me know.

It could use a bit less memory because my header inline checking attempt 
on a machine with 2G+2G ends up like this:

$ codiff vmlinux.o.base vmlinux.o
libclasses: out of memory(inline_expansion__new)
$ ls -al vmlinux.o{,.base}
-rw-r----- 1 ijjarvin tkol 633132586 Jan  9 13:11 vmlinux.o
-rw-r----- 1 ijjarvin tkol 633132572 Jan  9 00:58 vmlinux.o.base
$

Considering that's only 0.6G+0.6G I've a problem in understanding why 
codiff's number crunching eats up so much memory.

I just hope there isn't any O(n^2) or worse algos in it either once
the memory consumption gets resolved. :-)


-- 
 i.

^ permalink raw reply

* [PATCH net-2.6.25] [IPVS] Added include for ip_vs.h for ctl_path (build was broken)
From: Rami Rosen @ 2008-01-09 11:33 UTC (permalink / raw)
  To: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 430 bytes --]

Hi,
   The build was broken with this error:
	
  In file included from net/ipv4/ipvs/ip_vs_rr.c:27:
  include/net/ip_vs.h:857: error: array type has incomplete element type
  make[3]: *** [net/ipv4/ipvs/ip_vs_rr.o] Error 1

	This was due to missing include to the header file for ctl_path.
	
	This patch added #include <linux/sysctl.h> to ip_vs_.h to avoid it

Regards,
Rami Rosen


Signed-off-by: Rami Rosen <ramirose@gmail.com>

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 409 bytes --]

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 02ab7ca..56f3c94 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -9,6 +9,8 @@
 #include <asm/types.h>		/* For __uXX types */
 #include <linux/types.h>	/* For __beXX types in userland */
 
+#include <linux/sysctl.h>	/* For ctl_path */
+
 #define IP_VS_VERSION_CODE	0x010201
 #define NVERSION(version)			\
 	(version >> 16) & 0xFF,			\

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox