Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24
From: Herbert Xu @ 2008-01-09 22:05 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Jay Vosburgh, Krzysztof Oledzki, netdev, Jeff Garzik,
	David Miller
In-Reply-To: <20080109201709.GF8728@gospo.usersys.redhat.com>

On Wed, Jan 09, 2008 at 03:17:09PM -0500, Andy Gospodarek wrote:
>
> Agreed.  And despite Herbert's opinion that this isn't the correct fix,
> I think this will work fine.  This is one of the cases where we can take
> a write_lock(bond->lock) in softirq context, so we need to drop that (or
> make sure all the read_lock's are read_lock_bh's).  The latter isn't
> really an option since having a majority of the bonding code run in
> softirq context was what we are trying to avoid with the workqueue
> conversion.

No that's not the point.  The point is to move the majority of the code
into process context so that you can take the RTNL.  Once you have taken
the RTNL you can disable BH all you want and I don't care one bit.

In any case, fixing a known dead-lock is important.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re : Re : Re : Bonding : Monitoring of 4965 wireless card
From: patnel972-linux @ 2008-01-09 21:52 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

I mean that instead of arp test an ip in lan or else, i want it to test 127.0.0.1 but in order to do this it must go out and re-enter and then use wlan0 to go out.


----- Message d'origine ----
De : Jay Vosburgh <fubar@us.ibm.com>
À : patnel972-linux@yahoo.fr
Cc : John W. Linville <linville@tuxdriver.com>; netdev@vger.kernel.org
Envoyé le : Mercredi, 9 Janvier 2008, 22h36mn 00s
Objet : Re: Re : Re : Bonding : Monitoring of 4965 wireless card 

patnel972-linux@yahoo.fr wrote:

>I ignore it, but it seems like it prevent bonding detect link of
 wlan0. I enslave wlan0 and i already use use_carrier=1;

    The default for bonding is use_carrier=1, which makes bonding
use the device driver's netif_carrier_on/off state for link detection.
Bonding only checks via ethtool/mii if use_carrier=0.

>I'll try arp monitoring but this is annoying i c'ant test localhost.
 Is there a way to test localhost with arp, without pass through lo ? 

    What do you mean by "test localhost with arp, without pass
through lo"?  ARP monitoring issues probes (ARPs) to a remote
destination to confirm that there is connectivity; I'm not sure what
localhost has to do with it.

    In general, though, I have not tested bonding with wireless
adapters, so I'm unfamiliar with how well it does or does not work.

    -J

---
    -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com





      _____________________________________________________________________________ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr

^ permalink raw reply

* Re: Re : Re : Bonding : Monitoring of 4965 wireless card
From: Jay Vosburgh @ 2008-01-09 21:36 UTC (permalink / raw)
  To: patnel972-linux; +Cc: John W. Linville, netdev
In-Reply-To: <168648.20508.qm@web25705.mail.ukl.yahoo.com>

patnel972-linux@yahoo.fr wrote:

>I ignore it, but it seems like it prevent bonding detect link of wlan0. I enslave wlan0 and i already use use_carrier=1;

	The default for bonding is use_carrier=1, which makes bonding
use the device driver's netif_carrier_on/off state for link detection.
Bonding only checks via ethtool/mii if use_carrier=0.

>I'll try arp monitoring but this is annoying i c'ant test localhost. Is there a way to test localhost with arp, without pass through lo ? 

	What do you mean by "test localhost with arp, without pass
through lo"?  ARP monitoring issues probes (ARPs) to a remote
destination to confirm that there is connectivity; I'm not sure what
localhost has to do with it.

	In general, though, I have not tested bonding with wireless
adapters, so I'm unfamiliar with how well it does or does not work.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: Re : Re : Bonding : Monitoring of 4965 wireless card
From: Andy Gospodarek @ 2008-01-09 21:31 UTC (permalink / raw)
  To: patnel972-linux; +Cc: John W. Linville, netdev
In-Reply-To: <168648.20508.qm@web25705.mail.ukl.yahoo.com>

On Wed, Jan 09, 2008 at 09:17:06PM +0000, patnel972-linux@yahoo.fr wrote:
> I ignore it, but it seems like it prevent bonding detect link of wlan0. I enslave wlan0 and i already use use_carrier=1;
> I use bond to have my etherenet ip in wifi at office, else the wireless connection give temporary and you must pass through proxy then.
> I'll try arp monitoring but this is annoying i c'ant test localhost. Is there a way to test localhost with arp, without pass through lo ? 
> 
> 
> 
> ----- Message d'origine ----
> De : John W. Linville <linville@tuxdriver.com>
> À : patnel972-linux@yahoo.fr
> Envoyé le : Mercredi, 9 Janvier 2008, 21h24mn 10s
> Objet : Re: Re : Bonding : Monitoring of 4965 wireless card
> 
> On Wed, Jan 09, 2008 at 07:31:37PM +0000, patnel972-linux@yahoo.fr
>  wrote:
> > I'm doing a bonding with my eth0(e1000 driver) and my wlan
> > card(iwl4965). It work like i want, when i'm in wifi the dhcp give
> > me my ethernet adress. When i unplug the cable, my wlan card become
> > in charge of network. My problem is when i disconnect the wlan card,
> > the bonding does not detect it correctly, and ifplugstatus show me
> > wlan0 not connected and wmaster0 connected!! The bonding module does
> > not say no active interface, it work like wlan is on.
> > 
> > Am i clear?
> 
> Yes, that is much more clear to me.
> 
> What (if anything) are you doing to wmaster0?  You should just
> ignore it.
> 
> FWIW, miimon is not going to work with a mac80211-based device at
> this time.  The miimon option relies on support for either miitool
> or ethtool, and mac80211 device support neither of those.
> 
> Hmmm...it looks like there is a use_carrier option for miimon.
> Based on its description I would think it would work.  Of course,
> I think it is supposed to be the default and you don't seem to be
> disabling it.  So, I'm not sure what is happening.
> 
> Are you enslaving wlan0?  Or wmaster0?  Make sure it is wlan0.
> Also, please add use_carrier=1 to your bonding module options.
> Does this change the behaviour?  If not, please open a bug at either
> bugzilla.redhat.com (if you are a Fedora, RHEL, or even CentOS user)
> or bugzilla.kernel.org (otherwise).
> 
> In the meantime, you might try using NetworkManger.  Or you
> might consider using ARP monitoring.  The former probably is the
> best solution if you are mobile (e.g.  at a cafe or other hotspot)
> while the latter might be appropriate if you are just plugging and
> un-plugging within the same network (like at home or office).
> 
> Hth!
> 

John's suggestion to consider using ARP monitoring is a good one.  It is
the preferred method to check for failover when link checking is not an
option (which is the case with your current setup).


^ permalink raw reply

* Re: FW: ccid2/ccid3 oopses
From: devzero @ 2008-01-09 21:28 UTC (permalink / raw)
  To: gerrit; +Cc: Arnaldo Carvalho de Melo, dccp, devzero, netdev

[-- Attachment #1: Type: text/plain, Size: 1527 bytes --]

> So maybe the cause triggering this oops is somewhere else.
yes, probably.
sorry - i didn`t tell or maybe i didn`t know when writing my first mail to module authors and forget to add that before forwarding here.

for me , the problem does not happen with suse kernel of the day (2.6.24-rc6-git7-20080102160500-default, .config attached) but it happens with vanilla 2.6.24-rc6 (mostly allmodconfig, also attached)

regards
roland


----- Original Message ----- 
From: "Gerrit Renker" <gerrit@erg.abdn.ac.uk>
To: "Arnaldo Carvalho de Melo" <acme@redhat.com>; <devzero@web.de>; <dccp@vger.kernel.org>; <netdev@vger.kernel.org>
Sent: Wednesday, January 09, 2008 3:17 PM
Subject: Re: FW: ccid2/ccid3 oopses


>| > >> the easiest way to reproduce is:
> | > >> 
> | > >> while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
> | > >> after short time, the kernel oopses (messages below)
> | > >> 
> <snip>
> | 
> | Gerrit, the control socket isn't attached to any CCID module, so the
> | CCID modules should be safe to remove, and IIRC they were safe to
> | unload.
> | 
> Ah, right. I have misread the email. And can confirm the above: running
> the for-loop at the top of the message (60 seconds uninterrupted for
> CCID2,3 each) brought no oopses.
> So maybe the cause triggering this oops is somewhere else.
_________________________________________________________________________
In 5 Schritten zur eigenen Homepage. Jetzt Domain sichern und gestalten! 
Nur 3,99 EUR/Monat! http://www.maildomain.web.de/?mc=021114


[-- Attachment #2: working_config.gz --]
[-- Type: application/x-gzip, Size: 22013 bytes --]

[-- Attachment #3: config_where_problem_exists.gz --]
[-- Type: application/x-gzip, Size: 22143 bytes --]

^ permalink raw reply

* Re : Re : Bonding : Monitoring of 4965 wireless card
From: patnel972-linux @ 2008-01-09 21:17 UTC (permalink / raw)
  To: John W. Linville; +Cc: netdev

I ignore it, but it seems like it prevent bonding detect link of wlan0. I enslave wlan0 and i already use use_carrier=1;
I use bond to have my etherenet ip in wifi at office, else the wireless connection give temporary and you must pass through proxy then.
I'll try arp monitoring but this is annoying i c'ant test localhost. Is there a way to test localhost with arp, without pass through lo ? 



----- Message d'origine ----
De : John W. Linville <linville@tuxdriver.com>
À : patnel972-linux@yahoo.fr
Envoyé le : Mercredi, 9 Janvier 2008, 21h24mn 10s
Objet : Re: Re : Bonding : Monitoring of 4965 wireless card

On Wed, Jan 09, 2008 at 07:31:37PM +0000, patnel972-linux@yahoo.fr
 wrote:
> I'm doing a bonding with my eth0(e1000 driver) and my wlan
> card(iwl4965). It work like i want, when i'm in wifi the dhcp give
> me my ethernet adress. When i unplug the cable, my wlan card become
> in charge of network. My problem is when i disconnect the wlan card,
> the bonding does not detect it correctly, and ifplugstatus show me
> wlan0 not connected and wmaster0 connected!! The bonding module does
> not say no active interface, it work like wlan is on.
> 
> Am i clear?

Yes, that is much more clear to me.

What (if anything) are you doing to wmaster0?  You should just
ignore it.

FWIW, miimon is not going to work with a mac80211-based device at
this time.  The miimon option relies on support for either miitool
or ethtool, and mac80211 device support neither of those.

Hmmm...it looks like there is a use_carrier option for miimon.
Based on its description I would think it would work.  Of course,
I think it is supposed to be the default and you don't seem to be
disabling it.  So, I'm not sure what is happening.

Are you enslaving wlan0?  Or wmaster0?  Make sure it is wlan0.
Also, please add use_carrier=1 to your bonding module options.
Does this change the behaviour?  If not, please open a bug at either
bugzilla.redhat.com (if you are a Fedora, RHEL, or even CentOS user)
or bugzilla.kernel.org (otherwise).

In the meantime, you might try using NetworkManger.  Or you
might consider using ARP monitoring.  The former probably is the
best solution if you are mobile (e.g.  at a cafe or other hotspot)
while the latter might be appropriate if you are just plugging and
un-plugging within the same network (like at home or office).

Hth!

John
-- 
John W. Linville
linville@tuxdriver.com





      _____________________________________________________________________________ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr

^ permalink raw reply

* Re: Linux IPv6 DAD not full conform to RFC 4862 ?
From: Vlad Yasevich @ 2008-01-09 21:09 UTC (permalink / raw)
  To: Neil Horman
  Cc: YOSHIFUJI Hideaki / 吉藤英明, kkeil,
	netdev
In-Reply-To: <20080109185744.GB25106@hmsreliant.think-freely.org>

Neil Horman wrote:
> On Thu, Jan 10, 2008 at 01:38:57AM +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote:
>> In article <20080109153656.GA16962@pingi.kke.suse.de> (at Wed, 9 Jan 2008 16:36:56 +0100), Karsten Keil <kkeil@suse.de> says:
>>
>>> So I think we should disable the interface now, if DAD fails on a
>>> hardware based LLA.
>> I don't want to do this, at least, unconditionally.
>>
>> Options (not exclusive):
>>
>> - we could have "dad_reaction" interface variable and
>>  > 1: disable interface
>>  = 1: disable IPv6
>>  < 0: ignore (as we do now)
>>
> I like the flexibility of this solution, but given that the only part of the RFC
> that we're missing on at the moment is that we SHOULD disable the interface on
> DAD failure for a link-local address, I would think this scheme would be good:
> 
>   < 0 : ignore, and del address from interface (current behavior) 
>   = 0 : disable interface for dad failure for a link-local address 
>   > 0 : disable interface for dad failure for any address 
> 
> Regards
> Neil
>  

Just a friendly reminder that such a scheme should only be
applied to autoconfigured addresses.  A manually configured
duplicated address should not bring down the whole interface.

-vlad

^ permalink raw reply

* Re : Bonding : Monitoring of 4965 wireless card
From: patnel972-linux @ 2008-01-09 20:52 UTC (permalink / raw)
  To: John W. Linville; +Cc: netdev

I'm doing a bonding with my eth0(e1000 driver) and my wlan
 card(iwl4965). It work like i want, when i'm in wifi the dhcp give me my ethernet
 adress. When i unplug the cable, my wlan card become in charge of
 network. My problem is when i disconnect the wlan card, the bonding does not
 detect it correctly, and ifplugstatus show me wlan0 not connected and
 wmaster0 connected!! The bonding module does not say no active
 interface, it work like wlan is on.

Am i clear?

Ps:(sorry i have trouble with my mail)
----- Message d'origine ----
De : John W. Linville <linville@tuxdriver.com>
À : patnel972-linux@yahoo.fr
Cc : netdev@vger.kernel.org
Envoyé le : Mercredi, 9 Janvier 2008, 18h02mn 05s
Objet : Re: Bonding : Monitoring of 4965 wireless card

On Wed, Jan 09, 2008 at 09:00:05AM +0000, patnel972-linux@yahoo.fr
 wrote:
> Hi,
> 
> I want to make a bond with my wireless card. The ipw driver create
 two
>  interfaces (wlan0 and wmaster0). When i switch the rf_kill button,
>  ifplug detect wlan0 unplugged but not wmaster0. If i down wlan0
 (while
>  rf_kil ), bonding detect the inactivity when i up the interface.
> 
> Have you some idea where is the problem? the driver or the miimon of
>  the module?
> 
> my module parameters mode=1 miimon=100 primary eth0

I'm not sure I understand your description...what are you trying to do?
How exactly is it failing?

John
-- 
John W. Linville
linville@tuxdriver.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





    
  _____________________________________________________________________________ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo!
 Mail http://mail.yahoo.fr





      _____________________________________________________________________________ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr

^ permalink raw reply

* Re : Bonding : Monitoring of 4965 wireless card
From: linel patrice @ 2008-01-09 20:44 UTC (permalink / raw)
  To: John W. Linville; +Cc: netdev

I'm doing a bonding with my eth0(e1000 driver) and my wlan
 card(iwl4965). It work like i want, when i'm in wifi the dhcp give me my ethernet
 adress. When i unplug the cable, my wlan card become in charge of
 network. My problem is when i disconnect the wlan card, the bonding does not
 detect it correctly, and ifplugstatus show me wlan0 not connected and
 wmaster0 connected!! The bonding module does not say no active
 interface, it work like wlan is on.

Am i clear?


----- Message d'origine ----
De : John W. Linville <linville@tuxdriver.com>
À : patnel972-linux@yahoo.fr
Cc : netdev@vger.kernel.org
Envoyé le : Mercredi, 9 Janvier 2008, 18h02mn 05s
Objet : Re: Bonding : Monitoring of 4965 wireless card

On Wed, Jan 09, 2008 at 09:00:05AM +0000, patnel972-linux@yahoo.fr
 wrote:
> Hi,
> 
> I want to make a bond with my wireless card. The ipw driver create
 two
>  interfaces (wlan0 and wmaster0). When i switch the rf_kill button,
>  ifplug detect wlan0 unplugged but not wmaster0. If i down wlan0
 (while
>  rf_kil ), bonding detect the inactivity when i up the interface.
> 
> Have you some idea where is the problem? the driver or the miimon of
>  the module?
> 
> my module parameters mode=1 miimon=100 primary eth0

I'm not sure I understand your description...what are you trying to do?
How exactly is it failing?

John
-- 
John W. Linville
linville@tuxdriver.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





    
  _____________________________________________________________________________ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo!
 Mail http://mail.yahoo.fr





      _____________________________________________________________________________ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail http://mail.yahoo.fr

^ permalink raw reply

* Re: Linux IPv6 DAD not full conform to RFC 4862 ?
From: Karsten Keil @ 2008-01-09 20:32 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20080110.014051.121940744.yoshfuji@linux-ipv6.org>

On Thu, Jan 10, 2008 at 01:40:51AM +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote:
> In article <20080110.013857.37616214.yoshfuji@linux-ipv6.org> (at Thu, 10 Jan 2008 01:38:57 +0900 (JST)), YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> says:
> 
> > - we could have "dad_reaction" interface variable and
> >  > 1: disable interface
> >  = 1: disable IPv6
> >  < 0: ignore (as we do now)
> 
> Argh, >0, 0 and <0, maybe.
> 

I would like this solution, I had something similar to this in mind after I
discovered the issue.

-- 
Karsten Keil
SuSE Labs
SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 16746 (AG Nuernberg)

^ permalink raw reply

* Re: Linux IPv6 DAD not full conform to RFC 4862 ?
From: Karsten Keil @ 2008-01-09 20:26 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20080109161748.GA25106@hmsreliant.think-freely.org>

On Wed, Jan 09, 2008 at 11:17:48AM -0500, Neil Horman wrote:
> On Wed, Jan 09, 2008 at 04:36:56PM +0100, Karsten Keil wrote:
> > Hi,
> > 
> > I tried to run the 1.5.0 Beta2  TAHI Selftest on recent Linux kernel.
> > It fails in the Stateless Address Autoconfiguration section with
> > 6 tests.
> > These tests are for Duplicate Address Detection (DAD).
> > They are detect for the Link Local Address a duplicate address on the
> > network. It seems that our current behavior is to log an message and
> > do not assign this address.
> > 
> > But the RFC 4862 says:
> > 
> > 5.4.5.  When Duplicate Address Detection Fails
> > 
> >    A tentative address that is determined to be a duplicate as described
> >    above MUST NOT be assigned to an interface, and the node SHOULD log a
> >    system management error.
> > 
> >    If the address is a link-local address formed from an interface
> >    identifier based on the hardware address, which is supposed to be
> >    uniquely assigned (e.g., EUI-64 for an Ethernet interface), IP
> >    operation on the interface SHOULD be disabled.  By disabling IP
> >    operation, the node will then:
> > 
> >    -  not send any IP packets from the interface,
> > 
> >    -  silently drop any IP packets received on the interface, and
> > 
> >    -  not forward any IP packets to the interface (when acting as a
> >       router or processing a packet with a Routing header).
> > 
> >    In this case, the IP address duplication probably means duplicate
> >    hardware addresses are in use, and trying to recover from it by
> >    configuring another IP address will not result in a usable network.
> >    In fact, it probably makes things worse by creating problems that are
> >    harder to diagnose than just disabling network operation on the
> >    interface; the user will see a partially working network where some
> >    things work, and other things do not.
> > 
> >    On the other hand, if the duplicate link-local address is not formed
> >    from an interface identifier based on the hardware address, which is
> >    supposed to be uniquely assigned, IP operation on the interface MAY
> >    be continued.
> > 
> > 
> > So I think we should disable the interface now, if DAD fails on a
> > hardware based LLA.
> > 
> 
> Not sure I agree with that.  I assume that by disable, you mean that we should
> clear the IFF_UP flag?  If we do that, and another ip address is assigned to
> that interface, then your proposal would discontinue the functionality of those
> already established addresses, which would be bad.  I could see a DOS scenario
> comming out of that as well.  Simply send ndisc na's for a recently advertised
> address, and you could prevent network communication for an entire system.
> 
> Reading the section you reference, we do follow all the MUST requirements, and
> we log an error.  Given that the disable section is a SHOULD, I think we can at
> least be somewhat more restrictive in our implementation.  Perhaps we should
> just disable the interface iff the failed address is link-local AND there are no
> other functional address assigned to the interface.

I agree here, but it seems that currently the IPv6 Logo Committee thinks
that it has to be disable the interface to get the IPv6 ready Logo in
future. I already claim that on a discussion at the TAHI users list.


So far I remember a SHOULD in RFC has to interpreted as "You should
implement that in this way, exceptions are only acceptable for a good
reason". Maybe the DOS scenario is such a good reason.

-- 
Karsten Keil
SuSE Labs

^ permalink raw reply

* Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24
From: Andy Gospodarek @ 2008-01-09 20:17 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Andy Gospodarek, Krzysztof Oledzki, netdev, Jeff Garzik,
	David Miller, Herbert Xu
In-Reply-To: <32361.1199901296@death>

On Wed, Jan 09, 2008 at 09:54:56AM -0800, Jay Vosburgh wrote:
> Andy Gospodarek <andy@greyhouse.net> wrote:
> [...]
> >My initial concern was that a slave device could disappear out from
> >under us, but it seems like this certainly isn't the case since all
> >calls to bond_release are protected by rtnl-locks, so I think you are
> >correct that we are safe.  I'll test this on my setup here and let you
> >know if I see any problems.
> 
> 	Yep, all entries into enslave or remove come in with RTNL, so if
> we have RTNL there then slaves can't vanish.
> 
> 	On further inspection, I don't think it's safe to simply drop
> the locks in bond_set_multicast_list, I'm seeing a couple of cases that
> could be troublesome:
> 
> 	bond_set_promiscuity and bond_set_allmulti both reference
> curr_active_slave, which isn't protected from change by RTNL, so that
> could conflict with a change_active_slave calling bond_mc_swap (which is
> also holding the wrong locks for dev_set_promisc/allmulti).
> 
> 	It also looks like there are paths (igmp6 for one) into
> dev_mc_add that just hold a bunch of regular locks, and not RTNL, so
> those wouldn't be safe from having slaves vanish due to concurrent
> deslavement.

Eeeek!  I didn't realize that rtnl wasn't held for all those calls.  If
that's the case we can't drop all the locks.

> 	Looks like read_lock_bh for bond-lock and curr_slave_lock is
> needed in bond_set_multicast_list, and some dropping of locks is needed
> inside bond_set_promisc/allmulti.  Methinks that without any locks,
> bond_mc_add/delete could race with either a change of active slave or a
> de-enslavement of the active slave.
 
Agreed.  And despite Herbert's opinion that this isn't the correct fix,
I think this will work fine.  This is one of the cases where we can take
a write_lock(bond->lock) in softirq context, so we need to drop that (or
make sure all the read_lock's are read_lock_bh's).  The latter isn't
really an option since having a majority of the bonding code run in
softirq context was what we are trying to avoid with the workqueue
conversion.

> 	I'm wondering if this is worth trying to make perfect for 2.6.24
> (and maybe making things worse), and, instead, just do this:
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 77d004d..8b9e33a 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3937,7 +3937,7 @@ static void bond_set_multicast_list(struct net_device *bond_dev)
>  	struct bonding *bond = bond_dev->priv;
>  	struct dev_mc_list *dmi;
>  
> -	write_lock_bh(&bond->lock);
> +	read_lock_bh(&bond->lock);
>  
>  	/*
>  	 * Do promisc before checking multicast_mode
> @@ -3979,7 +3979,7 @@ static void bond_set_multicast_list(struct net_device *bond_dev)
>  	bond_mc_list_destroy(bond);
>  	bond_mc_list_copy(bond_dev->mc_list, bond, GFP_ATOMIC);
>  
> -	write_unlock_bh(&bond->lock);
> +	read_unlock_bh(&bond->lock);
>  }
>  
>  /*
> 
> 
> 	This should silence the lockdep (if I'm understanding what
> everybody's saying), and keep the change set to a minimum.  This might

The lockdep problem is easy to trigger.  The lockdep code does a good
job of noticing problems quickly regardless of how easy the deadlocks
are to create.

> not even be worth pushing for 2.6.24; I'm not exactly sure how difficult
> the lockdep problem would be to trigger.
> 

I'd like to see it go in there (for correct-ness) and to avoid hearing
about these lockdep issues for the next few months until it makes it
into 2.4.25.

> 	The other stuff I mention above can be dealt with later; they're
> very low-probability races that would be pretty difficult to hit even on
> purpose.
> 
> 	Thoughts?
> 
> 	-J
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH for 2.6.24][NET] fs_enet: check for phydev existence in the ethtool handlers
From: Matvejchikov Ilya @ 2008-01-09 20:20 UTC (permalink / raw)
  To: linuxppc-embedded, netdev

Hi folks!

I had the same problem too. The solution was the following:
http://www.mail-archive.com/netdev@vger.kernel.org/msg37951.html

Also have a look at the potential multicasting recovery problem in
fs_enet driver:
http://patchwork.ozlabs.org/linuxppc/patch?id=10700

Best regards,
Matvejchikov Ilya.

^ permalink raw reply

* Re: [PATCH for 2.6.24][NET] fs_enet: check for phydev existence in the ethtool handlers
From: Matvejchikov Ilya @ 2008-01-09 20:14 UTC (permalink / raw)
  To: linuxppc-embedded, netdev

Hi folks!

I had the same problem too. The solution was the following:
http://www.mail-archive.com/netdev@vger.kernel.org/msg37951.html

Also have a look at the potential multicasting recovery problem in
fs_enet driver:
http://patchwork.ozlabs.org/linuxppc/patch?id=10700

Best regards,
Matvejchikov Ilya.

^ permalink raw reply

* Re: [PATCH for 2.6.24][NET] fs_enet: check for phydev existence in the ethtool handlers
From: Matvejchikov Ilya @ 2008-01-09 20:10 UTC (permalink / raw)
  To: linuxppc-embedded, netdev


[-- Attachment #1.1: Type: text/plain, Size: 306 bytes --]

Hi folks!

I had the same problem too. The solution was the following:
http://www.mail-archive.com/netdev@vger.kernel.org/msg37951.html

Also have a look at the potential multicasting recovery problem in fs_enet
driver:
http://patchwork.ozlabs.org/linuxppc/patch?id=10700

Best regards,
Matvejchikov Ilya.

[-- Attachment #1.2: Type: text/html, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded

^ permalink raw reply

* Re: Linux IPv6 DAD not full conform to RFC 4862 ?
From: Neil Horman @ 2008-01-09 18:57 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明; +Cc: kkeil, netdev
In-Reply-To: <20080110.013857.37616214.yoshfuji@linux-ipv6.org>

On Thu, Jan 10, 2008 at 01:38:57AM +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote:
> In article <20080109153656.GA16962@pingi.kke.suse.de> (at Wed, 9 Jan 2008 16:36:56 +0100), Karsten Keil <kkeil@suse.de> says:
> 
> > So I think we should disable the interface now, if DAD fails on a
> > hardware based LLA.
> 
> I don't want to do this, at least, unconditionally.
> 
> Options (not exclusive):
> 
> - we could have "dad_reaction" interface variable and
>  > 1: disable interface
>  = 1: disable IPv6
>  < 0: ignore (as we do now)
> 
I like the flexibility of this solution, but given that the only part of the RFC
that we're missing on at the moment is that we SHOULD disable the interface on
DAD failure for a link-local address, I would think this scheme would be good:

  < 0 : ignore, and del address from interface (current behavior) 
  = 0 : disable interface for dad failure for a link-local address 
  > 0 : disable interface for dad failure for any address 

Regards
Neil
 
> --yoshfuji
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH][VLAN] Merge tree equal tails in vlan_skb_recv
From: Patrick McHardy @ 2008-01-09 18:42 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: Linux Netdev List
In-Reply-To: <478482A7.1040505@openvz.org>

Pavel Emelyanov wrote:
> Hi, Patrick.
> 
>> Pavel Emelyanov wrote:
>>> There are tree paths in it, that set the skb->proto and then
>>> perform common receive manipulations (basically call netif_rx()).
>>>
>>> I think, that we can make this code flow easier to understand
>>> by introducing the vlan_set_encap_proto() function (I hope the 
>>> name is good) to setup the skb proto and merge the paths calling 
>>> netif_rx() together.
>>>
>>> Surprisingly, but gcc detects this thing and merges these paths
>>> by itself, so this patch doesn't make the vlan module smaller.
>>
>> I already have something similar queued, but your patch is a nice
>> cleanup on top. I'll merge it into my tree and send it out after
>> some testing, hopefully today.
>>
> 
> What are your plans about this patch? Should I resubmit this one?


Sorry, I'm pretty backlogged from Christmas, but I have your patch
queued and hope to catch up until the end of this week.

^ permalink raw reply

* Re: SACK scoreboard
From: John Heffner @ 2008-01-09 18:23 UTC (permalink / raw)
  To: SANGTAE HA
  Cc: David Miller, andi, ilpo.jarvinen, lachlan.andrew, netdev,
	quetchen
In-Reply-To: <649aecc70801091014u50d1be26ndf5e59e0492ce9cd@mail.gmail.com>

SANGTAE HA wrote:
> On Jan 9, 2008 9:56 AM, John Heffner <jheffner@psc.edu> wrote:
>>>> I also wonder how much of a problem this is (for now, with window sizes
>>>> of order 10000 packets.  My understanding is that the biggest problems
>>>> arise from O(N^2) time for recovery because every ack was expensive.
>>>> Have current tests shown the final ack to be a major source of problems?
>>> Yes, several people have reported this.
>> I may have missed some of this.  Does anyone have a link to some recent
>> data?
> 
> I had some testing on this a month ago.
> A small set of recent results with linux 2.6.23.9 are at
> http://netsrv.csc.ncsu.edu/net-2.6.23.9/sack_efficiency
> One of serious cases with a large number of packet losses (initial
> loss is around 8000 packets) is at
> http://netsrv.csc.ncsu.edu/net-2.6.23.9/sack_efficiency/600--TCP-TCP-NONE--400-3-1.0--1000-120-0-0-1-1-5-500--1.0-0.5-133000-73-3000000-0.93-150--3/
> 
> Also, there is a comparison among three Linux kernels (2.6.13,
> 2.6.18-rc4, 2.6.20.3) at
> http://netsrv.csc.ncsu.edu/wiki/index.php/Efficiency_of_SACK_processing


If I'm reading this right, all these tests occur with large amounts of 
loss and tons of sack processing.  What would be most pertinent to this 
discussion would be a test with a large window, with delayed ack and 
sack disabled, and a single loss repaired by fast retransmit.  This 
would isolate the "single big ack" processing from other factors such as 
doubling the ack rate and sack processing.

I could probably set up such a test, but I don't want to duplicate 
effort if someone else already has done something similar.

Thanks,
   -John

^ permalink raw reply

* Re: [Bridge] Re: [ANNOUNCE] bridge-utils 1.4
From: Alon Bar-Lev @ 2008-01-09 18:16 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: Stephen Hemminger, bridge, netdev
In-Reply-To: <20080109063654.M87339@visp.net.lb>

On 1/9/08, Denys Fedoryshchenko <denys@visp.net.lb> wrote:
>
> As mentioned in
> http://marc.info/?l=linux-bridge&m=113105949718826&w=2
>
> Released package doesn't contain ./configure script
>
> For people who know what is make on, it is easy to run autoconf , but some
> know only how to use ./configure :-)

Also use of automake will be great!
If you like, I can provide new autoconf/automake build for this package.

Alon.

^ permalink raw reply

* Re: SACK scoreboard
From: SANGTAE HA @ 2008-01-09 18:14 UTC (permalink / raw)
  To: John Heffner
  Cc: David Miller, andi, ilpo.jarvinen, lachlan.andrew, netdev,
	quetchen
In-Reply-To: <4784E090.8080303@psc.edu>

On Jan 9, 2008 9:56 AM, John Heffner <jheffner@psc.edu> wrote:
> >> I also wonder how much of a problem this is (for now, with window sizes
> >> of order 10000 packets.  My understanding is that the biggest problems
> >> arise from O(N^2) time for recovery because every ack was expensive.
> >> Have current tests shown the final ack to be a major source of problems?
> >
> > Yes, several people have reported this.
>
> I may have missed some of this.  Does anyone have a link to some recent
> data?

I had some testing on this a month ago.
A small set of recent results with linux 2.6.23.9 are at
http://netsrv.csc.ncsu.edu/net-2.6.23.9/sack_efficiency
One of serious cases with a large number of packet losses (initial
loss is around 8000 packets) is at
http://netsrv.csc.ncsu.edu/net-2.6.23.9/sack_efficiency/600--TCP-TCP-NONE--400-3-1.0--1000-120-0-0-1-1-5-500--1.0-0.5-133000-73-3000000-0.93-150--3/

Also, there is a comparison among three Linux kernels (2.6.13,
2.6.18-rc4, 2.6.20.3) at
http://netsrv.csc.ncsu.edu/wiki/index.php/Efficiency_of_SACK_processing

Sangtae

^ permalink raw reply

* [PATCH net-2.6.25 19/19] [NETNS] Enable routing configuration in non-initial namespace.
From: Denis V. Lunev @ 2008-01-09 18:04 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A, benjamin.thery-6ktuUTfB/bM
In-Reply-To: <47850C57.60907-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>

I.e. remove the net != &init_net checks from the places, that now can handle
other-than-init net namespace.

Acked-by: Benjamin Thery <benjamin.thery-6ktuUTfB/bM@public.gmane.org>
Acked-by: Daniel Lezcano <dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Denis V. Lunev <den-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 net/ipv4/fib_frontend.c |   16 ----------------
 1 files changed, 0 insertions(+), 16 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index a5c47fc..a5e2fb3 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -572,9 +572,6 @@ static int inet_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *ar
 	struct fib_table *tb;
 	int err;
 
-	if (net != &init_net)
-		return -EINVAL;
-
 	err = rtm_to_fib_config(net, skb, nlh, &cfg);
 	if (err < 0)
 		goto errout;
@@ -597,9 +594,6 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *ar
 	struct fib_table *tb;
 	int err;
 
-	if (net != &init_net)
-		return -EINVAL;
-
 	err = rtm_to_fib_config(net, skb, nlh, &cfg);
 	if (err < 0)
 		goto errout;
@@ -625,9 +619,6 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	struct hlist_head *head;
 	int dumped = 0;
 
-	if (net != &init_net)
-		return 0;
-
 	if (nlmsg_len(cb->nlh) >= sizeof(struct rtmsg) &&
 	    ((struct rtmsg *) nlmsg_data(cb->nlh))->rtm_flags & RTM_F_CLONED)
 		return ip_rt_dump(skb, cb);
@@ -931,9 +922,6 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 	struct net_device *dev = ptr;
 	struct in_device *in_dev = __in_dev_get_rtnl(dev);
 
-	if (dev->nd_net != &init_net)
-		return NOTIFY_DONE;
-
 	if (event == NETDEV_UNREGISTER) {
 		fib_disable_ip(dev, 2);
 		return NOTIFY_DONE;
@@ -1013,10 +1001,6 @@ static int __net_init fib_net_init(struct net *net)
 {
 	int error;
 
-	error = 0;
-	if (net != &init_net)
-		goto out;
-
 	error = ip_fib_net_init(net);
 	if (error < 0)
 		goto out;
-- 
1.5.3.rc5

^ permalink raw reply related

* [PATCH net-2.6.25 18/19] [NETNS] Replace init_net with the correct context in fib_frontend.c
From: Denis V. Lunev @ 2008-01-09 18:04 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A, benjamin.thery-6ktuUTfB/bM
In-Reply-To: <47850C57.60907-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>

Acked-by: Benjamin Thery <benjamin.thery-6ktuUTfB/bM@public.gmane.org>
Acked-by: Daniel Lezcano <dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Denis V. Lunev <den-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 net/ipv4/fib_frontend.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index a5c47fc..a5e2fb3 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -597,7 +594,7 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *ar
 	if (err < 0)
 		goto errout;
 
-	tb = fib_new_table(&init_net, cfg.fc_table);
+	tb = fib_new_table(net, cfg.fc_table);
 	if (tb == NULL) {
 		err = -ENOBUFS;
 		goto errout;
@@ -794,7 +785,7 @@ static void fib_del_ifaddr(struct in_ifaddr *ifa)
 		fib_magic(RTM_DELROUTE, RTN_LOCAL, ifa->ifa_local, 32, prim);
 
 		/* Check, that this local address finally disappeared. */
-		if (inet_addr_type(&init_net, ifa->ifa_local) != RTN_LOCAL) {
+		if (inet_addr_type(dev->nd_net, ifa->ifa_local) != RTN_LOCAL) {
 			/* And the last, but not the least thing.
 			   We must flush stray FIB entries.
 
@@ -802,7 +793,7 @@ static void fib_del_ifaddr(struct in_ifaddr *ifa)
 			   for stray nexthop entries, then ignite fib_flush.
 			*/
 			if (fib_sync_down(ifa->ifa_local, NULL, 0))
-				fib_flush(&init_net);
+				fib_flush(dev->nd_net);
 		}
 	}
 #undef LOCAL_OK
@@ -894,7 +885,7 @@ static void nl_fib_lookup_exit(struct net *net)
 static void fib_disable_ip(struct net_device *dev, int force)
 {
 	if (fib_sync_down(0, dev, force))
-		fib_flush(&init_net);
+		fib_flush(dev->nd_net);
 	rt_cache_flush(0);
 	arp_ifdown(dev);
 }
-- 
1.5.3.rc5

^ permalink raw reply related

* [PATCH net-2.6.25 17/19] [NETNS] Pass namespace through ip_rt_ioctl.
From: Denis V. Lunev @ 2008-01-09 18:04 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A, benjamin.thery-6ktuUTfB/bM
In-Reply-To: <47850C57.60907-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>

... up to rtentry_to_fib_config

Acked-by: Benjamin Thery <benjamin.thery-6ktuUTfB/bM@public.gmane.org>
Acked-by: Daniel Lezcano <dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Denis V. Lunev <den-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 include/net/route.h     |    2 +-
 net/ipv4/af_inet.c      |    2 +-
 net/ipv4/fib_frontend.c |    8 ++++----
 net/ipv4/ipconfig.c     |    2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index b777000..5847e6f 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -120,7 +120,7 @@ extern void		ip_rt_send_redirect(struct sk_buff *skb);
 extern unsigned		inet_addr_type(struct net *net, __be32 addr);
 extern unsigned		inet_dev_addr_type(struct net *net, const struct net_device *dev, __be32 addr);
 extern void		ip_rt_multicast_event(struct in_device *);
-extern int		ip_rt_ioctl(unsigned int cmd, void __user *arg);
+extern int		ip_rt_ioctl(struct net *, unsigned int cmd, void __user *arg);
 extern void		ip_rt_get_source(u8 *src, struct rtable *rt);
 extern int		ip_rt_dump(struct sk_buff *skb,  struct netlink_callback *cb);
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 6fac905..f0968a1 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -793,7 +793,7 @@ int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 		case SIOCADDRT:
 		case SIOCDELRT:
 		case SIOCRTMSG:
-			err = ip_rt_ioctl(cmd, (void __user *)arg);
+			err = ip_rt_ioctl(sk->sk_net, cmd, (void __user *)arg);
 			break;
 		case SIOCDARP:
 		case SIOCGARP:
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index cecb660..a5c47fc 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -437,7 +437,7 @@ static int rtentry_to_fib_config(struct net *net, int cmd, struct rtentry *rt,
  *	Handle IP routing ioctl calls. These are used to manipulate the routing tables
  */
 
-int ip_rt_ioctl(unsigned int cmd, void __user *arg)
+int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 {
 	struct fib_config cfg;
 	struct rtentry rt;
@@ -453,18 +453,18 @@ int ip_rt_ioctl(unsigned int cmd, void __user *arg)
 			return -EFAULT;
 
 		rtnl_lock();
-		err = rtentry_to_fib_config(&init_net, cmd, &rt, &cfg);
+		err = rtentry_to_fib_config(net, cmd, &rt, &cfg);
 		if (err == 0) {
 			struct fib_table *tb;
 
 			if (cmd == SIOCDELRT) {
-				tb = fib_get_table(&init_net, cfg.fc_table);
+				tb = fib_get_table(net, cfg.fc_table);
 				if (tb)
 					err = tb->tb_delete(tb, &cfg);
 				else
 					err = -ESRCH;
 			} else {
-				tb = fib_new_table(&init_net, cfg.fc_table);
+				tb = fib_new_table(net, cfg.fc_table);
 				if (tb)
 					err = tb->tb_insert(tb, &cfg);
 				else
diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 7288adb..7a89247 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -302,7 +302,7 @@ static int __init ic_route_ioctl(unsigned int cmd, struct rtentry *arg)
 
 	mm_segment_t oldfs = get_fs();
 	set_fs(get_ds());
-	res = ip_rt_ioctl(cmd, (void __user *) arg);
+	res = ip_rt_ioctl(&init_net, cmd, (void __user *) arg);
 	set_fs(oldfs);
 	return res;
 }
-- 
1.5.3.rc5

^ permalink raw reply related

* [PATCH net-2.6.25 16/19] [NETNS] Correctly fill fib_config data.
From: Denis V. Lunev @ 2008-01-09 18:04 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A, benjamin.thery-6ktuUTfB/bM
In-Reply-To: <47850C57.60907-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>

Acked-by: Benjamin Thery <benjamin.thery-6ktuUTfB/bM@public.gmane.org>
Acked-by: Daniel Lezcano <dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 
Signed-off-by: Denis V. Lunev <den-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 net/ipv4/fib_frontend.c |   27 ++++++++++++++-------------
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index abe9f43..cecb660 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -305,14 +305,14 @@ static int put_rtax(struct nlattr *mx, int len, int type, u32 value)
 	return len + nla_total_size(4);
 }
 
-static int rtentry_to_fib_config(int cmd, struct rtentry *rt,
+static int rtentry_to_fib_config(struct net *net, int cmd, struct rtentry *rt,
 				 struct fib_config *cfg)
 {
 	__be32 addr;
 	int plen;
 
 	memset(cfg, 0, sizeof(*cfg));
-	cfg->fc_nlinfo.nl_net = &init_net;
+	cfg->fc_nlinfo.nl_net = net;
 
 	if (rt->rt_dst.sa_family != AF_INET)
 		return -EAFNOSUPPORT;
@@ -373,7 +373,7 @@ static int rtentry_to_fib_config(int cmd, struct rtentry *rt,
 		colon = strchr(devname, ':');
 		if (colon)
 			*colon = 0;
-		dev = __dev_get_by_name(&init_net, devname);
+		dev = __dev_get_by_name(net, devname);
 		if (!dev)
 			return -ENODEV;
 		cfg->fc_oif = dev->ifindex;
@@ -396,7 +396,7 @@ static int rtentry_to_fib_config(int cmd, struct rtentry *rt,
 	if (rt->rt_gateway.sa_family == AF_INET && addr) {
 		cfg->fc_gw = addr;
 		if (rt->rt_flags & RTF_GATEWAY &&
-		    inet_addr_type(&init_net, addr) == RTN_UNICAST)
+		    inet_addr_type(net, addr) == RTN_UNICAST)
 			cfg->fc_scope = RT_SCOPE_UNIVERSE;
 	}
 
@@ -453,7 +453,7 @@ int ip_rt_ioctl(unsigned int cmd, void __user *arg)
 			return -EFAULT;
 
 		rtnl_lock();
-		err = rtentry_to_fib_config(cmd, &rt, &cfg);
+		err = rtentry_to_fib_config(&init_net, cmd, &rt, &cfg);
 		if (err == 0) {
 			struct fib_table *tb;
 
@@ -494,8 +494,8 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX+1] = {
 	[RTA_FLOW]		= { .type = NLA_U32 },
 };
 
-static int rtm_to_fib_config(struct sk_buff *skb, struct nlmsghdr *nlh,
-			     struct fib_config *cfg)
+static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
+			    struct nlmsghdr *nlh, struct fib_config *cfg)
 {
 	struct nlattr *attr;
 	int err, remaining;
@@ -519,7 +519,7 @@ static int rtm_to_fib_config(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	cfg->fc_nlinfo.pid = NETLINK_CB(skb).pid;
 	cfg->fc_nlinfo.nlh = nlh;
-	cfg->fc_nlinfo.nl_net = &init_net;
+	cfg->fc_nlinfo.nl_net = net;
 
 	if (cfg->fc_type > RTN_MAX) {
 		err = -EINVAL;
@@ -575,7 +575,7 @@ static int inet_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *ar
 	if (net != &init_net)
 		return -EINVAL;
 
-	err = rtm_to_fib_config(skb, nlh, &cfg);
+	err = rtm_to_fib_config(net, skb, nlh, &cfg);
 	if (err < 0)
 		goto errout;
 
@@ -600,7 +600,7 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *ar
 	if (net != &init_net)
 		return -EINVAL;
 
-	err = rtm_to_fib_config(skb, nlh, &cfg);
+	err = rtm_to_fib_config(net, skb, nlh, &cfg);
 	if (err < 0)
 		goto errout;
 
@@ -667,6 +667,7 @@ out:
 
 static void fib_magic(int cmd, int type, __be32 dst, int dst_len, struct in_ifaddr *ifa)
 {
+	struct net *net = ifa->ifa_dev->dev->nd_net;
 	struct fib_table *tb;
 	struct fib_config cfg = {
 		.fc_protocol = RTPROT_KERNEL,
@@ -677,14 +678,14 @@ static void fib_magic(int cmd, int type, __be32 dst, int dst_len, struct in_ifad
 		.fc_oif = ifa->ifa_dev->dev->ifindex,
 		.fc_nlflags = NLM_F_CREATE | NLM_F_APPEND,
 		.fc_nlinfo = {
-			.nl_net = &init_net,
+			.nl_net = net,
 		},
 	};
 
 	if (type == RTN_UNICAST)
-		tb = fib_new_table(&init_net, RT_TABLE_MAIN);
+		tb = fib_new_table(net, RT_TABLE_MAIN);
 	else
-		tb = fib_new_table(&init_net, RT_TABLE_LOCAL);
+		tb = fib_new_table(net, RT_TABLE_LOCAL);
 
 	if (tb == NULL)
 		return;
-- 
1.5.3.rc5

^ permalink raw reply related

* [PATCH net-2.6.25 15/19] [NETNS] Provide correct namespace for fibnl netlink socket.
From: Denis V. Lunev @ 2008-01-09 18:04 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, containers-qjLDD68F18O7TbgM5vRIOg,
	xemul-GEFAQzZX7r8dnm+yROfE0A, benjamin.thery-6ktuUTfB/bM
In-Reply-To: <47850C57.60907-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>

This patch makes the netlink socket to be per namespace. That allows to have
each namespace its own socket for routing queries.

Acked-by: Benjamin Thery <benjamin.thery-6ktuUTfB/bM@public.gmane.org>
Acked-by: Daniel Lezcano <dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Denis V. Lunev <den-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
---
 include/net/netns/ipv4.h |    2 ++
 net/ipv4/fib_frontend.c  |   24 ++++++++++++++++--------
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 629ec6c..031d761 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -9,6 +9,7 @@ struct ctl_table_header;
 struct ipv4_devconf;
 struct fib_rules_ops;
 struct hlist_head;
+struct sock;
 
 struct netns_ipv4 {
 #ifdef CONFIG_SYSCTL
@@ -18,5 +19,6 @@ struct netns_ipv4 {
 	struct fib_rules_ops	*rules_ops;
 #endif
 	struct hlist_head	*fib_table_hash;
+	struct sock		*fibnl;
 };
 #endif
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 7fe54a3..a5e8167 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -49,8 +49,6 @@
 
 #define FFprint(a...) printk(KERN_DEBUG a)
 
-static struct sock *fibnl;
-
 #ifndef CONFIG_IP_MULTIPLE_TABLES
 
 static int __net_init fib4_rules_init(struct net *net)
@@ -845,11 +843,13 @@ static void nl_fib_lookup(struct fib_result_nl *frn, struct fib_table *tb )
 
 static void nl_fib_input(struct sk_buff *skb)
 {
+	struct net *net;
 	struct fib_result_nl *frn;
 	struct nlmsghdr *nlh;
 	struct fib_table *tb;
 	u32 pid;
 
+	net = skb->sk->sk_net;
 	nlh = nlmsg_hdr(skb);
 	if (skb->len < NLMSG_SPACE(0) || skb->len < nlh->nlmsg_len ||
 	    nlh->nlmsg_len < NLMSG_LENGTH(sizeof(*frn)))
@@ -861,28 +861,36 @@ static void nl_fib_input(struct sk_buff *skb)
 	nlh = nlmsg_hdr(skb);
 
 	frn = (struct fib_result_nl *) NLMSG_DATA(nlh);
-	tb = fib_get_table(&init_net, frn->tb_id_in);
+	tb = fib_get_table(net, frn->tb_id_in);
 
 	nl_fib_lookup(frn, tb);
 
 	pid = NETLINK_CB(skb).pid;       /* pid of sending process */
 	NETLINK_CB(skb).pid = 0;         /* from kernel */
 	NETLINK_CB(skb).dst_group = 0;  /* unicast */
-	netlink_unicast(fibnl, skb, pid, MSG_DONTWAIT);
+	netlink_unicast(net->ipv4.fibnl, skb, pid, MSG_DONTWAIT);
 }
 
 static int nl_fib_lookup_init(struct net *net)
 {
-	fibnl = netlink_kernel_create(net, NETLINK_FIB_LOOKUP, 0,
-				      nl_fib_input, NULL, THIS_MODULE);
-	if (fibnl == NULL)
+	struct sock *sk;
+	sk = netlink_kernel_create(net, NETLINK_FIB_LOOKUP, 0,
+				   nl_fib_input, NULL, THIS_MODULE);
+	if (sk == NULL)
 		return -EAFNOSUPPORT;
+	/* Don't hold an extra reference on the namespace */
+	put_net(sk->sk_net);
+	net->ipv4.fibnl = sk;
 	return 0;
 }
 
 static void nl_fib_lookup_exit(struct net *net)
 {
-	sock_put(fibnl);
+	/* At the last minute lie and say this is a socket for the
+	 * initial network namespace. So the socket will  be safe to free.
+	 */
+	net->ipv4.fibnl->sk_net = get_net(&init_net);
+	sock_put(net->ipv4.fibnl);
 }
 
 static void fib_disable_ip(struct net_device *dev, int force)

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox