Netdev List
 help / color / mirror / Atom feed
* Re: net-next: warnings from sysctl_net_exit
From: Stephen Hemminger @ 2011-02-27 18:45 UTC (permalink / raw)
  To: David Miller; +Cc: adobriyan, netdev
In-Reply-To: <20110226.222333.59680338.davem@davemloft.net>

On Sat, 26 Feb 2011 22:23:33 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Sat, 26 Feb 2011 16:56:01 -0800
> 
> > Seeing lots of these messages in dmesg. Something is broken
> > recently in net-next.
> 
> Did you by change pull plain net-2.6 into that tree?  Because one
> commit which is in net-2.6 but not in net-next-2.6 catches my eye:
> 
> commit c486da34390846b430896a407b47f0cea3a4189c
> Author: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
> Date:   Thu Feb 24 19:48:03 2011 +0000
> 
>     sysctl: ipv6: use correct net in ipv6_sysctl_rtcache_flush
>     
>     Before this patch issuing these commands:
>     
>       fd = open("/proc/sys/net/ipv6/route/flush")
>       unshare(CLONE_NEWNET)
>       write(fd, "stuff")
>     
>     would flush the newly created net, not the original one.
>     
>     The equivalent ipv4 code is correct (stores the net inside ->extra1).
>     Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
>     
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 

I am building against pure net-next tree. Just checked by recloning.

-- 

^ permalink raw reply

* Re: [PATCH] Bluetooth: Fix BT_L2CAP and BT_SCO in Kconfig
From: Gustavo F. Padovan @ 2011-02-27 19:19 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linville-2XuSBdqkA4R54TAoqtyWWQ,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <AANLkTimuyx4ixPq5BexRHozebVpjry2k-2xskqVf79g1-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Hi Vitaly,

* Vitaly Wool <vitalywool-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> [2011-02-26 18:52:44 +0100]:

> Hi Gustavo,
> 
> On Sat, Feb 26, 2011 at 2:41 AM, Gustavo F. Padovan
> <padovan-Y3ZbgMPKUGA34EUeqzHoZw@public.gmane.org> wrote:
> > If we want something "bool" built-in in something "tristate" it can't
> > "depend on" the tristate config option.
> >
> > Report by DaveM:
> >
> >   I give it 'y' just to make it happen, for both, and afterways no
> >   matter how many times I rerun "make oldconfig" I keep seeing things
> >   like this in my build:
> >
> > scripts/kconfig/conf --silentoldconfig Kconfig
> > include/config/auto.conf:986:warning: symbol value 'm' invalid for BT_SCO
> > include/config/auto.conf:3156:warning: symbol value 'm' invalid for BT_L2CAP
> >
> > Reported-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> > Signed-off-by: Gustavo F. Padovan <padovan-Y3ZbgMPKUGA34EUeqzHoZw@public.gmane.org>
> > ---
> >  net/bluetooth/Kconfig |    6 ++++--
> >  1 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/bluetooth/Kconfig b/net/bluetooth/Kconfig
> > index c6f9c2f..6ae5ec5 100644
> > --- a/net/bluetooth/Kconfig
> > +++ b/net/bluetooth/Kconfig
> > @@ -31,9 +31,10 @@ menuconfig BT
> >          to Bluetooth kernel modules are provided in the BlueZ packages.  For
> >          more information, see <http://www.bluez.org/>.
> >
> > +if BT != n
> > +
> >  config BT_L2CAP
> >        bool "L2CAP protocol support"
> > -       depends on BT
> >        select CRC16
> >        help
> >          L2CAP (Logical Link Control and Adaptation Protocol) provides
> > @@ -42,11 +43,12 @@ config BT_L2CAP
> >
> >  config BT_SCO
> >        bool "SCO links support"
> > -       depends on BT
> >        help
> >          SCO link provides voice transport over Bluetooth.  SCO support is
> >          required for voice applications like Headset and Audio.
> >
> > +endif
> > +
> 
> Ugh, isn't it far cleaner to change initial dependencies to "depends
> on BT != n" ?

I just followed the same approach as net/ipv6/Kconfig and
net/mac80211/Kconfig. 

Dave, how do you prefer this?

-- 
Gustavo F. Padovan
http://profusion.mobi

^ permalink raw reply

* Re: [Lxc-users] Bad checksums and lost packets with macvlan on dummy
From: Eric Dumazet @ 2011-02-27 19:50 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Andrian Nord, lxc-users, Patrick McHardy, Linux Netdev List
In-Reply-To: <4D6A6A5F.4030707@free.fr>

Le dimanche 27 février 2011 à 16:14 +0100, Daniel Lezcano a écrit :
> On 02/23/2011 06:13 PM, Andrian Nord wrote:
> > On Mon, Feb 21, 2011 at 05:07:31PM +0100, Daniel Lezcano wrote:
> >> I Cc'ed the netdev mailing list and Patrick in case my analysis is wrong
> >> or incomplete.
> > I'm confirming, that this happens only when macvlan's are onto dummy net
> > device. In case of some physical interface under macvlan there is no lost
> > packages and no broken checksums.
> 
> I did some tests with a 2.6.35 kernel version and it seems the checksum 
> errors do not appear.
> I noticed there are some changes in the dummy setup function:
> 
>    dev->features   |= NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_TSO;
>    dev->features   |= NETIF_F_NO_CSUM | NETIF_F_HIGHDMA | NETIF_F_LLTX;
> 
> 
> May be that was introduced by commit:
> 
> commit 6d81f41c58c69ddde497e9e640ba5805aa26e78c
> Author: Eric Dumazet <eric.dumazet@gmail.com>
> Date:   Mon Sep 27 20:50:33 2010 +0000
> 
>      dummy: percpu stats and lockless xmit
> 
>      Converts dummy network device driver to :
> 
>      - percpu stats
> 
>      - 64bit stats
> 
>      - lockless xmit (NETIF_F_LLTX)
> 
>      - performance features added (NETIF_F_SG | NETIF_F_FRAGLIST |
>      NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA)
> 
>      Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>      Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> 
> Eric,
> 
> Andrian is observing, with a couple of macvlan (in bridge mode) on top 
> of a dummy interface, a lot of checksums error and packets drop.
> Each macvlan is in a different network namespace and the dummy interface 
> is in the init_net.
> 
> Any ideas ?

Not sure I understand... I thought dummy was dropping all frames
anyway ?

static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct net_device *dev)
{
        struct pcpu_dstats *dstats = this_cpu_ptr(dev->dstats);

        u64_stats_update_begin(&dstats->syncp);
        dstats->tx_packets++;
        dstats->tx_bytes += skb->len;
        u64_stats_update_end(&dstats->syncp);

        dev_kfree_skb(skb);
        return NETDEV_TX_OK;
}


Maybe you could describe the setup ?



^ permalink raw reply

* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Jiri Pirko @ 2011-02-27 20:06 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: David Miller, kaber, eric.dumazet, netdev, shemminger, fubar,
	andy
In-Reply-To: <4D6A5CDD.4020009@gmail.com>

Sun, Feb 27, 2011 at 03:17:01PM CET, nicolas.2p.debian@gmail.com wrote:
>Le 23/02/2011 20:05, Jiri Pirko a écrit :
>>This patch converts bonding to use rx_handler. Results in cleaner
>>__netif_receive_skb() with much less exceptions needed. Also
>>bond-specific work is moved into bond code.
>>
>>Did performance test using pktgen and counting incoming packets by
>>iptables. No regression noted.
>>
>>Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>
>>v1->v2:
>>         using skb_iif instead of new input_dev to remember original
>>	device
>>
>>v2->v3:
>>	do another loop in case skb->dev is changed. That way orig_dev
>>	core can be left untouched.
>>
>>Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>---
>
>[snip]
>
>>+static struct sk_buff *bond_handle_frame(struct sk_buff *skb)
>>+{
>>+	struct net_device *slave_dev;
>>+	struct net_device *bond_dev;
>>+
>>+	skb = skb_share_check(skb, GFP_ATOMIC);
>>+	if (unlikely(!skb))
>>+		return NULL;
>>+	slave_dev = skb->dev;
>>+	bond_dev = ACCESS_ONCE(slave_dev->master);
>>+	if (unlikely(!bond_dev))
>>+		return skb;
>>+
>>+	if (bond_dev->priv_flags&  IFF_MASTER_ARPMON)
>>+		slave_dev->last_rx = jiffies;
>>+
>>+	if (bond_should_deliver_exact_match(skb, slave_dev, bond_dev)) {
>>+		skb->deliver_no_wcard = 1;
>>+		return skb;
>
>Shouldn't we return NULL here ?

No we shouldn't. We need sbk to be delivered to exact match.

>
>>+	}
>>+
>>+	skb->dev = bond_dev;
>>+
>>+	if (bond_dev->priv_flags&  IFF_MASTER_ALB&&
>>+	    bond_dev->priv_flags&  IFF_BRIDGE_PORT&&
>>+	    skb->pkt_type == PACKET_HOST) {
>>+		u16 *dest = (u16 *) eth_hdr(skb)->h_dest;
>>+
>>+		memcpy(dest, bond_dev->dev_addr, ETH_ALEN);
>>+	}
>>+
>>+	return skb;
>>+}
>>+
>
>[snip]
>
>>+static void vlan_on_bond_hook(struct sk_buff *skb)
>>  {
>>-	if (skb->pkt_type == PACKET_HOST) {
>>-		u16 *dest = (u16 *) eth_hdr(skb)->h_dest;
>>+	/*
>>+	 * Make sure ARP frames received on VLAN interfaces stacked on
>>+	 * bonding interfaces still make their way to any base bonding
>>+	 * device that may have registered for a specific ptype.
>>+	 */
>>+	if (skb->dev->priv_flags&  IFF_802_1Q_VLAN&&
>>+	    vlan_dev_real_dev(skb->dev)->priv_flags&  IFF_BONDING&&
>>+	    skb->protocol == htons(ETH_P_ARP)) {
>
>The vlan_on_bond case used to be cost effective. Now, we clone the skb and call netif_rx...

This should not cost too much overhead considering only few packets are
going thru this. This hook shouldn't have exited in the fisrt place. I
think introducing this functionality was a big mistake.
>
>>+		struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
>>
>>-		memcpy(dest, master->dev_addr, ETH_ALEN);
>>+		if (!skb2)
>>+			return;
>>+		skb2->dev = vlan_dev_real_dev(skb->dev);
>>+		netif_rx(skb2);
>>  	}
>>  }
>
>[snip]
>
>>  	if (rx_handler) {
>>+		struct net_device *prev_dev;
>>+
>>  		if (pt_prev) {
>>  			ret = deliver_skb(skb, pt_prev, orig_dev);
>>  			pt_prev = NULL;
>>  		}
>>+		prev_dev = skb->dev;
>>  		skb = rx_handler(skb);
>>  		if (!skb)
>>  			goto out;
>
>I would instead consider NULL as meaning exact-match-delivery-only.
>(The same effect as dev_bond_should_drop() returning true).

we can change the behaviour later on.

>
>>+		if (skb->dev != prev_dev)
>>+			goto another_round;
>>  	}
>
>Anyway, all my comments can't be postponed to follow-up patchs. Thanks Jiri.
>
>Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>

^ permalink raw reply

* Re: txqueuelen has wrong units; should be time
From: Eric Dumazet @ 2011-02-27 20:07 UTC (permalink / raw)
  To: Jussi Kivilinna; +Cc: Albert Cahalan, Mikael Abrahamsson, linux-kernel, netdev
In-Reply-To: <20110227125540.40754c5y78j9u2m8@hayate.sektori.org>

Le dimanche 27 février 2011 à 12:55 +0200, Jussi Kivilinna a écrit :
> Quoting Albert Cahalan <acahalan@gmail.com>:
> 
> > On Sun, Feb 27, 2011 at 2:54 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> Le dimanche 27 février 2011 à 08:02 +0100, Mikael Abrahamsson a écrit :
> >>> On Sun, 27 Feb 2011, Albert Cahalan wrote:
> >>>
> >>> > Nanoseconds seems fine; it's unlikely you'd ever want
> >>> > more than 4.2 seconds (32-bit unsigned) of queue.
> > ...
> >> Problem is some machines have slow High Resolution timing services.
> >>
> >> _If_ we have a time limit, it will probably use the low resolution (aka
> >> jiffies), unless high resolution services are cheap.
> >
> > As long as that is totally internal to the kernel and never
> > getting exposed by some API for setting the amount, sure.
> >
> >> I was thinking not having an absolute hard limit, but an EWMA based one.
> >
> > The whole point is to prevent stale packets, especially to prevent
> > them from messing with TCP, so I really don't think so. I suppose
> > you do get this to some extent via early drop.
> 
> I made simple hack on sch_fifo with per packet time limits  
> (attachment) this weekend and have been doing limited testing on  
> wireless link. I think hardlimit is fine, it's simple and does  
> somewhat same as what packet(-hard)limited buffer does, drops packets  
> when buffer is 'full'. My hack checks for timed out packets on  
> enqueue, might be wrong approach (on other hand might allow some more  
> burstiness).
> 


Qdisc should return to caller a good indication packet is queued or
dropped at enqueue() time... not later (aka : never)

Accepting a packet at t0, and dropping it later at t0+limit without
giving any indication to caller is a problem.

This is why I suggested using an EWMA plus a probabilist drop or
congestion indication (NET_XMIT_CN) to caller at enqueue() time.

The absolute time limit you are trying to implement should be checked at
dequeue time, to cope with enqueue bursts or pauses on wire.

^ permalink raw reply

* Re: Bad checksums and lost packets with macvlan on dummy
From: Daniel Lezcano @ 2011-02-27 20:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrian Nord, Linux Netdev List,
	lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Patrick McHardy
In-Reply-To: <1298836236.8726.109.camel@edumazet-laptop>

On 02/27/2011 08:50 PM, Eric Dumazet wrote:
> Le dimanche 27 février 2011 à 16:14 +0100, Daniel Lezcano a écrit :
>> On 02/23/2011 06:13 PM, Andrian Nord wrote:
>>> On Mon, Feb 21, 2011 at 05:07:31PM +0100, Daniel Lezcano wrote:
>>>> I Cc'ed the netdev mailing list and Patrick in case my analysis is wrong
>>>> or incomplete.
>>> I'm confirming, that this happens only when macvlan's are onto dummy net
>>> device. In case of some physical interface under macvlan there is no lost
>>> packages and no broken checksums.
>> I did some tests with a 2.6.35 kernel version and it seems the checksum
>> errors do not appear.
>> I noticed there are some changes in the dummy setup function:
>>
>>     dev->features   |= NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_TSO;
>>     dev->features   |= NETIF_F_NO_CSUM | NETIF_F_HIGHDMA | NETIF_F_LLTX;
>>
>>
>> May be that was introduced by commit:
>>
>> commit 6d81f41c58c69ddde497e9e640ba5805aa26e78c
>> Author: Eric Dumazet<eric.dumazet@gmail.com>
>> Date:   Mon Sep 27 20:50:33 2010 +0000
>>
>>       dummy: percpu stats and lockless xmit
>>
>>       Converts dummy network device driver to :
>>
>>       - percpu stats
>>
>>       - 64bit stats
>>
>>       - lockless xmit (NETIF_F_LLTX)
>>
>>       - performance features added (NETIF_F_SG | NETIF_F_FRAGLIST |
>>       NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA)
>>
>>       Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
>>       Signed-off-by: David S. Miller<davem@davemloft.net>
>>
>>
>> Eric,
>>
>> Andrian is observing, with a couple of macvlan (in bridge mode) on top
>> of a dummy interface, a lot of checksums error and packets drop.
>> Each macvlan is in a different network namespace and the dummy interface
>> is in the init_net.
>>
>> Any ideas ?
> Not sure I understand... I thought dummy was dropping all frames
> anyway ?
>
> static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct net_device *dev)
> {
>          struct pcpu_dstats *dstats = this_cpu_ptr(dev->dstats);
>
>          u64_stats_update_begin(&dstats->syncp);
>          dstats->tx_packets++;
>          dstats->tx_bytes += skb->len;
>          u64_stats_update_end(&dstats->syncp);
>
>          dev_kfree_skb(skb);
>          return NETDEV_TX_OK;
> }
>
>
> Maybe you could describe the setup ?

Yes, it is very simple.

There are two network namespaces.

macvlan1 is in network namespace 1
macvlan2 is in network namespace 2

Both are in "bridge" mode, so they can communicate together.
The lower device is dummy0 in the init network namespace.

IMO the problem is coming from the macvlan driver:

dev->features           = lowerdev->features & MACVLAN_FEATURES

As dummy0 has the offloading capabilities set on, the macvlan driver 
inherit these features.

In the normal case, dummy0 is supposed to drop the packets. But with 
macvlan these packets are broadcasted to the other macvlan ports, so no 
checksum is computed when the packets are transmitted between macvlan1 
and macvlan2.

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Lxc-users mailing list
Lxc-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-users

^ permalink raw reply

* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Nicolas de Pesloüan @ 2011-02-27 20:44 UTC (permalink / raw)
  To: Jiri Pirko, Jay Vosburgh
  Cc: David Miller, kaber, eric.dumazet, netdev, shemminger, andy,
	Fischer, Anna
In-Reply-To: <20110227125816.GB2814@psychotron.redhat.com>

Le 27/02/2011 13:58, Jiri Pirko a écrit :
> Sat, Feb 26, 2011 at 08:42:57PM CET, fubar@us.ibm.com wrote:
>> Nicolas de Pesloüan 	<nicolas.2p.debian@gmail.com>  wrote:
>>> Hi Jay,
>>>
>>> Still thinking about this orig_dev stuff, I wonder why the protocol
>>> handlers used in bonding (bond_3ad_lacpdu_recv() and bond_arp_rcv()) are
>>> registered at the master level instead of at the slave level ?
>>>
>>> If they were registered at the slave level, they would simply receive
>>> skb->dev as the ingress interface and use this value instead of needing
>>> the orig_dev value given to them when they are registered at the master
>>> level.
>>>
>>> As orig_dev is only used by bonding and by af_packet, but they disagree on
>>> the exact meaning of orig_dev, one way to fix this discrepancy would be to
>>> remove one of the usage. As the af_packet usage is exposed to user space,
>>> bonding seems the right place to stop using orig_dev, even if orig_dev was
>>> introduced for bonding :-)
>>>
>>> I understand that this would add one entry per slave device to the
>>> ptype_base list, but this seems to be the only bad effect of registering
>>> at the slave level. Can you confirm that this was the reason to register
>>> at the master level instead?
>>
>> 	My recollection is that it was done the way it is because there
>> was no "orig_dev" delivery logic at the time.  A handler registered to a
>> slave dev would receive no packets at all because assignment of skb->dev
>> to the master happened first, and the "orig_dev" knowledge was lost.
>>
>> 	When 802.3ad was added, a skb->real_dev field was created, but
>> it wasn't used for delivery.  802.3ad used real_dev to figure out which
>> slave a LACPDU arrived on.  The skb->real_dev was eventually replaced
>> with the orig_dev business that's there now.
>>
>> 	Later, I did the arp_validate stuff the same way as 802.3ad
>> because it worked and was easier than registering a handler per slave.
>>
>>> If you think registering at the slave level would cause too much impact on
>>> ptype_base, then we might have another way to stop using orig_dev for
>>> bonding:
>>>
>>> In __skb_bond_should_drop(), we already test for the two interesting protocols:
>>>
>>> if ((dev->priv_flags&  IFF_SLAVE_NEEDARP)&&  skb->protocol == __cpu_to_be16(ETH_P_ARP))
>>> 	return 0;
>>>
>>> if (master->priv_flags&  IFF_MASTER_8023AD&&  skb->protocol == __cpu_to_be16(ETH_P_SLOW))
>>> 	return 0;
>>>
>>> Would it be possible to call the right handlers directly from inside
>>> __skb_bond_should_drop() then let __skb_bond_should_drop() return 1
>>> ("should drop") after processing the frames that are only of interest for
>>> bonding?
>>
>> 	Isn't one purpose of switching to rx_handler that there won't
>> need to be any skb_bond_should_drop logic in __netif_receive_skb at all?
>
> Yes, that (hopefully most)  would be eventually removed.

The skb_bond_should_drop logic was simply moved from dev.c to 
bond_should_deliver_exact_match@bond_main.c by Jiri's patch.

But the logic remain and is necessary to decide whether we do normal delivery or only exact match 
delivery.

>> 	Still, if you're just trying to simplify __netif_receive_skb
>> first, I don't see any reason not to register the packet handlers at the
>> slave level.  Looking at the ptype_base hash, I don't think that the
>> protocols bonding is registering (ARP and SLOW) will hash collide with
>> IP or IPv6, so I suspect there won't be much impact.
>>
>> 	Once an rx_handler is used, then I suspect there's no need for
>> the packet handlers at all, since the rx_handler is within bonding and
>> can just deal with the ARP or LACPDU directly.
>
> That is very true. And given that af_packet uses orig_dev to obtain
> ifindex, it can be replaced by skb->skb_iif. That way we can get rid of
> orig_dev parameter for good.

Unfortunately, after doing some more research, I'm afraid we won't be able to suppress at least the 
ARP packet handler:

In commit 1f3c8804acba841b5573b953f5560d2683d2db0d (bonding: allow arp_ip_targets on separate vlans 
to use arp validation), Andy solved the problem of vlan on top of bonding, when the arp_ip_target is 
on one of the vlans:

eth0/eth1 -> bond0 -> bond0.100

At the time the frame is inspected by bonding, the frame is still tagged. This is true for the new 
rx_handler proposed by Jiri, and is also true for the former __skb_bond_should_drop() handling).

To receive the untagged frame, we would have to wait until the vlan code remove the tag. The current 
protocol handler seems to be the best way to catch the frame that late.

This is probably specific to ARP. I don't think SLOW frames can be tagged.

Anyway, Jay, thanks for you clarification.

> So I suggest to take V3 of my patch now and do multiple follow-on
> patches to get us where we want to get.

Agreed.

	Nicolas.

^ permalink raw reply

* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Nicolas de Pesloüan @ 2011-02-27 20:59 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Miller, kaber, eric.dumazet, netdev, shemminger, fubar,
	andy
In-Reply-To: <20110227200628.GA2984@psychotron.redhat.com>

Le 27/02/2011 21:06, Jiri Pirko a écrit :
> Sun, Feb 27, 2011 at 03:17:01PM CET, nicolas.2p.debian@gmail.com wrote:

>>> +	if (bond_should_deliver_exact_match(skb, slave_dev, bond_dev)) {
>>> +		skb->deliver_no_wcard = 1;
>>> +		return skb;
>>
>> Shouldn't we return NULL here ?
>
> No we shouldn't. We need sbk to be delivered to exact match.

So, if I understand properly:

- If skb->dev changed, loop,
- else, if skb->deliver_no_wcard, do exact match delivery only,
- Else, if !skb, drop the frame, without ever exact match delivery,
- Else, do normal delivery.

Right?

>> The vlan_on_bond case used to be cost effective. Now, we clone the skb and call netif_rx...
>
> This should not cost too much overhead considering only few packets are
> going thru this. This hook shouldn't have exited in the fisrt place. I
> think introducing this functionality was a big mistake.

What would you have proposed instead?

Anyway, I think the feature is broken, because it wouldn't provide the expected effect on the 
following configuration:

eth0/eth1 -> bond0 -> br0 -> br0.100.

We probably need a more general way to fix this, after your patch have been accepted.

[snip]

>> I would instead consider NULL as meaning exact-match-delivery-only.
>> (The same effect as dev_bond_should_drop() returning true).
>
> we can change the behaviour later on.

Agreed.

	Nicolas.

^ permalink raw reply

* Re: txqueuelen has wrong units; should be time
From: Jussi Kivilinna @ 2011-02-27 21:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Albert Cahalan, Mikael Abrahamsson, linux-kernel, netdev
In-Reply-To: <1298837273.8726.128.camel@edumazet-laptop>

Quoting Eric Dumazet <eric.dumazet@gmail.com>:

> Le dimanche 27 février 2011 à 12:55 +0200, Jussi Kivilinna a écrit :
>> Quoting Albert Cahalan <acahalan@gmail.com>:
>>
>> > On Sun, Feb 27, 2011 at 2:54 AM, Eric Dumazet  
>> <eric.dumazet@gmail.com> wrote:
>> >> Le dimanche 27 février 2011 à 08:02 +0100, Mikael Abrahamsson a écrit :
>> >>> On Sun, 27 Feb 2011, Albert Cahalan wrote:
>> >>>
>> >>> > Nanoseconds seems fine; it's unlikely you'd ever want
>> >>> > more than 4.2 seconds (32-bit unsigned) of queue.
>> > ...
>> >> Problem is some machines have slow High Resolution timing services.
>> >>
>> >> _If_ we have a time limit, it will probably use the low resolution (aka
>> >> jiffies), unless high resolution services are cheap.
>> >
>> > As long as that is totally internal to the kernel and never
>> > getting exposed by some API for setting the amount, sure.
>> >
>> >> I was thinking not having an absolute hard limit, but an EWMA based one.
>> >
>> > The whole point is to prevent stale packets, especially to prevent
>> > them from messing with TCP, so I really don't think so. I suppose
>> > you do get this to some extent via early drop.
>>
>> I made simple hack on sch_fifo with per packet time limits
>> (attachment) this weekend and have been doing limited testing on
>> wireless link. I think hardlimit is fine, it's simple and does
>> somewhat same as what packet(-hard)limited buffer does, drops packets
>> when buffer is 'full'. My hack checks for timed out packets on
>> enqueue, might be wrong approach (on other hand might allow some more
>> burstiness).
>>
>
>
> Qdisc should return to caller a good indication packet is queued or
> dropped at enqueue() time... not later (aka : never)

Ok, it is ugly hack ;) I got idea of dropping head from pfifo_head_drop.

>
> Accepting a packet at t0, and dropping it later at t0+limit without
> giving any indication to caller is a problem.

Ok.

>
> This is why I suggested using an EWMA plus a probabilist drop or
> congestion indication (NET_XMIT_CN) to caller at enqueue() time.
>
> The absolute time limit you are trying to implement should be checked at
> dequeue time, to cope with enqueue bursts or pauses on wire.
>

Ok.



^ permalink raw reply

* Re: net-next: warnings from sysctl_net_exit
From: Lucian Adrian Grijincu @ 2011-02-27 22:37 UTC (permalink / raw)
  To: Stephen Hemminger, David S. Miller, netdev

Stephen Hemminger <shemminger <at> vyatta.com> writes:
> [26207.669740]  [<ffffffff814154ad>] ? sysctl_net_exit+0x2a/0x2c
> [26207.669742]  [<ffffffff8136144e>] ? ops_exit_list+0x2a/0x5b
> [26207.669745]  [<ffffffff813618f0>] ? cleanup_net+0xfa/0x19a


David: it looks like someone registered a /proc/sys table with
register_net_sysctl_table but forgot to remove it (or someone wrote
something in the 'struct net*' and buffer overflowed into
&net->sysctls.list).

Stephen, can you post a `ls -R /proc/sys/net/` from before the dmesg
message appeared?

The check is triggered at network namespace deletion, so a moment
before deleting the netns should be fine.


-- 
 .
..: Lucian

^ permalink raw reply

* Re: [PATCH] don't allow CAP_NET_ADMIN to load non-netdev kernel modules
From: David Miller @ 2011-02-27 23:18 UTC (permalink / raw)
  To: segoon
  Cc: bhutchings, netdev, linux-kernel, kuznet, pekkas, jmorris,
	yoshfuji, kaber, eric.dumazet, therbert, xiaosuo, jesse,
	kees.cook, eugene, dan.j.rosenberg, akpm
In-Reply-To: <20110227114438.GA4317@albatros>

From: Vasiliy Kulikov <segoon@openwall.com>
Date: Sun, 27 Feb 2011 14:44:38 +0300

>    d) run modprobe with CAP_NET_ADMIN only

This is not part of my scheme.

The module loading will run with existing module loading privileges,
the "allowed capability" bits will be passed along back into the
kernel at module load time (via modprobe arguments or similar)
and validated by the kernel as it walks the ELF sections anyways
to perform relocations and whatnot.

^ permalink raw reply

* Re: [PATCH] don't allow CAP_NET_ADMIN to load non-netdev kernel modules
From: David Miller @ 2011-02-27 23:19 UTC (permalink / raw)
  To: segoon
  Cc: bhutchings, netdev, linux-kernel, kuznet, pekkas, jmorris,
	yoshfuji, kaber, eric.dumazet, therbert, xiaosuo, jesse,
	kees.cook, eugene, dan.j.rosenberg, akpm
In-Reply-To: <20110227114438.GA4317@albatros>

From: Vasiliy Kulikov <segoon@openwall.com>
Date: Sun, 27 Feb 2011 14:44:38 +0300

> Then the things are still broken - a user has to update modprobe
> together with the kernel, otherwise the updated kernel would call
> "modprobe" with unsupported argument and even "sit0" wouldn't work.

The capability bits get passed on the modprobe command line.

The module loading system call in the kernel inspects the command
line looking for the argument, and uses it to validate the module
load by comparing the mask with the special ELF section.

^ permalink raw reply

* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: David Miller @ 2011-02-27 23:22 UTC (permalink / raw)
  To: jpirko
  Cc: fubar, nicolas.2p.debian, kaber, eric.dumazet, netdev, shemminger,
	andy, anna.fischer
In-Reply-To: <20110227125816.GB2814@psychotron.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Sun, 27 Feb 2011 13:58:17 +0100

> That is very true. And given that af_packet uses orig_dev to obtain
> ifindex, it can be replaced by skb->skb_iif. That way we can get rid of
> orig_dev parameter for good.

I would rather see a complete patch set submitting at a unit, thanks.

I've already marked your V3 last night as "changes requested" in
patchwork for this reason.

^ permalink raw reply

* Re: txqueuelen has wrong units; should be time
From: Albert Cahalan @ 2011-02-27 23:33 UTC (permalink / raw)
  To: Jussi Kivilinna; +Cc: Eric Dumazet, Mikael Abrahamsson, linux-kernel, netdev
In-Reply-To: <20110227125540.40754c5y78j9u2m8@hayate.sektori.org>

On Sun, Feb 27, 2011 at 5:55 AM, Jussi Kivilinna
<jussi.kivilinna@mbnet.fi> wrote:

> I made simple hack on sch_fifo with per packet time limits (attachment) this
> weekend and have been doing limited testing on wireless link. I think
> hardlimit is fine, it's simple and does somewhat same as what
> packet(-hard)limited buffer does, drops packets when buffer is 'full'. My
> hack checks for timed out packets on enqueue, might be wrong approach (on
> other hand might allow some more burstiness).

Thanks!

I think the default is too high. 1 ms may even be a bit high.

I suppose there is a need to allow at least 2 packets despite any
time limits, so that it remains possible to use a traditional modem
even if a huge packet takes several seconds to send.

^ permalink raw reply

* Re: net-next: warnings from sysctl_net_exit
From: David Miller @ 2011-02-27 23:34 UTC (permalink / raw)
  To: lucian.grijincu; +Cc: shemminger, netdev, ebiederm
In-Reply-To: <AANLkTi=mhj3Ftq8hFPPzZYJvp8mgOeV89oX_vHZVxwzY@mail.gmail.com>

From: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
Date: Mon, 28 Feb 2011 00:37:09 +0200

> Stephen Hemminger <shemminger <at> vyatta.com> writes:
>> [26207.669740]  [<ffffffff814154ad>] ? sysctl_net_exit+0x2a/0x2c
>> [26207.669742]  [<ffffffff8136144e>] ? ops_exit_list+0x2a/0x5b
>> [26207.669745]  [<ffffffff813618f0>] ? cleanup_net+0xfa/0x19a
> 
> 
> David: it looks like someone registered a /proc/sys table with
> register_net_sysctl_table but forgot to remove it (or someone wrote
> something in the 'struct net*' and buffer overflowed into
> &net->sysctls.list).

Hmmm, we might therefore want to inspect this commit carefully:

commit bf36076a67db6d7423d09d861a072337866f0dd9
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Mon Jan 31 20:54:17 2011 -0800

    net: Fix ipv6 neighbour unregister_sysctl_table warning
    
    In my testing of 2.6.37 I was occassionally getting a warning about
    sysctl table entries being unregistered in the wrong order.  Digging
    in it turns out this dates back to the last great sysctl reorg done
    where Al Viro introduced the requirement that sysctl directories
    needed to be created before and destroyed after the files in them.
    
    It turns out that in that great reorg /proc/sys/net/ipv6/neigh was
    overlooked.  So this patch fixes that oversight and makes an annoying
    warning message go away.
    
    >------------[ cut here ]------------
    >WARNING: at kernel/sysctl.c:1992 unregister_sysctl_table+0x134/0x164()
    >Pid: 23951, comm: kworker/u:3 Not tainted 2.6.37-350888.2010AroraKernelBeta.fc14.x86_64 #1
    >Call Trace:
    > [<ffffffff8103e034>] warn_slowpath_common+0x80/0x98
    > [<ffffffff8103e061>] warn_slowpath_null+0x15/0x17
    > [<ffffffff810452f8>] unregister_sysctl_table+0x134/0x164
    > [<ffffffff810e7834>] ? kfree+0xc4/0xd1
    > [<ffffffff813439b2>] neigh_sysctl_unregister+0x22/0x3a
    > [<ffffffffa02cd14e>] addrconf_ifdown+0x33f/0x37b [ipv6]
    > [<ffffffff81331ec2>] ? skb_dequeue+0x5f/0x6b
    > [<ffffffffa02ce4a5>] addrconf_notify+0x69b/0x75c [ipv6]
    > [<ffffffffa02eb953>] ? ip6mr_device_event+0x98/0xa9 [ipv6]
    > [<ffffffff813d2413>] notifier_call_chain+0x32/0x5e
    > [<ffffffff8105bdea>] raw_notifier_call_chain+0xf/0x11
    > [<ffffffff8133cdac>] call_netdevice_notifiers+0x45/0x4a
    > [<ffffffff8133d2b0>] rollback_registered_many+0x118/0x201
    > [<ffffffff8133d3af>] unregister_netdevice_many+0x16/0x6d
    > [<ffffffff8133d571>] default_device_exit_batch+0xa4/0xb8
    > [<ffffffff81337c42>] ? cleanup_net+0x0/0x194
    > [<ffffffff81337a2a>] ops_exit_list+0x4e/0x56
    > [<ffffffff81337d36>] cleanup_net+0xf4/0x194
    > [<ffffffff81053318>] process_one_work+0x187/0x280
    > [<ffffffff8105441b>] worker_thread+0xff/0x19f
    > [<ffffffff8105431c>] ? worker_thread+0x0/0x19f
    > [<ffffffff8105776d>] kthread+0x7d/0x85
    > [<ffffffff81003824>] kernel_thread_helper+0x4/0x10
    > [<ffffffff810576f0>] ? kthread+0x0/0x85
    > [<ffffffff81003820>] ? kernel_thread_helper+0x0/0x10
    >---[ end trace 8a7e9310b35e9486 ]---
    
    Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>


^ permalink raw reply

* Re: [PATCH] don't allow CAP_NET_ADMIN to load non-netdev kernel modules
From: Arnd Bergmann @ 2011-02-27 20:22 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: Ben Hutchings, David Miller, segoon, netdev, linux-kernel, kuznet,
	pekkas, jmorris, yoshfuji, kaber, eric.dumazet, therbert, xiaosuo,
	jesse, kees.cook, eugene, dan.j.rosenberg, akpm
In-Reply-To: <AANLkTikmtKksP6o1hecf0=ZnRv0j4+_Q326UEG8ss16k@mail.gmail.com>

On Friday 25 February 2011, Michał Mirosław wrote:
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 54aaca6..0d09baa 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -1120,8 +1120,20 @@ void dev_load(struct net *net, const char *name)
> >        dev = dev_get_by_name_rcu(net, name);
> >        rcu_read_unlock();
> >
> > -       if (!dev && capable(CAP_NET_ADMIN))
> > -               request_module("%s", name);
> > +       if (!dev && capable(CAP_NET_ADMIN)) {
> > +               /* Check whether the name looks like one that a net
> > +                * driver will generate initially.  If not, require a
> > +                * module alias with a suitable prefix, so that this
> > +                * can't be used to load arbitrary modules.
> > +                */
> > +               if ((strncmp(name, "eth", 3) == 0 &&
> > +                    isdigit((unsigned char)name[3])) ||
> > +                   (strncmp(name, "wlan", 4) == 0 &&
> > +                    isdigit((unsigned char)name[4])))
> > +                       request_module("%s", name);
> > +               else
> > +                       request_module("netdev-%s", name);
> > +       }
> >  }
> >  EXPORT_SYMBOL(dev_load);
> >
> 
> This might be better as:
> 
> if (request_module("netdev-%s", name))
>     ... fallback
> 
> Then after some years the fallback could be removed if announced properly.

The backwards compatibility should mostly be for systems that today don't
use split capabilities, right?

The fallback could therefore rely on CAP_SYS_MODULE as well:

	if (request_module("netdev-%s", name)) {
		if (capable(CAP_SYS_MODULE))
			request_module("%s", name);
	}

Not 100% solution, but should solve the capability escalation nicely without
causing much pain.

	Arnd


^ permalink raw reply

* Re: [PATCH net-2.6] bnx2x: Add a missing bit for PXP parity register of 57712.
From: David Miller @ 2011-02-28  0:06 UTC (permalink / raw)
  To: vladz; +Cc: netdev, eilong
In-Reply-To: <201102201627.05803.vladz@broadcom.com>

From: "Vlad Zolotarov" <vladz@broadcom.com>
Date: Sun, 20 Feb 2011 16:27:05 +0200

> Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-2.6 2/2] be2net: remove netif_stop_queue being called before register_netdev.
From: David Miller @ 2011-02-28  0:07 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ajit.khaparde, netdev
In-Reply-To: <1298560543.2814.4.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 24 Feb 2011 16:15:43 +0100

> Le mardi 01 février 2011 à 15:42 -0800, David Miller a écrit :
>> From: Ajit Khaparde <ajit.khaparde@emulex.com>
>> Date: Mon, 31 Jan 2011 17:27:55 -0600
>> 
>> > It is illegal to call netif_stop_queue before register_netdev.
>> > 
>> > Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
>> 
>> Applied, thanks.
> 
> Not sure if this patch is queued for stable, I hit the bug (a Warning
> actually) on 2.6.37.1

I've added this to my -stable queue, thanks.

^ permalink raw reply

* Re: [PATCH] bonding: added 802.3ad round-robin hashing policy and source mac selection mode
From: David Miller @ 2011-02-28  0:08 UTC (permalink / raw)
  To: olegu; +Cc: netdev, fubar
In-Reply-To: <20110216191341.GA14920@yandex-team.ru>

From: "Oleg V. Ukhno" <olegu@yandex-team.ru>
Date: Wed, 16 Feb 2011 22:13:41 +0300

> Patch introduces two new (related) features to bonding module.
> First feature is round-robin hashing policy, which is primarily
> intended for use with 802.3ad mode, and puts every next IPv4 and
> IPv6 packet into  next availables slave without taling into account
> which layer3 and above protocol is used.
> Second feature makes possible choosing which MAC-address will be set
> in the transmitted packet - when set to src-mac it will force setting
> slave's interface real MAC address as source MAC address in every
> packet, sent via this slave interface.

Can we get some feedback on this patch from bonding folks?

I'm not applying it blinding without at least one bonding developer
saying it at least looks ok.

Thanks.

^ permalink raw reply

* Re: net-next: warnings from sysctl_net_exit
From: Stephen Hemminger @ 2011-02-28  0:08 UTC (permalink / raw)
  To: Lucian Adrian Grijincu; +Cc: David S. Miller, netdev
In-Reply-To: <AANLkTi=mhj3Ftq8hFPPzZYJvp8mgOeV89oX_vHZVxwzY@mail.gmail.com>

On Mon, 28 Feb 2011 00:37:09 +0200
Lucian Adrian Grijincu <lucian.grijincu@gmail.com> wrote:

> Stephen Hemminger <shemminger <at> vyatta.com> writes:
> > [26207.669740]  [<ffffffff814154ad>] ? sysctl_net_exit+0x2a/0x2c
> > [26207.669742]  [<ffffffff8136144e>] ? ops_exit_list+0x2a/0x5b
> > [26207.669745]  [<ffffffff813618f0>] ? cleanup_net+0xfa/0x19a
> 
> 
> David: it looks like someone registered a /proc/sys table with
> register_net_sysctl_table but forgot to remove it (or someone wrote
> something in the 'struct net*' and buffer overflowed into
> &net->sysctls.list).
> 
> Stephen, can you post a `ls -R /proc/sys/net/` from before the dmesg
> message appeared?
> 
> The check is triggered at network namespace deletion, so a moment
> before deleting the netns should be fine.

Although the kernel was compiled with netns, I never use net namespaces.

^ permalink raw reply

* Re: [V4 PATCH 1/3] bonding: sync netpoll code with bridge
From: David Miller @ 2011-02-28  0:12 UTC (permalink / raw)
  To: amwang; +Cc: linux-kernel, nhorman, herbert, fubar, netdev
In-Reply-To: <1298022215-21059-1-git-send-email-amwang@redhat.com>

From: Amerigo Wang <amwang@redhat.com>
Date: Fri, 18 Feb 2011 17:43:32 +0800

> V4: rebase to net-next-2.6
> V3: remove an useless #ifdef.
> 
> This patch unifies the netpoll code in bonding with netpoll code in bridge,
> thanks to Herbert that code is much cleaner now.
> 
> Signed-off-by: WANG Cong <amwang@redhat.com>

Applied.

^ permalink raw reply

* Re: [V4 PATCH 2/3] netpoll: remove IFF_IN_NETPOLL flag
From: David Miller @ 2011-02-28  0:12 UTC (permalink / raw)
  To: amwang
  Cc: linux-kernel, nhorman, herbert, fubar, horms, jpirko, kaber,
	nhorman, eric.dumazet, netdev
In-Reply-To: <1298022215-21059-2-git-send-email-amwang@redhat.com>

From: Amerigo Wang <amwang@redhat.com>
Date: Fri, 18 Feb 2011 17:43:33 +0800

> V4: rebase to net-next-2.6
> 
> This patch removes the flag IFF_IN_NETPOLL, we don't need it any more since
> we have netpoll_tx_running() now.
> 
> Signed-off-by: WANG Cong <amwang@redhat.com>

Applied.

^ permalink raw reply

* Re: [V4 PATCH 3/3] bond: service netpoll arp queue on master device
From: David Miller @ 2011-02-28  0:12 UTC (permalink / raw)
  To: amwang; +Cc: linux-kernel, nhorman, herbert, nhorman, eric.dumazet, netdev
In-Reply-To: <1298022215-21059-3-git-send-email-amwang@redhat.com>

From: Amerigo Wang <amwang@redhat.com>
Date: Fri, 18 Feb 2011 17:43:34 +0800

> Neil pointed out that we can't send ARP reply on behalf of slaves,
> we need to move the arp queue to their bond device.
> 
> Signed-off-by: WANG Cong <amwang@redhat.com>

Applied.

^ permalink raw reply

* Re: net-next: warnings from sysctl_net_exit
From: Lucian Adrian Grijincu @ 2011-02-28  0:49 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, netdev, ebiederm
In-Reply-To: <20110227.153418.245395393.davem@davemloft.net>

On Mon, Feb 28, 2011 at 1:34 AM, David Miller <davem@davemloft.net> wrote:
> From: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
>> David: it looks like someone registered a /proc/sys table with
>> register_net_sysctl_table but forgot to remove it (or someone wrote
>> something in the 'struct net*' and buffer overflowed into
>> &net->sysctls.list).
>
> Hmmm, we might therefore want to inspect this commit carefully:
>
> commit bf36076a67db6d7423d09d861a072337866f0dd9
> Author: Eric W. Biederman <ebiederm@xmission.com>
> Date:   Mon Jan 31 20:54:17 2011 -0800
>
>    net: Fix ipv6 neighbour unregister_sysctl_table warning


I'm not sure why we need that empty entry in the sysctl table anyway.

I just ran a net-next with the next patch and
register_net_sysctl_table managed to add /proc/sys/net/ipv6/neigh/
because we add 'default' and 'all' entries, followed by 'lo' for every
netns.


diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 7cb65ef..8e8b107 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -37,12 +37,6 @@ static ctl_table ipv6_table_template[] = {
                .mode           = 0644,
                .proc_handler   = proc_dointvec
        },
-       {
-               .procname       = "neigh",
-               .maxlen         = 0,
-               .mode           = 0555,
-               .child          = empty,
-       },
        { }
 };



What are the benefits of this empty entry?

Is there an assumption that we add this empty
'/proc/sys/net/ipv6/neigh/' entry *before* adding entries for
"/proc/sys/net/ipv6/neigh/default/" or
"/proc/sys/net/ipv6/neigh/all/"?

Because we register the empty '/proc/sys/net/ipv6/neigh/' entry
*after* registering these two of it's children, and I'm not sure how
this will affect the attachment of ctl_table_header corresponding to
these ctl_table trees.


With the following patch I get this ordering (for both init_net and a new net):
  CALL addrconf_init_net (create default/all)
  CALL ipv6_sysctl_net_init (create empty entry).


diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3daaf3c..5fe402e 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4564,6 +4564,8 @@ static int __net_init addrconf_init_net(struct net *net)
 	int err;
 	struct ipv6_devconf *all, *dflt;

+	printk(KERN_ALERT "CALL addrconf_init_net\n");
+
 	err = -ENOMEM;
 	all = &ipv6_devconf;
 	dflt = &ipv6_devconf_dflt;
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 7cb65ef..880d64a 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -71,6 +65,8 @@ static int __net_init ipv6_sysctl_net_init(struct net *net)
 	struct ctl_table *ipv6_icmp_table;
 	int err;

+	printk(KERN_ALERT "CALL ipv6_sysctl_net_init\n");
+
 	err = -ENOMEM;
 	ipv6_table = kmemdup(ipv6_table_template, sizeof(ipv6_table_template),
 			     GFP_KERNEL);


-- 
 .
..: Lucian

^ permalink raw reply related

* Re: net-next: warnings from sysctl_net_exit
From: Lucian Adrian Grijincu @ 2011-02-28  1:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev
In-Reply-To: <20110227160852.4f748c1f@nehalam>

On Mon, Feb 28, 2011 at 2:08 AM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
>> The check is triggered at network namespace deletion, so a moment
>> before deleting the netns should be fine.
>
> Although the kernel was compiled with netns, I never use net namespaces.


This can be triggered at shutdown when the init_net network namespace
is dismantled.

Are you getting this at shutdown time? If not, is it easily reproducible?
Can you post a .config?

-- 
 .
..: Lucian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox