Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] xps-mq: Transmit Packet Steering for multiqueue
From: David Miller @ 2011-02-26  7:09 UTC (permalink / raw)
  To: bhutchings; +Cc: therbert, eric.dumazet, shemminger, netdev
In-Reply-To: <1298312395.2608.65.camel@bwh-desktop>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 21 Feb 2011 18:19:55 +0000

> On Wed, 2010-09-01 at 18:32 -0700, David Miller wrote:
>> 2) TX queue datastructures in the driver get reallocated using
>>    memory in that NUMA domain.
> 
> I've previously sent patches to add an ethtool API for NUMA control,
> which include the option to allocate on the same node where IRQs are
> handled.  However, there is currently no function to allocate
> DMA-coherent memory on a specified NUMA node (rather than the device's
> node).  This is likely to be beneficial for event rings and might be
> good for descriptor rings for some devices.  (The implementation I sent
> for sfc mistakenly switched it to allocating non-coherent memory, for
> which it *is* possible to specify the node.)

The thing to do is to work with someone like FUJITA Tomonori on this.

It's simply a matter of making new APIs that take the node specifier,
have the implementations either make use of or completely ignore the node,
and have the existing APIs pass in "-1" for the node or whatever the
CPP macro is for this :-)

^ permalink raw reply

* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Jiri Pirko @ 2011-02-26  7:14 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: David Miller, kaber, eric.dumazet, netdev, shemminger, fubar,
	andy
In-Reply-To: <4D683F6D.1030208@gmail.com>

Sat, Feb 26, 2011 at 12:46:53AM CET, nicolas.2p.debian@gmail.com wrote:
>Le 23/02/2011 20:05, Jiri Pirko a écrit :
>>This patch converts bonding to use rx_handler. Results in cleaner
>>__netif_receive_skb() with much less exceptions needed. Also
>>bond-specific work is moved into bond code.
>>
>>Did performance test using pktgen and counting incoming packets by
>>iptables. No regression noted.
>>
>>Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>
>>v1->v2:
>>         using skb_iif instead of new input_dev to remember original
>>	device
>>
>>v2->v3:
>>	do another loop in case skb->dev is changed. That way orig_dev
>>	core can be left untouched.
>
>Hi Jiri,
>
>Eventually taking enough time for a review.
>
>I think we should split this change :
>
>1/ Change __netif_receive_skb() to call rx_handler for diverted net_device, until rx_handler is NULL.
>
>2/ Convert currently existing rx_handlers (bridge and macvlan) to use
>this new "loop" feature, removing the need to call netif_rx() inside
>their respective rx_handler and also removing the associated
>overhead.

This might not be possible. Macvlan uses result of called netif_rx for
counting, bridge calls netdev_receive_skb via NF_HOOK. Nevertheless,
this can be eventually handled later, not as a part of this patch.
>
>3/ Convert bonding to use rx_handlers.
>
>Also, on step 1, we definitely need to clarify what orig_dev should be.
>
>I now think that orig_dev should be "the device one level below the
>current one" or NULL if current device was not diverted from another
>one. It means that we should keep an array of crossed (diverted)
>devices and the associated orig_dev. This array would be used to pass
>the right orig_dev to protocol handlers, depending on the device they
>register on :

I constructed the patch in the way origdev is the same in all situations
as before the patch. I think that this decision can be ommitted at the
moment.

>
>eth0 -> bond0 -> br0
>
>A protocol handler registered on bond0 would receive eth0 as orig_dev.
>A protocol handler registered on br0 would receive bond0 as orig_dev.
>
>[snip]
>
>>@@ -3167,32 +3135,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
>
>[snip]
>
>>+another_round:
>>+
>>+	__this_cpu_inc(softnet_data.processed);
>>+
>>  #ifdef CONFIG_NET_CLS_ACT
>>  	if (skb->tc_verd&  TC_NCLS) {
>>  		skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
>>@@ -3209,8 +3157,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>  #endif
>>
>>  	list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>-		if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
>>-		    ptype->dev == orig_dev) {
>>+		if (!ptype->dev || ptype->dev == skb->dev) {
>>  			if (pt_prev)
>>  				ret = deliver_skb(skb, pt_prev, orig_dev);
>>  			pt_prev = ptype;
>>@@ -3224,16 +3171,20 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>  ncls:
>>  #endif
>>
>
>Why do you loop to ptype_all before calling rx_handler ?
>
>I don't understand why ptype_all and ptype_base are not handled at
>the same place in current __netif_receive_skb() but I think we should
>take the opportunity to change that, unless someone know of a good
>reason not to do so.

Again, the patch tries to do as little changes as it can. So this stays
the same as before. In case you want to change it, feel free to submit
patch doing that as follow-on.

>
>>-	/* Handle special case of bridge or macvlan */
>>  	rx_handler = rcu_dereference(skb->dev->rx_handler);
>>  	if (rx_handler) {
>
>	Nicolas.

^ permalink raw reply

* Re: SO_REUSEPORT - can it be done in kernel?
From: Eric Dumazet @ 2011-02-26  7:31 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, rick.jones2, tgraf, therbert, wsommerfeld,
	daniel.baluta, netdev
In-Reply-To: <20110226031118.GA21270@gondor.apana.org.au>

Le samedi 26 février 2011 à 11:11 +0800, Herbert Xu a écrit :
> On Fri, Feb 25, 2011 at 07:07:23PM -0800, David Miller wrote:
> > From: Herbert Xu <herbert@gondor.apana.org.au>
> > Date: Sat, 26 Feb 2011 10:48:48 +0800
> > 
> > > I'm looking at redoing this and the bulk of the work is going to
> > > be restructuring ip_append_data/ip_push_pending_frames so that it
> > > doesn't store the states in sk/inet_sk.
> > 
> > I suppose you're going to replace that stuff with an on-stack
> > control structure that gets passed around by reference or
> > similar?
> 
> Either that or have ip_append_data do ip_push_pending_frames
> directly.
> 
> That function's signature is a mess already and I need to think
> about this a bit more :)
> 
> Cheers,


UDP CORK is a problem indeed. I wonder who really uses it ?




^ permalink raw reply

* Re: SO_REUSEPORT - can it be done in kernel?
From: David Miller @ 2011-02-26  7:46 UTC (permalink / raw)
  To: eric.dumazet
  Cc: herbert, rick.jones2, tgraf, therbert, wsommerfeld, daniel.baluta,
	netdev
In-Reply-To: <1298705484.2659.126.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 26 Feb 2011 08:31:24 +0100

> UDP CORK is a problem indeed. I wonder who really uses it ?

git grep MSG_MORE -- net/sunrpc


^ permalink raw reply

* Re: [PATCH net-next 0/6] Phonet: small pipe protocol fixes
From: Rémi Denis-Courmont @ 2011-02-26  9:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA, ofono-bdc2hr5oBkPYtjvyW6yDsg
In-Reply-To: <20110225.112406.246526410.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

Le vendredi 25 février 2011 21:24:06 David Miller, vous avez écrit :
> > From: "Rémi Denis-Courmont" <remi.denis-courmont-xNZwKgViW5gAvxtiuMwx3w@public.gmane.org>
> > Date: Fri, 25 Feb 2011 11:13:41 +0200
> > 
> >> This patch series cleans up and fixes a number of small bits in the
> >> Phonet pipe code, especially the experimental pipe controller. Once
> >> this small bits are sorted out, I will try to fix the controller
> >> protocol implementation proper so that we do not need the
> >> compile-time (experimental) flag anymore.
> > 
> > All applied thanks.
> > 
> > If you want to start using GIT to push phonet changes to me, frankly I
> > would welcome that :-)

No problem in principles. I need to figure out where to put linux-phonet.git 
though.

> BTW, I had to add the following patch to fix a build warning:

Hmm, right. I am planning to kill this config option and reunify the cough 
cough chal-len-ged *ahem* ST-Ericsson code with the Nokia code... So I confess 
did not bother to eliminate that ST-Ericsson-only mode warning.

-- 
Rémi Denis-Courmont
http://www.remlab.info/
http://fi.linkedin.com/in/remidenis

^ permalink raw reply

* Re: [RFC] be2net: add rxhash support
From: Eric Dumazet @ 2011-02-26 10:30 UTC (permalink / raw)
  To: Ajit Khaparde; +Cc: netdev
In-Reply-To: <20110225213542.GA11773@akhaparde-VBox>

Le vendredi 25 février 2011 à 15:35 -0600, Ajit Khaparde a écrit :

> I asked that because, if a switch is part a of the configuration,
> the ASIC can receive packets other than the tcp flow.
> 
> And if hashing is enabled for IP packets, we can see this behavior.
> The other values indicate that hashing has been enabled for IPv4 packets.

To make sure RSS (and rxhash) was OK, I added following debugging aid :

diff --git a/include/net/sock.h b/include/net/sock.h
index da0534d..e9b1180 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -688,6 +688,7 @@ static inline void sock_rps_save_rxhash(struct sock *sk, u32 rxhash)
 {
 #ifdef CONFIG_RPS
 	if (unlikely(sk->sk_rxhash != rxhash)) {
+		pr_err("rxhash change from %x to %x\n", sk->sk_rxhash, rxhash);
 		sock_rps_reset_flow(sk);
 		sk->sk_rxhash = rxhash;
 	}


And got following traces :

[  201.170297] change rxhash from 0 to be0b5a87
[  232.607474] bonding: bond1: Setting eth3 as active slave.
[  232.607478] bonding: bond1: making interface eth3 the new active one.
[  232.710848] change rxhash from be0b5a87 to e56a3c1e
[  300.047500] bonding: bond1: Setting eth1 as active slave.
[  300.047504] bonding: bond1: making interface eth1 the new active one.
[  300.159162] change rxhash from e56a3c1e to be0b5a87

The flip occured when I changed my active slave (bonding mode=1).

eth1 is a bnx2 NIC, while eth3 a be2net one, so its OK to change the rxhash in this case 
(different firmware/algo)

So as far as be2net is concerned, everything seems OK : all packets for 
a given flow get an unique RSS hash and can feed skb->rxhash




^ permalink raw reply related

* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Nicolas de Pesloüan @ 2011-02-26 11:25 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Miller, kaber, eric.dumazet, netdev, shemminger, fubar,
	andy
In-Reply-To: <20110226071433.GA2783@psychotron.redhat.com>

Le 26/02/2011 08:14, Jiri Pirko a écrit :
> Sat, Feb 26, 2011 at 12:46:53AM CET, nicolas.2p.debian@gmail.com wrote:
>> Le 23/02/2011 20:05, Jiri Pirko a écrit :
>>> This patch converts bonding to use rx_handler. Results in cleaner
>>> __netif_receive_skb() with much less exceptions needed. Also
>>> bond-specific work is moved into bond code.
>>>
>>> Did performance test using pktgen and counting incoming packets by
>>> iptables. No regression noted.
>>>
>>> Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>>
>>> v1->v2:
>>>          using skb_iif instead of new input_dev to remember original
>>> 	device
>>>
>>> v2->v3:
>>> 	do another loop in case skb->dev is changed. That way orig_dev
>>> 	core can be left untouched.
>>
>> Hi Jiri,
>>
>> Eventually taking enough time for a review.
>>
>> I think we should split this change :
>>
>> 1/ Change __netif_receive_skb() to call rx_handler for diverted net_device, until rx_handler is NULL.
>>
>> 2/ Convert currently existing rx_handlers (bridge and macvlan) to use
>> this new "loop" feature, removing the need to call netif_rx() inside
>> their respective rx_handler and also removing the associated
>> overhead.
>
> This might not be possible. Macvlan uses result of called netif_rx for
> counting, bridge calls netdev_receive_skb via NF_HOOK. Nevertheless,
> this can be eventually handled later, not as a part of this patch.

Yes, I agree. Step 2 and step 3 can be swapped.

Anyway, we need to describe the options given to a rx_handler:

- Return skb unchanged. This would cause normal delivery (ptype->dev == NULL or ptype->dev == skb->dev).
- Return skb->dev changed. __netif_receive_skb() will loop to the new device. This would cause 
extact match delivery only (ptype->dev != NULL and ptype->dev == one of the orig_dev).
- Manage the skb another way and return NULL. This would stop any protocol handlers to receive the 
skb, except if the rx_handler arrange to re-inject the skb somewhere.

>> 3/ Convert bonding to use rx_handlers.
>>
>> Also, on step 1, we definitely need to clarify what orig_dev should be.
>>
>> I now think that orig_dev should be "the device one level below the
>> current one" or NULL if current device was not diverted from another
>> one. It means that we should keep an array of crossed (diverted)
>> devices and the associated orig_dev. This array would be used to pass
>> the right orig_dev to protocol handlers, depending on the device they
>> register on :
>
> I constructed the patch in the way origdev is the same in all situations
> as before the patch. I think that this decision can be ommitted at the
> moment.

Agreed, event if the current handling of orig_dev is far from bullet proof and needs to be clarified 
at some time.

>> eth0 ->  bond0 ->  br0
>>
>> A protocol handler registered on bond0 would receive eth0 as orig_dev.
>> A protocol handler registered on br0 would receive bond0 as orig_dev.
>>
>> [snip]
>>
>>> @@ -3167,32 +3135,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>
>> [snip]
>>
>>> +another_round:
>>> +
>>> +	__this_cpu_inc(softnet_data.processed);
>>> +
>>>   #ifdef CONFIG_NET_CLS_ACT
>>>   	if (skb->tc_verd&   TC_NCLS) {
>>>   		skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
>>> @@ -3209,8 +3157,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>   #endif
>>>
>>>   	list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>> -		if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
>>> -		    ptype->dev == orig_dev) {
>>> +		if (!ptype->dev || ptype->dev == skb->dev) {
>>>   			if (pt_prev)
>>>   				ret = deliver_skb(skb, pt_prev, orig_dev);
>>>   			pt_prev = ptype;
>>> @@ -3224,16 +3171,20 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>   ncls:
>>>   #endif
>>>
>>
>> Why do you loop to ptype_all before calling rx_handler ?
>>
>> I don't understand why ptype_all and ptype_base are not handled at
>> the same place in current __netif_receive_skb() but I think we should
>> take the opportunity to change that, unless someone know of a good
>> reason not to do so.
>
> Again, the patch tries to do as little changes as it can. So this stays
> the same as before. In case you want to change it, feel free to submit
> patch doing that as follow-on.

The point here is that bridge and macvlan handling used to be after the ptype_all loop (hence the 
place you inserted the call to rx_handler last summer), but the bonding part is currently before the 
ptype_all loop.

Moving bonding handling after the ptype_all loop will cause the ptype_all loop to be run twice:
- first time, with skb->dev == eth0 and orig_dev == eth0.
- second time, with skb->dev == bond0 and orig_dev == eth0.

The first time currently does not exists. And because bonding wasn't given a chance yet to decide 
that the frame should be dropped, the packet will always be delivered to eth0, causing duplicate 
deliveries. Note that this is probably true for bridge and macvlan too, and that those duplicate 
deliveries probably already exists.

Also, delivering skb inside a loop that may change the skb (skb->dev at least) is guaranteed to 
produce strange behaviors.

Can someone, knowing the history of ptype_all/ptype_base/bridge/macvlan/bonding/vlan handling in 
__netif_receive_skb(), comment on this?

Are there any reasons not to process ptype_all and ptype_base at the same location, at the end of 
__netif_receive_skb(), and to manage all divert features (bridge/macvlan/bonding/vlan) before?

	Nicolas.

^ permalink raw reply

* Re: 2.6.37 regression: adding main interface to a bridge breaks vlan interface RX
From: chriss @ 2011-02-26 11:51 UTC (permalink / raw)
  To: netdev
In-Reply-To: <AANLkTikSvs7jF9BZzbsYkLAawpCH2h1Z0r09ft219uaa@mail.gmail.com>

Jesse Gross <jesse <at> nicira.com> writes:

> 
> Can you confirm this by running tcpdump -eni br0?  I would expect that
> you see the correct packets but without vlan tags.
> 

Thats correct. i see the packets in br0 without tags and tagged in eth1. thats
why i added the brouting rule in ebtables to drop it at eth1 and the it apears
in eth1.3 (untagged)...

^ permalink raw reply

* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Nicolas de Pesloüan @ 2011-02-26 14:24 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Jiri Pirko, David Miller, kaber, eric.dumazet, netdev, shemminger,
	andy, Fischer, Anna
In-Reply-To: <4D62F324.6020301@gmail.com>

Le 22/02/2011 00:20, Nicolas de Pesloüan a écrit :

> After checking every protocol handlers installed by dev_add_pack(), it
> appears that only 4 of them really use the orig_dev parameter given by
> __netif_receive_skb():
>
> - bond_3ad_lacpdu_recv() @ drivers/net/bonding/bond_3ad.c
> - bond_arp_recv() @ drivers/net/bonding/bond_main.c
> - packet_rcv() @ net/packet/af_packet.c
> - tpacket_rcv() @ net/packet/af_packet.c
>
>  From the bonding point of view, the meaning of orig_dev is obviously
> "the device one layer below the bonding device, through which the packet
> reached the bonding device". It is used by bond_3ad_lacpdu_recv() and
> bond_arp_recv(), to find the underlying slave device through which the
> LACPDU or ARP was received. (The protocol handler is registered at the
> bonding device level).
>
>  From the af_packet point of view, the meaning is documented (in commit
> "[AF_PACKET]: Add option to return orig_dev to userspace") as the
> "physical device [that] actually received the traffic, instead of having
> the encapsulating device hide that information."
>
> When the bonding device is just one level above the physical device, the
> two meanings happen to match the same device, by chance.
>
> So, currently, a bonding device cannot stack properly on top of anything
> but physical devices. It might not be a problem today, but may change in
> the future...

Hi Jay,

Still thinking about this orig_dev stuff, I wonder why the protocol handlers used in bonding 
(bond_3ad_lacpdu_recv() and bond_arp_rcv()) are registered at the master level instead of at the 
slave level ?

If they were registered at the slave level, they would simply receive skb->dev as the ingress 
interface and use this value instead of needing the orig_dev value given to them when they are 
registered at the master level.

As orig_dev is only used by bonding and by af_packet, but they disagree on the exact meaning of 
orig_dev, one way to fix this discrepancy would be to remove one of the usage. As the af_packet 
usage is exposed to user space, bonding seems the right place to stop using orig_dev, even if 
orig_dev was introduced for bonding :-)

I understand that this would add one entry per slave device to the ptype_base list, but this seems 
to be the only bad effect of registering at the slave level. Can you confirm that this was the 
reason to register at the master level instead?

If you think registering at the slave level would cause too much impact on ptype_base, then we might 
have another way to stop using orig_dev for bonding:

In __skb_bond_should_drop(), we already test for the two interesting protocols:

if ((dev->priv_flags & IFF_SLAVE_NEEDARP) && skb->protocol == __cpu_to_be16(ETH_P_ARP))
	return 0;

if (master->priv_flags & IFF_MASTER_8023AD && skb->protocol == __cpu_to_be16(ETH_P_SLOW))
	return 0;

Would it be possible to call the right handlers directly from inside __skb_bond_should_drop() then 
let __skb_bond_should_drop() return 1 ("should drop") after processing the frames that are only of 
interest for bonding?

	Nicolas.

^ permalink raw reply

* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Jiri Pirko @ 2011-02-26 14:58 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: David Miller, kaber, eric.dumazet, netdev, shemminger, fubar,
	andy
In-Reply-To: <4D68E31E.7060807@gmail.com>

Sat, Feb 26, 2011 at 12:25:18PM CET, nicolas.2p.debian@gmail.com wrote:
>Le 26/02/2011 08:14, Jiri Pirko a écrit :
>>Sat, Feb 26, 2011 at 12:46:53AM CET, nicolas.2p.debian@gmail.com wrote:
>>>Le 23/02/2011 20:05, Jiri Pirko a écrit :
>>>>This patch converts bonding to use rx_handler. Results in cleaner
>>>>__netif_receive_skb() with much less exceptions needed. Also
>>>>bond-specific work is moved into bond code.
>>>>
>>>>Did performance test using pktgen and counting incoming packets by
>>>>iptables. No regression noted.
>>>>
>>>>Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>>>
>>>>v1->v2:
>>>>         using skb_iif instead of new input_dev to remember original
>>>>	device
>>>>
>>>>v2->v3:
>>>>	do another loop in case skb->dev is changed. That way orig_dev
>>>>	core can be left untouched.
>>>
>>>Hi Jiri,
>>>
>>>Eventually taking enough time for a review.
>>>
>>>I think we should split this change :
>>>
>>>1/ Change __netif_receive_skb() to call rx_handler for diverted net_device, until rx_handler is NULL.
>>>
>>>2/ Convert currently existing rx_handlers (bridge and macvlan) to use
>>>this new "loop" feature, removing the need to call netif_rx() inside
>>>their respective rx_handler and also removing the associated
>>>overhead.
>>
>>This might not be possible. Macvlan uses result of called netif_rx for
>>counting, bridge calls netdev_receive_skb via NF_HOOK. Nevertheless,
>>this can be eventually handled later, not as a part of this patch.
>
>Yes, I agree. Step 2 and step 3 can be swapped.
>
>Anyway, we need to describe the options given to a rx_handler:
>
>- Return skb unchanged. This would cause normal delivery (ptype->dev == NULL or ptype->dev == skb->dev).
>- Return skb->dev changed. __netif_receive_skb() will loop to the new
>device. This would cause extact match delivery only (ptype->dev !=
>NULL and ptype->dev == one of the orig_dev).
>- Manage the skb another way and return NULL. This would stop any
>protocol handlers to receive the skb, except if the rx_handler
>arrange to re-inject the skb somewhere.
>
>>>3/ Convert bonding to use rx_handlers.
>>>
>>>Also, on step 1, we definitely need to clarify what orig_dev should be.
>>>
>>>I now think that orig_dev should be "the device one level below the
>>>current one" or NULL if current device was not diverted from another
>>>one. It means that we should keep an array of crossed (diverted)
>>>devices and the associated orig_dev. This array would be used to pass
>>>the right orig_dev to protocol handlers, depending on the device they
>>>register on :
>>
>>I constructed the patch in the way origdev is the same in all situations
>>as before the patch. I think that this decision can be ommitted at the
>>moment.
>
>Agreed, event if the current handling of orig_dev is far from bullet
>proof and needs to be clarified at some time.
>
>>>eth0 ->  bond0 ->  br0
>>>
>>>A protocol handler registered on bond0 would receive eth0 as orig_dev.
>>>A protocol handler registered on br0 would receive bond0 as orig_dev.
>>>
>>>[snip]
>>>
>>>>@@ -3167,32 +3135,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>
>>>[snip]
>>>
>>>>+another_round:
>>>>+
>>>>+	__this_cpu_inc(softnet_data.processed);
>>>>+
>>>>  #ifdef CONFIG_NET_CLS_ACT
>>>>  	if (skb->tc_verd&   TC_NCLS) {
>>>>  		skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
>>>>@@ -3209,8 +3157,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>  #endif
>>>>
>>>>  	list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>>-		if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
>>>>-		    ptype->dev == orig_dev) {
>>>>+		if (!ptype->dev || ptype->dev == skb->dev) {
>>>>  			if (pt_prev)
>>>>  				ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>  			pt_prev = ptype;
>>>>@@ -3224,16 +3171,20 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>  ncls:
>>>>  #endif
>>>>
>>>
>>>Why do you loop to ptype_all before calling rx_handler ?
>>>
>>>I don't understand why ptype_all and ptype_base are not handled at
>>>the same place in current __netif_receive_skb() but I think we should
>>>take the opportunity to change that, unless someone know of a good
>>>reason not to do so.
>>
>>Again, the patch tries to do as little changes as it can. So this stays
>>the same as before. In case you want to change it, feel free to submit
>>patch doing that as follow-on.
>
>The point here is that bridge and macvlan handling used to be after
>the ptype_all loop (hence the place you inserted the call to
>rx_handler last summer), but the bonding part is currently before the
>ptype_all loop.
>
>Moving bonding handling after the ptype_all loop will cause the ptype_all loop to be run twice:
>- first time, with skb->dev == eth0 and orig_dev == eth0.
>- second time, with skb->dev == bond0 and orig_dev == eth0.
>
>The first time currently does not exists. And because bonding wasn't
>given a chance yet to decide that the frame should be dropped, the
>packet will always be delivered to eth0, causing duplicate
>deliveries. Note that this is probably true for bridge and macvlan
>too, and that those duplicate deliveries probably already exists.

Yes, and in fact that was what I like about this patch, that then
deliveries are simillar to bridge.

>
>Also, delivering skb inside a loop that may change the skb (skb->dev
>at least) is guaranteed to produce strange behaviors.
>
>Can someone, knowing the history of
>ptype_all/ptype_base/bridge/macvlan/bonding/vlan handling in
>__netif_receive_skb(), comment on this?
>
>Are there any reasons not to process ptype_all and ptype_base at the
>same location, at the end of __netif_receive_skb(), and to manage all
>divert features (bridge/macvlan/bonding/vlan) before?

That is very good set of questions. Would like to hear answers too.

>
>	Nicolas.

^ permalink raw reply

* //claim...26/2/2011
From: Mrs J B Eaq. Brown @ 2011-02-26  9:57 UTC (permalink / raw)


Will U;{ stephensgates@aim.com }
I am Janet Brown diagnosed for cancer,has a time limit to live,I WILL/donate the sum of USD$10 Million to you.Contact my Attorney stephen gates { stephensgates@aim.com },for claims with this email;{ stephensgates@aim.com }
dumlupinar.edu.tr/26th


^ permalink raw reply

* dccp: null-pointer dereference on close
From: Johan Hovold @ 2011-02-26 17:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: David S. Miller, dccp, netdev

Hi,

I triggered the null-pointer dereference below when closing a dccp
socket on 2.6.37 the other day. The receive path is hit during
close, and the socket has already been unhashed in dccp_set_state from
dccp_close.

Thanks,
Johan


root@overo:~# [84140.128631] ------------[ cut here ]------------
[84140.133575] WARNING: at net/ipv4/inet_timewait_sock.c:141 __inet_twsk_hashdance+0x48/0x128()
[84140.142517] Modules linked in: arc4 ecb carl9170 rt2870sta(C) mac80211 r8712u(C) crc_ccitt ah
[84140.151794] [<c0038850>] (unwind_backtrace+0x0/0xec) from [<c0055364>] (warn_slowpath_common)
[84140.161743] [<c0055364>] (warn_slowpath_common+0x4c/0x64) from [<c0055398>] (warn_slowpath_n)
[84140.171966] [<c0055398>] (warn_slowpath_null+0x1c/0x24) from [<c02b72d0>] (__inet_twsk_hashd)
[84140.182373] [<c02b72d0>] (__inet_twsk_hashdance+0x48/0x128) from [<c031caa0>] (dccp_time_wai)
[84140.192413] [<c031caa0>] (dccp_time_wait+0x40/0xc8) from [<c031c15c>] (dccp_rcv_state_proces)
[84140.202636] [<c031c15c>] (dccp_rcv_state_process+0x120/0x538) from [<c032609c>] (dccp_v4_do_)
[84140.213043] [<c032609c>] (dccp_v4_do_rcv+0x11c/0x14c) from [<c0286594>] (release_sock+0xac/0)
[84140.222442] [<c0286594>] (release_sock+0xac/0x110) from [<c031fd34>] (dccp_close+0x28c/0x380)
[84140.231475] [<c031fd34>] (dccp_close+0x28c/0x380) from [<c02d9a78>] (inet_release+0x64/0x70)
[84140.240386] [<c02d9a78>] (inet_release+0x64/0x70) from [<c0284ddc>] (sock_release+0x24/0xb8)
[84140.249328] [<c0284ddc>] (sock_release+0x24/0xb8) from [<c0284e94>] (sock_close+0x24/0x34)
[84140.258087] [<c0284e94>] (sock_close+0x24/0x34) from [<c00c2e4c>] (fput+0x108/0x1f4)
[84140.266296] [<c00c2e4c>] (fput+0x108/0x1f4) from [<c00c0104>] (filp_close+0x70/0x7c)
[84140.274505] [<c00c0104>] (filp_close+0x70/0x7c) from [<c00c01c4>] (sys_close+0xb4/0x10c)
[84140.283081] [<c00c01c4>] (sys_close+0xb4/0x10c) from [<c0033a80>] (ret_fast_syscall+0x0/0x30)
[84140.292114] ---[ end trace b8877ec9d542c32e ]---
[84140.296997] Unable to handle kernel NULL pointer dereference at virtual address 00000010
[84140.305541] pgd = cedb0000
[84140.308410] [00000010] *pgd=8ed22031, *pte=00000000, *ppte=00000000
[84140.315032] Internal error: Oops: 17 [#1] PREEMPT
[84140.320007] last sysfs file: /sys/kernel/uevent_seqnum
[84140.325408] Modules linked in: arc4 ecb carl9170 rt2870sta(C) mac80211 r8712u(C) crc_ccitt ah
[84140.334533] CPU: 0    Tainted: G        WC   (2.6.37+ #47)
[84140.340332] PC is at __inet_twsk_hashdance+0x4c/0x128
[84140.345642] LR is at warn_slowpath_null+0x1c/0x24
[84140.350616] pc : [<c02b72d4>]    lr : [<c0055398>]    psr: 60000013
[84140.350616] sp : ce975e68  ip : ce975db8  fp : cfbc5c00
[84140.362701] r10: cfa3e400  r9 : cfbc5c18  r8 : 00000000
[84140.368225] r7 : 00000006  r6 : cfa96110  r5 : cfa3e400  r4 : cfb54000
[84140.375091] r3 : 00000002  r2 : 00000006  r1 : 00000000  r0 : 00000000
[84140.381988] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[84140.389495] Control: 10c5387d  Table: 8edb0019  DAC: 00000015
[84140.395538] Process be2p_ctrl (pid: 2207, stack limit = 0xce9742f0)
[84140.402160] Stack: (0xce975e68 to 0xce976000)
[84140.406738] 5e60:                   cfb54000 00000180 cfa3e400 c031caa0 00000007 cfbc5c00
[84140.415374] 5e80: cfbc9824 00000020 00000007 c031c15c 00000000 00000022 00000000 00000008
[84140.424011] 5ea0: 00000001 cfbc5c00 cfbc5c00 cfa3e400 cfbc9824 00000000 00000001 c04c11b8
[84140.432617] 5ec0: be8ffc1c c032609c fa200000 c0033608 cfa3e400 cfa3e7b0 be8ffc1c ce975ee8
[84140.441253] 5ee0: be8ffc1c cfbc5c00 cfa3e400 ce974000 00000000 c0286594 cfa3e474 cfa3e400
[84140.449859] 5f00: cfa3e408 00000007 cf487c20 cf805840 cf60ca00 c031fd34 00000000 00000000
[84140.458496] 5f20: cfb20288 cfa3e400 cf487c00 00000008 00000000 c02d9a78 00000003 00000000
[84140.467102] 5f40: cf487c00 c0284ddc 00000000 cfb20288 cfb20280 c0284e94 00000000 c00c2e4c
[84140.475738] 5f60: 00000000 00000000 cfb20280 00000000 cfbc50c0 00000006 c0033c04 ce974000
[84140.484375] 5f80: 00000000 c00c0104 00000004 cfbc50c0 cfb20280 c00c01c4 400a1000 00000000
[84140.492980] 5fa0: 0000891c c0033a80 400a1000 00000000 00000004 00000000 403d3014 00000000
[84140.501617] 5fc0: 400a1000 00000000 0000891c 00000006 00000000 00000000 400a9000 be8ffc1c
[84140.510223] 5fe0: 00000000 be8ffbe0 00009584 4036320c 60000010 00000004 00005153 bf0fa7d0
[84140.518859] [<c02b72d4>] (__inet_twsk_hashdance+0x4c/0x128) from [<c031caa0>] (dccp_time_wai)
[84140.528869] [<c031caa0>] (dccp_time_wait+0x40/0xc8) from [<c031c15c>] (dccp_rcv_state_proces)
[84140.539062] [<c031c15c>] (dccp_rcv_state_process+0x120/0x538) from [<c032609c>] (dccp_v4_do_)
[84140.549407] [<c032609c>] (dccp_v4_do_rcv+0x11c/0x14c) from [<c0286594>] (release_sock+0xac/0)
[84140.558776] [<c0286594>] (release_sock+0xac/0x110) from [<c031fd34>] (dccp_close+0x28c/0x380)
[84140.567779] [<c031fd34>] (dccp_close+0x28c/0x380) from [<c02d9a78>] (inet_release+0x64/0x70)
[84140.576660] [<c02d9a78>] (inet_release+0x64/0x70) from [<c0284ddc>] (sock_release+0x24/0xb8)
[84140.585571] [<c0284ddc>] (sock_release+0x24/0xb8) from [<c0284e94>] (sock_close+0x24/0x34)
[84140.594299] [<c0284e94>] (sock_close+0x24/0x34) from [<c00c2e4c>] (fput+0x108/0x1f4)
[84140.602447] [<c00c2e4c>] (fput+0x108/0x1f4) from [<c00c0104>] (filp_close+0x70/0x7c)
[84140.610626] [<c00c0104>] (filp_close+0x70/0x7c) from [<c00c01c4>] (sys_close+0xb4/0x10c)
[84140.619171] [<c00c01c4>] (sys_close+0xb4/0x10c) from [<c0033a80>] (ret_fast_syscall+0x0/0x30)
[84140.628143] Code: e59f00dc e3a0108d ebf6782a e5941044 (e5912010) 
[84140.634643] ---[ end trace b8877ec9d542c32f ]---
[84140.639526] Kernel panic - not syncing: Fatal exception in interrupt


^ permalink raw reply

* Re: [PATCH] Bluetooth: Fix BT_L2CAP and BT_SCO in Kconfig
From: Vitaly Wool @ 2011-02-26 17:52 UTC (permalink / raw)
  To: Gustavo F. Padovan; +Cc: davem, linville, linux-bluetooth, netdev
In-Reply-To: <1298684485-3081-1-git-send-email-padovan@profusion.mobi>

Hi Gustavo,

On Sat, Feb 26, 2011 at 2:41 AM, Gustavo F. Padovan
<padovan@profusion.mobi> wrote:
> If we want something "bool" built-in in something "tristate" it can't
> "depend on" the tristate config option.
>
> Report by DaveM:
>
>   I give it 'y' just to make it happen, for both, and afterways no
>   matter how many times I rerun "make oldconfig" I keep seeing things
>   like this in my build:
>
> scripts/kconfig/conf --silentoldconfig Kconfig
> include/config/auto.conf:986:warning: symbol value 'm' invalid for BT_SCO
> include/config/auto.conf:3156:warning: symbol value 'm' invalid for BT_L2CAP
>
> Reported-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
> ---
>  net/bluetooth/Kconfig |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/bluetooth/Kconfig b/net/bluetooth/Kconfig
> index c6f9c2f..6ae5ec5 100644
> --- a/net/bluetooth/Kconfig
> +++ b/net/bluetooth/Kconfig
> @@ -31,9 +31,10 @@ menuconfig BT
>          to Bluetooth kernel modules are provided in the BlueZ packages.  For
>          more information, see <http://www.bluez.org/>.
>
> +if BT != n
> +
>  config BT_L2CAP
>        bool "L2CAP protocol support"
> -       depends on BT
>        select CRC16
>        help
>          L2CAP (Logical Link Control and Adaptation Protocol) provides
> @@ -42,11 +43,12 @@ config BT_L2CAP
>
>  config BT_SCO
>        bool "SCO links support"
> -       depends on BT
>        help
>          SCO link provides voice transport over Bluetooth.  SCO support is
>          required for voice applications like Headset and Audio.
>
> +endif
> +

Ugh, isn't it far cleaner to change initial dependencies to "depends
on BT != n" ?

Thanks,
   Vitaly

^ permalink raw reply

* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Jay Vosburgh @ 2011-02-26 19:42 UTC (permalink / raw)
  To: =?ISO-8859-1?Q?Nicolas_de_Peslo=FCan?=
  Cc: Jiri Pirko, David Miller, kaber, eric.dumazet, netdev, shemminger,
	andy, Fischer, Anna
In-Reply-To: <4D690D16.8020503@gmail.com>

Nicolas de Pesloüan 	<nicolas.2p.debian@gmail.com> wrote:

>Le 22/02/2011 00:20, Nicolas de Pesloüan a écrit :
>
>> After checking every protocol handlers installed by dev_add_pack(), it
>> appears that only 4 of them really use the orig_dev parameter given by
>> __netif_receive_skb():
>>
>> - bond_3ad_lacpdu_recv() @ drivers/net/bonding/bond_3ad.c
>> - bond_arp_recv() @ drivers/net/bonding/bond_main.c
>> - packet_rcv() @ net/packet/af_packet.c
>> - tpacket_rcv() @ net/packet/af_packet.c
>>
>>  From the bonding point of view, the meaning of orig_dev is obviously
>> "the device one layer below the bonding device, through which the packet
>> reached the bonding device". It is used by bond_3ad_lacpdu_recv() and
>> bond_arp_recv(), to find the underlying slave device through which the
>> LACPDU or ARP was received. (The protocol handler is registered at the
>> bonding device level).
>>
>>  From the af_packet point of view, the meaning is documented (in commit
>> "[AF_PACKET]: Add option to return orig_dev to userspace") as the
>> "physical device [that] actually received the traffic, instead of having
>> the encapsulating device hide that information."
>>
>> When the bonding device is just one level above the physical device, the
>> two meanings happen to match the same device, by chance.
>>
>> So, currently, a bonding device cannot stack properly on top of anything
>> but physical devices. It might not be a problem today, but may change in
>> the future...
>
>Hi Jay,
>
>Still thinking about this orig_dev stuff, I wonder why the protocol
>handlers used in bonding (bond_3ad_lacpdu_recv() and bond_arp_rcv()) are
>registered at the master level instead of at the slave level ?
>
>If they were registered at the slave level, they would simply receive
>skb->dev as the ingress interface and use this value instead of needing
>the orig_dev value given to them when they are registered at the master
>level.
>
>As orig_dev is only used by bonding and by af_packet, but they disagree on
>the exact meaning of orig_dev, one way to fix this discrepancy would be to
>remove one of the usage. As the af_packet usage is exposed to user space,
>bonding seems the right place to stop using orig_dev, even if orig_dev was
>introduced for bonding :-)
>
>I understand that this would add one entry per slave device to the
>ptype_base list, but this seems to be the only bad effect of registering
>at the slave level. Can you confirm that this was the reason to register
>at the master level instead?

	My recollection is that it was done the way it is because there
was no "orig_dev" delivery logic at the time.  A handler registered to a
slave dev would receive no packets at all because assignment of skb->dev
to the master happened first, and the "orig_dev" knowledge was lost.

	When 802.3ad was added, a skb->real_dev field was created, but
it wasn't used for delivery.  802.3ad used real_dev to figure out which
slave a LACPDU arrived on.  The skb->real_dev was eventually replaced
with the orig_dev business that's there now.

	Later, I did the arp_validate stuff the same way as 802.3ad
because it worked and was easier than registering a handler per slave.

>If you think registering at the slave level would cause too much impact on
>ptype_base, then we might have another way to stop using orig_dev for
>bonding:
>
>In __skb_bond_should_drop(), we already test for the two interesting protocols:
>
>if ((dev->priv_flags & IFF_SLAVE_NEEDARP) && skb->protocol == __cpu_to_be16(ETH_P_ARP))
>	return 0;
>
>if (master->priv_flags & IFF_MASTER_8023AD && skb->protocol == __cpu_to_be16(ETH_P_SLOW))
>	return 0;
>
>Would it be possible to call the right handlers directly from inside
>__skb_bond_should_drop() then let __skb_bond_should_drop() return 1
>("should drop") after processing the frames that are only of interest for
>bonding?

	Isn't one purpose of switching to rx_handler that there won't
need to be any skb_bond_should_drop logic in __netif_receive_skb at all?

	Still, if you're just trying to simplify __netif_receive_skb
first, I don't see any reason not to register the packet handlers at the
slave level.  Looking at the ptype_base hash, I don't think that the
protocols bonding is registering (ARP and SLOW) will hash collide with
IP or IPv6, so I suspect there won't be much impact.

	Once an rx_handler is used, then I suspect there's no need for
the packet handlers at all, since the rx_handler is within bonding and
can just deal with the ARP or LACPDU directly.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [Lxc-users] Bad checksums and lost packets with macvlan on dummy
From: Andrian Nord @ 2011-02-26 20:38 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: lxc-users, Patrick McHardy, Linux Netdev List
In-Reply-To: <4D6630D9.2050400@free.fr>

[-- Attachment #1: Type: text/plain, Size: 323 bytes --]

On Thu, Feb 24, 2011 at 11:20:09AM +0100, Daniel Lezcano wrote:
> I saw you were using the command 'nc6', do you use netcat with ipv6 ?

Well, yes and no. I've tried both ipv4 and ipv6 and my notebook has no
ipv6 address assigned, so most terrible connection was though ipv4 =).

At another server there is no ipv6 at all.

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [RFC] be2net: add rxhash support
From: Ajit Khaparde @ 2011-02-26 21:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev


> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Saturday, February 26, 2011 4:31 AM
> To: Khaparde, Ajit
> Cc: netdev@vger.kernel.org
> Subject: Re: [RFC] be2net: add rxhash support
> 
> Le vendredi 25 février 2011 à 15:35 -0600, Ajit Khaparde a écrit :
> 
> > I asked that because, if a switch is part a of the configuration,
> > the ASIC can receive packets other than the tcp flow.
> >
> > And if hashing is enabled for IP packets, we can see this behavior.
> > The other values indicate that hashing has been enabled for IPv4
> packets.
> 
> To make sure RSS (and rxhash) was OK, I added following debugging aid :
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index da0534d..e9b1180 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -688,6 +688,7 @@ static inline void sock_rps_save_rxhash(struct sock
> *sk, u32 rxhash)
>  {
>  #ifdef CONFIG_RPS
>  	if (unlikely(sk->sk_rxhash != rxhash)) {
> +		pr_err("rxhash change from %x to %x\n", sk->sk_rxhash,
> rxhash);
>  		sock_rps_reset_flow(sk);
>  		sk->sk_rxhash = rxhash;
>  	}
> 
> 
> And got following traces :
> 
> [  201.170297] change rxhash from 0 to be0b5a87
> [  232.607474] bonding: bond1: Setting eth3 as active slave.
> [  232.607478] bonding: bond1: making interface eth3 the new active
> one.
> [  232.710848] change rxhash from be0b5a87 to e56a3c1e
> [  300.047500] bonding: bond1: Setting eth1 as active slave.
> [  300.047504] bonding: bond1: making interface eth1 the new active
> one.
> [  300.159162] change rxhash from e56a3c1e to be0b5a87
> 
> The flip occured when I changed my active slave (bonding mode=1).
> 
> eth1 is a bnx2 NIC, while eth3 a be2net one, so its OK to change the
> rxhash in this case
> (different firmware/algo)
> 
> So as far as be2net is concerned, everything seems OK : all packets for
> a given flow get an unique RSS hash and can feed skb->rxhash
> 
Fair enough. Thanks.
I guess a fresh patch with the ethtool support included will be ideal,
instead of the previous patch?

-Ajit

^ permalink raw reply

* IPv6 source address selection and privacy extensions
From: Bruno Prémont @ 2011-02-26 22:16 UTC (permalink / raw)
  To: netdev

>From Documentation/networking/ip-sysctl.txt:

use_tempaddr - INTEGER
        Preference for Privacy Extensions (RFC3041).
          <= 0 : disable Privacy Extensions
          == 1 : enable Privacy Extensions, but prefer public
                 addresses over temporary addresses.
          >  1 : enable Privacy Extensions and prefer temporary
                 addresses over public addresses.
        Default:  0 (for most devices)
                 -1 (for point-to-point devices and loopback devices)

Is it possible with current kernel to have >1 make temporary addresses
used by default but have manual or dynamic (e.g. MAC based) address used
for some destination addresses/subnets?
If it's possible, how can this be done (adding a hint to ip-sysctl.txt
would then make it easy for others to find)

With IPv4 this can be done via `ip route add $subnet/$prefix src $addr`
though the same does not work for IPv6.

Thanks,
Bruno

^ permalink raw reply

* Re: [net-next-2.6 PATCH 02/10] ethtool: add ntuple flow specifier to network flow classifier
From: David Miller @ 2011-02-27  0:05 UTC (permalink / raw)
  To: alexander.h.duyck; +Cc: jeffrey.t.kirsher, bhutchings, netdev
In-Reply-To: <20110225233249.7920.70334.stgit@gitlad.jf.intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>
Date: Fri, 25 Feb 2011 15:32:49 -0800

> @@ -396,8 +411,10 @@ struct ethtool_rx_flow_spec {
>  		struct ethtool_ah_espip4_spec		esp_ip4_spec;
>  		struct ethtool_usrip4_spec		usr_ip4_spec;
>  		struct ethhdr				ether_spec;
> +		struct ethtool_ntuple_spec_ext		ntuple_spec;
>  		__u8					hdata[72];
>  	} h_u, m_u;
> +	__u32		flow_type_ext;
>  	__u64		ring_cookie;
>  	__u32		location;
>  };

How can you add this flow_type_ext member to this user visible structure
without utterly breaking userspace?  It changes the offsets of the
ring_cookie and location members.


^ permalink raw reply

* Re: [net-next-2.6 PATCH 01/10] ethtool: prevent null pointer dereference with NTUPLE set but no set_rx_ntuple
From: David Miller @ 2011-02-27  0:07 UTC (permalink / raw)
  To: alexander.h.duyck; +Cc: bhutchings, jeffrey.t.kirsher, netdev
In-Reply-To: <4D684BED.20805@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>
Date: Fri, 25 Feb 2011 16:40:13 -0800

> It cannot occur with any of the in-kernel drivers since they all set
> the NETIF_F_NTUPLE flag and have the function defined.  However going
> forward I would like to have the option of using the network flow
> classifier interface instead of the set_rx_ntuple interface due to the
> fact that it supports many of the features I needed.

This still doesn't explain to me why a driver would set the feature
flag, but not actually implement the feature.

I'm not applying this patch.

When you create the situation that causes the potentially NULL
dereference, then you can use that patch to show why this seemingly
illogical situation can indeed occur.

Until then no driver causes this issue, therefore the problem does
not exist.

^ permalink raw reply

* net-next: warnings from sysctl_net_exit
From: Stephen Hemminger @ 2011-02-27  0:56 UTC (permalink / raw)
  To: Alexey Dobriyan, David Miller; +Cc: netdev

Seeing lots of these messages in dmesg. Something is broken
recently in net-next.


[26207.669668] ------------[ cut here ]------------
[26207.669673] WARNING: at net/sysctl_net.c:84 sysctl_net_exit+0x2a/0x2c()
[26207.669675] Hardware name: System Product Name
[26207.669676] Modules linked in: ip6table_filter ip6_tables nfs lockd fscache nfs_acl auth_rpcgss sunrpc binfmt_misc ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc kvm_intel kvm snd_hda_codec_analog lm63 snd_hda_intel snd_hda_codec snd_hwdep radeon pl2303 snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event ttm drm_kms_helper snd_seq drm usbserial i7core_edac snd_timer snd_seq_device snd sky2 e1000e edac_core psmouse igb serio_raw soundcore snd_page_alloc i2c_algo_bit asus_atk0110 hid_belkin usbhid hid pata_marvell ahci libahci dca floppy btrfs lzo_compress zlib_deflate crc32c libcrc32c
[26207.669725] Pid: 67, comm: kworker/u:5 Tainted: G        W   2.6.38-rc5-net-next+ #38
[26207.669726] Call Trace:
[26207.669731]  [<ffffffff81040dd2>] ? warn_slowpath_common+0x85/0x9d
[26207.669735]  [<ffffffff813617f6>] ? cleanup_net+0x0/0x19a
[26207.669738]  [<ffffffff81040e04>] ? warn_slowpath_null+0x1a/0x1c
[26207.669740]  [<ffffffff814154ad>] ? sysctl_net_exit+0x2a/0x2c
[26207.669742]  [<ffffffff8136144e>] ? ops_exit_list+0x2a/0x5b
[26207.669745]  [<ffffffff813618f0>] ? cleanup_net+0xfa/0x19a
[26207.669749]  [<ffffffff810575c1>] ? process_one_work+0x233/0x3aa
[26207.669752]  [<ffffffff81057528>] ? process_one_work+0x19a/0x3aa
[26207.669755]  [<ffffffff810599c2>] ? worker_thread+0x13b/0x25a
[26207.669757]  [<ffffffff81059887>] ? worker_thread+0x0/0x25a
[26207.669760]  [<ffffffff8105d0f5>] ? kthread+0x9d/0xa5
[26207.669763]  [<ffffffff8106d618>] ? trace_hardirqs_on_caller+0x10c/0x130
[26207.669766]  [<ffffffff810030d4>] ? kernel_thread_helper+0x4/0x10
[26207.669770]  [<ffffffff8142f300>] ? restore_args+0x0/0x30
[26207.669772]  [<ffffffff8105d058>] ? kthread+0x0/0xa5
[26207.669774]  [<ffffffff810030d0>] ? kernel_thread_helper+0x0/0x10
[26207.669776] ---[ end trace 0cd6e119ada0eab1 ]---

^ permalink raw reply

* Re: [net-next-2.6 PATCH 01/10] ethtool: prevent null pointer dereference with NTUPLE set but no set_rx_ntuple
From: Alexander Duyck @ 2011-02-27  2:16 UTC (permalink / raw)
  To: David Miller; +Cc: alexander.h.duyck, bhutchings, jeffrey.t.kirsher, netdev
In-Reply-To: <20110226.160747.226765885.davem@davemloft.net>

On Sat, Feb 26, 2011 at 4:07 PM, David Miller <davem@davemloft.net> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> Date: Fri, 25 Feb 2011 16:40:13 -0800
>
>> It cannot occur with any of the in-kernel drivers since they all set
>> the NETIF_F_NTUPLE flag and have the function defined.  However going
>> forward I would like to have the option of using the network flow
>> classifier interface instead of the set_rx_ntuple interface due to the
>> fact that it supports many of the features I needed.
>
> This still doesn't explain to me why a driver would set the feature
> flag, but not actually implement the feature.

Actually the reason I ran into this is because of the patches in the
RFC set.  Basically I was looking at moving the ntuple support in
ixgbe over to network flow classifier rules.  As such I was leaving
the ntuple flag set, but using set_rxnfc via the filter rules instead.
 If you recommend adding a new flag to do that I am fine with that.

> I'm not applying this patch.
>
> When you create the situation that causes the potentially NULL
> dereference, then you can use that patch to show why this seemingly
> illogical situation can indeed occur.
>
> Until then no driver causes this issue, therefore the problem does
> not exist.

I'll do some digging late next week to see if there are any other
means of encountering the issue and will get back to you if I find
anything.

Thanks,

Alex

^ permalink raw reply

* dccp: Change maintainer
From: Arnaldo Carvalho de Melo @ 2011-02-27  2:28 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Networking Development Mailing List

Today was as good as any other day, but I felt I had to do things I love
to when paying hommage to somebody I love, so please apply this one,
something he would be proud of, even if so geekly.

    Way past it was/is deserved.

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

diff --git a/MAINTAINERS b/MAINTAINERS
index 5dd6c75..1752436 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2026,7 +2026,7 @@ F:	Documentation/scsi/dc395x.txt
 F:	drivers/scsi/dc395x.*

 DCCP PROTOCOL
-M:	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
+M:	Gerrit Renker <gerrit@erg.abdn.ac.uk>
 L:	dccp@vger.kernel.org
 W:	http://www.linuxfoundation.org/collaborate/workgroups/networking/dccp
 S:	Maintained

^ permalink raw reply related

* txqueuelen has wrong units; should be time
From: Albert Cahalan @ 2011-02-27  5:44 UTC (permalink / raw)
  To: linux-kernel, netdev

(thinking about the bufferbloat problem here)

Setting txqueuelen to some fixed number of packets
seems pretty broken if:

1. a link can vary in speed (802.11 especially)

2. a packet can vary in size (9 KiB jumbograms, etc.)

3. there is other weirdness (PPP compression, etc.)

It really needs to be set to some amount of time,
with the OS accounting for packets in terms of the
time it will take to transmit them. This would need
to account for physical-layer packet headers and
minimum spacing requirements.

I think it could also account for estimated congestion
on the local link, because that effects the rate at which
the queue can empty. An OS can directly observe this
on some types of hardware.

Nanoseconds seems fine; it's unlikely you'd ever want
more than 4.2 seconds (32-bit unsigned) of queue.

I guess there are at least 2 queues of interest, with the
second one being under control of the hardware driver.
Having the kernel split the max time as appropriate for
the hardware seems nicest.

^ permalink raw reply

* Re: dccp: Change maintainer
From: David Miller @ 2011-02-27  5:47 UTC (permalink / raw)
  To: acme; +Cc: netdev
In-Reply-To: <20110227022854.GB19108@ghostprotocols.net>

From: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Date: Sat, 26 Feb 2011 23:28:54 -0300

> Today was as good as any other day, but I felt I had to do things I love
> to when paying hommage to somebody I love, so please apply this one,
> something he would be proud of, even if so geekly.

I think you're trying to say "I wish people would sending me DCCP bug
reports, damn..." :-)

>     Way past it was/is deserved.
>     
>     Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Applied, thanks.

^ permalink raw reply

* Re: net-next: warnings from sysctl_net_exit
From: David Miller @ 2011-02-27  6:23 UTC (permalink / raw)
  To: shemminger; +Cc: adobriyan, netdev
In-Reply-To: <20110226165601.48858003@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Sat, 26 Feb 2011 16:56:01 -0800

> Seeing lots of these messages in dmesg. Something is broken
> recently in net-next.

Did you by change pull plain net-2.6 into that tree?  Because one
commit which is in net-2.6 but not in net-next-2.6 catches my eye:

commit c486da34390846b430896a407b47f0cea3a4189c
Author: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
Date:   Thu Feb 24 19:48:03 2011 +0000

    sysctl: ipv6: use correct net in ipv6_sysctl_rtcache_flush
    
    Before this patch issuing these commands:
    
      fd = open("/proc/sys/net/ipv6/route/flush")
      unshare(CLONE_NEWNET)
      write(fd, "stuff")
    
    would flush the newly created net, not the original one.
    
    The equivalent ipv4 code is correct (stores the net inside ->extra1).
    Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
    
    Signed-off-by: David S. Miller <davem@davemloft.net>


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox