Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH] network: return errors if we know tcp_connect failed
From: Patrick McHardy @ 2010-11-15 10:32 UTC (permalink / raw)
  To: Hua Zhong
  Cc: 'Eric Paris', netdev, linux-kernel, davem, kuznet, pekkas,
	jmorris, yoshfuji
In-Reply-To: <017301cb82bf$54540cf0$fcfc26d0$@com>

On 13.11.2010 00:14, Hua Zhong wrote:
>> On 11.11.2010 22:58, Hua Zhong wrote:
>>>> Yes, I realize this is little different than if the
>>>> SYN was dropped in the first network device, but it is different
>>>> because we know what happened!  We know that connect() call failed
>>>> and that there isn't anything coming back.
>>>
>>> I would argue that -j DROP should behave exactly as the packet is
>> dropped in the network, while -j REJECT should signal the failure to
>> the application as soon as possible (which it doesn't seem to do).
>>
>> It sends an ICMP error or TCP reset. Interpretation is up to TCP.
> 
> Huh? It's the OUTPUT chain we are talking about. There is no ICMP error or
> TCP reset.

Of course there is.

ICMP (default):

iptables -A OUTPUT -p tcp -j REJECT

TCP reset:

iptables -A OUTPUT -p tcp -j REJECT --reject-with tcp-reset

The second one will cause a hard error for the connection.

^ permalink raw reply

* Re: [PATCH] netfilter: fix the wrong alloc_size
From: Patrick McHardy @ 2010-11-15 10:48 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1289716548-30767-1-git-send-email-xiaosuo@gmail.com>

On 14.11.2010 07:35, Changli Gao wrote:
> In function update_alloc_size(), sizeof(struct nf_ct_ext) is added twice
> wrongly.

Nice catch, applied.

^ permalink raw reply

* Re: [PATCH] netfilter: define ct_*_info as needed
From: Patrick McHardy @ 2010-11-15 10:51 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1289716813-30822-1-git-send-email-xiaosuo@gmail.com>

On 14.11.2010 07:40, Changli Gao wrote:
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

Applied.

^ permalink raw reply

* Re: Fwd: a Great Idea - include Kademlia networking protocol in kernel -- REVISITED
From: Chris Snook @ 2010-11-15 10:55 UTC (permalink / raw)
  To: Marcos; +Cc: Eric Dumazet, netdev, Stephen Guerin
In-Reply-To: <AANLkTimkPpWob8ANySeWBvDE+Pq2wy4SQWJOSYzbS7QG@mail.gmail.com>

On Sun, Nov 14, 2010 at 4:14 AM, Marcos <stalkingtime@gmail.com> wrote:
>> I have no idea why and how kademlia would be added to "linux kernel"
>>
>> Its a protocol based on UDP, and probably already done on userland.
>>
>> What am I missing ?
>
> The idea is to tightly couple it to the operating system to create a
> sort of "super operating system" that is seamless to the application
> layers above.  Just like memory stores are tightly integrated as to be
> unnoticeable....

According to Google, you're not an experienced kernel developer, so
let me explain what happens when you take userspace code and implement
it in the kernel:

1) You restrict development of the feature to a specialized subset of C only.
2) You lose the simple filesystem interface enjoyed by userspace.
3) You lose the simple memory allocation enjoyed by userspace.
4) You lose the BSD network socket interface enjoyed by userspace.
5) You lose the ability to call recursive functions such as quicksort.
6) You lose the ability to call nearly every library ever written.
7) You freeze your ABI, making it much more difficult to fix your
mistakes without breaking huge amounts of userspace code, which pisses
off users, though they'll also be pissed if you *don't* fix your
mistakes.
8) You make debugging 10x harder.
9) You piss off your users 100x more when it breaks.
10) You make any security holes in your code 1000x worse.

There are plenty more drawbacks, but I think I've made the point.  We
don't put things in the kernel unless userspace is inadequate.  In
fact, we've done a lot of work over the years to provide kernel
infrastructure to allow us to move things *out* of the kernel and into
userspace.  If you think this feature would benefit from
filesystem-level integration, try implementing it as a FUSE module.
At least then you'll be able to use libraries and the language of your
choice.  None of the drawbacks of FUSE relative to the core kernel
apply in this case, so if it doesn't work with FUSE, it certainly
won't work in the kernel.

-- Chris

^ permalink raw reply

* Re: [PATCH] netfilter: don't use atomic bit operation
From: Patrick McHardy @ 2010-11-15 10:59 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1289725534-8439-1-git-send-email-xiaosuo@gmail.com>

On 14.11.2010 10:05, Changli Gao wrote:
> As we own ct, and the others can't see it until we confirm it, we don't
> need to use atomic bit operation on ct->status.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ---
>  include/net/netfilter/nf_nat_core.h |    4 ++--
>  net/ipv4/netfilter/nf_nat_core.c    |    4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> diff --git a/include/net/netfilter/nf_nat_core.h b/include/net/netfilter/nf_nat_core.h
> index 33602ab..52ac1d8 100644
> --- a/include/net/netfilter/nf_nat_core.h
> +++ b/include/net/netfilter/nf_nat_core.h
> @@ -21,9 +21,9 @@ static inline int nf_nat_initialized(struct nf_conn *ct,
>  				     enum nf_nat_manip_type manip)
>  {
>  	if (manip == IP_NAT_MANIP_SRC)
> -		return test_bit(IPS_SRC_NAT_DONE_BIT, &ct->status);
> +		return IPS_SRC_NAT_DONE_BIT & ct->status;
>  	else
> -		return test_bit(IPS_DST_NAT_DONE_BIT, &ct->status);
> +		return IPS_DST_NAT_DONE_BIT & ct->status;
>  }

Looks fine, but I changed the order to ct->status & ...

Applied.

^ permalink raw reply

* Re: [PATCH] netfilter: place in source hash after SNAT is done
From: Patrick McHardy @ 2010-11-15 11:07 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1289796483-2970-1-git-send-email-xiaosuo@gmail.com>

On 15.11.2010 05:48, Changli Gao wrote:
> If SNAT isn't done, the wrong info maybe got by the other cts.
> 
> As the filter table is after DNAT table, the packets dropped in filter
> table also bother bysource hash table.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ---
>  net/ipv4/netfilter/nf_nat_core.c |   18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> diff --git a/net/ipv4/netfilter/nf_nat_core.c b/net/ipv4/netfilter/nf_nat_core.c
> index c04787c..51ce55a 100644
> --- a/net/ipv4/netfilter/nf_nat_core.c
> +++ b/net/ipv4/netfilter/nf_nat_core.c
> @@ -545,11 +550,10 @@ static void nf_nat_move_storage(void *new, void *old)
>  	struct nf_conn_nat *old_nat = old;
>  	struct nf_conn *ct = old_nat->ct;
>  
> -	if (!ct || !(ct->status & IPS_NAT_DONE_MASK))
> +	if (!ct || !(ct->status & IPS_SRC_NAT_DONE))
>  		return;
>  
>  	spin_lock_bh(&nf_nat_lock);
> -	new_nat->ct = ct;

Why are you removing this?

^ permalink raw reply

* Re: [PATCH] netfilter: guard the size of the nf_ct_ext
From: Patrick McHardy @ 2010-11-15 11:10 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1289801749-8993-1-git-send-email-xiaosuo@gmail.com>

On 15.11.2010 07:15, Changli Gao wrote:
> We'd better guard the size of the nf_ct_ext, as the nf_ct_ext.len is u8.
> If the size is bigger than 255, a warning will be printed.

Why are you checking this in basically every possible spot?
Just checking once during registration (assuming the worst
case of a conntrack using every possible extension) should
be enough.

^ permalink raw reply

* Re: [PATCH] netfilter: define NF_CT_EXT_* as needed
From: Patrick McHardy @ 2010-11-15 11:13 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1289813149-22897-1-git-send-email-xiaosuo@gmail.com>

On 15.11.2010 10:25, Changli Gao wrote:
>  #ifdef CONFIG_NF_CONNTRACK_EVENTS
> diff --git a/include/net/netfilter/nf_conntrack_extend.h b/include/net/netfilter/nf_conntrack_extend.h
> index 0772d29..1a9f96d 100644
> --- a/include/net/netfilter/nf_conntrack_extend.h
> +++ b/include/net/netfilter/nf_conntrack_extend.h
> @@ -7,10 +7,16 @@
>  
>  enum nf_ct_ext_id {
>  	NF_CT_EXT_HELPER,
> +#if defined(CONFIG_NF_NAT) || defined(CONFIG_NF_NAT_MODULE)
>  	NF_CT_EXT_NAT,
> +#endif
>  	NF_CT_EXT_ACCT,
> +#ifdef CONFIG_NF_CONNTRACK_EVENTS
>  	NF_CT_EXT_ECACHE,
> +#endif
> +#ifdef CONFIG_NF_CONNTRACK_ZONES
>  	NF_CT_EXT_ZONE,
> +#endif
>  	NF_CT_EXT_NUM,

What is the purpose of #ifdef'ing the extension IDs?

^ permalink raw reply

* Re: [PATCH] netfilter: place in source hash after SNAT is done
From: Changli Gao @ 2010-11-15 11:16 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <4CE11475.30905@trash.net>

On Mon, Nov 15, 2010 at 7:07 PM, Patrick McHardy <kaber@trash.net> wrote:
> On 15.11.2010 05:48, Changli Gao wrote:
>> If SNAT isn't done, the wrong info maybe got by the other cts.
>>
>> As the filter table is after DNAT table, the packets dropped in filter
>> table also bother bysource hash table.
>>
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>> ---
>>  net/ipv4/netfilter/nf_nat_core.c |   18 +++++++++++-------
>>  1 file changed, 11 insertions(+), 7 deletions(-)
>> diff --git a/net/ipv4/netfilter/nf_nat_core.c b/net/ipv4/netfilter/nf_nat_core.c
>> index c04787c..51ce55a 100644
>> --- a/net/ipv4/netfilter/nf_nat_core.c
>> +++ b/net/ipv4/netfilter/nf_nat_core.c
>> @@ -545,11 +550,10 @@ static void nf_nat_move_storage(void *new, void *old)
>>       struct nf_conn_nat *old_nat = old;
>>       struct nf_conn *ct = old_nat->ct;
>>
>> -     if (!ct || !(ct->status & IPS_NAT_DONE_MASK))
>> +     if (!ct || !(ct->status & IPS_SRC_NAT_DONE))
>>               return;
>>
>>       spin_lock_bh(&nf_nat_lock);
>> -     new_nat->ct = ct;
>
> Why are you removing this?
>

nf_ct_ext uses __krealloc() to enlarge memory, so the content has been
copied already.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: Kernel rwlock design, Multicore and IGMP
From: Cypher Wu @ 2010-11-15 11:18 UTC (permalink / raw)
  To: Chris Metcalf; +Cc: Américo Wang, Eric Dumazet, linux-kernel, netdev
In-Reply-To: <AANLkTind92uwcigzmDn8yn9a22exDy7zcreGQ5-6NLV-@mail.gmail.com>

In that post I want to confirm another thing: if we join/leave on
different cores that every call will start the timer for IGMP message
using the same in_dev->mc_list, could that be optimized?

-- 
Cyberman Wu
http://www.meganovo.com

^ permalink raw reply

* Re: [PATCH] netfilter: define NF_CT_EXT_* as needed
From: Changli Gao @ 2010-11-15 11:19 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <4CE115C5.20302@trash.net>

On Mon, Nov 15, 2010 at 7:13 PM, Patrick McHardy <kaber@trash.net> wrote:
> On 15.11.2010 10:25, Changli Gao wrote:
>>  #ifdef CONFIG_NF_CONNTRACK_EVENTS
>> diff --git a/include/net/netfilter/nf_conntrack_extend.h b/include/net/netfilter/nf_conntrack_extend.h
>> index 0772d29..1a9f96d 100644
>> --- a/include/net/netfilter/nf_conntrack_extend.h
>> +++ b/include/net/netfilter/nf_conntrack_extend.h
>> @@ -7,10 +7,16 @@
>>
>>  enum nf_ct_ext_id {
>>       NF_CT_EXT_HELPER,
>> +#if defined(CONFIG_NF_NAT) || defined(CONFIG_NF_NAT_MODULE)
>>       NF_CT_EXT_NAT,
>> +#endif
>>       NF_CT_EXT_ACCT,
>> +#ifdef CONFIG_NF_CONNTRACK_EVENTS
>>       NF_CT_EXT_ECACHE,
>> +#endif
>> +#ifdef CONFIG_NF_CONNTRACK_ZONES
>>       NF_CT_EXT_ZONE,
>> +#endif
>>       NF_CT_EXT_NUM,
>
> What is the purpose of #ifdef'ing the extension IDs?
>

struct nf_ct_ext {
        struct rcu_head rcu;
        u8 offset[NF_CT_EXT_NUM];
        u8 len;
        char data[0];
};

Less IDs make nf_ct_ext smaller.
-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] netfilter: define NF_CT_EXT_* as needed
From: Patrick McHardy @ 2010-11-15 11:20 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <AANLkTikxaZ0p4idy5qQmoi9R6Obz1UKECFHS-8LV_vKT@mail.gmail.com>

On 15.11.2010 12:19, Changli Gao wrote:
> On Mon, Nov 15, 2010 at 7:13 PM, Patrick McHardy <kaber@trash.net> wrote:
>> On 15.11.2010 10:25, Changli Gao wrote:
>>>  #ifdef CONFIG_NF_CONNTRACK_EVENTS
>>> diff --git a/include/net/netfilter/nf_conntrack_extend.h b/include/net/netfilter/nf_conntrack_extend.h
>>> index 0772d29..1a9f96d 100644
>>> --- a/include/net/netfilter/nf_conntrack_extend.h
>>> +++ b/include/net/netfilter/nf_conntrack_extend.h
>>> @@ -7,10 +7,16 @@
>>>
>>>  enum nf_ct_ext_id {
>>>       NF_CT_EXT_HELPER,
>>> +#if defined(CONFIG_NF_NAT) || defined(CONFIG_NF_NAT_MODULE)
>>>       NF_CT_EXT_NAT,
>>> +#endif
>>>       NF_CT_EXT_ACCT,
>>> +#ifdef CONFIG_NF_CONNTRACK_EVENTS
>>>       NF_CT_EXT_ECACHE,
>>> +#endif
>>> +#ifdef CONFIG_NF_CONNTRACK_ZONES
>>>       NF_CT_EXT_ZONE,
>>> +#endif
>>>       NF_CT_EXT_NUM,
>>
>> What is the purpose of #ifdef'ing the extension IDs?
>>
> 
> struct nf_ct_ext {
>         struct rcu_head rcu;
>         u8 offset[NF_CT_EXT_NUM];
>         u8 len;
>         char data[0];
> };
> 
> Less IDs make nf_ct_ext smaller.

Right, thanks for the explanation.

^ permalink raw reply

* Re: [PATCH] netfilter: define NF_CT_EXT_* as needed
From: Patrick McHardy @ 2010-11-15 11:24 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1289813149-22897-1-git-send-email-xiaosuo@gmail.com>

On 15.11.2010 10:25, Changli Gao wrote:
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

Applied. Please provide the intention of the patch in the changelog
next time. I've now added your explanation.

^ permalink raw reply

* Re: [PATCH] netfilter: define nat_pptp_info as needed
From: Patrick McHardy @ 2010-11-15 11:27 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1289735268-13791-1-git-send-email-xiaosuo@gmail.com>

On 14.11.2010 12:47, Changli Gao wrote:
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

Applied.

^ permalink raw reply

* Re: Kernel rwlock design, Multicore and IGMP
From: Eric Dumazet @ 2010-11-15 11:31 UTC (permalink / raw)
  To: Cypher Wu; +Cc: Chris Metcalf, Américo Wang, linux-kernel, netdev
In-Reply-To: <AANLkTinzRm_5WW5HwxzXqZZD4255GDpR4yzELqz66Dfn@mail.gmail.com>

Le lundi 15 novembre 2010 à 19:18 +0800, Cypher Wu a écrit :
> In that post I want to confirm another thing: if we join/leave on
> different cores that every call will start the timer for IGMP message
> using the same in_dev->mc_list, could that be optimized?
> 

Which timer exactly ? Is it a real scalability problem ?

I believe RTNL would be the blocking point actually...

^ permalink raw reply

* Re: [PATCH] netfilter: guard the size of the nf_ct_ext
From: Changli Gao @ 2010-11-15 11:35 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <4CE1150D.1080302@trash.net>

On Mon, Nov 15, 2010 at 7:10 PM, Patrick McHardy <kaber@trash.net> wrote:
> On 15.11.2010 07:15, Changli Gao wrote:
>> We'd better guard the size of the nf_ct_ext, as the nf_ct_ext.len is u8.
>> If the size is bigger than 255, a warning will be printed.
>
> Why are you checking this in basically every possible spot?
> Just checking once during registration (assuming the worst
> case of a conntrack using every possible extension) should
> be enough.
>

Yes. It is enough, if we check every patch carefully. Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [Patch] fix packet loss and massive ping spikes with PPP multi-link
From: Richard Hartmann @ 2010-11-15 12:07 UTC (permalink / raw)
  To: Ben McKeegan
  Cc: Paul Mackerras, netdev, linux-ppp, Alan Cox,
	Alexander E. Patrakov, linux-kernel, gabriele.paoloni
In-Reply-To: <AANLkTi=GHmqeb=8rMZAnpqdKGbTEf2mRYhygX1aC2hyD@mail.gmail.com>

On Mon, Nov 8, 2010 at 15:05, Richard Hartmann
<richih.mailinglist@gmail.com> wrote:

> Is there any update on this? It's been quite some time since you last updated
> on this issue.

As it's been a week without any reply and as I know how stuff can
drown in more important work & projects, I am tentatively poking again
:)


RIchard

^ permalink raw reply

* Re: [PATCH 0/5] bridge: RCU annotation and cleanup
From: Tetsuo Handa @ 2010-11-15 12:23 UTC (permalink / raw)
  To: shemminger, davem, eric.dumazet; +Cc: netdev, bridge
In-Reply-To: <20101114211201.678755903@vyatta.com>

Stephen Hemminger wrote:
> This is a split up of what Eric did with a couple of small changes and additions.
Something seems to be wrong with this patchset.

--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
> @@ -173,8 +177,8 @@ forward:
>  	switch (p->state) {
>  	case BR_STATE_FORWARDING:
>  		rhook = rcu_dereference(br_should_route_hook);
> -		if (rhook != NULL) {
> -			if (rhook(skb))
> +		if (rhook) {
> +			if ((*rhook)(skb))

Is *rhook != NULL guaranteed when rhook != NULL?

>  				return skb;
>  			dest = eth_hdr(skb)->h_dest;
>  		}

--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
> @@ -242,7 +242,7 @@ static void br_multicast_flood(struct ne
>  		if ((unsigned long)lport >= (unsigned long)port)
>  			p = rcu_dereference(p->next);
>  		if ((unsigned long)rport >= (unsigned long)port)
> -			rp = rcu_dereference(rp->next);
> +			rp = rcu_dereference(hlist_next_rcu(rp->next));

I think this one is hlist_next_rcu(rp).

>  	}
>  
>  	if (!prev)

--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
> @@ -475,11 +475,8 @@ int br_del_if(struct net_bridge *br, str
>  {
>  	struct net_bridge_port *p;
>  
> -	if (!br_port_exists(dev))
> -		return -EINVAL;
> -
>  	p = br_port_get(dev);

Don't you need to use br_port_get_rtnl()?  (I don't know.)

> -	if (p->br != br)
> +	if (!p || p->br != br)
>  		return -EINVAL;
>  
>  	del_nbp(p);

--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
> @@ -169,9 +171,9 @@ static int br_rtm_setlink(struct sk_buff
>  	if (!dev)
>  		return -ENODEV;
>  
> -	if (!br_port_exists(dev))
> -		return -EINVAL;
>  	p = br_port_get(dev);

Don't you need to use br_port_get_rtnl()?  (I don't know.)

> +	if (!p)
> +		return -EINVAL;
>  
>  	/* if kernel STP is running, don't allow changes */
>  	if (p->br->stp_enabled == BR_KERNEL_STP)

--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
> @@ -151,11 +151,21 @@ struct net_bridge_port
>  #endif
>  };
>  
> -#define br_port_get_rcu(dev) \
> -	((struct net_bridge_port *) rcu_dereference(dev->rx_handler_data))
> -#define br_port_get(dev) ((struct net_bridge_port *) dev->rx_handler_data)
>  #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT)
>  
> +static inline struct net_bridge_port *br_port_get_rcu(const struct net_device *dev)
> +{
> +	return br_port_exists(dev) ?
> +		rcu_dereference(dev->rx_handler_data) : NULL;
> +}
> +
> +static inline struct net_bridge_port *br_port_get(struct net_device *dev)
> +{
> +	return br_port_exists(dev) ? dev->rx_handler_data : NULL;
> +}
> +
> +#define br_port_get(dev) ((struct net_bridge_port *) dev->rx_handler_data)

Why are you defining br_port_get() twice, once as macro and once as inlined
function?

^ permalink raw reply

* Re: [PATCH 0/5] bridge: RCU annotation and cleanup
From: Eric Dumazet @ 2010-11-15 13:33 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: shemminger, davem, netdev, bridge
In-Reply-To: <201011152123.HHB21896.HFOOVSMFOtLJQF@I-love.SAKURA.ne.jp>

Le lundi 15 novembre 2010 à 21:23 +0900, Tetsuo Handa a écrit :
> Stephen Hemminger wrote:
> > This is a split up of what Eric did with a couple of small changes and additions.
> Something seems to be wrong with this patchset.
> 
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> > @@ -173,8 +177,8 @@ forward:
> >  	switch (p->state) {
> >  	case BR_STATE_FORWARDING:
> >  		rhook = rcu_dereference(br_should_route_hook);
> > -		if (rhook != NULL) {
> > -			if (rhook(skb))
> > +		if (rhook) {
> > +			if ((*rhook)(skb))
> 
> Is *rhook != NULL guaranteed when rhook != NULL?

Its the C standard convention, we call function pointed by rhook, not
*rhook.

$ cat func.c
typedef int (*hook_t)(int a1, int a2);

hook_t *hook;

int foo(int a1, int a2)
{
hook_t *handler = hook;

	if (handler)
		return handler(a1, a2);
	return 0;
}
$ gcc -O2 -c func.c
func.c: In function ‘foo’:
func.c:10:17: error: called object ‘handler’ is not a function


Now, if we use (*handler), it works :

$ cat func.c
typedef int (*hook_t)(int a1, int a2);

hook_t *hook;

int foo(int a1, int a2)
{
hook_t *handler = hook;

	if (handler)
		return (*handler)(a1, a2);
	return 0;
}
$ gcc -O2 -c func.c
$




^ permalink raw reply

* NETPOLL on bond interfaces
From: sergey belov @ 2010-11-15 13:49 UTC (permalink / raw)
  To: Netdev

Before 2.6.36 came out I was using this patch to enable netpoll on bond
interfaces (see below)

When I tried to apply this patch to 2.6.36 sources some chunks was failed.

[root@kernel-x32 experimental]# patch -p0 < netpoll.patch
patching file a/drivers/net/bonding/bond_main.c
Hunk #1 FAILED at 75.
Hunk #2 FAILED at 416.
Hunk #3 succeeded at 1334 with fuzz 2 (offset 23 lines).
Hunk #4 succeeded at 1729 with fuzz 1 (offset 15 lines).
Hunk #5 succeeded at 1837 (offset 42 lines).
Hunk #6 succeeded at 2026 (offset 19 lines).
Hunk #7 succeeded at 4490 with fuzz 1 (offset 11 lines).
Hunk #8 succeeded at 4674 (offset 68 lines).
2 out of 8 hunks FAILED -- saving rejects to file
a/drivers/net/bonding/bond_main.c.rej
patching file a/drivers/net/bonding/bonding.h
Hunk #2 succeeded at 227 (offset 4 lines).
patching file a/include/linux/netdevice.h
Hunk #1 FAILED at 52.
Hunk #2 FAILED at 625.
2 out of 2 hunks FAILED -- saving rejects to file
a/include/linux/netdevice.h.rej
patching file a/include/linux/netpoll.h
Hunk #1 succeeded at 43 with fuzz 2 (offset 9 lines).
patching file a/net/core/netpoll.c
Reversed (or previously applied) patch detected!  Assume -R? [n]


Inspecting changelog at
http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.36 I found there
are a lot of changes was done inside of the bonding and netpoll subsystem.

Could please someone provide me new patch to enable netpoll on bond ifaces
or send me a workaround how to enable it in any other way.

We need to use it only to enable netconsole on our servers with bond.


   1. diff --git a/drivers/net/bonding/bond_main.c
   b/drivers/net/bonding/bond_main.c
   2. index d927f71..6304720 100644
   3. --- a/drivers/net/bonding/bond_main.c
   4. +++ b/drivers/net/bonding/bond_main.c
   5. @@ -75,6 +75,7 @@
   6.  #include <linux/jiffies.h>
   7.  #include <net/route.h>
   8.  #include <net/net_namespace.h>
   9. +#include <linux/netpoll.h>
   10.  #include "bonding.h"
   11.  #include "bond_3ad.h"
   12.  #include "bond_alb.h"
   13. @@ -415,7 +416,12 @@ int bond_dev_queue_xmit(struct bonding *bond,
   struct sk_buff *skb,
   14.         }
   15.
   16.         skb->priority = 1;
   17. -       dev_queue_xmit(skb);
   18. +#ifdef CONFIG_NET_POLL_CONTROLLER
   19. +       if (bond->netpoll)
   20. +               netpoll_send_skb(bond->netpoll, skb);
   21. +       else
   22. +#endif
   23. +               dev_queue_xmit(skb);
   24.
   25.         return 0;
   26.  }
   27. @@ -1305,6 +1311,44 @@ static void bond_detach_slave(struct bonding
   *bond, struct slave *slave)
   28.         bond->slave_cnt--;
   29.  }
   30.
   31. +#ifdef CONFIG_NET_POLL_CONTROLLER
   32. +static int slaves_support_netpoll(struct net_device *bond_dev)
   33. +{
   34. +       struct bonding *bond = netdev_priv(bond_dev);
   35. +       struct slave *slave;
   36. +       int i;
   37. +
   38. +       bond_for_each_slave(bond, slave, i)
   39. +               if (!slave->dev->netdev_ops->ndo_poll_controller)
   40. +                       return 0;
   41. +       return 1;
   42. +}
   43. +
   44. +static void bond_poll_controller(struct net_device *bond_dev)
   45. +{
   46. +       struct bonding *bond = netdev_priv(bond_dev);
   47. +       struct slave *slave;
   48. +       int i;
   49. +
   50. +       if (slaves_support_netpoll(bond_dev))
   51. +               bond_for_each_slave(bond, slave, i)
   52. +                       netpoll_poll_dev(slave->dev);
   53. +}
   54. +
   55. +static int bond_netpoll_setup(struct net_device *bond_dev,
   56. +                             struct netpoll_info *npinfo)
   57. +{
   58. +       struct bonding *bond = netdev_priv(bond_dev);
   59. +       struct slave *slave;
   60. +       int i;
   61. +
   62. +       bond_for_each_slave(bond, slave, i)
   63. +               if (slave->dev)
   64. +                       slave->dev->npinfo = npinfo;
   65. +       return 0;
   66. +}
   67. +#endif
   68. +
   69.  /*---------------------------------- IOCTL
   ----------------------------------*/
   70.
   71.  static int bond_sethwaddr(struct net_device *bond_dev,
   72. @@ -1670,6 +1714,16 @@ int bond_enslave(struct net_device *bond_dev,
   struct net_device *slave_dev)
   73.                         bond->primary_slave = new_slave;
   74.         }
   75.
   76. +#ifdef CONFIG_NET_POLL_CONTROLLER
   77. +       if (slaves_support_netpoll(bond_dev))
   78. +               slave_dev->npinfo = bond_dev->npinfo;
   79. +       else {
   80. +               pr_err(DRV_NAME "New slave device %s does not support
   netpoll.\n",
   81. +                      slave_dev->name);
   82. +               pr_err(DRV_NAME "netpoll disabled for %s.\n",
   bond_dev->name);
   83. +       }
   84. +#endif
   85. +
   86.         write_lock_bh(&bond->curr_slave_lock);
   87.
   88.         switch (bond->params.mode) {
   89. @@ -1741,6 +1795,10 @@ int bond_enslave(struct net_device *bond_dev,
   struct net_device *slave_dev)
   90.
   91.  /* Undo stages on error */
   92.  err_close:
   93. +#ifdef CONFIG_NET_POLL_CONTROLLER
   94. +       if (slave_dev->npinfo)
   95. +               slave_dev->npinfo = NULL;
   96. +#endif
   97.         dev_close(slave_dev);
   98.
   99.  err_unset_master:
   100. @@ -1949,6 +2007,10 @@ int bond_release(struct net_device *bond_dev,
   struct net_device *slave_dev)
   101.                                    IFF_SLAVE_INACTIVE | IFF_BONDING
   |
   102.                                    IFF_SLAVE_NEEDARP);
   103.
   104. +#ifdef CONFIG_NET_POLL_CONTROLLER
   105. +       if (slave_dev->npinfo)
   106. +               slave_dev->npinfo = NULL;
   107. +#endif
   108.         kfree(slave);
   109.
   110.         return 0;  /* deletion OK */
   111. @@ -4417,6 +4479,20 @@ out:
   112.         return 0;
   113.  }
   114.
   115. +#ifdef CONFIG_NET_POLL_CONTROLLER
   116. +int bond_netpoll_start_xmit(struct netpoll *np, struct sk_buff *skb
   )
   117. +{
   118. +       struct bonding *bond = netdev_priv(skb->dev);
   119. +       int ret;
   120. +
   121. +       bond->netpoll = np;
   122. +       ret = bond->dev->netdev_ops->ndo_start_xmit(skb, bond->dev);
   123. +       bond->netpoll = NULL;
   124. +
   125. +       return ret;
   126. +}
   127. +#endif
   128. +
   129.  /*------------------------- Device initialization
   ---------------------------*/
   130.
   131.  static void bond_set_xmit_hash_policy(struct bonding *bond)
   132. @@ -4530,6 +4606,9 @@ static const struct net_device_ops
   bond_netdev_ops = {
   133.         .ndo_change_mtu         = bond_change_mtu,
   134.         .ndo_set_mac_address    = bond_set_mac_address,
   135.         .ndo_neigh_setup        = bond_neigh_setup,
   136. +       .ndo_netpoll_setup      = bond_netpoll_setup,
   137. +       .ndo_netpoll_start_xmit = bond_netpoll_start_xmit,
   138. +       .ndo_poll_controller    = bond_poll_controller,
   139.         .ndo_vlan_rx_register   = bond_vlan_rx_register,
   140.         .ndo_vlan_rx_add_vid    = bond_vlan_rx_add_vid,
   141.         .ndo_vlan_rx_kill_vid   = bond_vlan_rx_kill_vid,
   142. diff --git a/drivers/net/bonding/bonding.h
   b/drivers/net/bonding/bonding.h
   143. index 6290a50..563d28c 100644
   144. --- a/drivers/net/bonding/bonding.h
   145. +++ b/drivers/net/bonding/bonding.h
   146. @@ -18,6 +18,7 @@
   147.  #include <linux/timer.h>
   148.  #include <linux/proc_fs.h>
   149.  #include <linux/if_bonding.h>
   150. +#include <linux/netpoll.h>
   151.  #include <linux/kobject.h>
   152.  #include <linux/in6.h>
   153.  #include "bond_3ad.h"
   154. @@ -222,6 +223,10 @@ struct bonding {
   155.  #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
   156.         struct   in6_addr master_ipv6;
   157.  #endif
   158. +#ifdef CONFIG_NET_POLL_CONTROLLER
   159. +       struct   netpoll *netpoll;
   160. +#endif
   161. +
   162.  };
   163.
   164.  /**
   165. diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
   166. index d4a4d98..e1724b2 100644
   167. --- a/include/linux/netdevice.h
   168. +++ b/include/linux/netdevice.h
   169. @@ -52,6 +52,7 @@
   170.
   171.  struct vlan_group;
   172.  struct netpoll_info;
   173. +struct netpoll;
   174.  /* 802.11 specific */
   175.  struct wireless_dev;
   176.                                         /* source back-compat hooks
   */
   177. @@ -624,6 +625,10 @@ struct net_device_ops {
   178.                                                         unsigned
   short vid);
   179.  #ifdef CONFIG_NET_POLL_CONTROLLER
   180.  #define HAVE_NETDEV_POLL
   181. +       int                     (*ndo_netpoll_setup)(struct
   net_device *dev,
   182. +                                                    struct
   netpoll_info *npinfo);
   183. +       int                     (*ndo_netpoll_start_xmit)(struct
   netpoll *np,
   184. +                                                         struct
   sk_buff *skb);
   185.         void                    (*ndo_poll_controller)(struct
   net_device *dev);
   186.  #endif
   187.  #if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
   188. diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
   189. index 2524267..39d42e4 100644
   190. --- a/include/linux/netpoll.h
   191. +++ b/include/linux/netpoll.h
   192. @@ -34,7 +34,9 @@ struct netpoll_info {
   193.  };
   194.
   195.  void netpoll_poll(struct netpoll *np);
   196. +void netpoll_poll_dev(struct net_device *dev);
   197.  void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
   ;
   198. +void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb);
   199.  void netpoll_print_options(struct netpoll *np);
   200.  int netpoll_parse_options(struct netpoll *np, char *opt);
   201.  int netpoll_setup(struct netpoll *np);
   202. diff --git a/net/core/netpoll.c b/net/core/netpoll.c
   203. index 9675f31..3776b26 100644
   204. --- a/net/core/netpoll.c
   205. +++ b/net/core/netpoll.c
   206. @@ -174,9 +174,8 @@ static void service_arp_queue(struct
   netpoll_info *npi)
   207.         }
   208.  }
   209.
   210. -void netpoll_poll(struct netpoll *np)
   211. +void netpoll_poll_dev(struct net_device *dev)
   212.  {
   213. -       struct net_device *dev = np->dev;
   214.         const struct net_device_ops *ops;
   215.
   216.         if (!dev || !netif_running(dev))
   217. @@ -196,6 +195,11 @@ void netpoll_poll(struct netpoll *np)
   218.         zap_completion_queue();
   219.  }
   220.
   221. +void netpoll_poll(struct netpoll *np)
   222. +{
   223. +       netpoll_poll_dev(np->dev);
   224. +}
   225. +
   226.  static void refill_skbs(void)
   227.  {
   228.         struct sk_buff *skb;
   229. @@ -277,11 +281,11 @@ static int netpoll_owner_active(struct
   net_device *dev)
   230.         return 0;
   231.  }
   232.
   233. -static void netpoll_send_skb(struct netpoll *np, struct sk_buff
   *skb)
   234. +void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
   235.  {
   236.         int status = NETDEV_TX_BUSY;
   237.         unsigned long tries;
   238. -       struct net_device *dev = np->dev;
   239. +       struct net_device *dev = skb->dev;
   240.         const struct net_device_ops *ops = dev->netdev_ops;
   241.         struct netpoll_info *npinfo = np->dev->npinfo;
   242.
   243. @@ -303,7 +307,10 @@ static void netpoll_send_skb(struct netpoll
   *np, struct sk_buff *skb)
   244.                      tries > 0; --tries) {
   245.                         if (__netif_tx_trylock(txq)) {
   246.                                 if (!netif_tx_queue_stopped(txq)) {
   247. -                                       status = ops->ndo_start_xmit
   (skb, dev);
   248. +                                       if (
   ops->ndo_netpoll_start_xmit)
   249. +                                               status =
   ops->ndo_netpoll_start_xmit(np,skb);
   250. +                                       else
   251. +                                               status =
   ops->ndo_start_xmit(skb, dev);
   252.                                         if (status == NETDEV_TX_OK)
   253.                                                 txq_trans_update(txq
   );
   254.                                 }
   255. @@ -789,6 +796,9 @@ int netpoll_setup(struct netpoll *np)
   256.         /* avoid racing with NAPI reading npinfo */
   257.         synchronize_rcu();
   258.
   259. +       if (ndev->netdev_ops->ndo_netpoll_setup)
   260. +               ndev->netdev_ops->ndo_netpoll_setup(ndev, npinfo);
   261. +
   262.         return 0;
   263.
   264.   release:
   265. @@ -859,4 +869,6 @@ EXPORT_SYMBOL(netpoll_parse_options);
   266.  EXPORT_SYMBOL(netpoll_setup);
   267.  EXPORT_SYMBOL(netpoll_cleanup);
   268.  EXPORT_SYMBOL(netpoll_send_udp);
   269. +EXPORT_SYMBOL(netpoll_send_skb);
   270.  EXPORT_SYMBOL(netpoll_poll);
   271. +EXPORT_SYMBOL(netpoll_poll_dev);

^ permalink raw reply

* Re: [PATCH] atomic: add atomic_inc_not_zero_hint()
From: Christoph Lameter @ 2010-11-15 13:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Eric Dumazet, Andrew Morton, linux-kernel, David Miller, netdev,
	Arnaldo Carvalho de Melo, Ingo Molnar, Andi Kleen, Nick Piggin
In-Reply-To: <20101113222612.GD2825@linux.vnet.ibm.com>

On Sat, 13 Nov 2010, Paul E. McKenney wrote:

> On Fri, Nov 12, 2010 at 01:14:12PM -0600, Christoph Lameter wrote:
> >
> > prefetchw() would be too much overhead?
>
> No idea.  Where do you believe that prefetchw() should be added?

It is another way to get an exclusive cache line
for situations like this. No need to give a hint.


^ permalink raw reply

* Re: [PATCH] atomic: add atomic_inc_not_zero_hint()
From: Andi Kleen @ 2010-11-15 14:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul E. McKenney, Eric Dumazet, Andrew Morton, linux-kernel,
	David Miller, netdev, Arnaldo Carvalho de Melo, Ingo Molnar,
	Andi Kleen, Nick Piggin
In-Reply-To: <alpine.DEB.2.00.1011150756130.19175@router.home>

On Mon, Nov 15, 2010 at 07:57:10AM -0600, Christoph Lameter wrote:
> On Sat, 13 Nov 2010, Paul E. McKenney wrote:
> 
> > On Fri, Nov 12, 2010 at 01:14:12PM -0600, Christoph Lameter wrote:
> > >
> > > prefetchw() would be too much overhead?
> >
> > No idea.  Where do you believe that prefetchw() should be added?
> 
> It is another way to get an exclusive cache line
> for situations like this. No need to give a hint.

prefetchw doesn't work on Intel (or rather is equivalent to prefetch), 
for Intel you always need to explicitely write to get an exclusive
line.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply

* Re: [PATCH] atomic: add atomic_inc_not_zero_hint()
From: Christoph Lameter @ 2010-11-15 14:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Paul E. McKenney, Eric Dumazet, Andrew Morton, linux-kernel,
	David Miller, netdev, Arnaldo Carvalho de Melo, Ingo Molnar,
	Nick Piggin
In-Reply-To: <20101115140739.GJ7269@basil.fritz.box>

On Mon, 15 Nov 2010, Andi Kleen wrote:

> > It is another way to get an exclusive cache line
> > for situations like this. No need to give a hint.
>
> prefetchw doesn't work on Intel (or rather is equivalent to prefetch),
> for Intel you always need to explicitely write to get an exclusive
> line.

Argh. You mean x86. Itanium could do it and is also by Intel. Could you
please change that for x86 as well? Otherwise we will get more of these
weird code twisters.

^ permalink raw reply

* Re: [PATCH] atomic: add atomic_inc_not_zero_hint()
From: Eric Dumazet @ 2010-11-15 14:17 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul E. McKenney, Andrew Morton, linux-kernel, David Miller,
	netdev, Arnaldo Carvalho de Melo, Ingo Molnar, Andi Kleen,
	Nick Piggin
In-Reply-To: <alpine.DEB.2.00.1011150756130.19175@router.home>

Le lundi 15 novembre 2010 à 07:57 -0600, Christoph Lameter a écrit :
> On Sat, 13 Nov 2010, Paul E. McKenney wrote:
> 
> > On Fri, Nov 12, 2010 at 01:14:12PM -0600, Christoph Lameter wrote:
> > >
> > > prefetchw() would be too much overhead?
> >
> > No idea.  Where do you believe that prefetchw() should be added?
> 
> It is another way to get an exclusive cache line
> for situations like this. No need to give a hint.
> 

Exclusive access ? As soon as another cpu takes it again, you lose.

Its not really the same thing... Maybe you miss the 'hint' intention at
all. We know the probable value of the counter, we dont want to read it.

In fact, prefetchw() is useful when you can assert it many cycles before
the memory read you are going to perform [before the write]. On
contended cache lines, its a waste, because by the time your cpu is
going to read memory, then perform the atomic compare_and_exchange(), an
other cpu might have dirtied the location again. This is what we noticed
during Netfilter Workshop 2010 : A high performance cost at both
atomic_read() and atomic_cmpxchg(). We tried prefetchw() and it was a
performance drop. It was with only 16 cpus contending on neighbour
refcnt, and 5 millions frames per second (5 millions atomic increments,
5 millions atomic decrements)

prefetchw() should be used on very specific spots, when a cpu is going
to write into a private area (not potentially accessed by other cpus).
We use it for example in __alloc_skb(), a bit before memset().

By the way, atomic_inc_not_zero_hint() is less code than 
[prefetchw(), atomic_inc_not_zero()]. Using one instruction [cmpxchg]
with the memory pointer is better than three.  [prefetchw(), read(),
cmpxchg()], particularly if you have high contention on cache line.

^ permalink raw reply

* Re: [PATCH] tcp: restrict net.ipv4.tcp_adv_min_scale (#20312)
From: Ben Hutchings @ 2010-11-15 14:18 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Eric Dumazet, davem, shemminger, netdev
In-Reply-To: <20101114201458.GA28181@core2.telecom.by>

On Sun, 2010-11-14 at 22:14 +0200, Alexey Dobriyan wrote:
> On Sun, Nov 14, 2010 at 08:49:43PM +0100, Eric Dumazet wrote:
> > > +static int _minus_31 = -31;
> > > +static int _31 = 31;
> > 
> > Please use normal symbols, not starting by underscore.
> 
> static int thirty_one = 31? :-)

How about, oh, I don't know, adv_win_scale_{min,max}?

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox