* Re: linux-next: build failure after merge of the final tree (net tree related)
From: David Miller @ 2010-04-27 17:18 UTC (permalink / raw)
To: sfr; +Cc: netdev, linux-next, linux-kernel, yoshfuji
In-Reply-To: <20100427.093430.258110898.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Tue, 27 Apr 2010 09:34:30 -0700 (PDT)
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Tue, 27 Apr 2010 15:25:16 +1000
>
>> After merging the bkl-ioctl tree, today's linux-next build (powerpc
>> ppc44x_defconfig) failed like this:
>>
>> net/bridge/br_multicast.c: In function 'br_ip6_multicast_alloc_query':
>> net/bridge/br_multicast.c:469: error: implicit declaration of function 'csum_ipv6_magic'
>>
>> Introduced by commit 08b202b6726459626c73ecfa08fcdc8c3efc76c2 ("bridge
>> br_multicast: IPv6 MLD support") from the net tree.
>>
>> csum_ipv6_magic is declared in net/ip6_checksum.h ...
>
> Bummer, powerpc is one of the few platforms that doesn't get the header
> file implicitly so you always trip over this whereas we never see it in
> x86 and sparc64 builds :-)
>
> I'll fix this, thanks!
I just committed the following for this:
bridge: Fix build of ipv6 multicast code.
Based upon a report from Stephen Rothwell:
--------------------
net/bridge/br_multicast.c: In function 'br_ip6_multicast_alloc_query':
net/bridge/br_multicast.c:469: error: implicit declaration of function 'csum_ipv6_magic'
Introduced by commit 08b202b6726459626c73ecfa08fcdc8c3efc76c2 ("bridge
br_multicast: IPv6 MLD support") from the net tree.
csum_ipv6_magic is declared in net/ip6_checksum.h ...
--------------------
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/bridge/br_multicast.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index e481dbd..2048ef0 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -28,6 +28,7 @@
#include <net/ipv6.h>
#include <net/mld.h>
#include <net/addrconf.h>
+#include <net/ip6_checksum.h>
#endif
#include "br_private.h"
--
1.7.0.4
^ permalink raw reply related
* Re: [PATCH] bnx2x: add support for receive hashing
From: Eric Dumazet @ 2010-04-27 17:13 UTC (permalink / raw)
To: David Miller; +Cc: bmb, therbert, netdev, rick.jones2
In-Reply-To: <20100427.095108.68126984.davem@davemloft.net>
Le mardi 27 avril 2010 à 09:51 -0700, David Miller a écrit :
> From: Brian Bloniarz <bmb@athenacr.com>
> Date: Tue, 27 Apr 2010 09:37:11 -0400
>
> > David Miller wrote:
> >> How damn hard is it to add two 16-bit ports to the hash regardless of
> >> protocol?
> >>
> > Come to think of it, for UDP the hash must ignore
> > the srcport and srcaddr, because a single bound
> > socket is going to wildcard both those fields.
>
> For load distribution we don't care if the local socket is wildcard
> bounded on source.
>
> It's going to be fully specified in the packet, and that's enough.
>
> Sure, for full RFS some amends might be necessary in this area, but
> for RPS and adapter based hw steering, using all of the ports is
> entirely sufficient.
Well well well...
I was doing some pktgen tests, with :
pgset "src_min 192.168.0.10"
pgset "src_max 192.168.0.110"
pgset "dst_min 192.168.0.2"
pgset "dst_max 192.168.0.2"
pgset "udp_dst_min 4000"
pgset "udp_dst_max 4000"
So I simulate 100 remote IPS bombarding a single port on target machine.
pktgen injects about 930.000 pps
sofirq of my target received on cpu0, and RPS spread packets to 7 other
cpus.
And my receiver is stuck (he can read about 50 pps !!!)
As soon as I disable rps, my receiver can catch 850.000 pps
RPS OFF: perf top of cpu 0
------------------------------------------------------------------------------------------------------------------------------
PerfTop: 1001 irqs/sec kernel:100.0% [1000Hz cycles], (all, cpu: 0)
------------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ______________________ _______
385.00 10.2% __udp4_lib_lookup vmlinux
322.00 8.5% ip_route_input vmlinux
312.00 8.3% sock_queue_rcv_skb vmlinux
262.00 6.9% do_raw_spin_lock vmlinux
251.00 6.6% __alloc_skb vmlinux
239.00 6.3% sock_put vmlinux
207.00 5.5% eth_type_trans vmlinux
202.00 5.4% __slab_alloc vmlinux
159.00 4.2% __kmalloc_track_caller vmlinux
149.00 3.9% __sk_mem_schedule vmlinux
125.00 3.3% kmem_cache_alloc vmlinux
116.00 3.1% ipt_do_table vmlinux
115.00 3.0% do_raw_read_lock vmlinux
71.00 1.9% tg3_poll_work vmlinux
65.00 1.7% __netdev_alloc_skb vmlinux
64.00 1.7% skb_pull vmlinux
58.00 1.5% ip_rcv vmlinux
58.00 1.5% __slab_free vmlinux
53.00 1.4% udp_queue_rcv_skb vmlinux
47.00 1.2% nf_iterate vmlinux
44.00 1.2% __netif_receive_skb vmlinux
29.00 0.8% sock_def_readable vmlinux
28.00 0.7% do_raw_spin_unlock vmlinux
26.00 0.7% kfree vmlinux
25.00 0.7% __udp4_lib_rcv vmlinux
24.00 0.6% ip_rcv_finish vmlinux
24.00 0.6% __list_add vmlinux
RPS, on, a perf top of a slave CPU :
------------------------------------------------------------------------------------------------------------------------------
PerfTop: 1000 irqs/sec kernel:100.0% [1000Hz cycles], (all, cpu: 1)
------------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________ _______
2411.00 62.0% do_raw_spin_lock vmlinux
690.00 17.7% delay_tsc vmlinux
234.00 6.0% __udp4_lib_lookup vmlinux
174.00 4.5% sock_put vmlinux
72.00 1.9% ip_rcv vmlinux
51.00 1.3% __netif_receive_skb vmlinux
43.00 1.1% do_raw_spin_unlock vmlinux
39.00 1.0% __delay vmlinux
38.00 1.0% sock_queue_rcv_skb vmlinux
36.00 0.9% udp_queue_rcv_skb vmlinux
31.00 0.8% ip_route_input vmlinux
15.00 0.4% __slab_free vmlinux
12.00 0.3% ipt_do_table vmlinux
11.00 0.3% skb_release_data vmlinux
7.00 0.2% kfree vmlinux
5.00 0.1% nf_iterate vmlinux
So we have a BIG problem :
All cpus are fighting to get the socket lock,
and very litle progress is done.
Note this problem has nothing to do with RPS, we could have
it with multiqueue as well.
Oh well...
^ permalink raw reply
* [PATCH net-next] bridge: multicast router list manipulation
From: Stephen Hemminger @ 2010-04-27 17:13 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev
In-Reply-To: <E1NlbuX-00021m-Bq@gondolin.me.apana.org.au>
I prefer that the hlist be only accessed through the hlist macro
objects. Explicit twiddling of links (especially with RCU) exposes
the code to future bugs.
Compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/net/bridge/br_multicast.c 2010-04-27 09:54:02.180531924 -0700
+++ b/net/bridge/br_multicast.c 2010-04-27 10:07:19.188688664 -0700
@@ -1041,21 +1041,21 @@ static int br_ip6_multicast_mld2_report(
static void br_multicast_add_router(struct net_bridge *br,
struct net_bridge_port *port)
{
- struct hlist_node *p;
- struct hlist_node **h;
+ struct net_bridge_port *p;
+ struct hlist_node *n, *last = NULL;
- for (h = &br->router_list.first;
- (p = *h) &&
- (unsigned long)container_of(p, struct net_bridge_port, rlist) >
- (unsigned long)port;
- h = &p->next)
- ;
-
- port->rlist.pprev = h;
- port->rlist.next = p;
- rcu_assign_pointer(*h, &port->rlist);
- if (p)
- p->pprev = &port->rlist.next;
+ hlist_for_each_entry(p, n, &br->router_list, rlist) {
+ if ((unsigned long) port >= (unsigned long) p) {
+ hlist_add_before_rcu(n, &port->rlist);
+ return;
+ }
+ last = n;
+ }
+
+ if (last)
+ hlist_add_after_rcu(last, &port->rlist);
+ else
+ hlist_add_head_rcu(&port->rlist, &br->router_list);
}
static void br_multicast_mark_router(struct net_bridge *br,
^ permalink raw reply
* [PATCH net-next] bridge: use is_multicast_ether_addr
From: Stephen Hemminger @ 2010-04-27 17:13 UTC (permalink / raw)
To: Herbert Xu, David S. Miller; +Cc: netdev
In-Reply-To: <E1NlbuW-00021d-8E@gondolin.me.apana.org.au>
Use existing inline function.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/net/bridge/br_device.c 2010-04-27 09:49:30.059258391 -0700
+++ b/net/bridge/br_device.c 2010-04-27 09:50:21.439878721 -0700
@@ -36,7 +36,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *
skb_reset_mac_header(skb);
skb_pull(skb, ETH_HLEN);
- if (dest[0] & 1) {
+ if (is_multicast_ether_addr(dest)) {
if (br_multicast_rcv(br, NULL, skb))
goto out;
^ permalink raw reply
* Re: [PATCH] bnx2x: add support for receive hashing
From: David Miller @ 2010-04-27 17:06 UTC (permalink / raw)
To: bmb; +Cc: therbert, eric.dumazet, netdev, rick.jones2
In-Reply-To: <4BD71890.2050606@athenacr.com>
From: Brian Bloniarz <bmb@athenacr.com>
Date: Tue, 27 Apr 2010 13:02:08 -0400
> Maybe I'm misunderstanding... won't it distribute the
> packet handling load to multiple cores, but then all
> those cores will contend trying to deliver those packets
> to the single socket?
>
> I was assuming that this'd be a net loss over just doing
> all the protocol handling on a single core. I haven't
> done any benchmarks yet.
Whether it's a new loss depends upon the application.
Also, on the non-application side f.e. a router or firewall, this is
exactly the behavior you want.
^ permalink raw reply
* Re: [PATCH 0/4] net: ipmr netlink interface for route dumping
From: David Miller @ 2010-04-27 17:03 UTC (permalink / raw)
To: Patrick, McHardy, kaber; +Cc: netdev
In-Reply-To: <1272374785-3858-1-git-send-email-kaber@trash.net>
Whoa, there are three of you now?!?!?!
:-)
^ permalink raw reply
* Re: [PATCH] bnx2x: add support for receive hashing
From: Brian Bloniarz @ 2010-04-27 17:02 UTC (permalink / raw)
To: David Miller; +Cc: therbert, eric.dumazet, netdev, rick.jones2
In-Reply-To: <20100427.095108.68126984.davem@davemloft.net>
David Miller wrote:
> From: Brian Bloniarz <bmb@athenacr.com>
> Date: Tue, 27 Apr 2010 09:37:11 -0400
>
>> David Miller wrote:
>>> How damn hard is it to add two 16-bit ports to the hash regardless of
>>> protocol?
>>>
>> Come to think of it, for UDP the hash must ignore
>> the srcport and srcaddr, because a single bound
>> socket is going to wildcard both those fields.
>
> For load distribution we don't care if the local socket is wildcard
> bounded on source.
>
> It's going to be fully specified in the packet, and that's enough.
Maybe I'm misunderstanding... won't it distribute the
packet handling load to multiple cores, but then all
those cores will contend trying to deliver those packets
to the single socket?
I was assuming that this'd be a net loss over just doing
all the protocol handling on a single core. I haven't
done any benchmarks yet.
^ permalink raw reply
* Re: [patch v2] sctp: cleanup: remove duplicate assignment
From: David Miller @ 2010-04-27 16:58 UTC (permalink / raw)
To: vladislav.yasevich
Cc: error27, sri, yjwei, cdischino, linux-sctp, netdev,
kernel-janitors
In-Reply-To: <4BD6F582.9030804@hp.com>
From: Vlad Yasevich <vladislav.yasevich@hp.com>
Date: Tue, 27 Apr 2010 10:32:34 -0400
>
>
> Dan Carpenter wrote:
>> This assignment isn't needed because we did it earlier already.
>>
>> Also another reason to delete the assignment is because it triggers a
>> Smatch warning about checking for NULL pointers after a dereference.
>>
>> Reported-by: Vlad Yasevich <vladislav.yasevich@hp.com>
>> Signed-off-by: Dan Carpenter <error27@gmail.com>
>
> Thanks. I'll take this one.
And when will I get this from you? :-)
^ permalink raw reply
* Re: [net-2.6 PATCH] ixgbevf: Fix link speed display
From: David Miller @ 2010-04-27 16:57 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, gregory.v.rose
In-Reply-To: <20100427103834.23338.22213.stgit@localhost.localdomain>
Not appropriate this late in the -RC series.
I don't see this in the regression list, and reported link speed being
incorrect is not a catastropic crash and/or failure.
^ permalink raw reply
* Re: [net-2.6 PATCH] ixgbe: cleanup ethtool autoneg input
From: David Miller @ 2010-04-27 16:56 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, donald.c.skidmore
In-Reply-To: <20100427112513.24196.70511.stgit@localhost.localdomain>
This is also not appropriate for net-2.6, it doesn't fix a
regression in the regression list and it doesn't fix a catastropic
crash or failure.
You really have to be kidding me if you thing a patch like this
is fine this late in the -RC series.
You also aren't even numbering your patches, which is quite a
transgression when a submission of a set of patches all to the same
driver and/or files.
^ permalink raw reply
* Re: [net-2.6 PATCH] ixgbe: Properly display 1 gig downshift warning for backplane
From: David Miller @ 2010-04-27 16:54 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, anjali.singhai
In-Reply-To: <20100427103724.23338.53872.stgit@localhost.localdomain>
Not fixing a regression, not fixing a catastropic error, just adding
a warning message.
Therefore not appropriate for net-2.6
^ permalink raw reply
* Re: [PATCH] bnx2x: add support for receive hashing
From: David Miller @ 2010-04-27 16:51 UTC (permalink / raw)
To: bmb; +Cc: therbert, eric.dumazet, netdev, rick.jones2
In-Reply-To: <4BD6E887.3000804@athenacr.com>
From: Brian Bloniarz <bmb@athenacr.com>
Date: Tue, 27 Apr 2010 09:37:11 -0400
> David Miller wrote:
>> How damn hard is it to add two 16-bit ports to the hash regardless of
>> protocol?
>>
> Come to think of it, for UDP the hash must ignore
> the srcport and srcaddr, because a single bound
> socket is going to wildcard both those fields.
For load distribution we don't care if the local socket is wildcard
bounded on source.
It's going to be fully specified in the packet, and that's enough.
Sure, for full RFS some amends might be necessary in this area, but
for RPS and adapter based hw steering, using all of the ports is
entirely sufficient.
^ permalink raw reply
* Re: linux-next: build failure after merge of the final tree (net tree related)
From: David Miller @ 2010-04-27 16:34 UTC (permalink / raw)
To: sfr; +Cc: netdev, linux-next, linux-kernel, yoshfuji
In-Reply-To: <20100427152516.580ee620.sfr@canb.auug.org.au>
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Tue, 27 Apr 2010 15:25:16 +1000
> After merging the bkl-ioctl tree, today's linux-next build (powerpc
> ppc44x_defconfig) failed like this:
>
> net/bridge/br_multicast.c: In function 'br_ip6_multicast_alloc_query':
> net/bridge/br_multicast.c:469: error: implicit declaration of function 'csum_ipv6_magic'
>
> Introduced by commit 08b202b6726459626c73ecfa08fcdc8c3efc76c2 ("bridge
> br_multicast: IPv6 MLD support") from the net tree.
>
> csum_ipv6_magic is declared in net/ip6_checksum.h ...
Bummer, powerpc is one of the few platforms that doesn't get the header
file implicitly so you always trip over this whereas we never see it in
x86 and sparc64 builds :-)
I'll fix this, thanks!
^ permalink raw reply
* Re: [PATCH] RCU: don't turn off lockdep when find suspicious rcu_dereference_check() usage
From: Eric Dumazet @ 2010-04-27 16:33 UTC (permalink / raw)
To: paulmck
Cc: Eric W. Biederman, Miles Lane, Vivek Goyal, Eric Paris,
Lai Jiangshan, Ingo Molnar, Peter Zijlstra, LKML, nauman, netdev,
Jens Axboe, Gui Jianfeng, Li Zefan, Johannes Berg
In-Reply-To: <20100427162201.GA5826@linux.vnet.ibm.com>
Le mardi 27 avril 2010 à 09:22 -0700, Paul E. McKenney a écrit :
> On Mon, Apr 26, 2010 at 09:27:44PM -0700, Paul E. McKenney wrote:
> > On Mon, Apr 26, 2010 at 11:35:10AM -0700, Eric W. Biederman wrote:
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> > >
> > > > Eric Dumazet traced these down to a commit from Eric Biederman.
> > > >
> > > > If I don't hear from Eric Biederman in a few days, I will attempt a
> > > > patch, but it would be more likely to be correct coming from someone
> > > > with a better understanding of the code. ;-)
> > >
> > > I already replied.
> > >
> > > http://lkml.org/lkml/2010/4/21/420
> >
> > You did indeed!!! This experience is giving me an even better appreciation
> > of the maintainers' ability to keep all their patches straight!
> >
> > I will put together something based on your suggestion.
>
> How about the following?
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 85fa42bd568ab99c375f018761ae6345249942cd
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date: Mon Apr 26 21:40:05 2010 -0700
>
> net: suppress RCU lockdep false positive in twsk_net()
>
> Calls to twsk_net() are in some cases protected by reference counting
> as an alternative to RCU protection. Cases covered by reference counts
> include __inet_twsk_kill(), inet_twsk_free(), inet_twdr_do_twkill_work(),
> inet_twdr_twcal_tick(), and tcp_timewait_state_process(). RCU is used
> by inet_twsk_purge(). Locking is used by established_get_first()
> and established_get_next(). Finally, __inet_twsk_hashdance() is an
> initialization case.
>
> It appears to be non-trivial to locate the appropriate locks and
> reference counts from within twsk_net(), so used rcu_dereference_raw().
>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
> diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
> index 79f67ea..a066fdd 100644
> --- a/include/net/inet_timewait_sock.h
> +++ b/include/net/inet_timewait_sock.h
> @@ -224,7 +224,9 @@ static inline
> struct net *twsk_net(const struct inet_timewait_sock *twsk)
> {
> #ifdef CONFIG_NET_NS
> - return rcu_dereference(twsk->tw_net);
> + return rcu_dereference_raw(twsk->tw_net); /* protected by locking, */
> + /* reference counting, */
> + /* initialization, or RCU. */
> #else
> return &init_net;
> #endif
> --
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
^ permalink raw reply
* Re: [PATCH 0/3] [RFC] ptp: IEEE 1588 clock support
From: Wolfgang Grandegger @ 2010-04-27 16:20 UTC (permalink / raw)
To: Richard Cochran; +Cc: netdev
In-Reply-To: <4BD6E837.2040505@grandegger.com>
Wolfgang Grandegger wrote:
> Hi Richard,
>
> Richard Cochran wrote:
>> BTW,
>>
>> To show how the API works from the user space, I also patches For the
>> popular ptpd program to that project.
>>
>> https://sourceforge.net/tracker/?func=detail&aid=2992845&group_id=139814&atid=744634
>
> Cool stuff, I'm giving it a try on my MPC8313 PTP setup.
Do you have also a patch adding support for hardware timestamping to ptpd?
Thanks,
Wolfgang.
^ permalink raw reply
* Re: [PATCH] RCU: don't turn off lockdep when find suspicious rcu_dereference_check() usage
From: Paul E. McKenney @ 2010-04-27 16:22 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
Peter Zijlstra, LKML, nauman, eric.dumazet, netdev, Jens Axboe,
Gui Jianfeng, Li Zefan, Johannes Berg
In-Reply-To: <20100427042744.GD2510@linux.vnet.ibm.com>
On Mon, Apr 26, 2010 at 09:27:44PM -0700, Paul E. McKenney wrote:
> On Mon, Apr 26, 2010 at 11:35:10AM -0700, Eric W. Biederman wrote:
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:
> >
> > > Eric Dumazet traced these down to a commit from Eric Biederman.
> > >
> > > If I don't hear from Eric Biederman in a few days, I will attempt a
> > > patch, but it would be more likely to be correct coming from someone
> > > with a better understanding of the code. ;-)
> >
> > I already replied.
> >
> > http://lkml.org/lkml/2010/4/21/420
>
> You did indeed!!! This experience is giving me an even better appreciation
> of the maintainers' ability to keep all their patches straight!
>
> I will put together something based on your suggestion.
How about the following?
Thanx, Paul
------------------------------------------------------------------------
commit 85fa42bd568ab99c375f018761ae6345249942cd
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Mon Apr 26 21:40:05 2010 -0700
net: suppress RCU lockdep false positive in twsk_net()
Calls to twsk_net() are in some cases protected by reference counting
as an alternative to RCU protection. Cases covered by reference counts
include __inet_twsk_kill(), inet_twsk_free(), inet_twdr_do_twkill_work(),
inet_twdr_twcal_tick(), and tcp_timewait_state_process(). RCU is used
by inet_twsk_purge(). Locking is used by established_get_first()
and established_get_next(). Finally, __inet_twsk_hashdance() is an
initialization case.
It appears to be non-trivial to locate the appropriate locks and
reference counts from within twsk_net(), so used rcu_dereference_raw().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index 79f67ea..a066fdd 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -224,7 +224,9 @@ static inline
struct net *twsk_net(const struct inet_timewait_sock *twsk)
{
#ifdef CONFIG_NET_NS
- return rcu_dereference(twsk->tw_net);
+ return rcu_dereference_raw(twsk->tw_net); /* protected by locking, */
+ /* reference counting, */
+ /* initialization, or RCU. */
#else
return &init_net;
#endif
^ permalink raw reply related
* Re: [PATCH net-next-2.6] net: fix a lockdep rcu warning in __sk_dst_set()
From: Paul E. McKenney @ 2010-04-27 16:17 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1272350443.4861.9.camel@edumazet-laptop>
On Tue, Apr 27, 2010 at 08:40:43AM +0200, Eric Dumazet wrote:
> __sk_dst_set() might be called while no state can be integrated in a
> rcu_dereference_check() condition.
>
> So use rcu_dereference_raw() to shutup lockdep warnings (if
> CONFIG_PROVE_RCU is set)
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 86a8ca1..94dbdbc 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1236,8 +1236,11 @@ __sk_dst_set(struct sock *sk, struct dst_entry *dst)
> struct dst_entry *old_dst;
>
> sk_tx_queue_clear(sk);
> - old_dst = rcu_dereference_check(sk->sk_dst_cache,
> - lockdep_is_held(&sk->sk_dst_lock));
> + /*
> + * This can be called while sk is owned by the caller only,
> + * with no state that can be checked in a rcu_dereference_check() cond
> + */
> + old_dst = rcu_dereference_raw(sk->sk_dst_cache);
> rcu_assign_pointer(sk->sk_dst_cache, dst);
> dst_release(old_dst);
> }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH RFC: linux-next 1/2] irq: Add CPU mask affinity hint callback framework
From: Peter P Waskiewicz Jr @ 2010-04-27 16:04 UTC (permalink / raw)
To: Thomas Gleixner
Cc: davem@davemloft.net, arjan@linux.jf.intel.com,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <alpine.LFD.2.00.1004271425140.2951@localhost.localdomain>
On Tue, 27 Apr 2010, Thomas Gleixner wrote:
> On Sun, 18 Apr 2010, Peter P Waskiewicz Jr wrote:
>> +/**
>> + * struct irqaffinityhint - per interrupt affinity helper
>> + * @callback: device driver callback function
>> + * @dev: reference for the affected device
>> + * @irq: interrupt number
>> + */
>> +struct irqaffinityhint {
>> + irq_affinity_hint_t callback;
>> + void *dev;
>> + int irq;
>> +};
>
> Why do you need that extra data structure ? The device and the irq
> number are known, so all you need is the callback itself. So no need
> for allocating memory ....
When I register the function callback with the interrupt layer, I need to
know what device structures to reference back in the driver. In other
words, if I call into an underlying driver with just an interrupt number,
then I have no way at getting at the dev structures (netdevice for me,
plus my private adapter structures), unless I declare them globally
(yuck).
I had a different approach before this one where I assumed the device from
the irq handler callback was safe to use for the device in this new
callback. I didn't feel really great about that, since it's an implicit
assumption that could cause things to go sideways really quickly.
Let me know what you think either way. I'm certainly willing to make a
change, I just don't know at this point what's the safest approach from
what I currently have.
>
>> +static ssize_t irq_affinity_hint_proc_write(struct file *file,
>> + const char __user *buffer, size_t count, loff_t *pos)
>> +{
>> + /* affinity_hint is read-only from proc */
>> + return -EOPNOTSUPP;
>> +}
>> +
>
> Why do you want a write function when the file is read only ?
It's leftover paranoia. I put it in early on, then changed the mode
later. I can remove this function. I'll re-send something once we agree
on how the code in your first comment should look.
Thanks Thomas!
-PJ
^ permalink raw reply
* Re: IPv6: race condition in __ipv6_ifa_notify() and dst_free() ?
From: Herbert Xu @ 2010-04-27 15:55 UTC (permalink / raw)
To: Jiri Bohac; +Cc: David Miller, yoshfuji, netdev, shemminger
In-Reply-To: <20100427155034.GA11157@midget.suse.cz>
On Tue, Apr 27, 2010 at 05:50:34PM +0200, Jiri Bohac wrote:
>
> I think that what Herbert is doing is only going to enforce that
> the ip6_del_rt() is never going to fail, so the dst_free()
> won't ever be called anyway, right?
Yes I think you're right. But let's fix the bigger problem first,
and then we can audit this and possibly turn it into a WARN_ON.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: IPv6: race condition in __ipv6_ifa_notify() and dst_free() ?
From: Jiri Bohac @ 2010-04-27 15:50 UTC (permalink / raw)
To: David Miller; +Cc: jbohac, herbert, yoshfuji, netdev, shemminger
In-Reply-To: <20100422.185400.71096585.davem@davemloft.net>
On Thu, Apr 22, 2010 at 06:54:00PM -0700, David Miller wrote:
> From: Jiri Bohac <jbohac@suse.cz>
> Date: Thu, 22 Apr 2010 17:49:08 +0200
>
> > I still don't see why __ipv6_ifa_notify() needs to call
> > dst_free(). Shouldn't that be dst_release() instead, to drop the
> > reference obtained by dst_hold(&ifp->rt->u.dst)?
>
> It likely wants to do both.
>
> Just doing dst_release() doesn't mark the 'dst' object as obsolete,
> and therefore it won't get force garbage collected.
Sure. So If I understand it correctly, there are two problems:
- the reference taken by dst_hold() just above the ip6_del_rt()
is never dropped if ip6_del_rt() fails; so shouldn't the code
be like this?:
dst_hold(&ifp->rt->u.dst);
if (ip6_del_rt(ifp->rt)) {
dst_release(&ifp->rt->u.dst);
dst_free(&ifp->rt->u.dst);
}
- if ip6_del_rt() fails because it races with something else
deleting the address, dst_free() will be called twice. This is
what Herbert is fixing with additional locking. However -- even
when he fixes that -- how can ip6_del_rt() fail with the
ifp->rt still needing a dst_free()?
AFAICS, it can fail by:
- __ip6_del_rt() finding that (rt ==
net->ipv6.ip6_null_entry); we don't want to call
dst_free() on net->ipv6.ip6_null_entry, do we?
- fib6_del() returning -ENOENT for multiple reasons;
but doesn't that mean that something else has removed
the route already and called dst_free on it?
In either case, the dst_free() looks like not being needed. And it only
does no harm in most cases, because these events are rare and
it usually finds (obsolete > 2) and does nothing.
I think that what Herbert is doing is only going to enforce that
the ip6_del_rt() is never going to fail, so the dst_free()
won't ever be called anyway, right?
Thanks,
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ
^ permalink raw reply
* Re: [PATCH] bnx2x: add support for receive hashing
From: Brian Bloniarz @ 2010-04-27 15:44 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, therbert, netdev, rick.jones2
In-Reply-To: <1272378659.2295.193.camel@edumazet-laptop>
Eric Dumazet wrote:
> Le mardi 27 avril 2010 à 09:37 -0400, Brian Bloniarz a écrit :
>> David Miller wrote:
>>> How damn hard is it to add two 16-bit ports to the hash regardless of
>>> protocol?
>>>
>> Come to think of it, for UDP the hash must ignore
>> the srcport and srcaddr, because a single bound
>> socket is going to wildcard both those fields.
>>
>
> For your application maybe ;)
>
> Here, I have thousand of RTP flows to big mediagateways, so the
> (srcaddr, dstaddr) is shared by all these flows.
>
> Tom Herbert also wants a threaded DNS server.
>
> For UDP, we could have a bitmap (system level ?) to say if a particular
> destination port wants multi-cpu (RPS) spreading or not, even if the NIC
> decided to use a single queue to submit frames to the host (ie mask the
> src_addr and src_port). In this case, RFS on non conected UDP sockets
> could be activated as well.
>
> Or just a global sysctl to be able to mask the src_addr and/or src_port
> in our software rxhash.
This is all good to know. Here are 3 other alternatives/suggestions:
1) Leave things as they are, if it really hurts that badly
for our workload we can just disable RPS entirely.
2) Add a global sysctl to disable RPS for datagram-based protocols.
3) Pluggable SW rxhashes :)
I haven't benchmarked anything for our workloads yet, so
I can't say for sure it's even a big issue.
Has anyone benchmarked RPS + single-threaded DNS servers? It'd
be a pessimization in that case, right? Even the multi-threaded
DNS server wouldn't be workable unless there was something like
the soreuseport patch you & Tom had been been discussing.
^ permalink raw reply
* Re: linux-next: manual merge of the tty tree with the net tree
From: Greg KH @ 2010-04-27 15:14 UTC (permalink / raw)
To: Stephen Rothwell
Cc: linux-next, linux-kernel, Sjur Braendeland, David Miller, netdev,
Pavan Savoy
In-Reply-To: <20100427135706.259fa9c9.sfr@canb.auug.org.au>
On Tue, Apr 27, 2010 at 01:57:06PM +1000, Stephen Rothwell wrote:
> Hi Greg,
>
> Today's linux-next merge of the tty tree got a conflict in
> include/linux/tty.h between commit
> 9b27105b4a44c54bf91ecd7d0315034ae75684f7 ("net-caif-driver: add CAIF
> serial driver (ldisc)") from the net tree and commit
> f3150d54db0e0caa1871f66b2424f463a34a7e7b ("serial: TTY: new ldiscs for
> staging") from the tty tree.
>
> I fixed it up (they both changed NR_LDISCS - I used the higher number) and
> can carry the fix as necessary.
Thanks, the higher number should be what is needed here.
greg k-h
^ permalink raw reply
* Re: linux-next: manual merge of the staging-next tree with the net tree
From: Greg KH @ 2010-04-27 15:13 UTC (permalink / raw)
To: Stephen Rothwell
Cc: linux-next, linux-kernel, Jiri Pirko, David Miller, netdev,
John Sheehan
In-Reply-To: <20100427142655.49673ff1.sfr@canb.auug.org.au>
On Tue, Apr 27, 2010 at 02:26:55PM +1000, Stephen Rothwell wrote:
> Hi Greg,
>
> Today's linux-next merge of the staging-next tree got a conflict in
> drivers/staging/arlan/arlan-main.c between commit
> 22bedad3ce112d5ca1eaf043d4990fa2ed698c87 ("net: convert multicast list to
> list_head") from the net tree and commit
> 8db2022b08e232bb237b2162f03ff5f6e7c0c35e ("staging: arlan: fix errors
> reported by checkpatch.pl tool") from the staging-next tree.
>
> I fixed it up (see below) and can carry the fix as necessary.
That looks fine, thanks.
greg k-h
^ permalink raw reply
* Re: [patch v2] sctp: cleanup: remove duplicate assignment
From: Vlad Yasevich @ 2010-04-27 14:32 UTC (permalink / raw)
To: Dan Carpenter
Cc: Sridhar Samudrala, David S. Miller, Wei Yongjun, Chris Dischino,
linux-sctp, netdev, kernel-janitors
In-Reply-To: <20100424171805.GO29093@bicker>
Dan Carpenter wrote:
> This assignment isn't needed because we did it earlier already.
>
> Also another reason to delete the assignment is because it triggers a
> Smatch warning about checking for NULL pointers after a dereference.
>
> Reported-by: Vlad Yasevich <vladislav.yasevich@hp.com>
> Signed-off-by: Dan Carpenter <error27@gmail.com>
Thanks. I'll take this one.
-vlad
> ---
> Thanks Vlad. I came so close to seeing that myself if only I had openned
> my eyes a tiny bit more. :P
>
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 17cb400..33aed1c 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -419,10 +419,17 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
> if (!retval)
> goto nomem_chunk;
>
> - /* Per the advice in RFC 2960 6.4, send this reply to
> - * the source of the INIT packet.
> + /* RFC 2960 6.4 Multi-homed SCTP Endpoints
> + *
> + * An endpoint SHOULD transmit reply chunks (e.g., SACK,
> + * HEARTBEAT ACK, * etc.) to the same destination transport
> + * address from which it received the DATA or control chunk
> + * to which it is replying.
> + *
> + * [INIT ACK back to where the INIT came from.]
> */
> retval->transport = chunk->transport;
> +
> retval->subh.init_hdr =
> sctp_addto_chunk(retval, sizeof(initack), &initack);
> retval->param_hdr.v = sctp_addto_chunk(retval, addrs_len, addrs.v);
> @@ -461,18 +468,6 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
> /* We need to remove the const qualifier at this point. */
> retval->asoc = (struct sctp_association *) asoc;
>
> - /* RFC 2960 6.4 Multi-homed SCTP Endpoints
> - *
> - * An endpoint SHOULD transmit reply chunks (e.g., SACK,
> - * HEARTBEAT ACK, * etc.) to the same destination transport
> - * address from which it received the DATA or control chunk
> - * to which it is replying.
> - *
> - * [INIT ACK back to where the INIT came from.]
> - */
> - if (chunk)
> - retval->transport = chunk->transport;
> -
> nomem_chunk:
> kfree(cookie);
> nomem_cookie:
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [PATCH] bnx2x: add support for receive hashing
From: Eric Dumazet @ 2010-04-27 14:30 UTC (permalink / raw)
To: Brian Bloniarz; +Cc: David Miller, therbert, netdev, rick.jones2
In-Reply-To: <4BD6E887.3000804@athenacr.com>
Le mardi 27 avril 2010 à 09:37 -0400, Brian Bloniarz a écrit :
> David Miller wrote:
> > How damn hard is it to add two 16-bit ports to the hash regardless of
> > protocol?
> >
> Come to think of it, for UDP the hash must ignore
> the srcport and srcaddr, because a single bound
> socket is going to wildcard both those fields.
>
For your application maybe ;)
Here, I have thousand of RTP flows to big mediagateways, so the
(srcaddr, dstaddr) is shared by all these flows.
Tom Herbert also wants a threaded DNS server.
For UDP, we could have a bitmap (system level ?) to say if a particular
destination port wants multi-cpu (RPS) spreading or not, even if the NIC
decided to use a single queue to submit frames to the host (ie mask the
src_addr and src_port). In this case, RFS on non conected UDP sockets
could be activated as well.
Or just a global sysctl to be able to mask the src_addr and/or src_port
in our software rxhash.
> So the best we can hope for is for the hash to
> include destport and destaddr? From looking at
> my BCM5709's multiq behavior, I think broadcom's
> hash includes the destaddr but not the destport.
Toepliz hash for UDP is a hash(src_addr, dst_addr)
Both src port and dst port are ignored.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox