Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next-2.6] net: sk_sleep() helper
From: David Miller @ 2010-04-26 18:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1272270006.2346.13.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 26 Apr 2010 10:20:06 +0200

> Here is a followup to this patch, I missed three files in this
> conversion.
> 
> Thanks !
> 
> [PATCH net-next-2.6] net: use sk_sleep()
> 
> Commit aa395145 (net: sk_sleep() helper) missed three files in the
> conversion.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] pppoe: use pppoe_pernet instead of directly net_generic
From: David Miller @ 2010-04-26 18:17 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, mostrows
In-Reply-To: <20100426114611.GC2941@psychotron.lab.eng.brq.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Mon, 26 Apr 2010 13:46:12 +0200

> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] pppoe: use phonet_pernet instead of directly net_generic
From: David Miller @ 2010-04-26 18:17 UTC (permalink / raw)
  To: jpirko; +Cc: netdev
In-Reply-To: <20100426134100.GF2941@psychotron.lab.eng.brq.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Mon, 26 Apr 2010 15:41:00 +0200

> As in for example pppoe introduce phonet_pernet and use it instead of calling
> net_generic directly.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Applied, and I modified your commit header line slightly.  This isn't
a change to "pppoe: " but rather "phonet: " :-)

Thanks.

^ permalink raw reply

* Re: [PATCH] bnx2x: add support for receive hashing
From: Tom Herbert @ 2010-04-26 18:19 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20100426.110432.104061817.davem@davemloft.net>

> I'm pretty sure there isn't at this point.
>
> We'll need to elide setting ->rxhash for non-TCP packets.  I bet that
> the ETH_FAST_PATH_RX_CQE_RSS_HASH_TYPE field might be usable to making
> this decision, but if not in the worst case we'll need to parse the
> VLAN/ETH and IP4/IP6 headers to figure out the protocol.
>
> Damn, I'm so pissed off about this.  This ruins everything!
>
> How damn hard is it to add two 16-bit ports to the hash regardless of
> protocol?
>
Fair question.

This also hits RSS/multiqueue. In a netperf RR test, 500 streams
between my two 16 core AMDs:  TCP 970K tps, UDP 370K tps.  I'm
surprised they didn't catch that in some benchmarks...

^ permalink raw reply

* Re: [PATCH] ieee802154: Fix oops during ieee802154_sock_ioctl
From: David Miller @ 2010-04-26 18:20 UTC (permalink / raw)
  To: dbaryshkov; +Cc: netdev, linux-kernel, stefan
In-Reply-To: <1272293202-11712-1-git-send-email-dbaryshkov@gmail.com>

From: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Date: Mon, 26 Apr 2010 18:46:42 +0400

> From: Stefan Schmidt <stefan@datenfreihafen.org>
> 
> Trying to run izlisten (from lowpan-tools tests) on a device that does not
> exists I got the oops below. The problem is that we are using get_dev_by_name
> without checking if we really get a device back. We don't in this case and
> writing to dev->type generates this oops.
> 
> [Oops code removed by Dmitry Eremin-Solenikov]
> 
> If possible this patch should be applied to the current -rc fixes branch.
> 
> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
> Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>

Applied, and queued up for -stable too, thanks guys.

^ permalink raw reply

* Re: [PATCH] bnx2x: add support for receive hashing
From: David Miller @ 2010-04-26 18:22 UTC (permalink / raw)
  To: therbert; +Cc: eric.dumazet, netdev
In-Reply-To: <u2q65634d661004261119j74042496z4f1ba570251e0c44@mail.gmail.com>

From: Tom Herbert <therbert@google.com>
Date: Mon, 26 Apr 2010 11:19:05 -0700

> This also hits RSS/multiqueue. In a netperf RR test, 500 streams
> between my two 16 core AMDs:  TCP 970K tps, UDP 370K tps.  I'm
> surprised they didn't catch that in some benchmarks...

Meanwhile, these NIC vendors seem to have all the time in the world to
add iSCSI, RDMA and all the other stateful offload junk into their
firmware and silicon.

Yet they can't hash ports if the protocol is not TCP?  Beyond
baffling...

^ permalink raw reply

* Re: [PATCH v3] TCP: avoid to send keepalive probes if receiving data
From: David Miller @ 2010-04-26 18:24 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: fleitner, netdev, eric.dumazet
In-Reply-To: <alpine.DEB.2.00.1004261241510.7041@wel-95.cs.helsinki.fi>

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Mon, 26 Apr 2010 12:47:13 +0300 (EEST)

>>  			    !((1 << sk->sk_state) &
>>  			      (TCPF_CLOSE | TCPF_LISTEN))) {
>> -				__u32 elapsed = tcp_time_stamp - tp->rcv_tstamp;
>> +				u32 elapsed = min_t(u32,
>> +				      tcp_time_stamp - icsk->icsk_ack.lrcvtime,
>> +				      tcp_time_stamp - tp->rcv_tstamp);
>> +
> 
> I'd put this into a helper in include/net/tcp.h
> 
 ...
>> +	if (elapsed < keepalive_time_when(tp)) {
>> +		elapsed = keepalive_time_when(tp) - elapsed;
>> +		goto resched;
>> +	}
>> +
>>  	elapsed = tcp_time_stamp - tp->rcv_tstamp;
> 
> ...And then use it here too for setting the elapsed and doing the test 
> and what follows only once?
> 

Agreed.

^ permalink raw reply

* Re: [PATCH net-2.6] bridge br_multicast: Ensure to initialize BR_INPUT_SKB_CB(skb)->mrouters_only.
From: David Miller @ 2010-04-26 18:26 UTC (permalink / raw)
  To: yoshfuji; +Cc: netdev
In-Reply-To: <201004251859.o3PIx7Vo012485@94.43.138.210.xn.2iij.net>

From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date: Mon, 26 Apr 2010 03:59:07 +0900

> Even with commit 32dec5dd0233ebffa9cae25ce7ba6daeb7df4467 ("bridge
> br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only
> without IGMP snooping."), BR_INPUT_SKB_CB(skb)->mrouters_only is
> not appropriately initialized if IGMP snooping support is
> compiled and disabled, so we can see garbage.
> 
> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] bridge br_multicast: Ensure to initialize BR_INPUT_SKB_CB(skb)->mrouters_only.
From: David Miller @ 2010-04-26 18:26 UTC (permalink / raw)
  To: yoshfuji; +Cc: yoshfuji, netdev
In-Reply-To: <201004251806.o3PI6e2p008200@94.43.138.210.xn.2iij.net>

From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date: Mon, 26 Apr 2010 03:06:40 +0900

> Even with commit 32dec5dd0233ebffa9cae25ce7ba6daeb7df4467 ("bridge
> br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only
> without IGMP snooping."), BR_INPUT_SKB_CB(skb)->mrouters_only is
> not appropriately initialized if IGMP/MLD snooping support is
> compiled and disabled, so we can see garbage.
> 
> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

Applied.

^ permalink raw reply

* Re: [patch v2] bluetooth: handle l2cap_create_connless_pdu() errors
From: Gustavo F. Padovan @ 2010-04-26 18:27 UTC (permalink / raw)
  To: David Miller
  Cc: error27-Re5JQEeQqe8AvxtiuMwx3w, marcel-kz+m5ild9QBg9hUCZPvPmw,
	andrei.emeltchenko-xNZwKgViW5gAvxtiuMwx3w,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	kernel-janitors-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100426.111259.112594696.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

Hi David,

* David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> [2010-04-26 11:12:59 -0700]:

> From: "Gustavo F. Padovan" <gustavo-THi1TnShQwVAfugRpC6u6w@public.gmane.org>
> Date: Mon, 26 Apr 2010 12:09:19 -0300
> 
> > Hi Dan,
> > 
> > * Dan Carpenter <error27-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> [2010-04-26 13:36:27 +0200]:
> > 
> >> l2cap_create_connless_pdu() can sometimes return ERR_PTR(-ENOMEM) or
> >> ERR_PTR(-EFAULT).
> >> 
> >> Signed-off-by: Dan Carpenter <error27-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >> ---
> >> In v2 I wrote the patch on top of Gustavo Padovon's devel tree
> 
> This is the kind of bug that could cause a crash if the path actually
> executes.
> 
> Therefore it tires me that that submitter was told to regenerate this
> patch against some devel tree that is -next bound, when in fact this
> is the kind of fix that warrants inclusion right now into net-2.6

My bad here. So I think we should pick the first version of the Dan's patch. 
It applies against bluetooth-testing right now and then against net-2.6 too.

Marcel, is that ok to you?

> 
> Marcel, please do whatever magic you need to so I can get this into
> Linus's tree as I did the rest of the ERR_PTR() fixes from Dan already.
> No reason to treat Bluetooth special and defer these fixes to -next.
> 
> Thanks.

-- 
Gustavo F. Padovan
http://padovan.org

^ permalink raw reply

* Re: [PATCH] RCU: don't turn off lockdep when find suspicious rcu_dereference_check() usage
From: Eric W. Biederman @ 2010-04-26 18:35 UTC (permalink / raw)
  To: paulmck
  Cc: Miles Lane, Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, nauman, eric.dumazet, netdev, Jens Axboe,
	Gui Jianfeng, Li Zefan, Johannes Berg
In-Reply-To: <20100426160925.GD2529@linux.vnet.ibm.com>

"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:

> Eric Dumazet traced these down to a commit from Eric Biederman.
>
> If I don't hear from Eric Biederman in a few days, I will attempt a
> patch, but it would be more likely to be correct coming from someone
> with a better understanding of the code.  ;-)

I already replied.

http://lkml.org/lkml/2010/4/21/420

Eric

^ permalink raw reply

* Re: [PATCH net-next-2.6] pppoe: use phonet_pernet instead of directly net_generic
From: Jiri Pirko @ 2010-04-26 18:45 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100426.111747.115930051.davem@davemloft.net>

Mon, Apr 26, 2010 at 08:17:47PM CEST, davem@davemloft.net wrote:
>From: Jiri Pirko <jpirko@redhat.com>
>Date: Mon, 26 Apr 2010 15:41:00 +0200
>
>> As in for example pppoe introduce phonet_pernet and use it instead of calling
>> net_generic directly.
>> 
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>Applied, and I modified your commit header line slightly.  This isn't
>a change to "pppoe: " but rather "phonet: " :-)

Ups, copy/paste mistake. Thanks.
>
>Thanks.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/2] Add ndo_set_vf_port_profile
From: Scott Feldman @ 2010-04-26 19:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, chrisw, arnd
In-Reply-To: <20100424.001934.189691704.davem@davemloft.net>

On 4/24/10 12:19 AM, "David Miller" <davem@davemloft.net> wrote:

>> int   (*ndo_set_vf_tx_rate)(struct net_device *dev,
>>      int vf, int rate);
>> + int   (*ndo_set_vf_port_profile)(
>> +     struct net_device *dev, int vf,
>> +     u8 *port_profile, u8 *mac,
>> +     u8 *host_uuid,
>> +     u8 *client_uuid,
>> +     u8 *client_name);
>> int   (*ndo_get_vf_config)(struct net_device *dev,
> 
> Just pass the "struct ifla_vf_port_profile *" instead of tons of
> arguments.

Ok
 
> Also, I think it's reasonable to fetch the current profile in use, so
> GETLINK ought to report these things.  To make it generic we can
> maintain the settings given to us in software, hung off of the netdev
> struct, and simply report that during GETLINK.

We'd need an array of struct ifla_vf_port_profile hanging off of netdev, one
element for each VF.  That seems like a lot of mem hanging off of netdev,
and we'd have to define a MAX_VF to size the array.  How about a
ndo_get_vf_port_profile() that the netdev fills in, and the netdev keeps the
data in it's private area?  That's how ndo_get_vf_config() is working.
 
-scott


^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/2] Add ndo_set_vf_port_profile
From: Scott Feldman @ 2010-04-26 19:57 UTC (permalink / raw)
  To: Scott Feldman, David Miller; +Cc: netdev, chrisw, arnd
In-Reply-To: <C7FB3747.2BAAA%scofeldm@cisco.com>

On 4/26/10 12:27 PM, "Scott Feldman" <scofeldm@cisco.com> wrote:
> On 4/24/10 12:19 AM, "David Miller" <davem@davemloft.net> wrote:
>> Also, I think it's reasonable to fetch the current profile in use, so
>> GETLINK ought to report these things.  To make it generic we can
>> maintain the settings given to us in software, hung off of the netdev
>> struct, and simply report that during GETLINK.
> 
> We'd need an array of struct ifla_vf_port_profile hanging off of netdev, one
> element for each VF.  That seems like a lot of mem hanging off of netdev,
> and we'd have to define a MAX_VF to size the array.  How about a
> ndo_get_vf_port_profile() that the netdev fills in, and the netdev keeps the
> data in it's private area?  That's how ndo_get_vf_config() is working.

Hmmm....even that isn't so nice because the port-profile info for all VFs is
going to get stuffed into the RTM_GETLINK skb, and I assume there are limits
on the skb return size.

Here's a proposal:

Currently we have RTM_GETLINK for

    ip link show [ DEVICE ]

This dumps everything for the DEVICE including info for each VF.  Let's
modify RTM_GETLINK to look like this:

    ip link show [ DEVICE [ vf NUM] ]

If you don't give the optional vf NUM you get base dump on DEVICE.  If you
give vf NUM, you get the VF-specific dump.  This scales much better with the
number of VFs.  

(Number of VFs can easily be > 128 in some designs).

Comments?

-scott


^ permalink raw reply

* Re: [RFC][PATCH v4 01/18] Add a new struct for device to manipulate external buffer.
From: Andy Fleming @ 2010-04-26 20:06 UTC (permalink / raw)
  To: xiaohui.xin; +Cc: netdev, kvm, linux-kernel, mst, mingo, davem, jdike
In-Reply-To: <1272187206-18534-1-git-send-email-xiaohui.xin@intel.com>

On Sun, Apr 25, 2010 at 4:19 AM,  <xiaohui.xin@intel.com> wrote:
> From: Xin Xiaohui <xiaohui.xin@intel.com>
>
> Signed-off-by: Xin Xiaohui <xiaohui.xin@intel.com>
> Signed-off-by: Zhao Yu <yzhao81@gmail.com>
> Reviewed-by: Jeff Dike <jdike@linux.intel.com>
> ---
>  include/linux/netdevice.h |   19 ++++++++++++++++++-
>  1 files changed, 18 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index c79a88b..bf79756 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -530,6 +530,22 @@ struct netdev_queue {
>        unsigned long           tx_dropped;
>  } ____cacheline_aligned_in_smp;
>
> +/* Add a structure in structure net_device, the new field is
> + * named as mp_port. It's for mediate passthru (zero-copy).
> + * It contains the capability for the net device driver,
> + * a socket, and an external buffer creator, external means
> + * skb buffer belongs to the device may not be allocated from
> + * kernel space.
> + */
> +struct mpassthru_port  {
> +       int             hdr_len;
> +       int             data_len;
> +       int             npages;
> +       unsigned        flags;
> +       struct socket   *sock;
> +       struct skb_external_page *(*ctor)(struct mpassthru_port *,
> +                               struct sk_buff *, int);
> +};


I tried searching around, but couldn't find where struct
skb_external_page is declared.  Where is it?

Andy

^ permalink raw reply

* Re: [PATCH] bnx2x: add support for receive hashing
From: Rick Jones @ 2010-04-26 20:19 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, eric.dumazet, netdev
In-Reply-To: <20100426.112244.260086869.davem@davemloft.net>

David Miller wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Mon, 26 Apr 2010 11:19:05 -0700
> 
> 
>>This also hits RSS/multiqueue. In a netperf RR test, 500 streams
>>between my two 16 core AMDs:  TCP 970K tps, UDP 370K tps.  I'm
>>surprised they didn't catch that in some benchmarks...
> 
> 
> Meanwhile, these NIC vendors seem to have all the time in the world to
> add iSCSI, RDMA and all the other stateful offload junk into their
> firmware and silicon.
> 
> Yet they can't hash ports if the protocol is not TCP?  Beyond
> baffling...

As a networking guy I can see why it seems baffling, but stepping out of myself 
and thinking like the customers with whom I've interacted over the years, it is 
not baffling at all.

By and large, customers do not do anything "substantial" with UDP.  NFS went to 
TCP mounts 99 times out of 10 many years ago, leaving DNS as about the only 
thing left*. So, customers will not be chomping at the bit for improved UDP 
scalability/performance.  They would though, be jumping up and down demanding 
iSCSI performance and by implication all that comes along for the ride.

rick jones

* And even there, one of the biggest pushes is trying to make TCP "transaction 
friendly" to deal with DNS messages becoming larger than typical MTUs.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/2] Add ndo_set_vf_port_profile
From: David Miller @ 2010-04-26 20:24 UTC (permalink / raw)
  To: scofeldm; +Cc: netdev, chrisw, arnd
In-Reply-To: <C7FB3747.2BAAA%scofeldm@cisco.com>

From: Scott Feldman <scofeldm@cisco.com>
Date: Mon, 26 Apr 2010 12:27:51 -0700

> We'd need an array of struct ifla_vf_port_profile hanging off of netdev, one
> element for each VF.  That seems like a lot of mem hanging off of netdev,
> and we'd have to define a MAX_VF to size the array.  How about a
> ndo_get_vf_port_profile() that the netdev fills in, and the netdev keeps the
> data in it's private area?  That's how ndo_get_vf_config() is working.

Sure if the device can do it, but for situations where it can't we can
use a linked list and dynamic memory allocation.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/2] Add ndo_set_vf_port_profile
From: David Miller @ 2010-04-26 20:25 UTC (permalink / raw)
  To: scofeldm; +Cc: netdev, chrisw, arnd
In-Reply-To: <C7FB3E22.2BAEE%scofeldm@cisco.com>

From: Scott Feldman <scofeldm@cisco.com>
Date: Mon, 26 Apr 2010 12:57:06 -0700

> Hmmm....even that isn't so nice because the port-profile info for all VFs is
> going to get stuffed into the RTM_GETLINK skb, and I assume there are limits
> on the skb return size.

It's probably better to use .dump() for this, which allows to return
the result in multiple SKBs.

^ permalink raw reply

* Re: [PATCH] bnx2x: add support for receive hashing
From: David Miller @ 2010-04-26 20:40 UTC (permalink / raw)
  To: rick.jones2; +Cc: therbert, eric.dumazet, netdev
In-Reply-To: <4BD5F553.6020006@hp.com>

From: Rick Jones <rick.jones2@hp.com>
Date: Mon, 26 Apr 2010 13:19:31 -0700

> As a networking guy I can see why it seems baffling, but stepping out
> of myself and thinking like the customers with whom I've interacted
> over the years, it is not baffling at all.

<sarcasm>
And hey nobody is using SCTP either, that's right, nobody...
</sarcasm>

Look, don't try to defend this abomination of a situation with some
"customers only use TCP" argument.  It only makes the situation look
even more absurd.  

Furthermore, people test system scalability using tools like pktgen,
which surprise surprise generates streams of UDP packets.  Most
hardware based scalability testers spew UDP too.

Everything in the world points to "this toeplitz hash situation is
stupid an inexcusable."

If UDP isn't used by anyone, then you tell me why the checksum engines
of all of these chips can handle them just fine.  Maybe the guy who
works on the checksum logic blocks doesn't talk to the guy who works
on the hashing ones?  Maybe the checksum guy can find the ports in a
UDP packet, but the hashing dude can't locate them?

What the heck do you think people use for various forms of media
streaming?  They often use UDP and it has to scale, and they'd like to
move to DCCP at some point too which is another argument for a fully
protocol agnostic hash.

Why do you think Eric Dumazet gives a crap about UDP scalability and
is constantly testing it?  What about VOIP?  H.323, RTP, etc.?

Some of these cards can even statelessly offload UDP fragmentation
too, in silicon, not even in firmware.  What's their excuse for
screwing up the hash?

Look, this is a complete joke from every angle, at least admit that
fact.

^ permalink raw reply

* Re: [PATCH] bnx2x: add support for receive hashing
From: Rick Jones @ 2010-04-26 20:48 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, eric.dumazet, netdev
In-Reply-To: <20100426.134051.232750473.davem@davemloft.net>

David Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Mon, 26 Apr 2010 13:19:31 -0700
> 
> 
>>As a networking guy I can see why it seems baffling, but stepping out
>>of myself and thinking like the customers with whom I've interacted
>>over the years, it is not baffling at all.
> 
> 
> <sarcasm>
> And hey nobody is using SCTP either, that's right, nobody...
> </sarcasm>
> 
> Look, don't try to defend this abomination of a situation with some
> "customers only use TCP" argument.  It only makes the situation look
> even more absurd.  

Do not confuse explanation with endorsement.

rick

^ permalink raw reply

* Re: [PATCH] bnx2x: add support for receive hashing
From: David Miller @ 2010-04-26 20:53 UTC (permalink / raw)
  To: rick.jones2; +Cc: therbert, eric.dumazet, netdev
In-Reply-To: <4BD5FC16.4070402@hp.com>

From: Rick Jones <rick.jones2@hp.com>
Date: Mon, 26 Apr 2010 13:48:22 -0700

> Do not confuse explanation with endorsement.

Ok, fair enough.

But I don't see even the "other perspective" argument being even
valid.  Big shops still use UDP and it has to scale.

Or have they made multicast magically start working with TCP so
they can us it to do trades on the NASDAQ?

^ permalink raw reply

* Re: [PATCH] bnx2x: add support for receive hashing
From: Stephen Hemminger @ 2010-04-26 20:58 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, therbert, eric.dumazet, netdev
In-Reply-To: <20100426.134051.232750473.davem@davemloft.net>

On Mon, 26 Apr 2010 13:40:51 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Rick Jones <rick.jones2@hp.com>
> Date: Mon, 26 Apr 2010 13:19:31 -0700
> 
> > As a networking guy I can see why it seems baffling, but stepping out
> > of myself and thinking like the customers with whom I've interacted
> > over the years, it is not baffling at all.
> 
> <sarcasm>
> And hey nobody is using SCTP either, that's right, nobody...
> </sarcasm>
> 
> Look, don't try to defend this abomination of a situation with some
> "customers only use TCP" argument.  It only makes the situation look
> even more absurd.  
> 
> Furthermore, people test system scalability using tools like pktgen,
> which surprise surprise generates streams of UDP packets.  Most
> hardware based scalability testers spew UDP too.
> 
> Everything in the world points to "this toeplitz hash situation is
> stupid an inexcusable."
> 
> If UDP isn't used by anyone, then you tell me why the checksum engines
> of all of these chips can handle them just fine.  Maybe the guy who
> works on the checksum logic blocks doesn't talk to the guy who works
> on the hashing ones?  Maybe the checksum guy can find the ports in a
> UDP packet, but the hashing dude can't locate them?
> 
> What the heck do you think people use for various forms of media
> streaming?  They often use UDP and it has to scale, and they'd like to
> move to DCCP at some point too which is another argument for a fully
> protocol agnostic hash.
> 
> Why do you think Eric Dumazet gives a crap about UDP scalability and
> is constantly testing it?  What about VOIP?  H.323, RTP, etc.?
> 
> Some of these cards can even statelessly offload UDP fragmentation
> too, in silicon, not even in firmware.  What's their excuse for
> screwing up the hash?
> 
> Look, this is a complete joke from every angle, at least admit that
> fact.

I think it is fair to blame Microsoft for this as well. The vendors
follow what Msft tells them to do with NDIS spec. It looks like
IPV6 didn't make it in until the NDIS6.2 (Win7) spec.

-- 

^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: jamal @ 2010-04-26 21:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, David S. Miller, Tom Herbert, Stephen Hemminger,
	netdev
In-Reply-To: <1272290584.19143.43.camel@edumazet-laptop>

On Mon, 2010-04-26 at 16:03 +0200, Eric Dumazet wrote:

> 
> Jamal, I have a Nehalem setup now, and I can see
> _raw_spin_lock_irqsave() abuse is not coming from network tree, but from
> clockevents_notify()

yikes. Thanks Eric - I shouldve been able to figure that one out. But
why is this thing expensive? I will run the test tommorow and see if i
see the same thing. 

cheers,
jamal




^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: jamal @ 2010-04-26 21:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, David S. Miller, Tom Herbert, Stephen Hemminger,
	netdev, Andi Kleen
In-Reply-To: <1272293707.19143.51.camel@edumazet-laptop>

On Mon, 2010-04-26 at 16:55 +0200, Eric Dumazet wrote:

> Another interesting finding:
> 
> - if all packets are received on a single queue, max speed seems to be
> 1.200.000 packets per second on my machine :-(

Well, if any consolation, it is not as bad as sky2 hardware;-> I cant do
more than 750Kpps.
Also, it seems you use VLANS - max pps will be lower than without VLANs
by probably maybe 6-70Kpps (doesnt explain the 1.2Mpps of course).

cheers,
jamal


^ permalink raw reply

* [PATCH] net/sb1250: setup the pdevice within the soc code
From: Sebastian Andrzej Siewior @ 2010-04-26 21:07 UTC (permalink / raw)
  To: netdev; +Cc: Ralf Baechle
In-Reply-To: <20100426210022.GE27656@linux-mips.org>

doing it within the driver does not look good.
And surely isn't how platform devices were meat to be used.

Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
 arch/mips/sibyte/swarm/platform.c |   54 +++++++++++++++++
 drivers/net/sb1250-mac.c          |  120 +------------------------------------
 2 files changed, 55 insertions(+), 119 deletions(-)

diff --git a/arch/mips/sibyte/swarm/platform.c b/arch/mips/sibyte/swarm/platform.c
index 54847fe..0973352 100644
--- a/arch/mips/sibyte/swarm/platform.c
+++ b/arch/mips/sibyte/swarm/platform.c
@@ -83,3 +83,57 @@ static int __init swarm_pata_init(void)
 device_initcall(swarm_pata_init);
 
 #endif /* defined(CONFIG_SIBYTE_SWARM) || defined(CONFIG_SIBYTE_LITTLESUR) */
+
+#define sb1250_dev_struct(num) \
+	static struct resource sb1250_res##num = {		\
+		.name = "SB1250 MAC " __stringify(num),		\
+		.flags = IORESOURCE_MEM,		\
+		.start = A_MAC_CHANNEL_BASE(num),	\
+		.end = A_MAC_CHANNEL_BASE(num + 1) -1,	\
+	};\
+	static struct platform_device sb1250_dev##num = {	\
+		.name = "sb1250-mac",			\
+	.id = num,					\
+	.resource = &sb1250_res##num,			\
+	.num_resources = 1,				\
+	}
+
+sb1250_dev_struct(0);
+sb1250_dev_struct(1);
+sb1250_dev_struct(2);
+sb1250_dev_struct(3);
+
+static struct platform_device *sb1250_devs[] __initdata = {
+	&sb1250_dev0,
+	&sb1250_dev1,
+	&sb1250_dev2,
+	&sb1250_dev3,
+};
+
+static int __init sb1250_device_init(void)
+{
+	int ret;
+
+	/* Set the number of available units based on the SOC type.  */
+	switch (soc_type) {
+	case K_SYS_SOC_TYPE_BCM1250:
+	case K_SYS_SOC_TYPE_BCM1250_ALT:
+		ret = platform_add_devices(sb1250_devs, 3);
+		break;
+	case K_SYS_SOC_TYPE_BCM1120:
+	case K_SYS_SOC_TYPE_BCM1125:
+	case K_SYS_SOC_TYPE_BCM1125H:
+	case K_SYS_SOC_TYPE_BCM1250_ALT2:       /* Hybrid */
+		ret = platform_add_devices(sb1250_devs, 2);
+		break;
+	case K_SYS_SOC_TYPE_BCM1x55:
+	case K_SYS_SOC_TYPE_BCM1x80:
+		ret = platform_add_devices(sb1250_devs, 4);
+		break;
+	default:
+		ret = -ENODEV;
+		break;
+	}
+	return ret;
+}
+device_initcall(sb1250_device_init);
diff --git a/drivers/net/sb1250-mac.c b/drivers/net/sb1250-mac.c
index fc503a1..459bc59 100644
--- a/drivers/net/sb1250-mac.c
+++ b/drivers/net/sb1250-mac.c
@@ -332,7 +332,6 @@ static int sbmac_mii_write(struct mii_bus *bus, int phyaddr, int regidx,
  ********************************************************************* */
 
 static char sbmac_string[] = "sb1250-mac";
-static char sbmac_pretty[] = "SB1250 MAC";
 
 static char sbmac_mdio_string[] = "sb1250-mac-mdio";
 
@@ -2668,114 +2667,6 @@ static int __exit sbmac_remove(struct platform_device *pldev)
 	return 0;
 }
 
-
-static struct platform_device **sbmac_pldev;
-static int sbmac_max_units;
-
-static int __init sbmac_platform_probe_one(int idx)
-{
-	struct platform_device *pldev;
-	struct {
-		struct resource r;
-		char name[strlen(sbmac_pretty) + 4];
-	} *res;
-	int err;
-
-	res = kzalloc(sizeof(*res), GFP_KERNEL);
-	if (!res) {
-		printk(KERN_ERR "%s.%d: unable to allocate memory\n",
-		       sbmac_string, idx);
-		err = -ENOMEM;
-		goto out_err;
-	}
-
-	/*
-	 * This is the base address of the MAC.
-	 */
-	snprintf(res->name, sizeof(res->name), "%s %d", sbmac_pretty, idx);
-	res->r.name = res->name;
-	res->r.flags = IORESOURCE_MEM;
-	res->r.start = A_MAC_CHANNEL_BASE(idx);
-	res->r.end = A_MAC_CHANNEL_BASE(idx + 1) - 1;
-
-	pldev = platform_device_register_simple(sbmac_string, idx, &res->r, 1);
-	if (IS_ERR(pldev)) {
-		printk(KERN_ERR "%s.%d: unable to register platform device\n",
-		       sbmac_string, idx);
-		err = PTR_ERR(pldev);
-		goto out_kfree;
-	}
-
-	if (!pldev->dev.driver) {
-		err = 0;		/* No hardware at this address. */
-		goto out_unregister;
-	}
-
-	sbmac_pldev[idx] = pldev;
-	return 0;
-
-out_unregister:
-	platform_device_unregister(pldev);
-
-out_kfree:
-	kfree(res);
-
-out_err:
-	return err;
-}
-
-static void __init sbmac_platform_probe(void)
-{
-	int i;
-
-	/* Set the number of available units based on the SOC type.  */
-	switch (soc_type) {
-	case K_SYS_SOC_TYPE_BCM1250:
-	case K_SYS_SOC_TYPE_BCM1250_ALT:
-		sbmac_max_units = 3;
-		break;
-	case K_SYS_SOC_TYPE_BCM1120:
-	case K_SYS_SOC_TYPE_BCM1125:
-	case K_SYS_SOC_TYPE_BCM1125H:
-	case K_SYS_SOC_TYPE_BCM1250_ALT2:	/* Hybrid */
-		sbmac_max_units = 2;
-		break;
-	case K_SYS_SOC_TYPE_BCM1x55:
-	case K_SYS_SOC_TYPE_BCM1x80:
-		sbmac_max_units = 4;
-		break;
-	default:
-		return;				/* none */
-	}
-
-	sbmac_pldev = kcalloc(sbmac_max_units, sizeof(*sbmac_pldev),
-			      GFP_KERNEL);
-	if (!sbmac_pldev) {
-		printk(KERN_ERR "%s: unable to allocate memory\n",
-		       sbmac_string);
-		return;
-	}
-
-	/*
-	 * Walk through the Ethernet controllers and find
-	 * those who have their MAC addresses set.
-	 */
-	for (i = 0; i < sbmac_max_units; i++)
-		if (sbmac_platform_probe_one(i))
-			break;
-}
-
-
-static void __exit sbmac_platform_cleanup(void)
-{
-	int i;
-
-	for (i = 0; i < sbmac_max_units; i++)
-		platform_device_unregister(sbmac_pldev[i]);
-	kfree(sbmac_pldev);
-}
-
-
 static struct platform_driver sbmac_driver = {
 	.probe = sbmac_probe,
 	.remove = __exit_p(sbmac_remove),
@@ -2786,20 +2677,11 @@ static struct platform_driver sbmac_driver = {
 
 static int __init sbmac_init_module(void)
 {
-	int err;
-
-	err = platform_driver_register(&sbmac_driver);
-	if (err)
-		return err;
-
-	sbmac_platform_probe();
-
-	return err;
+	return platform_driver_register(&sbmac_driver);
 }
 
 static void __exit sbmac_cleanup_module(void)
 {
-	sbmac_platform_cleanup();
 	platform_driver_unregister(&sbmac_driver);
 }
 
-- 
1.6.6.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox