netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Intel and TOE in the news
@ 2005-02-19  3:44 Jeff Garzik
  2005-02-19  4:10 ` Lennert Buytenhek
                   ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Jeff Garzik @ 2005-02-19  3:44 UTC (permalink / raw)
  To: Netdev


http://www.theregister.co.uk/2005/02/18/intel_tcpip_attack/

Intel to attack greedy TCP/IP stack

Intel next year will plant itself square in the middle of the budding 
market for systems that speed network traffic by rolling out something 
called I/OAT.

I/OAT stands for I/O Acceleration Technology and it will be previewed 
for the first time at next month's Intel Developer Forum (IDF) in San 
Francisco. Intel remains cagey about what exactly I/OAT is, but it 
dangled a few details today in front of reporters ahead of the IDF event.

[...]

Intel plans to sidestep the need for separate TOE cards by building this 
technology into its server processor package - the chip itself, chipset 
and network controller. This should reduce some of the time a processor 
typically spends waiting for memory to feed back information and improve 
overall application processing speeds.

[...]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-19  3:44 Intel and TOE in the news Jeff Garzik
@ 2005-02-19  4:10 ` Lennert Buytenhek
  2005-02-19 19:46   ` David S. Miller
  2005-03-02 13:48   ` Lennert Buytenhek
  2005-02-21 13:59 ` P
  2005-02-21 22:44 ` Stephen Hemminger
  2 siblings, 2 replies; 74+ messages in thread
From: Lennert Buytenhek @ 2005-02-19  4:10 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Netdev

On Fri, Feb 18, 2005 at 10:44:45PM -0500, Jeff Garzik wrote:

> Intel plans to sidestep the need for separate TOE cards by building this 
> technology into its server processor package - the chip itself, chipset 
> and network controller. This should reduce some of the time a processor 
> typically spends waiting for memory to feed back information and improve 
> overall application processing speeds.

I wonder if they could just take the network processing circuitry from
the IXP2800 (an extra 16-core (!) RISCy processor on-die, dedicated to
doing just network stuff, and a 10gbps pipe going straight into the CPU
itself) and graft it onto the Xeon.

Now _that_ would be something worth experiencing.


--L

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-19  4:10 ` Lennert Buytenhek
@ 2005-02-19 19:46   ` David S. Miller
  2005-02-19 20:27     ` Andi Kleen
  2005-02-19 20:29     ` Lennert Buytenhek
  2005-03-02 13:48   ` Lennert Buytenhek
  1 sibling, 2 replies; 74+ messages in thread
From: David S. Miller @ 2005-02-19 19:46 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: jgarzik, netdev

On Sat, 19 Feb 2005 05:10:07 +0100
Lennert Buytenhek <buytenh@wantstofly.org> wrote:

> On Fri, Feb 18, 2005 at 10:44:45PM -0500, Jeff Garzik wrote:
> 
> > Intel plans to sidestep the need for separate TOE cards by building this 
> > technology into its server processor package - the chip itself, chipset 
> > and network controller. This should reduce some of the time a processor 
> > typically spends waiting for memory to feed back information and improve 
> > overall application processing speeds.
> 
> I wonder if they could just take the network processing circuitry from
> the IXP2800 (an extra 16-core (!) RISCy processor on-die, dedicated to
> doing just network stuff, and a 10gbps pipe going straight into the CPU
> itself) and graft it onto the Xeon.
> 
> Now _that_ would be something worth experiencing.

No, that would be garbage.

Read what they are doing.  The idea is not to have all of this network
protocol logic off-cpu, the idea is to "reduce some of the time
a processor typically spends waiting for memory to feed back information"

Think about what part of the network I/O equation that is working on.
It's not protocol offload, that's for sure.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-19 19:46   ` David S. Miller
@ 2005-02-19 20:27     ` Andi Kleen
  2005-02-19 20:32       ` Lennert Buytenhek
                         ` (2 more replies)
  2005-02-19 20:29     ` Lennert Buytenhek
  1 sibling, 3 replies; 74+ messages in thread
From: Andi Kleen @ 2005-02-19 20:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: jgarzik, netdev

"David S. Miller" <davem@davemloft.net> writes:

> Read what they are doing.  The idea is not to have all of this network
> protocol logic off-cpu, the idea is to "reduce some of the time
> a processor typically spends waiting for memory to feed back information"
>
> Think about what part of the network I/O equation that is working on.
> It's not protocol offload, that's for sure.

<speculating freely>

It would be nice if the NIC could asynchronously trigger prefetches in
the CPU. Currently a lot of the packet processing cost goes
to waiting for read cache misses.

E.g. 

- NIC receives packet. 
- Tells target CPU to prefetch RX descriptor and headers. 
- CPU later looks at them and doesn't have to wait a for a cache miss.

Drawback is that you would need to tell the NIC in advance
on which CPU you want to process the packet, but with Linux
IRQ affinity that's easy to figure out. 

-Andi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-19 19:46   ` David S. Miller
  2005-02-19 20:27     ` Andi Kleen
@ 2005-02-19 20:29     ` Lennert Buytenhek
  1 sibling, 0 replies; 74+ messages in thread
From: Lennert Buytenhek @ 2005-02-19 20:29 UTC (permalink / raw)
  To: David S. Miller; +Cc: jgarzik, netdev

On Sat, Feb 19, 2005 at 11:46:24AM -0800, David S. Miller wrote:

> > I wonder if they could just take the network processing circuitry from
> > the IXP2800 (an extra 16-core (!) RISCy processor on-die, dedicated to
> > doing just network stuff, and a 10gbps pipe going straight into the CPU
> > itself) and graft it onto the Xeon.
> > 
> > Now _that_ would be something worth experiencing.
> 
> No, that would be garbage.
> 
> Read what they are doing.  The idea is not to have all of this network
> protocol logic off-cpu, the idea is to "reduce some of the time
> a processor typically spends waiting for memory to feed back information"

I agree that offloading just for the sake of offloading is silly.

The reason a 1.4GHz IXP2800 processes 15Mpps while a high-end PC hardly
does 1Mpps is exactly because the PC spends all of its cycles stalling on
memory and PCI reads (i.e. 'latency'), and the IXP2800 has various ways
of mitigating this cost that the PC doesn't have.  First of all, the IXP
has 16 cores which are 8-way 'hyperthreaded' each (128 threads total.)
Second, the SDRAM controller and the the NIC circuitry are all on-chip, 
which saves you having to cross the FSB and the PCI bus a gazillion
times for everything you do.  (An L2 miss is hundreds of wasted cycles,
and just reading the interrupt status register of the e1000 that's on
a dedicated 64bit 100MHz PCI bus is ~2500 wasted cycles on my Xeon
2.4GHz.)  Third, SDRAM is treated as an async resource -- i.e. you
submit a memory read or write, and get an async signal when it's done,
so you can do other stuff instead of stalling.

The goal of the IXP2800 design, AFAICS, was not to create the possiblity
to offload the TCP stack per se, but to create an architecture that is
better suited to the kind of nonlocal reference patterns that you see
with network traffic.  It wasn't even specifically designed for
offloading things as such.

For something somewhat more conventional, look at the Broadcom BCM1250:
a dual core MIPS CPU, relatively slow (600MHz-1GHz), but with three
GigE MACs and a SDRAM controller inside the CPU.  It gives the PC a
good run for its money for networking stuff.

Any kind of "make networking go faster" solution will be all about
reducing the cost of latency, and less about increasing raw processing
power.  Multi-core CPUs help not because they have more MIPS, but
because if they stall, it's only one core that stalls and all the
other cores just keep on going.  For certain tasks, I'll take a 4-core
1.0GHz CPU over a single-core 4.0GHz one any day.


--L

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-19 20:27     ` Andi Kleen
@ 2005-02-19 20:32       ` Lennert Buytenhek
  2005-02-20 16:46       ` Eugene Surovegin
  2005-02-20 19:45       ` rick jones
  2 siblings, 0 replies; 74+ messages in thread
From: Lennert Buytenhek @ 2005-02-19 20:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David S. Miller, jgarzik, netdev

On Sat, Feb 19, 2005 at 09:27:31PM +0100, Andi Kleen wrote:

> It would be nice if the NIC could asynchronously trigger prefetches
> in the CPU. Currently a lot of the packet processing cost goes to
> waiting for read cache misses.

I've been told that this is something that the BCM-1250 can do.
The MACs are in the CPU, and they can (reportedly) DMA the packet
headers straight into L2.


--L

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-19 20:27     ` Andi Kleen
  2005-02-19 20:32       ` Lennert Buytenhek
@ 2005-02-20 16:46       ` Eugene Surovegin
  2005-02-21 14:01         ` jamal
  2005-02-20 19:45       ` rick jones
  2 siblings, 1 reply; 74+ messages in thread
From: Eugene Surovegin @ 2005-02-20 16:46 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David S. Miller, jgarzik, netdev

On Sat, Feb 19, 2005 at 09:27:31PM +0100, Andi Kleen wrote:
> <speculating freely>
> 
> It would be nice if the NIC could asynchronously trigger prefetches in
> the CPU. Currently a lot of the packet processing cost goes
> to waiting for read cache misses.

Just FYI :), Freescale 85xx TSECs can prefetch buffers into L2 cache. 
IIRC they call it buffer "stashing" and gianfar driver has a config 
option to enable such behavior.

But in embedded world you usually don't see flashy PR releases for 
such features :).

--
Eugene.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-19 20:27     ` Andi Kleen
  2005-02-19 20:32       ` Lennert Buytenhek
  2005-02-20 16:46       ` Eugene Surovegin
@ 2005-02-20 19:45       ` rick jones
  2005-02-20 21:20         ` Michael Richardson
  2005-02-20 21:29         ` Andi Kleen
  2 siblings, 2 replies; 74+ messages in thread
From: rick jones @ 2005-02-20 19:45 UTC (permalink / raw)
  To: netdev

> <speculating freely>
>
> It would be nice if the NIC could asynchronously trigger prefetches in
> the CPU. Currently a lot of the packet processing cost goes
> to waiting for read cache misses.
>
> E.g.
>
> - NIC receives packet.
> - Tells target CPU to prefetch RX descriptor and headers.
> - CPU later looks at them and doesn't have to wait a for a cache miss.
>
> Drawback is that you would need to tell the NIC in advance
> on which CPU you want to process the packet, but with Linux
> IRQ affinity that's easy to figure out.

With all the interrupt avoidance that is going-on these days, would 
prefetching in the driver be sufficient?  Presumably the driver is 
going to be processing multiple packets at a time on an interrupt/etc 
so having it issue prefetches in SW would seem to help with all but the 
very first packet.

rick jones
Wisdom teeth are impacted, people are affected by the effects of events

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-20 19:45       ` rick jones
@ 2005-02-20 21:20         ` Michael Richardson
  2005-02-20 21:29         ` Andi Kleen
  1 sibling, 0 replies; 74+ messages in thread
From: Michael Richardson @ 2005-02-20 21:20 UTC (permalink / raw)
  To: rick jones; +Cc: netdev

-----BEGIN PGP SIGNED MESSAGE-----


>>>>> "rick" == rick jones <rick.jones2@hp.com> writes:
    >> - NIC receives packet.  - Tells target CPU to prefetch RX
    >> descriptor and headers.  - CPU later looks at them and doesn't
    >> have to wait a for a cache miss.
    >> 
    >> Drawback is that you would need to tell the NIC in advance on
    >> which CPU you want to process the packet, but with Linux IRQ
    >> affinity that's easy to figure out.

    rick> With all the interrupt avoidance that is going-on these days,
    rick> would prefetching in the driver be sufficient?  Presumably the
    rick> driver is going to be processing multiple packets at a time on
    rick> an interrupt/etc so having it issue prefetches in SW would
    rick> seem to help with all but the very first packet.

  I did prefetching of subsequent descriptor rings in the 200Mhz days,
when the descriptors lived in the NIC card. It's rather hard to write in
C :-)
  I did prefetching of packets on a PowerPC system (actually, it was
inside a TOE-like device, living in a Linux 1U). This was done with
altivec instructions, and was rather easy.
  On Intel, one can use MMX or floating point instructions to do the
prefetch. 
  
  The interrupt avoidance mechanism actually makes the prefetch easier,
since you are already in the bottom half, so it is easier to do. 
  The question is: how much to prefetch?

  BTW: in 1996 the IETF IPv6 WG briefly toyed with the idea of swapping
       the source and destination addresses in the IPv6 header. Having
       the dst first reduces a cache miss. They decided not to do this.

- -- 
] Michael Richardson          Xelerance Corporation, Ottawa, ON |  firewalls  [
] mcr @ xelerance.com           Now doing IPsec training, see   |net architect[
] http://www.sandelman.ca/mcr/    www.xelerance.com/training/   |device driver[
] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Finger me for keys

iQCVAwUBQhj/DYqHRg3pndX9AQHw8QP8CzXmHRgXWFOjU7TDpp4R73iiWmFtvrfq
NjFOQ9FbHkhk/xU47hwayOq8VA07Xh4baQa5YE824a2BbnFn88bz5A4kGeSia0/i
XLXEH+1d0QlWZ5ZJxPDwWyxszPorQpS8Mim+3GIcX24l2l9R7Y/x5hkVKHCZ/OuY
hukjy7DiObQ=
=h+QI
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-20 19:45       ` rick jones
  2005-02-20 21:20         ` Michael Richardson
@ 2005-02-20 21:29         ` Andi Kleen
  2005-02-20 22:43           ` Leonid Grossman
  1 sibling, 1 reply; 74+ messages in thread
From: Andi Kleen @ 2005-02-20 21:29 UTC (permalink / raw)
  To: rick jones; +Cc: netdev

rick jones <rick.jones2@hp.com> writes:

>> <speculating freely>
>>
>> It would be nice if the NIC could asynchronously trigger prefetches in
>> the CPU. Currently a lot of the packet processing cost goes
>> to waiting for read cache misses.
>>
>> E.g.
>>
>> - NIC receives packet.
>> - Tells target CPU to prefetch RX descriptor and headers.
>> - CPU later looks at them and doesn't have to wait a for a cache miss.
>>
>> Drawback is that you would need to tell the NIC in advance
>> on which CPU you want to process the packet, but with Linux
>> IRQ affinity that's easy to figure out.
>
> With all the interrupt avoidance that is going-on these days, would
> prefetching in the driver be sufficient?  Presumably the driver is
> going to be processing multiple packets at a time on an interrupt/etc
> so having it issue prefetches in SW would seem to help with all but
> the very first packet.

Yes, we came up with this idea some years ago too ;-). It was even
tried in some simple variants, but didn't work very well:

- The time between finding out you have a packet and it being processed
is often too short to make it worthwhile. That gets worse with NAPI 
under high load.
- You have to fetch the RX descriptor anyways to find out where
the packet memory is to prefetch the header, and that is a cache
miss too.
(presumably you could keep a second sw only cache hot table that allows
to figure this out faster, that hasn't been tried so far) 
- It really requires a NIC that tells you in the RX descriptor
if a packet is IP (some do, but other popular ones don't). 
Otherwise the network driver has to eat an early cache miss
anyways to read the 802.x protocol ID for passing the packet up
the network stack.
(one possible fix for that would be to shift the protocol parsing
to later to avoid this, but that would be an incompatible change in 
the driver interface) 

I guess with more intrusive changes Linux could do this better.
e.g. if you have the cache hot secondary table and a really cheap
way to find out from the NIC on a interrupt how many packets
it accepted you could aggressive prefetching and do the protocol
lookup later with a callback to the driver. But this has a problems too:

- Even on modern CPUs you cannot do too many prefetches in parallel
because you're overwhelming the load store units. At some points
new prefetches just get ignored. On older CPUs this problem is even worse.

Jamal and Robert did some experiments with routing on this and they
ran also into this.

If the NIC initiated the transfers the bandwidth of the CPU would be
much more evenly used because the transfers are spaced out in time
as the packets arrive. Software prefetch will be always bursty.

However I agree that probably some smaller software only improvements
could be still done in this area on Linux.

-Andi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-20 21:29         ` Andi Kleen
@ 2005-02-20 22:43           ` Leonid Grossman
  2005-02-20 23:07             ` Andi Kleen
  2005-02-21 13:44             ` jamal
  0 siblings, 2 replies; 74+ messages in thread
From: Leonid Grossman @ 2005-02-20 22:43 UTC (permalink / raw)
  To: 'Andi Kleen', 'rick jones'; +Cc: netdev, 'Alex Aizman'

Andi,
This is an interesting idea, we'll play around...

BTW - does anybody know if it's possible to indicate multiple receive
packets?

In other OS drivers we have an option to indicate a "packet train" that got
received during an interrupt, but I'm not sure if/how it's doable in Linux.

We are adding Linux driver support for message-signaled interrupts and
header separation (just recently figured out how to indicate chained skb for
a packet that has IP and TCP headers separated by the ASIC);
If a packet train indication works then the driver could prefetch the
descriptor ring segment and also a rx buffer segment that holds headers
stored back-to-back, before indicating the train. 

Leonid

> -----Original Message-----
> From: netdev-bounce@oss.sgi.com 
> [mailto:netdev-bounce@oss.sgi.com] On Behalf Of Andi Kleen
> Sent: Sunday, February 20, 2005 1:30 PM
> To: rick jones
> Cc: netdev@oss.sgi.com
> Subject: Re: Intel and TOE in the news
> 
> rick jones <rick.jones2@hp.com> writes:
> 
> >> <speculating freely>
> >>
> >> It would be nice if the NIC could asynchronously trigger 
> prefetches 
> >> in the CPU. Currently a lot of the packet processing cost goes to 
> >> waiting for read cache misses.
> >>
> >> E.g.
> >>
> >> - NIC receives packet.
> >> - Tells target CPU to prefetch RX descriptor and headers.
> >> - CPU later looks at them and doesn't have to wait a for a 
> cache miss.
> >>
> >> Drawback is that you would need to tell the NIC in advance 
> on which 
> >> CPU you want to process the packet, but with Linux IRQ affinity 
> >> that's easy to figure out.
> >
> > With all the interrupt avoidance that is going-on these days, would 
> > prefetching in the driver be sufficient?  Presumably the driver is 
> > going to be processing multiple packets at a time on an 
> interrupt/etc 
> > so having it issue prefetches in SW would seem to help with all but 
> > the very first packet.
> 
> Yes, we came up with this idea some years ago too ;-). It was 
> even tried in some simple variants, but didn't work very well:
> 
> - The time between finding out you have a packet and it being 
> processed is often too short to make it worthwhile. That gets 
> worse with NAPI under high load.
> - You have to fetch the RX descriptor anyways to find out 
> where the packet memory is to prefetch the header, and that 
> is a cache miss too.
> (presumably you could keep a second sw only cache hot table 
> that allows to figure this out faster, that hasn't been tried so far)
> - It really requires a NIC that tells you in the RX 
> descriptor if a packet is IP (some do, but other popular ones don't). 
> Otherwise the network driver has to eat an early cache miss 
> anyways to read the 802.x protocol ID for passing the packet 
> up the network stack.
> (one possible fix for that would be to shift the protocol 
> parsing to later to avoid this, but that would be an 
> incompatible change in the driver interface) 
> 
> I guess with more intrusive changes Linux could do this better.
> e.g. if you have the cache hot secondary table and a really 
> cheap way to find out from the NIC on a interrupt how many 
> packets it accepted you could aggressive prefetching and do 
> the protocol lookup later with a callback to the driver. But 
> this has a problems too:
> 
> - Even on modern CPUs you cannot do too many prefetches in 
> parallel because you're overwhelming the load store units. At 
> some points new prefetches just get ignored. On older CPUs 
> this problem is even worse.
> 
> Jamal and Robert did some experiments with routing on this 
> and they ran also into this.
> 
> If the NIC initiated the transfers the bandwidth of the CPU 
> would be much more evenly used because the transfers are 
> spaced out in time as the packets arrive. Software prefetch 
> will be always bursty.
> 
> However I agree that probably some smaller software only 
> improvements could be still done in this area on Linux.
> 
> -Andi
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-20 22:43           ` Leonid Grossman
@ 2005-02-20 23:07             ` Andi Kleen
  2005-02-21  1:57               ` Alex Aizman
  2005-02-21  3:31               ` Leonid Grossman
  2005-02-21 13:44             ` jamal
  1 sibling, 2 replies; 74+ messages in thread
From: Andi Kleen @ 2005-02-20 23:07 UTC (permalink / raw)
  To: Leonid Grossman; +Cc: 'rick jones', netdev, 'Alex Aizman'

On Sun, Feb 20, 2005 at 02:43:59PM -0800, Leonid Grossman wrote:
> This is an interesting idea, we'll play around...

What exactly? The software only shadow table?

> 
> BTW - does anybody know if it's possible to indicate multiple receive
> packets?
> 
> In other OS drivers we have an option to indicate a "packet train" that got
> received during an interrupt, but I'm not sure if/how it's doable in Linux.

You can always call netif_rx() multiple times from the interrupt. The
function doesn't do the full packet processing,  but just stuffs the packet 
into a CPU queue that is processed at a lower priority interrupt (softirq). 
Doesn't work for NAPI unfortunately though; netif_receive_skb always
does the protocol stack.

> We are adding Linux driver support for message-signaled interrupts and
> header separation (just recently figured out how to indicate chained skb for

I had an experimental patch to enable MSI (without -X) for your cards,
but didn't push it out because i wasn't too happy with it.

Most interesting would be to use per CPU TX completion interrupts using
MSI-X and avoid bouncing packets around between CPUs.

> a packet that has IP and TCP headers separated by the ASIC);
> If a packet train indication works then the driver could prefetch the
> descriptor ring segment and also a rx buffer segment that holds headers
> stored back-to-back, before indicating the train. 

Me and Jamal tried that some time ago, but it did not help too much.
Probably because the protocol process overhead was not big enough.
However that was with NAPI, might be perhaps worth trying without
it. 

Problem is that need to fill in skb->protocol/pkt_type before you can pass
the packet up; you can perhaps derive it from the RX descriptor (card has a 
bit that says "this is IP" and "its unicast for my MAC"). 
But the RX descriptor access is already a cache miss that stalls you.  

To make the prefetching work well for this would probably require a callback
to the driver so that you can do this later after your prefetch succeeded.

-Andi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-20 23:07             ` Andi Kleen
@ 2005-02-21  1:57               ` Alex Aizman
  2005-02-21  2:37                 ` Jeff Garzik
  2005-02-21 11:37                 ` Andi Kleen
  2005-02-21  3:31               ` Leonid Grossman
  1 sibling, 2 replies; 74+ messages in thread
From: Alex Aizman @ 2005-02-21  1:57 UTC (permalink / raw)
  To: 'Andi Kleen', 'Leonid Grossman'
  Cc: 'rick jones', netdev

> You can always call netif_rx() multiple times from the 
> interrupt. 

That's what we do, simply because interrupts are coalesced. 

> The function doesn't do the full packet 
> processing,  but just stuffs the packet into a CPU queue that 
> is processed at a lower priority interrupt (softirq). 

There's the netif_rx's own per-packet overhead (including the call itself)
that arguably could be optimized..

> Most interesting would be to use per CPU TX completion 
> interrupts using MSI-X and avoid bouncing packets around between CPUs.

Our experimental Linux driver already supports MSI and MSI-X (the latter not
tested). Once/if multi-MSI support in Linux becomes available it'd be
practically possible to scale TCP connections with a number of CPUs.
Alternative: wait until Xframe II adapter w/MSI-X.. 


Alex

> -----Original Message-----
> From: Andi Kleen [mailto:ak@muc.de] 
> Sent: Sunday, February 20, 2005 3:07 PM
> To: Leonid Grossman
> Cc: 'rick jones'; netdev@oss.sgi.com; 'Alex Aizman'
> Subject: Re: Intel and TOE in the news
> 
> On Sun, Feb 20, 2005 at 02:43:59PM -0800, Leonid Grossman wrote:
> > This is an interesting idea, we'll play around...
> 
> What exactly? The software only shadow table?
> 
> > 
> > BTW - does anybody know if it's possible to indicate 
> multiple receive 
> > packets?
> > 
> > In other OS drivers we have an option to indicate a "packet train" 
> > that got received during an interrupt, but I'm not sure 
> if/how it's doable in Linux.
> 
> You can always call netif_rx() multiple times from the 
> interrupt. The function doesn't do the full packet 
> processing,  but just stuffs the packet into a CPU queue that 
> is processed at a lower priority interrupt (softirq). 
> Doesn't work for NAPI unfortunately though; netif_receive_skb 
> always does the protocol stack.
> 
> > We are adding Linux driver support for message-signaled 
> interrupts and 
> > header separation (just recently figured out how to 
> indicate chained 
> > skb for
> 
> I had an experimental patch to enable MSI (without -X) for 
> your cards, but didn't push it out because i wasn't too happy with it.
> 
> Most interesting would be to use per CPU TX completion 
> interrupts using MSI-X and avoid bouncing packets around between CPUs.
> 
> > a packet that has IP and TCP headers separated by the ASIC); If a 
> > packet train indication works then the driver could prefetch the 
> > descriptor ring segment and also a rx buffer segment that holds 
> > headers stored back-to-back, before indicating the train.
> 
> Me and Jamal tried that some time ago, but it did not help too much.
> Probably because the protocol process overhead was not big enough.
> However that was with NAPI, might be perhaps worth trying without it. 
> 
> Problem is that need to fill in skb->protocol/pkt_type before 
> you can pass the packet up; you can perhaps derive it from 
> the RX descriptor (card has a bit that says "this is IP" and 
> "its unicast for my MAC"). 
> But the RX descriptor access is already a cache miss that 
> stalls you.  
> 
> To make the prefetching work well for this would probably 
> require a callback to the driver so that you can do this 
> later after your prefetch succeeded.
> 
> -Andi
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21  1:57               ` Alex Aizman
@ 2005-02-21  2:37                 ` Jeff Garzik
  2005-02-21 19:34                   ` Alex Aizman
  2005-02-21 11:37                 ` Andi Kleen
  1 sibling, 1 reply; 74+ messages in thread
From: Jeff Garzik @ 2005-02-21  2:37 UTC (permalink / raw)
  To: alex
  Cc: 'Andi Kleen', 'Leonid Grossman',
	'rick jones', netdev

Alex Aizman wrote:
> Our experimental Linux driver already supports MSI and MSI-X (the latter not
> tested). Once/if multi-MSI support in Linux becomes available it'd be
> practically possible to scale TCP connections with a number of CPUs.
> Alternative: wait until Xframe II adapter w/MSI-X.. 


The infiniband Linux driver is already using multi-MSI.  You are behind 
the times :)

	Jeff

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-20 23:07             ` Andi Kleen
  2005-02-21  1:57               ` Alex Aizman
@ 2005-02-21  3:31               ` Leonid Grossman
  2005-02-21 11:50                 ` Andi Kleen
  1 sibling, 1 reply; 74+ messages in thread
From: Leonid Grossman @ 2005-02-21  3:31 UTC (permalink / raw)
  To: 'Andi Kleen', 'Leonid Grossman'
  Cc: 'rick jones', netdev, 'Alex Aizman'

 

> -----Original Message-----
> From: netdev-bounce@oss.sgi.com 
> [mailto:netdev-bounce@oss.sgi.com] On Behalf Of Andi Kleen
> Sent: Sunday, February 20, 2005 3:07 PM
> To: Leonid Grossman
> Cc: 'rick jones'; netdev@oss.sgi.com; 'Alex Aizman'
> Subject: Re: Intel and TOE in the news
> 
> On Sun, Feb 20, 2005 at 02:43:59PM -0800, Leonid Grossman wrote:
> > This is an interesting idea, we'll play around...
> 
> What exactly? The software only shadow table?
> 
> > 
> > BTW - does anybody know if it's possible to indicate 
> multiple receive 
> > packets?
> > 
> > In other OS drivers we have an option to indicate a "packet train" 
> > that got received during an interrupt, but I'm not sure 
> if/how it's doable in Linux.
> 
> You can always call netif_rx() multiple times from the 
> interrupt. The function doesn't do the full packet 
> processing,  but just stuffs the packet into a CPU queue that 
> is processed at a lower priority interrupt (softirq). 
> Doesn't work for NAPI unfortunately though; netif_receive_skb 
> always does the protocol stack.

Yes, this is what we currently do; I was rather thinking about the option to
indicate multiple packets in a single call (say as a linked list). 
Alternative (rather, complimentary) approach is to collapse consecutive
packets from the same session in a single large frame; we are going to try
this as well since the ASIC has hw assists for that.

> 
> > We are adding Linux driver support for message-signaled 
> interrupts and 
> > header separation (just recently figured out how to 
> indicate chained 
> > skb for
> 
> I had an experimental patch to enable MSI (without -X) for 
> your cards, but didn't push it out because i wasn't too happy with it.
> 

We have single MSI working now. Jeff - thanks for the pointer to multiple
MSI usage in IB, we will have a look (to catch up with the times :-). 
I guess MSIs are not that interesting in themselves - it's more what the
driver can do with them to optimize tx/rx processing... 

> Most interesting would be to use per CPU TX completion 
> interrupts using MSI-X and avoid bouncing packets around between CPUs.

Do you mean indicating rx packets to the same cpu that tx (for the same
session) came from, or something else?

> 
> > a packet that has IP and TCP headers separated by the ASIC); If a 
> > packet train indication works then the driver could prefetch the 
> > descriptor ring segment and also a rx buffer segment that holds 
> > headers stored back-to-back, before indicating the train.
> 
> Me and Jamal tried that some time ago, but it did not help too much.
> Probably because the protocol process overhead was not big enough.
> However that was with NAPI, might be perhaps worth trying without it. 
> 
> Problem is that need to fill in skb->protocol/pkt_type before 
> you can pass the packet up; you can perhaps derive it from 
> the RX descriptor (card has a bit that says "this is IP" and 
> "its unicast for my MAC"). 
> But the RX descriptor access is already a cache miss that 
> stalls you.  
> 
> To make the prefetching work well for this would probably 
> require a callback to the driver so that you can do this 
> later after your prefetch succeeded.

Instead of requiring a callback, a driver can prefetch descriptors and
headers for the packets that are going to be processed on the next interrupt
- by then, prefetch will likely succeed.

Leonid  

> 
> -Andi
> 
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21  1:57               ` Alex Aizman
  2005-02-21  2:37                 ` Jeff Garzik
@ 2005-02-21 11:37                 ` Andi Kleen
  1 sibling, 0 replies; 74+ messages in thread
From: Andi Kleen @ 2005-02-21 11:37 UTC (permalink / raw)
  To: Alex Aizman; +Cc: 'Leonid Grossman', 'rick jones', netdev

On Sun, Feb 20, 2005 at 05:57:56PM -0800, Alex Aizman wrote:
> 
> That's what we do, simply because interrupts are coalesced. 
> 
> > The function doesn't do the full packet 
> > processing,  but just stuffs the packet into a CPU queue that 
> > is processed at a lower priority interrupt (softirq). 
> 
> There's the netif_rx's own per-packet overhead (including the call itself)
> that arguably could be optimized..

netif_rx should be pretty cheap. It could be probably optimized more
(e.g. no local_irq_save if you know you're comming from
a interrupt or somehow avoiding the atomic reference count
increase on the device), but I suspect there are other
areas that could be improved first.

> > Most interesting would be to use per CPU TX completion 
> > interrupts using MSI-X and avoid bouncing packets around between CPUs.
> 
> Our experimental Linux driver already supports MSI and MSI-X (the latter not
> tested). Once/if multi-MSI support in Linux becomes available it'd be
> practically possible to scale TCP connections with a number of CPUs.

It's already available, although the API is still a bit clumpsy.

> Alternative: wait until Xframe II adapter w/MSI-X.. 

How does that help with MSI?

-Andi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21  3:31               ` Leonid Grossman
@ 2005-02-21 11:50                 ` Andi Kleen
  2005-02-21 13:28                   ` Thomas Graf
  0 siblings, 1 reply; 74+ messages in thread
From: Andi Kleen @ 2005-02-21 11:50 UTC (permalink / raw)
  To: Leonid Grossman; +Cc: 'rick jones', netdev, 'Alex Aizman'

On Sun, Feb 20, 2005 at 07:31:55PM -0800, Leonid Grossman wrote:
> Yes, this is what we currently do; I was rather thinking about the option to
> indicate multiple packets in a single call (say as a linked list). 

For the non NAPI case the packet is just put into a queue 
anyways. If you want to process packets as lists then just the consumer 
of the queue would need to be changed. I agree that it would be a good 
idea to lower locking overhead. That would only help much though
if all the packets in a list belong to the same stream, otherwise
you need multiple locks anyways for different sockets and it would be useless.

For NAPI there would need to be some higher level changes for this.

The main problem is that someone has to go through all the protocol layers
and make sure they can process lists. Also it needs careful handling
in netfilter.

> > Most interesting would be to use per CPU TX completion 
> > interrupts using MSI-X and avoid bouncing packets around between CPUs.
> 
> Do you mean indicating rx packets to the same cpu that tx (for the same
> session) came from, or something else?

Just freeing TX packets on the same CPU as they were submitted.
This way the skb will always stay in the per CPU slab cache.

Should be straight forward: you hash the CPU number to the 8 transmit
queues in dev_queue_xmit, then give each queue an own TX MSI and set the 
irq affinity of its interrupt to its CPUs. 

If you have more than 8 CPUs there will be still some bouncing,
but e.g. on a NUMA system you could keep it at least node local
or near in the machine topology (2.6 has the necessary topology
information in the kernel for this now; just distance information
has to be still gotten from ACPI) 

Should be all possible to do from the driver without stack changes.

Using it usefully in RX is probably much harder and would need stack changes.

-Andi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 11:50                 ` Andi Kleen
@ 2005-02-21 13:28                   ` Thomas Graf
  2005-02-21 14:03                     ` jamal
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Graf @ 2005-02-21 13:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Leonid Grossman, 'rick jones', netdev,
	'Alex Aizman'

* Andi Kleen <20050221115006.GB87576@muc.de> 2005-02-21 12:50
> The main problem is that someone has to go through all the protocol layers
> and make sure they can process lists. Also it needs careful handling
> in netfilter.

Special handling is also required for the ingress qdisc. The classifiers
or tc_classify() must be changed to iterate over the lists and unlink
the skbs if one of them is to be dropped. The mirred action must probably
split the list after cloning the skbs. Given the lists are session based
all session based qdiscs could benefit from this and enqueue/dequeue the
lists rather than single skbs. Other qdiscs would have to split the lists
but could return a new list upon dequeue.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-20 22:43           ` Leonid Grossman
  2005-02-20 23:07             ` Andi Kleen
@ 2005-02-21 13:44             ` jamal
  2005-02-21 16:52               ` Leonid Grossman
  1 sibling, 1 reply; 74+ messages in thread
From: jamal @ 2005-02-21 13:44 UTC (permalink / raw)
  To: Leonid Grossman
  Cc: 'Andi Kleen', 'rick jones', netdev,
	'Alex Aizman'

On Sun, 2005-02-20 at 17:43, Leonid Grossman wrote:
> Andi,
> This is an interesting idea, we'll play around...
> 
> BTW - does anybody know if it's possible to indicate multiple receive
> packets?
> 
> In other OS drivers we have an option to indicate a "packet train" that got
> received during an interrupt, 

Explain what you mean by "packet train". If you mean a burst of random
packets that just happened to arrive back to back, then it is not
useful. NAPI already does a sufficently great job at processing these by
pulling them off the DMA ring and processing to completion.
OTOH, if you mean a train of packets terminating at exactly the same
location i.e related to the same flow, then this would be useful.
I suspect the later would require extreme intelligence that may not
be suited for your NIC. 
This leaves you with only nicecities of spliting a packet into
header/data which may allow for nice gather-scatter techniques. Remains
to be seen (in other words, proof is in the pudding, make it -
experiment and show that it tastes good).

cheers,
jamal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-19  3:44 Intel and TOE in the news Jeff Garzik
  2005-02-19  4:10 ` Lennert Buytenhek
@ 2005-02-21 13:59 ` P
  2005-02-21 14:10   ` jamal
  2005-02-21 22:44 ` Stephen Hemminger
  2 siblings, 1 reply; 74+ messages in thread
From: P @ 2005-02-21 13:59 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Netdev

Jeff Garzik wrote:
> 
> http://www.theregister.co.uk/2005/02/18/intel_tcpip_attack/

I wonder is this ETA in conjunction with multicore CPUs?
http://www.hoti.org/archive/Hoti11_program/papers/hoti11_11_regnier_g.pdf

-- 
Pádraig Brady - http://www.pixelbeat.org
--

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-20 16:46       ` Eugene Surovegin
@ 2005-02-21 14:01         ` jamal
  0 siblings, 0 replies; 74+ messages in thread
From: jamal @ 2005-02-21 14:01 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: Andi Kleen, David S. Miller, jgarzik, netdev

On Sun, 2005-02-20 at 11:46, Eugene Surovegin wrote:
> On Sat, Feb 19, 2005 at 09:27:31PM +0100, Andi Kleen wrote:
> > <speculating freely>
> > 
> > It would be nice if the NIC could asynchronously trigger prefetches in
> > the CPU. Currently a lot of the packet processing cost goes
> > to waiting for read cache misses.
> 
> Just FYI :), Freescale 85xx TSECs can prefetch buffers into L2 cache. 
> IIRC they call it buffer "stashing" and gianfar driver has a config 
> option to enable such behavior.
> 
> But in embedded world you usually don't see flashy PR releases for 
> such features :).

yes ;-> Big bad Intel is now going to do this, the rest of the world
better listen.
I have a modified sb1250 driver that i converted to NAPI that does this
(in kernel driver doesnt do either NAPI or stash packets into cache).
Reminds me i have to find that driver and submit. 
In any case, the problem could be x86; about every MIPS board i have
seen (I heard PMC-sierra as well) can do this. To be fair to Big Bad
Intel, they may have made a PR or two ;->

cheers,
jamal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 13:28                   ` Thomas Graf
@ 2005-02-21 14:03                     ` jamal
  2005-02-21 14:17                       ` Thomas Graf
  0 siblings, 1 reply; 74+ messages in thread
From: jamal @ 2005-02-21 14:03 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Andi Kleen, Leonid Grossman, 'rick jones', netdev,
	'Alex Aizman'


Everything in the stack would have to be re-written not just that one
piece. 
The question is: Is it worth it? My experimentation shows, only in a few
speacilized cases.

cheers,
jamal

On Mon, 2005-02-21 at 08:28, Thomas Graf wrote:
> * Andi Kleen <20050221115006.GB87576@muc.de> 2005-02-21 12:50
> > The main problem is that someone has to go through all the protocol layers
> > and make sure they can process lists. Also it needs careful handling
> > in netfilter.
> 
> Special handling is also required for the ingress qdisc. The classifiers
> or tc_classify() must be changed to iterate over the lists and unlink
> the skbs if one of them is to be dropped. The mirred action must probably
> split the list after cloning the skbs. Given the lists are session based
> all session based qdiscs could benefit from this and enqueue/dequeue the
> lists rather than single skbs. Other qdiscs would have to split the lists
> but could return a new list upon dequeue.
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 13:59 ` P
@ 2005-02-21 14:10   ` jamal
  0 siblings, 0 replies; 74+ messages in thread
From: jamal @ 2005-02-21 14:10 UTC (permalink / raw)
  To: P; +Cc: Jeff Garzik, Netdev

On Mon, 2005-02-21 at 08:59, P@draigBrady.com wrote:
> Jeff Garzik wrote:
> > 
> > http://www.theregister.co.uk/2005/02/18/intel_tcpip_attack/
> 
> I wonder is this ETA in conjunction with multicore CPUs?
> http://www.hoti.org/archive/Hoti11_program/papers/hoti11_11_regnier_g.pdf

I have a feeling ETA is one of the things behind theregister announce.
If you ask my opinion, its not a very smart idea. Now they have to write
a speacial stack just to sit in this one XEON processor they have while
the other(s) run Linux. They will not get 3x what Linux can today for a
simple app as L3 packet forwarding for example. And when it gets to
actually providing the features linux provides, they will be either at
the same speed or even worse depending on where they started. 
Broadcom btw is preaching the same angle.

What intel needs to do is kill its chipset division ;->

cheers,
jamal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 14:03                     ` jamal
@ 2005-02-21 14:17                       ` Thomas Graf
  2005-02-21 14:31                         ` jamal
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Graf @ 2005-02-21 14:17 UTC (permalink / raw)
  To: jamal
  Cc: Andi Kleen, Leonid Grossman, 'rick jones', netdev,
	'Alex Aizman'

* jamal <1108994621.1089.158.camel@jzny.localdomain> 2005-02-21 09:03
> 
> Everything in the stack would have to be re-written not just that one
> piece. 
> The question is: Is it worth it? My experimentation shows, only in a few
> speacilized cases.

Assuming we can deliver a chain of skbs to enqueue (session based or not),
the locking times should decrease. I'm not sure whether it's worth to
rewrite the whole stack (I wouldn't have any use for it) or just establish
a fast path. We could for example allow the ingress qdisc to redirect
packets directly to a egress qdisc and have "dynamic" fast forwarding.
We can add an action looking up the route and rewriting the packet as
needed and enqueue it to the right qdisc immediately. The redirect action
is already on that way, now if we can reduce the locking overhead a bit
more the fast forwarding routing t-put should increase.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 14:17                       ` Thomas Graf
@ 2005-02-21 14:31                         ` jamal
  2005-02-21 15:34                           ` Thomas Graf
  2005-02-21 15:38                           ` Robert Olsson
  0 siblings, 2 replies; 74+ messages in thread
From: jamal @ 2005-02-21 14:31 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Andi Kleen, Leonid Grossman, 'rick jones', netdev,
	'Alex Aizman'


I have done the experiments and even posted the patches on netdev last
year to batch on enqueue. 
Despite Andis assertion that theres value in amortizing the locks, the
benefits are highly missing on a generic level unfortunately.
Locking overhead is like the 50th item on things you have to worry about
- so i wouldnt even start worrying about this. 
Infact performance goes down when you batch in some cases depending on
the hardware used. My investigation shows that if you have a fast CPU
and a fast bus, theres always only one packet in flight within the
stack. Batching by definition requires more than one packet.

cheers,
jamal

On Mon, 2005-02-21 at 09:17, Thomas Graf wrote:
> * jamal <1108994621.1089.158.camel@jzny.localdomain> 2005-02-21 09:03
> > 
> > Everything in the stack would have to be re-written not just that one
> > piece. 
> > The question is: Is it worth it? My experimentation shows, only in a few
> > speacilized cases.
> 
> Assuming we can deliver a chain of skbs to enqueue (session based or not),
> the locking times should decrease. I'm not sure whether it's worth to
> rewrite the whole stack (I wouldn't have any use for it) or just establish
> a fast path. We could for example allow the ingress qdisc to redirect
> packets directly to a egress qdisc and have "dynamic" fast forwarding.
> We can add an action looking up the route and rewriting the packet as
> needed and enqueue it to the right qdisc immediately. The redirect action
> is already on that way, now if we can reduce the locking overhead a bit
> more the fast forwarding routing t-put should increase.
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 14:31                         ` jamal
@ 2005-02-21 15:34                           ` Thomas Graf
  2005-02-21 15:48                             ` jamal
  2005-02-21 15:38                           ` Robert Olsson
  1 sibling, 1 reply; 74+ messages in thread
From: Thomas Graf @ 2005-02-21 15:34 UTC (permalink / raw)
  To: jamal
  Cc: Andi Kleen, Leonid Grossman, 'rick jones', netdev,
	'Alex Aizman'

* jamal <1108996313.1090.178.camel@jzny.localdomain> 2005-02-21 09:31
> 
> I have done the experiments and even posted the patches on netdev last
> year to batch on enqueue. 

Right, I slightly remember. I have a head like a sieve. Can you
remmeber the subject of the post? I can't find it in the archive.

> Despite Andis assertion that theres value in amortizing the locks, the
> benefits are highly missing on a generic level unfortunately.
> Locking overhead is like the 50th item on things you have to worry about
> - so i wouldnt even start worrying about this. 

OK.

> Infact performance goes down when you batch in some cases depending on
> the hardware used. My investigation shows that if you have a fast CPU
> and a fast bus, theres always only one packet in flight within the
> stack. Batching by definition requires more than one packet.

Make sense but we probably have multiple packets in the stack if qdiscs are
involved. What bottlenecks remain in that case? One is probably
transmission not being able to keep up because receiving is using too much
resources The transmitter dropping packets making the bottleneck
even worse. Can we reduce the dropping by pushing multiple skbs to the nic
by for example have dequeue return a batch of skbs?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 14:31                         ` jamal
  2005-02-21 15:34                           ` Thomas Graf
@ 2005-02-21 15:38                           ` Robert Olsson
  2005-02-21 15:50                             ` jamal
  1 sibling, 1 reply; 74+ messages in thread
From: Robert Olsson @ 2005-02-21 15:38 UTC (permalink / raw)
  To: hadi
  Cc: Thomas Graf, Andi Kleen, Leonid Grossman, 'rick jones',
	netdev, 'Alex Aizman'


jamal writes:

 > Infact performance goes down when you batch in some cases depending on
 > the hardware used. My investigation shows that if you have a fast CPU
 > and a fast bus, theres always only one packet in flight within the
 > stack. Batching by definition requires more than one packet.

 Hello!

 Yes when queue length/batch increases you're risking to load the L2 
 twice for the same skb. Which is the most expensive operation.... 
 Forwarding profiles show most functions where cache misses occur.

					--ro

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 15:34                           ` Thomas Graf
@ 2005-02-21 15:48                             ` jamal
  2005-02-21 16:40                               ` Thomas Graf
  0 siblings, 1 reply; 74+ messages in thread
From: jamal @ 2005-02-21 15:48 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Andi Kleen, Leonid Grossman, 'rick jones', netdev,
	'Alex Aizman'

On Mon, 2005-02-21 at 10:34, Thomas Graf wrote:
> * jamal <1108996313.1090.178.camel@jzny.localdomain> 2005-02-21 09:31
> > 
> > I have done the experiments and even posted the patches on netdev last
> > year to batch on enqueue. 
> 
> Right, I slightly remember. I have a head like a sieve. Can you
> remmeber the subject of the post? I can't find it in the archive.
> 

I cant remember - patches are on another machine; the last time i posted
was with some folks from .it who were trying to improve routing
perfomance. 

> > Infact performance goes down when you batch in some cases depending on
> > the hardware used. My investigation shows that if you have a fast CPU
> > and a fast bus, theres always only one packet in flight within the
> > stack. Batching by definition requires more than one packet.
> 
> Make sense but we probably have multiple packets in the stack if qdiscs are
> involved. 

No - thats the problem. Theres always no more than one packet i.e only
packet in flight except in the case of some hardware - which i pointed
out as problematic in my SUCON presentation.

Maybe i should write a paper about this - i spent christmas collecting a
lot of data using relayfs; 

cheers,
jamal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 15:38                           ` Robert Olsson
@ 2005-02-21 15:50                             ` jamal
  0 siblings, 0 replies; 74+ messages in thread
From: jamal @ 2005-02-21 15:50 UTC (permalink / raw)
  To: Robert Olsson
  Cc: Thomas Graf, Andi Kleen, Leonid Grossman, 'rick jones',
	netdev, 'Alex Aizman'

On Mon, 2005-02-21 at 10:38, Robert Olsson wrote:

>  Yes when queue length/batch increases you're risking to load the L2 
>  twice for the same skb. Which is the most expensive operation.... 

You are probably onto something here. I have to rerun with varying batch
sizes to see if this is true (although i did i think try with even 2 and
still saw performance degradation).

I will be revisiting these patches in about a month.

cheers,
jamal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 15:48                             ` jamal
@ 2005-02-21 16:40                               ` Thomas Graf
  2005-02-21 17:03                                 ` jamal
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Graf @ 2005-02-21 16:40 UTC (permalink / raw)
  To: jamal
  Cc: Andi Kleen, Leonid Grossman, 'rick jones', netdev,
	'Alex Aizman'

* jamal <1109000925.1076.3.camel@jzny.localdomain> 2005-02-21 10:48
> On Mon, 2005-02-21 at 10:34, Thomas Graf wrote:
> > Make sense but we probably have multiple packets in the stack if qdiscs are
> > involved. 
> 
> No - thats the problem. Theres always no more than one packet i.e only
> packet in flight except in the case of some hardware - which i pointed
> out as problematic in my SUCON presentation.

I read your slides again but I guess I'm missing the information you
provided in your speech. One of the disadvantages of organizing a
conference is that one can't listen to the speeches. ;->

> Maybe i should write a paper about this - i spent christmas collecting a
> lot of data using relayfs; 

Would be nice, or at least provide the data. I haven't done any data
collection execpt some basic data to check ematch performance depening
on the size of the evalation tree, i.e. the number of ematches attached
to a classifier to find out wehther further optimizations are needed.

So basically you're telling me that it doesn't make a difference for a
full qdisc to dequeue in single steps or in batches?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-21 13:44             ` jamal
@ 2005-02-21 16:52               ` Leonid Grossman
  2005-02-21 17:11                 ` jamal
  2005-02-22 17:27                 ` Intel and TOE in the news Andi Kleen
  0 siblings, 2 replies; 74+ messages in thread
From: Leonid Grossman @ 2005-02-21 16:52 UTC (permalink / raw)
  To: hadi
  Cc: 'Andi Kleen', 'rick jones', netdev,
	'Alex Aizman'

 

> -----Original Message-----
> From: netdev-bounce@oss.sgi.com 
> [mailto:netdev-bounce@oss.sgi.com] On Behalf Of jamal
> Sent: Monday, February 21, 2005 5:45 AM
> To: Leonid Grossman
> Cc: 'Andi Kleen'; 'rick jones'; netdev@oss.sgi.com; 'Alex Aizman'
> Subject: RE: Intel and TOE in the news
> 

> > In other OS drivers we have an option to indicate a "packet train" 
> > that got received during an interrupt,
> 
> Explain what you mean by "packet train". If you mean a burst 
> of random packets that just happened to arrive back to back, 
> then it is not useful. NAPI already does a sufficently great 
> job at processing these by pulling them off the DMA ring and 
> processing to completion.

Yes, the question was with regards to the burst of random packets.
I agree that this may be not too useful and arguably doesn't warrant the
significant stack change. The reason I asked is that one of our engineers
was under impression (from reading Linux TCP/IP Stack book) that the feature
is already supported.

WRT to the burst of packets related to the same flow - we are hoping to be
able to collapse the burst into a single oversized frame and pass it to the
stack, this way no or very minimal changes to the stack will be needed. 
There is enough intelligence on the NIC to do that efficiently, we just need
to try and see how well this works.

Leonid 

> OTOH, if you mean a train of packets terminating at exactly 
> the same location i.e related to the same flow, then this 
> would be useful.
> I suspect the later would require extreme intelligence that 
> may not be suited for your NIC. 
> This leaves you with only nicecities of spliting a packet 
> into header/data which may allow for nice gather-scatter 
> techniques. Remains to be seen (in other words, proof is in 
> the pudding, make it - experiment and show that it tastes good).
> 
> cheers,
> jamal
> 
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 16:40                               ` Thomas Graf
@ 2005-02-21 17:03                                 ` jamal
  2005-02-21 20:12                                   ` patrick mcmanus
  2005-02-21 21:41                                   ` Thomas Graf
  0 siblings, 2 replies; 74+ messages in thread
From: jamal @ 2005-02-21 17:03 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Andi Kleen, Leonid Grossman, 'rick jones', netdev,
	'Alex Aizman'

On Mon, 2005-02-21 at 11:40, Thomas Graf wrote:
> * jamal <1109000925.1076.3.camel@jzny.localdomain> 2005-02-21 10:48
> > On Mon, 2005-02-21 at 10:34, Thomas Graf wrote:
> > > Make sense but we probably have multiple packets in the stack if qdiscs are
> > > involved. 
> > 
> > No - thats the problem. Theres always no more than one packet i.e only
> > packet in flight except in the case of some hardware - which i pointed
> > out as problematic in my SUCON presentation.
> 
> I read your slides again but I guess I'm missing the information you
> provided in your speech. One of the disadvantages of organizing a
> conference is that one can't listen to the speeches. ;->
> 

Most of this came after SUCON actually after the BSD folks claimed to be
doing 1Mpps forwarding (only to find out they used CSA later).
Look at the case where we have 32 bit bus in those slides; in that
scenario we do see an improvement with batching (but really for the
wrong reasons). The next piece of hardware we see infact a degradation
in perfomance.


> So basically you're telling me that it doesn't make a difference for a
> full qdisc to dequeue in single steps or in batches?

I said in some cases it does make sense (as in the case of 32 bit bus
above) in some it doesnt (as in the case of  most xeon boards, fast CPU,
fast PCI-X). 
Any solution should be where things work all the time or most of the
time and are not very hardware specific. If we can hit 95% of the cases,
then it would make sense to do submit such the patch. Otherwise
it was a good exercise for me, but useless in general. I do plan to work
on trying out some heuristics like discovering a few things (such as CPU
speed, runtime congestion etc) before activating the batching code.
Wont have time for at least 1 month.

cheers,
jamal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-21 16:52               ` Leonid Grossman
@ 2005-02-21 17:11                 ` jamal
  2005-02-21 18:02                   ` Leonid Grossman
  2005-03-14 20:22                   ` [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters Alex Aizman
  2005-02-22 17:27                 ` Intel and TOE in the news Andi Kleen
  1 sibling, 2 replies; 74+ messages in thread
From: jamal @ 2005-02-21 17:11 UTC (permalink / raw)
  To: Leonid Grossman
  Cc: 'Andi Kleen', 'rick jones', netdev,
	'Alex Aizman'

On Mon, 2005-02-21 at 11:52, Leonid Grossman wrote:
>  
> WRT to the burst of packets related to the same flow - we are hoping to be
> able to collapse the burst into a single oversized frame and pass it to the
> stack, this way no or very minimal changes to the stack will be needed. 
> There is enough intelligence on the NIC to do that efficiently, we just need
> to try and see how well this works.

Indeed, would be nice to see what you come up with. I think there may be
value in sending one huge chunk packet which itself is actually a
collection of several independet packets when you have a huge amount of
small packets. The benefit being you amortize the cost of DMA setup.
But then you may need to be able to break them down in the driver; is
this what you are talking about?

cheers,
jamal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-21 17:11                 ` jamal
@ 2005-02-21 18:02                   ` Leonid Grossman
  2005-02-22 18:02                     ` Stephen Hemminger
  2005-03-14 20:22                   ` [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters Alex Aizman
  1 sibling, 1 reply; 74+ messages in thread
From: Leonid Grossman @ 2005-02-21 18:02 UTC (permalink / raw)
  To: hadi, 'Leonid Grossman'
  Cc: 'Andi Kleen', 'rick jones', netdev,
	'Alex Aizman'

 

> -----Original Message-----
> From: jamal [mailto:hadi@cyberus.ca] 
> Sent: Monday, February 21, 2005 9:11 AM
> To: Leonid Grossman
> Cc: 'Andi Kleen'; 'rick jones'; netdev@oss.sgi.com; 'Alex Aizman'
> Subject: RE: Intel and TOE in the news
> 
> On Mon, 2005-02-21 at 11:52, Leonid Grossman wrote:
> >  
> > WRT to the burst of packets related to the same flow - we 
> are hoping 
> > to be able to collapse the burst into a single oversized frame and 
> > pass it to the stack, this way no or very minimal changes 
> to the stack will be needed.
> > There is enough intelligence on the NIC to do that efficiently, we 
> > just need to try and see how well this works.
> 
> Indeed, would be nice to see what you come up with. I think 
> there may be value in sending one huge chunk packet which 
> itself is actually a collection of several independet packets 
> when you have a huge amount of small packets. The benefit 
> being you amortize the cost of DMA setup.
> But then you may need to be able to break them down in the 
> driver; is this what you are talking about?

Pretty much. Actually the ASIC separates the headers so the driver doesn't
need to break packets down.
The driver just chains the payload (for several packets from a same-flow
burst) and builds the header for this single oversized frame. Kind of
inversed TSO, but on receive side. 
We are going to post an experimental driver code in 2-3 weeks, along with
this "Large Receive Offload" algorithm, for review.

Cheers, Leonid

> 
> cheers,
> jamal
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-21  2:37                 ` Jeff Garzik
@ 2005-02-21 19:34                   ` Alex Aizman
  2005-02-21 20:34                     ` Jeff Garzik
  0 siblings, 1 reply; 74+ messages in thread
From: Alex Aizman @ 2005-02-21 19:34 UTC (permalink / raw)
  To: 'Jeff Garzik', 'Andi Kleen',
	'Leonid Grossman', 'rick jones', netdev

> > Alternative: wait until Xframe II adapter w/MSI-X..
> 
> How does that help with MSI?

Does not help with MSI, helps to scale.

Btw, there's one alternative to MSI/MSI-X idea that in theory should
help to scale with CPUs. In a month or so I might get time to try it
out.

> The infiniband Linux driver is already using multi-MSI.  You 
> are behind the times :)

That's just great. Where, which kernel?

2.6.11-rc4 MSI-HOWTO still says "Due to the non-contiguous fashion in
vector assignment of the existing Linux kernel, this version does not
support multiple messages regardless of a device function is capable of
supporting more than one vector." 

2.6.11-rc4 MTHCA driver still does request_irq() just once for MSI
(note: MSI, not MSI-X). 

> Despite Andis assertion that theres value in amortizing the 
> locks, the benefits are highly missing on a generic level 
> unfortunately. Locking overhead is like the 50th item on 
> things you have to worry about
> - so i wouldnt even start worrying about this. 

That's probably true. Not 50th, 5th but still.

>  Yes when queue length/batch increases you're risking to load the L2 
>  twice for the same skb. Which is the most expensive operation.... 
>  Forwarding profiles show most functions where cache misses occur.

I wonder if alloc_skb_from_cache() will help to relieve the pressure on
memory at multi-Gbps receives.

Alex

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 17:03                                 ` jamal
@ 2005-02-21 20:12                                   ` patrick mcmanus
  2005-02-21 21:12                                     ` jamal
  2005-02-21 21:41                                   ` Thomas Graf
  1 sibling, 1 reply; 74+ messages in thread
From: patrick mcmanus @ 2005-02-21 20:12 UTC (permalink / raw)
  To: hadi; +Cc: netdev@oss.sgi.com

On Mon, 2005-02-21 at 12:03, jamal wrote:
> after the BSD folks claimed to be
> doing 1Mpps forwarding (only to find out they used CSA later).
                           ^^^^^^^^^^^^^^^^^^^^^^^^

I can't grok this. Can you elucidate?

Thanks,
Pat

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 19:34                   ` Alex Aizman
@ 2005-02-21 20:34                     ` Jeff Garzik
  2005-02-22  0:50                       ` Alex Aizman
  0 siblings, 1 reply; 74+ messages in thread
From: Jeff Garzik @ 2005-02-21 20:34 UTC (permalink / raw)
  To: Alex Aizman
  Cc: 'Andi Kleen', 'Leonid Grossman',
	'rick jones', netdev

Alex Aizman wrote:
> 2.6.11-rc4 MTHCA driver still does request_irq() just once for MSI
> (note: MSI, not MSI-X). 


I doubt that will change.

request_irq() is passed a "cookie" that enables your interrupt handler. 
  This cookie can be associated with more than one interrupt vector.

	Jeff

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 20:12                                   ` patrick mcmanus
@ 2005-02-21 21:12                                     ` jamal
  2005-03-06 11:21                                       ` Harald Welte
  0 siblings, 1 reply; 74+ messages in thread
From: jamal @ 2005-02-21 21:12 UTC (permalink / raw)
  To: patrick mcmanus; +Cc: netdev@oss.sgi.com

On Mon, 2005-02-21 at 15:12, patrick mcmanus wrote:
> On Mon, 2005-02-21 at 12:03, jamal wrote:
> > after the BSD folks claimed to be
> > doing 1Mpps forwarding (only to find out they used CSA later).
>                            ^^^^^^^^^^^^^^^^^^^^^^^^
> 
> I can't grok this. Can you elucidate?
> 

Credit should go to Harald Welte for pointing this out to me a while
back.
CSA is essentially a direct link to the Northbridge System Controller
(or MCH); there are some speacilized motherboards that have this
feature. So imagine a e1000 with no PCI-X bridge directly connected to
the MCH. The funny thing is it was supposed to be a cost-reduction
concept (no more PCI-X brifge). Refer to my earlier comment about
intel and its chipset division.

cheers,
jamal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 17:03                                 ` jamal
  2005-02-21 20:12                                   ` patrick mcmanus
@ 2005-02-21 21:41                                   ` Thomas Graf
  1 sibling, 0 replies; 74+ messages in thread
From: Thomas Graf @ 2005-02-21 21:41 UTC (permalink / raw)
  To: jamal
  Cc: Andi Kleen, Leonid Grossman, 'rick jones', netdev,
	'Alex Aizman'

* jamal <1109005385.1074.68.camel@jzny.localdomain> 2005-02-21 12:03
> Most of this came after SUCON actually after the BSD folks claimed to be
> doing 1Mpps forwarding (only to find out they used CSA later).

Got it, found your patches.

> Look at the case where we have 32 bit bus in those slides; in that
> scenario we do see an improvement with batching (but really for the
> wrong reasons). The next piece of hardware we see infact a degradation
> in perfomance.

I see, thanks for the elaboration.

> I said in some cases it does make sense (as in the case of 32 bit bus
> above) in some it doesnt (as in the case of  most xeon boards, fast CPU,
> fast PCI-X). 

Yes of course, I left the 32bit bus out, it shouldn't be used as a
data point for optimizations since everyone interested in such
optimizations is unlikely to use such hardware.

> Any solution should be where things work all the time or most of the
> time and are not very hardware specific. If we can hit 95% of the cases,
> then it would make sense to do submit such the patch. Otherwise
> it was a good exercise for me, but useless in general. I do plan to work
> on trying out some heuristics like discovering a few things (such as CPU
> speed, runtime congestion etc) before activating the batching code.
> Wont have time for at least 1 month.

Agreed. Thanks.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-19  3:44 Intel and TOE in the news Jeff Garzik
  2005-02-19  4:10 ` Lennert Buytenhek
  2005-02-21 13:59 ` P
@ 2005-02-21 22:44 ` Stephen Hemminger
  2 siblings, 0 replies; 74+ messages in thread
From: Stephen Hemminger @ 2005-02-21 22:44 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Netdev

What Tcp Offload Engine's already exist and are shipping on Linux?
How do they interface and work?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-21 20:34                     ` Jeff Garzik
@ 2005-02-22  0:50                       ` Alex Aizman
  0 siblings, 0 replies; 74+ messages in thread
From: Alex Aizman @ 2005-02-22  0:50 UTC (permalink / raw)
  To: 'Jeff Garzik'
  Cc: 'Andi Kleen', 'Leonid Grossman',
	'rick jones', netdev

"Cookie" or no "cookie", demux in the OS or in the driver, does not matter
(well, it matters just a little bit :). The important think is to run MSI
handler on two (for starters) CPUs. Will check on it this week with 2.6.11.

Alex

> -----Original Message-----
> From: Jeff Garzik [mailto:jgarzik@pobox.com] 
> Sent: Monday, February 21, 2005 12:35 PM
> To: Alex Aizman
> Cc: 'Andi Kleen'; 'Leonid Grossman'; 'rick jones'; netdev@oss.sgi.com
> Subject: Re: Intel and TOE in the news
> 
> Alex Aizman wrote:
> > 2.6.11-rc4 MTHCA driver still does request_irq() just once for MSI
> > (note: MSI, not MSI-X). 
> 
> 
> I doubt that will change.
> 
> request_irq() is passed a "cookie" that enables your 
> interrupt handler. 
>   This cookie can be associated with more than one interrupt vector.
> 
> 	Jeff
> 
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 16:52               ` Leonid Grossman
  2005-02-21 17:11                 ` jamal
@ 2005-02-22 17:27                 ` Andi Kleen
  1 sibling, 0 replies; 74+ messages in thread
From: Andi Kleen @ 2005-02-22 17:27 UTC (permalink / raw)
  To: Leonid Grossman; +Cc: hadi, 'rick jones', netdev, 'Alex Aizman'

On Mon, Feb 21, 2005 at 08:52:13AM -0800, Leonid Grossman wrote:
> Yes, the question was with regards to the burst of random packets.
> I agree that this may be not too useful and arguably doesn't warrant the
> significant stack change. The reason I asked is that one of our engineers
> was under impression (from reading Linux TCP/IP Stack book) that the feature
> is already supported.

There is some support to process list of skbs, but it's only used
for IP fragments belonging to the same IP packet; and is also
only valid for some small parts of the stack between IP and transport
layer.

The only IP protocols that support it are UDP and RAW. 
TCP doesn't support it; or rather it would always try to reassemble
them first. 

In short it's a hack to avoid one copy of data for NFS over
fragmented UDP.

I guess he thought it was a more general facility or the author
of the book didn't make it clear enough that it was a quite special
case hack. 

> WRT to the burst of packets related to the same flow - we are hoping to be
> able to collapse the burst into a single oversized frame and pass it to the
> stack, this way no or very minimal changes to the stack will be needed. 
> There is enough intelligence on the NIC to do that efficiently, we just need
> to try and see how well this works.

Hopefully you don't need too many cache misses to figure this out though.

-Andi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 18:02                   ` Leonid Grossman
@ 2005-02-22 18:02                     ` Stephen Hemminger
  2005-02-22 18:07                       ` Andi Kleen
  0 siblings, 1 reply; 74+ messages in thread
From: Stephen Hemminger @ 2005-02-22 18:02 UTC (permalink / raw)
  To: Leonid Grossman
  Cc: hadi, 'Leonid Grossman', 'Andi Kleen',
	'rick jones', netdev, 'Alex Aizman'

On Mon, 21 Feb 2005 10:02:47 -0800
"Leonid Grossman" <leonid.grossman@neterion.com> wrote:

>  
> 
> > -----Original Message-----
> > From: jamal [mailto:hadi@cyberus.ca] 
> > Sent: Monday, February 21, 2005 9:11 AM
> > To: Leonid Grossman
> > Cc: 'Andi Kleen'; 'rick jones'; netdev@oss.sgi.com; 'Alex Aizman'
> > Subject: RE: Intel and TOE in the news
> > 
> > On Mon, 2005-02-21 at 11:52, Leonid Grossman wrote:
> > >  
> > > WRT to the burst of packets related to the same flow - we 
> > are hoping 
> > > to be able to collapse the burst into a single oversized frame and 
> > > pass it to the stack, this way no or very minimal changes 
> > to the stack will be needed.
> > > There is enough intelligence on the NIC to do that efficiently, we 
> > > just need to try and see how well this works.
> > 
> > Indeed, would be nice to see what you come up with. I think 
> > there may be value in sending one huge chunk packet which 
> > itself is actually a collection of several independet packets 
> > when you have a huge amount of small packets. The benefit 
> > being you amortize the cost of DMA setup.
> > But then you may need to be able to break them down in the 
> > driver; is this what you are talking about?
> 
> Pretty much. Actually the ASIC separates the headers so the driver doesn't
> need to break packets down.
> The driver just chains the payload (for several packets from a same-flow
> burst) and builds the header for this single oversized frame. Kind of
> inversed TSO, but on receive side. 
> We are going to post an experimental driver code in 2-3 weeks, along with
> this "Large Receive Offload" algorithm, for review.

Be careful, this may have the same problems that original TSO code did.
Make sure and force the PUSH flag on these jumbo receives or the TCP
"every other segment" ACK logic will be busted.  There are other parts of TCP
that depend on packet count as well and this inverse TSO logic will break them.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-22 18:02                     ` Stephen Hemminger
@ 2005-02-22 18:07                       ` Andi Kleen
  2005-02-22 20:51                         ` Leonid Grossman
  0 siblings, 1 reply; 74+ messages in thread
From: Andi Kleen @ 2005-02-22 18:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Leonid Grossman, hadi, 'rick jones', netdev,
	'Alex Aizman'

> Be careful, this may have the same problems that original TSO code did.
> Make sure and force the PUSH flag on these jumbo receives or the TCP
> "every other segment" ACK logic will be busted.  There are other parts of TCP

> that depend on packet count as well and this inverse TSO logic will break them.

Linux TCP RX path didn't care about PSH last time I checked. It should
make no difference how PSH is set on RX or if it is set at all. At least
unless Leonid wants to run his driver on Darwin too, but that would be 
someone else's problem.

-Andi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-22 18:07                       ` Andi Kleen
@ 2005-02-22 20:51                         ` Leonid Grossman
  2005-02-22 21:20                           ` Rick Jones
  2005-02-22 21:43                           ` Andi Kleen
  0 siblings, 2 replies; 74+ messages in thread
From: Leonid Grossman @ 2005-02-22 20:51 UTC (permalink / raw)
  To: 'Andi Kleen', 'Stephen Hemminger'
  Cc: 'Leonid Grossman', hadi, 'rick jones', netdev,
	'Alex Aizman'




> -----Original Message-----
> From: Andi Kleen [mailto:ak@muc.de] 
> Sent: Tuesday, February 22, 2005 10:07 AM
> To: Stephen Hemminger
> Cc: Leonid Grossman; hadi@cyberus.ca; 'rick jones'; 
> netdev@oss.sgi.com; 'Alex Aizman'
> Subject: Re: Intel and TOE in the news
> 
> > Be careful, this may have the same problems that original 
> TSO code did.
> > Make sure and force the PUSH flag on these jumbo receives 
> or the TCP 
> > "every other segment" ACK logic will be busted.  There are 
> other parts of TCP
> > that depend on packet count as well and this inverse TSO 
> logic will break them.


TSO case is probably different - TSO hardware just sets PSH on the last
fragment only (and only if the flag was set on the original large tx
packet). Receiver doesn't really know if the sender is TSO-capable or not
and will ACK the same way - will it not?
Anyway, with LRO we do change rx packet count so affecting parts of TCP that
depend on packet count is indeed a concern; I guess we'll find out soon
enough whether there are real issues with the approach :-)

Leonid

> 
> Linux TCP RX path didn't care about PSH last time I checked. 
> It should make no difference how PSH is set on RX or if it is 
> set at all. At least unless Leonid wants to run his driver on 
> Darwin too, but that would be someone else's problem.
> 
> -Andi
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-22 20:51                         ` Leonid Grossman
@ 2005-02-22 21:20                           ` Rick Jones
  2005-02-22 21:30                             ` Leonid Grossman
  2005-02-22 21:43                           ` Andi Kleen
  1 sibling, 1 reply; 74+ messages in thread
From: Rick Jones @ 2005-02-22 21:20 UTC (permalink / raw)
  To: netdev

I always thought the biggest issue with "RSO" would be deciding how long to wait 
for another segment to paste-in with the rest?

rick jones

all the previous talk about header/data split makes me wax nostalgic for the 
FDDI adaptors of the late 80's and early 90's that did that - back when MTU's 
were >= page size and life was good :) :) :)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-22 21:20                           ` Rick Jones
@ 2005-02-22 21:30                             ` Leonid Grossman
  2005-02-22 21:42                               ` Rick Jones
  0 siblings, 1 reply; 74+ messages in thread
From: Leonid Grossman @ 2005-02-22 21:30 UTC (permalink / raw)
  To: 'Rick Jones', netdev

No wait, "paste in" only packets that are already received when a traffic
interrupt comes...
At 10GbE, the entire window will likely be sitting there :-)

Leonid 

> -----Original Message-----
> From: netdev-bounce@oss.sgi.com 
> [mailto:netdev-bounce@oss.sgi.com] On Behalf Of Rick Jones
> Sent: Tuesday, February 22, 2005 1:20 PM
> To: netdev@oss.sgi.com
> Subject: Re: Intel and TOE in the news
> 
> I always thought the biggest issue with "RSO" would be 
> deciding how long to wait for another segment to paste-in 
> with the rest?
> 
> rick jones
> 
> all the previous talk about header/data split makes me wax 
> nostalgic for the FDDI adaptors of the late 80's and early 
> 90's that did that - back when MTU's were >= page size and 
> life was good :) :) :)
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-22 21:30                             ` Leonid Grossman
@ 2005-02-22 21:42                               ` Rick Jones
  2005-02-22 22:10                                 ` Leonid Grossman
  0 siblings, 1 reply; 74+ messages in thread
From: Rick Jones @ 2005-02-22 21:42 UTC (permalink / raw)
  To: netdev

Leonid Grossman wrote:
> No wait, "paste in" only packets that are already received when a traffic
> interrupt comes...

Ah, so it is ass-u-me-ing that there is little or no packet loss?

How about when the driver is polling instead of taking interrupts?  Will the 
driver "see" the RSO segment-in-progress or will it be hidden?

rick jones

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-22 20:51                         ` Leonid Grossman
  2005-02-22 21:20                           ` Rick Jones
@ 2005-02-22 21:43                           ` Andi Kleen
  2005-02-22 22:17                             ` Leonid Grossman
  1 sibling, 1 reply; 74+ messages in thread
From: Andi Kleen @ 2005-02-22 21:43 UTC (permalink / raw)
  To: Leonid Grossman
  Cc: 'Stephen Hemminger', hadi, 'rick jones', netdev,
	'Alex Aizman'

> TSO case is probably different - TSO hardware just sets PSH on the last
> fragment only (and only if the flag was set on the original large tx
> packet). Receiver doesn't really know if the sender is TSO-capable or not
> and will ACK the same way - will it not?
> Anyway, with LRO we do change rx packet count so affecting parts of TCP that
> depend on packet count is indeed a concern; I guess we'll find out soon
> enough whether there are real issues with the approach :-)

Linux doesn't depend on packet count; it keeps an estimate called rcv_mss
about the biggest seen packet size and acks every 2*rcv_mss worth of data.

Your scheme would likely result in acking every two of your large packets
as soon as the connection is out of "quickack" mode. So there would
be on wire differences. 

> > Linux TCP RX path didn't care about PSH last time I checked. 

Actually it does for measuring the rcv_mss for very small MTUs. 
Shouldn't matter in practice though.

-Andi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-22 21:42                               ` Rick Jones
@ 2005-02-22 22:10                                 ` Leonid Grossman
  0 siblings, 0 replies; 74+ messages in thread
From: Leonid Grossman @ 2005-02-22 22:10 UTC (permalink / raw)
  To: 'Rick Jones', netdev

 

> -----Original Message-----
> From: netdev-bounce@oss.sgi.com 
> [mailto:netdev-bounce@oss.sgi.com] On Behalf Of Rick Jones
> Sent: Tuesday, February 22, 2005 1:42 PM
> To: netdev@oss.sgi.com
> Subject: Re: Intel and TOE in the news
> 
> Leonid Grossman wrote:
> > No wait, "paste in" only packets that are already received when a 
> > traffic interrupt comes...
> 
> Ah, so it is ass-u-me-ing that there is little or no packet loss?

This is a reasonable assumption in a datacenter, isn't it :-)?
At any rate, no wait still - if the driver doesn't find the next sequence
(no matter if the packet is lost or it is just not received yet at the time
the driver is looking at the receive buffer) it will indicate the
accumulated RSO right away.
 
> 
> How about when the driver is polling instead of taking 
> interrupts?  Will the driver "see" the RSO 
> segment-in-progress or will it be hidden?

It doesn't matter what triggered the driver to wake up and look at the
receive buffer.
It will still see in real time the packets that have been received - no
matter if the interrupts for these packets arrived or not.

Leonid

> 
> rick jones
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-22 21:43                           ` Andi Kleen
@ 2005-02-22 22:17                             ` Leonid Grossman
  2005-02-22 22:42                               ` Andi Kleen
  0 siblings, 1 reply; 74+ messages in thread
From: Leonid Grossman @ 2005-02-22 22:17 UTC (permalink / raw)
  To: 'Andi Kleen', 'Leonid Grossman'
  Cc: 'Stephen Hemminger', hadi, 'rick jones', netdev,
	'Alex Aizman'

 

> -----Original Message-----
> From: Andi Kleen [mailto:ak@muc.de] 
> Sent: Tuesday, February 22, 2005 1:44 PM
> To: Leonid Grossman
> Cc: 'Stephen Hemminger'; hadi@cyberus.ca; 'rick jones'; 
> netdev@oss.sgi.com; 'Alex Aizman'
> Subject: Re: Intel and TOE in the news
> 
> > TSO case is probably different - TSO hardware just sets PSH on the 
> > last fragment only (and only if the flag was set on the 
> original large 
> > tx packet). Receiver doesn't really know if the sender is 
> TSO-capable 
> > or not and will ACK the same way - will it not?
> > Anyway, with LRO we do change rx packet count so affecting parts of 
> > TCP that depend on packet count is indeed a concern; I guess we'll 
> > find out soon enough whether there are real issues with the 
> approach 
> > :-)
> 
> Linux doesn't depend on packet count; it keeps an estimate 
> called rcv_mss about the biggest seen packet size and acks 
> every 2*rcv_mss worth of data.
> 
> Your scheme would likely result in acking every two of your 
> large packets as soon as the connection is out of "quickack" 
> mode. So there would be on wire differences. 

Sounds good, thanks! We should be OK then.
Leonid
> 
> > > Linux TCP RX path didn't care about PSH last time I checked. 
> 
> Actually it does for measuring the rcv_mss for very small MTUs. 
> Shouldn't matter in practice though.
> 
> -Andi
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-22 22:17                             ` Leonid Grossman
@ 2005-02-22 22:42                               ` Andi Kleen
  2005-02-22 22:51                                 ` Leonid Grossman
  0 siblings, 1 reply; 74+ messages in thread
From: Andi Kleen @ 2005-02-22 22:42 UTC (permalink / raw)
  To: Leonid Grossman
  Cc: 'Stephen Hemminger', hadi, 'rick jones', netdev,
	'Alex Aizman'

On Tue, Feb 22, 2005 at 02:17:35PM -0800, Leonid Grossman wrote:
>  
> 
> > -----Original Message-----
> > From: Andi Kleen [mailto:ak@muc.de] 
> > Sent: Tuesday, February 22, 2005 1:44 PM
> > To: Leonid Grossman
> > Cc: 'Stephen Hemminger'; hadi@cyberus.ca; 'rick jones'; 
> > netdev@oss.sgi.com; 'Alex Aizman'
> > Subject: Re: Intel and TOE in the news
> > 
> > > TSO case is probably different - TSO hardware just sets PSH on the 
> > > last fragment only (and only if the flag was set on the 
> > original large 
> > > tx packet). Receiver doesn't really know if the sender is 
> > TSO-capable 
> > > or not and will ACK the same way - will it not?
> > > Anyway, with LRO we do change rx packet count so affecting parts of 
> > > TCP that depend on packet count is indeed a concern; I guess we'll 
> > > find out soon enough whether there are real issues with the 
> > approach 
> > > :-)
> > 
> > Linux doesn't depend on packet count; it keeps an estimate 
> > called rcv_mss about the biggest seen packet size and acks 
> > every 2*rcv_mss worth of data.
> > 
> > Your scheme would likely result in acking every two of your 
> > large packets as soon as the connection is out of "quickack" 
> > mode. So there would be on wire differences. 
> 
> Sounds good, thanks! We should be OK then.

Are you sure? It definitely wouldn't look like a conventional
TCP ack clock on the wire, but you would get an ack only every 
(BIGPACKETSIZE/MSS) * 2 packets. 

Quite a stretch ACK.

I'm not saying it wouldn't work (and maybe Rick et.al. are right
and Acking overhead is the next TCP frontier), but it would be definitely
quite different. It needs careful testing on how it behaves with 
packet loss.

-Andi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-02-22 22:42                               ` Andi Kleen
@ 2005-02-22 22:51                                 ` Leonid Grossman
  0 siblings, 0 replies; 74+ messages in thread
From: Leonid Grossman @ 2005-02-22 22:51 UTC (permalink / raw)
  To: 'Andi Kleen', 'Leonid Grossman'
  Cc: 'Stephen Hemminger', hadi, 'rick jones', netdev,
	'Alex Aizman'

 

> -----Original Message-----
> From: Andi Kleen [mailto:ak@muc.de] 
> Sent: Tuesday, February 22, 2005 2:43 PM
> To: Leonid Grossman
> Cc: 'Stephen Hemminger'; hadi@cyberus.ca; 'rick jones'; 
> netdev@oss.sgi.com; 'Alex Aizman'
> Subject: Re: Intel and TOE in the news
> 
> On Tue, Feb 22, 2005 at 02:17:35PM -0800, Leonid Grossman wrote:
> >  
> > 
> > > -----Original Message-----
> > > From: Andi Kleen [mailto:ak@muc.de]
> > > Sent: Tuesday, February 22, 2005 1:44 PM
> > > To: Leonid Grossman
> > > Cc: 'Stephen Hemminger'; hadi@cyberus.ca; 'rick jones'; 
> > > netdev@oss.sgi.com; 'Alex Aizman'
> > > Subject: Re: Intel and TOE in the news
> > > 
> > > > TSO case is probably different - TSO hardware just sets 
> PSH on the 
> > > > last fragment only (and only if the flag was set on the
> > > original large
> > > > tx packet). Receiver doesn't really know if the sender is
> > > TSO-capable
> > > > or not and will ACK the same way - will it not?
> > > > Anyway, with LRO we do change rx packet count so 
> affecting parts 
> > > > of TCP that depend on packet count is indeed a concern; I guess 
> > > > we'll find out soon enough whether there are real 
> issues with the
> > > approach
> > > > :-)
> > > 
> > > Linux doesn't depend on packet count; it keeps an estimate called 
> > > rcv_mss about the biggest seen packet size and acks every 
> 2*rcv_mss 
> > > worth of data.
> > > 
> > > Your scheme would likely result in acking every two of your large 
> > > packets as soon as the connection is out of "quickack"
> > > mode. So there would be on wire differences. 
> > 
> > Sounds good, thanks! We should be OK then.
> 
> Are you sure? It definitely wouldn't look like a conventional 
> TCP ack clock on the wire, but you would get an ack only every
> (BIGPACKETSIZE/MSS) * 2 packets. 
> 
> Quite a stretch ACK.
> 
> I'm not saying it wouldn't work (and maybe Rick et.al. are 
> right and Acking overhead is the next TCP frontier), but it 
> would be definitely quite different. It needs careful testing 
> on how it behaves with packet loss.

Sure, but I think the main RSO application will be in a datacenter; this
assumes very little packet loss.
I agree that corner cases will need to be tested very carefully, and there
may be scenarious when the feature will not work well and may need to be
turned off. 

In general, I don't expect RSO benefits to be significant (just because rx
processing for 1500 MTU is still one of the biggest bottlenecks for 10GbE,
so every bit will help) but certanly not as big or as transparent as TSO
benefits.

Leonid

> 
> -Andi
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-19  4:10 ` Lennert Buytenhek
  2005-02-19 19:46   ` David S. Miller
@ 2005-03-02 13:48   ` Lennert Buytenhek
  2005-03-02 17:34     ` Leonid Grossman
  1 sibling, 1 reply; 74+ messages in thread
From: Lennert Buytenhek @ 2005-03-02 13:48 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Netdev

On Sat, Feb 19, 2005 at 05:10:07AM +0100, Lennert Buytenhek wrote:

> > Intel plans to sidestep the need for separate TOE cards by building this 
> > technology into its server processor package - the chip itself, chipset 
> > and network controller. This should reduce some of the time a processor 
> > typically spends waiting for memory to feed back information and improve 
> > overall application processing speeds.
> 
> I wonder if they could just take the network processing circuitry from
> the IXP2800 (an extra 16-core (!) RISCy processor on-die, dedicated to
> doing just network stuff, and a 10gbps pipe going straight into the CPU
> itself) and graft it onto the Xeon.

It indeed appears to be something like the IXP2000.

	http://www.intel.com/technology/ioacceleration/index.htm

Quote from ServerNetworkIOAccel.pdf (which is otherwise content-free):

	Lightweight Threading

	[...] Rather than providing multiple hardware contexts in a
	processor like Hyper-Threading (HT) Technology from Intel, a
	single hardware context contains the network stack with
	multiple software-controlled threads.  When a packet
	thread triggers a memory event a scheduler within the network
	stack selects an alternate packet thread and loads the CPU
	execution pipeline. Porcessing continues in the shadow of a
	memory access. [...] Stall conditions, triggered by requests
	to slow memory devices, are nearly eliminated.

They can also DMA packet headers straight into L1/L2 ('Direct Cache
Access', innovation!), just like other products have been able to do
for ages now.

Not much other details up yet.


--L

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: Intel and TOE in the news
  2005-03-02 13:48   ` Lennert Buytenhek
@ 2005-03-02 17:34     ` Leonid Grossman
  0 siblings, 0 replies; 74+ messages in thread
From: Leonid Grossman @ 2005-03-02 17:34 UTC (permalink / raw)
  To: 'Lennert Buytenhek', 'Jeff Garzik'; +Cc: 'Netdev'

 

> -----Original Message-----
> From: netdev-bounce@oss.sgi.com 
> [mailto:netdev-bounce@oss.sgi.com] On Behalf Of Lennert Buytenhek
> Sent: Wednesday, March 02, 2005 5:48 AM
> To: Jeff Garzik
> Cc: Netdev
> Subject: Re: Intel and TOE in the news
> 
> On Sat, Feb 19, 2005 at 05:10:07AM +0100, Lennert Buytenhek wrote:
> 
> > > Intel plans to sidestep the need for separate TOE cards 
> by building 
> > > this technology into its server processor package - the 
> chip itself, 
> > > chipset and network controller. This should reduce some 
> of the time 
> > > a processor typically spends waiting for memory to feed back 
> > > information and improve overall application processing speeds.
> > 
> > I wonder if they could just take the network processing 
> circuitry from 
> > the IXP2800 (an extra 16-core (!) RISCy processor on-die, 
> dedicated to 
> > doing just network stuff, and a 10gbps pipe going straight into the 
> > CPU
> > itself) and graft it onto the Xeon.
> 
> It indeed appears to be something like the IXP2000.
> 
> 	http://www.intel.com/technology/ioacceleration/index.htm
> 
> Quote from ServerNetworkIOAccel.pdf (which is otherwise content-free):
> 
> 	Lightweight Threading
> 
> 	[...] Rather than providing multiple hardware contexts in a
> 	processor like Hyper-Threading (HT) Technology from Intel, a
> 	single hardware context contains the network stack with
> 	multiple software-controlled threads.  When a packet
> 	thread triggers a memory event a scheduler within the network
> 	stack selects an alternate packet thread and loads the CPU
> 	execution pipeline. Porcessing continues in the shadow of a
> 	memory access. [...] Stall conditions, triggered by requests
> 	to slow memory devices, are nearly eliminated.
> 
> They can also DMA packet headers straight into L1/L2 ('Direct 
> Cache Access', innovation!), just like other products have 
> been able to do for ages now.
> 
> Not much other details up yet.

It was a good presentation, I suspect some/most of you guys may be able to
get it through your company attendees. At any rate, don't worry - details
will probably come out soon enough since kernel support should be a "must
have" for the entire concept to work :-) 

On the NIC side, I suspect we will not see much in I/O AT GbE comparing to
what we are already shipping as 10GbE Xframe ASIC features (header
separation for potential prefetching, stateless/state aware offloads, etc.)
- the feat would be to make these assists a de-facto standard (so both NIC
vendors and kernel developers have motivation to support'em) and fully
utilize them by integrating with the rest of the hw/OS in the system; 
I'm actually very happy to see Intel pushing this ... 

Leonid 
  

> 
> 
> --L
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: Intel and TOE in the news
  2005-02-21 21:12                                     ` jamal
@ 2005-03-06 11:21                                       ` Harald Welte
  0 siblings, 0 replies; 74+ messages in thread
From: Harald Welte @ 2005-03-06 11:21 UTC (permalink / raw)
  To: jamal; +Cc: patrick mcmanus, netdev@oss.sgi.com

[-- Attachment #1: Type: text/plain, Size: 958 bytes --]

Sorry for my late catch-up, didn't have time to read all mailinglists
for some time.

On Mon, Feb 21, 2005 at 04:12:42PM -0500, jamal wrote:
> CSA is essentially a direct link to the Northbridge System Controller
> (or MCH); there are some speacilized motherboards that have this
> feature. 

It's actually becoming more and more common.  But there are only MCH's
with a single CSA port, so you will never benefit from it if you need
more than one port.

Like everything intel seems to do (i've read about everything I could
find about their recent press releases), they concentrate on the end
host case, not packet forwarding :(

-- 
- Harald Welte <laforge@gnumonks.org>          	        http://gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-02-21 17:11                 ` jamal
  2005-02-21 18:02                   ` Leonid Grossman
@ 2005-03-14 20:22                   ` Alex Aizman
  2005-03-14 20:38                     ` David S. Miller
  2005-03-15  5:14                     ` Scott Feldman
  1 sibling, 2 replies; 74+ messages in thread
From: Alex Aizman @ 2005-03-14 20:22 UTC (permalink / raw)
  To: netdev; +Cc: leonid, 'Jeff Garzik'

This is to announce "xge", an experimental 10GbE driver for Neterion, Inc
(formerly S2io, Inc) family of adapters.

Motivation
==========
The production ("s2io") driver for Xframe-I adapter was accepted in the
kernel starting 2.6.5, and is currently a part of the 2.6 kernel.

Why to re-write an accepted and well-performing production quality driver? 

As Neterion 10GbE ASICs evolve and add more stateless and state-aware TCP
assists, we felt that a forward-looking re-design is warranted. 

Going forward this experimental driver will hopefully serve as a vehicle to
support ASIC features like Large Receive Offload and UDP LSO (based upon
header splitting, multiple logical Tx/Rx paths, MSI/MSI-X usage, etc.) -
both in the driver and in the kernel. Hopefully, some of these TCP assists
will eventually get supported in a de-facto standard way, like checksum
offload and TSO are supported today.

At present, the experimental driver is running on the shipping hardware as
well. Over the last five months it was tested to the same extend as the
"s2io" driver in the kernel, and in some scenarios offers performance
advantage.

When reviewed by the community, we would advocate that "xge" driver is
included in the kernel as an addition, and going forward as a replacement
for the "s2io" driver.

HAL-based
=========
Most Neterion drivers are HAL (Hardware Abstraction Layer) based. This is
always a curse and blessing; in our experience this was the latter by a big
margin. While the current "s2io" driver in the kernel doesn't share HAL code
with other driver, the "xge" driver is HAL-based. 

For these reviewers who consider this a minus, we hope you will find the HAL
code in full compliance with Linux guidelines (in fact, it was written by
our Linux team). Performance-wise, there was no negative impact discovered
either. Testing-wise, this HAL has undergone numerous stress, functional,
and performance tests "under" different drivers on a variety of platforms.

Since the HAL is a "common code", you will find a GPL license on Linux-only
files and a GPL license plus exception on the HAL code.

Going forward, we don't expect community developers to do extensive
modifications to the HAL files - but if it happens, we would appreciate the
changes to be released under the same "GPL plus exception" terms. You are
not required to do so of course, but keeping the default terms on the
hw-specific part of the code will help us to maintain consistency of the
code - for the common benefit.


What about maintenance, support and documentation?
==================================================
Neterion Linux team is committed to maintain the driver going forward;
please post any requests/suggestions on the list and to alex@neterion.com

For developers who intend to work with the hw-dependent code, we will make
available both HAL API guide and ASIC programming manual; please send
requests to leonid@neterion.com

If there is enough interest in modifying the hw-dependent code, please send
the request and we will consider making ASIC programming manual available as
well; please send request to leonid@neterion.com


What about 10GbE setup?
======================
If there will be enough interest in working with the driver; we will put in
place a back-to-back 10GbE setup for remote access, available 24 hrs a day
on a "first come, first served" self-scheduled basis; we realize that 10GbE
gear is still fairly expensive.


Driver Status
=============
The xge driver was done (code complete) by October 2004. Since then it has
accumulated enough mileage in terms of stress, performance, and functional
testing. The driver, although carrying the experimental status, is estimated
to be very close to production.

Kernels: the driver supports both 2.6 and 2.4 kernels.

Platforms: AMD Opteron (TM) (64bit and 32bit), Itanium(TM) based (including
SGI Altix), Xeon 32bit, and PPC(TM) based (in particular, pSeries) systems.

Features:

- Neterion Xframe-I and Xframe-II adapters (the latter adapters were just
recently announced by Neterion, end-user availability is still TBD);

- full checksum offload;

- jumbo frames (up to 9600 bytes);

- there is no limit on a number, size, and alignment of Tx fragments;

- broadcast, multicast, and promiscuous mode;

- TCP Send Offload (TSO);

- ethtool (fully);

- multiple Xframe I and/or Xframe II adapters present in the host;

- multiple receive and transmit rings;

- MSI and MSI-X (the latter Xframe-II only);

- L2, L3, and L4 header splitting;

- Receive Traffic Hashing (Xframe-II only);

- NAPI.

The driver also contains a number of optimization features.

Performance numbers for Xframe-I 10GbE adapter: 7.6 Gbps transmit and 7.6
Gbps receive (jumbo, 2.4GHz dual Opteron), limited by PCI-X 133 bus. Note
that Xframe-II adapter removes the PCI-X bottleneck.

For more information see the readme file: xge.txt.

TODO
=====

(1) Testing so far is 100% successful, but production-grade testing still
remains tbd.

(2) MSI. The driver was tested with a single MSI (see
Documentation/MSI-HOWTO.txt). Multiple MSIs remain tbd.

(3) Large Receive Offload. Unlike most of other features, the LRO retains
its experimental status. It needs to be debugged and fully tested. Note that
to a certain extent the "experimental" status  applies also to MSI and
Receive Traffic Hashing.

(4) NAPI performance. 1500 MTU receive performance is marginally better with
NAPI enabled. We expect to get more than "margin" out of NAPI.


Download
=========

Here's the FTP site to download a single gzipped patch (filename
xge-v.1.1.2816-2.6.11.patch.gz, cksum 3463616385). This patch will replace
the "s2io" driver with the new "xge" driver.

ftp: ns1.s2io.com
user: linuxsrc
password: opensource

Signed-off-by: Dmitry Yusupov <dima@neterion.com>
Signed-off-by: Raghavendra Koushik <raghavendra.koushik@neterion.com>
Signed-off-by: Alex Aizman <alex@neterion.com>


Regards,
Neterion's team.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-14 20:22                   ` [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters Alex Aizman
@ 2005-03-14 20:38                     ` David S. Miller
  2005-03-14 20:53                       ` Leonid Grossman
  2005-03-15  5:14                     ` Scott Feldman
  1 sibling, 1 reply; 74+ messages in thread
From: David S. Miller @ 2005-03-14 20:38 UTC (permalink / raw)
  To: alex; +Cc: netdev, leonid, jgarzik

On Mon, 14 Mar 2005 12:22:51 -0800
"Alex Aizman" <alex@neterion.com> wrote:

> For these reviewers who consider this a minus, we hope you will find the HAL
> code in full compliance with Linux guidelines (in fact, it was written by
> our Linux team). Performance-wise, there was no negative impact discovered
> either. Testing-wise, this HAL has undergone numerous stress, functional,
> and performance tests "under" different drivers on a variety of platforms.

So you wrote a non-HAL version of this driver and compared the
results?  Simply comparing against the existins s2io driver
does not count.

If you're simply comparing against s2io, and your driver is faster
than s2io is already, imagine how much faster it might be without
that HAL layer.

I totally reject this driver, HAL is unacceptable for in-tree drivers.
We've been over this a thousand times.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-14 20:38                     ` David S. Miller
@ 2005-03-14 20:53                       ` Leonid Grossman
  2005-03-14 23:27                         ` Andi Kleen
  0 siblings, 1 reply; 74+ messages in thread
From: Leonid Grossman @ 2005-03-14 20:53 UTC (permalink / raw)
  To: 'David S. Miller', alex; +Cc: netdev, leonid, jgarzik

Hi David,
S2io and Neterion is the same company, we just got our Company name changed
recently. 
So, the non-HAL version of this driver is in fact "s2io" driver in the
kernel - sorry we did not make it clear in the readme file.

I've seen in the past layered architectures for MAC drivers that resulted in
a performance hit, so this was a significant concern for me as well - for
10GbE product performance is an overwriting objective.

We compared performance results of the HAL-based driver vs. monolithic
approach on multiple platforms, and did not see any performance delta that
could be attributed to the approach itself. In fact, in some cases this
experimental driver faster - but I assume this is due to the lessons we
learned not the layered approach itself; performance-wise I see it as a
draw.

Do you have other objections to the submission? We'd like to see if these
could be addressed; going forward we see significant benefits both for
S2io/Neterion (and our customers) and for community to use this driver.

Regards, Leonid

-----Original Message-----
From: David S. Miller [mailto:davem@davemloft.net] 
Sent: Monday, March 14, 2005 12:38 PM
To: alex@neterion.com
Cc: netdev@oss.sgi.com; leonid@neterion.com; jgarzik@pobox.com
Subject: Re: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters

On Mon, 14 Mar 2005 12:22:51 -0800
"Alex Aizman" <alex@neterion.com> wrote:

> For these reviewers who consider this a minus, we hope you will find the
HAL
> code in full compliance with Linux guidelines (in fact, it was written by
> our Linux team). Performance-wise, there was no negative impact discovered
> either. Testing-wise, this HAL has undergone numerous stress, functional,
> and performance tests "under" different drivers on a variety of platforms.

So you wrote a non-HAL version of this driver and compared the
results?  Simply comparing against the existins s2io driver
does not count.

If you're simply comparing against s2io, and your driver is faster
than s2io is already, imagine how much faster it might be without
that HAL layer.

I totally reject this driver, HAL is unacceptable for in-tree drivers.
We've been over this a thousand times.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-14 20:53                       ` Leonid Grossman
@ 2005-03-14 23:27                         ` Andi Kleen
  2005-03-14 23:45                           ` Jeff Garzik
  2005-03-15  1:07                           ` Alex Aizman
  0 siblings, 2 replies; 74+ messages in thread
From: Andi Kleen @ 2005-03-14 23:27 UTC (permalink / raw)
  To: Leonid Grossman; +Cc: netdev, leonid, jgarzik, alex

"Leonid Grossman" <leonid.grossman@neterion.com> writes:
>
> Do you have other objections to the submission? We'd like to see if these
> could be addressed; going forward we see significant benefits both for
> S2io/Neterion (and our customers) and for community to use this driver.

I guess the main objection to the HAL comes not from performance
issues (Usually the only thing that really counts for performance
is data cache misses and the HAL is unlikely to affect this much), but
the coding style etc.. Indeed it does not look too Linux like. 

One thing that's frowned upon in Linux are lots of wrappers for
standard functions (like spin_lock etc.). I would recommend
to at least replace them with the standard Linux functions.
In principle this can be even done with some kind of preprocessor
Also less ifdefs would be nice.

An possible compromise might be to get rid of all the HAL parts that
wraps Linux functionality, and then only use a leaner low level
library to access the more difficult parts of the hardware.  This
would involve moving more code into the Linux specific layers. This
should be more low level code, nothing like the high level queue
handling functions you currently have etc., with the high level logic
all in Linux code

-Andi

P.S.: The patch would be much easier to read if it created new
files instead of changing the old ones. This makes sense since
the new driver will probably live next to the old one for some time.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-14 23:27                         ` Andi Kleen
@ 2005-03-14 23:45                           ` Jeff Garzik
  2005-03-15  0:32                             ` Leonid Grossman
  2005-03-15  1:07                           ` Alex Aizman
  1 sibling, 1 reply; 74+ messages in thread
From: Jeff Garzik @ 2005-03-14 23:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Leonid Grossman, netdev, leonid, alex, David S. Miller

Andi Kleen wrote:
> "Leonid Grossman" <leonid.grossman@neterion.com> writes:
> 
>>Do you have other objections to the submission? We'd like to see if these
>>could be addressed; going forward we see significant benefits both for
>>S2io/Neterion (and our customers) and for community to use this driver.
> 
> 
> I guess the main objection to the HAL comes not from performance
> issues (Usually the only thing that really counts for performance
> is data cache misses and the HAL is unlikely to affect this much), but
> the coding style etc.. Indeed it does not look too Linux like. 

Well, not coding style, but code analysis and maintenance issues.

HALs are generally type-opaque, breaking checker-style tools and sparse 
checks.

"it looks like Linux code" has implications on bug finding and fixing, 
and long term maintenance of the code.  You want to make it easy for 
someone to make the same change across N net drivers.

Because most ->hard_start_xmit() hooks were written in a similar 
fashion, it was easy and quick to deploy fixes for the skb_padto() 
security bug across many net drivers.

A lot of tiny costs that mostly wind up as noise:  additional branching 
/ derefs.

My biggest objection is that HALs increase the overall "cost" of 
maintaining a piece of code, and serve as a barrier against outside 
(non-primary-author) kernel hacker involvement.

Remember, this driver is going to be with us for -10- years or more.

	Jeff

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-14 23:45                           ` Jeff Garzik
@ 2005-03-15  0:32                             ` Leonid Grossman
  0 siblings, 0 replies; 74+ messages in thread
From: Leonid Grossman @ 2005-03-15  0:32 UTC (permalink / raw)
  To: 'Jeff Garzik', 'Andi Kleen'
  Cc: netdev, leonid, alex, 'David S. Miller'



> -----Original Message-----
> From: Jeff Garzik [mailto:jgarzik@pobox.com]
> Sent: Monday, March 14, 2005 3:46 PM
> To: Andi Kleen
> Cc: Leonid Grossman; netdev@oss.sgi.com; leonid@neterion.com;
> alex@neterion.com; David S. Miller
> Subject: Re: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE
> Adapters
> 
> Andi Kleen wrote:
> > "Leonid Grossman" <leonid.grossman@neterion.com> writes:
> >
> >>Do you have other objections to the submission? We'd like to see if
> these
> >>could be addressed; going forward we see significant benefits both for
> >>S2io/Neterion (and our customers) and for community to use this driver.
> >
> >
> > I guess the main objection to the HAL comes not from performance
> > issues (Usually the only thing that really counts for performance
> > is data cache misses and the HAL is unlikely to affect this much), but
> > the coding style etc.. Indeed it does not look too Linux like.
> 
> Well, not coding style, but code analysis and maintenance issues.
> 
> HALs are generally type-opaque, breaking checker-style tools and sparse
> checks.
> 
> "it looks like Linux code" has implications on bug finding and fixing,
> and long term maintenance of the code.  You want to make it easy for
> someone to make the same change across N net drivers.
> 
> Because most ->hard_start_xmit() hooks were written in a similar
> fashion, it was easy and quick to deploy fixes for the skb_padto()
> security bug across many net drivers.
> 
> A lot of tiny costs that mostly wind up as noise:  additional branching
> / derefs.
> 
> My biggest objection is that HALs increase the overall "cost" of
> maintaining a piece of code, and serve as a barrier against outside
> (non-primary-author) kernel hacker involvement.

There are several valid arguments against HAL-based MAC drivers, but the
maintenance cost is probably one of the main arguments in the "for" column.

My assumption is that the vast majority of the maintenance for the
hw-dependent code will be done by Neterion team - and for us the HAL-based
approach allows to fix a bug or to add a new feature in hw-dependent code
once and for all across all drivers. In our experience, third-party
contribution tends to come to the OS-specific part of the driver not to the
HAL - but if there is a meaningful contribution to the HAL from one of our
Unix driver, it will be useful to pick it up in a transparent fashion. 

Agreed that HAL makes code analysis and changes by the outside hacker
somewhat more difficult - but hopefully not by much, and again I expect most
of the outside contribution to be done above HAL level.
Also, we are trying to ease the pain by releasing HAL API and ASIC
Programming Manual documentation.

> 
> Remember, this driver is going to be with us for -10- years or more.

Yes, and this was actually a second reason for the submission. Present
"s2io" driver is fine for the current and next ASIC, beyond that we will
have to probably re-write it anyways. The new submission is 2+ years
"younger" and has a much better chance to require an incremental upgrade
only. Since 10GbE is just taking off and not a lot of people got familiar
with the code, we feel it's easier to take this hit earlier than later. 

To sum it up, I agree that HAL is a trade-off - but for this hardware it has
been working well for us in multi-platform environment; the hope is that you
will find a (smaller, perhaps) subset of these benefits useful in Linux
environment as well.

Leonid

> 
> 	Jeff
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-14 23:27                         ` Andi Kleen
  2005-03-14 23:45                           ` Jeff Garzik
@ 2005-03-15  1:07                           ` Alex Aizman
  2005-03-15  1:29                             ` Rick Jones
  2005-03-15 15:07                             ` Leonid Grossman
  1 sibling, 2 replies; 74+ messages in thread
From: Alex Aizman @ 2005-03-15  1:07 UTC (permalink / raw)
  To: 'Andi Kleen', 'Leonid Grossman'; +Cc: netdev, leonid, jgarzik

Andi Kleen writes: 

> I guess the main objection to the HAL comes not from 
> performance issues 

But the second or the third objection comes from there, I guess... As far as
the data path, HAL as a "layer" completely disappears. There is just a few
inline instructions that post descriptors and process completed descriptors.
These same instructions are unavoidable; they'd be present HAL or no-HAL.
There's no HAL locks on the data path (the locks are compiled out), no HAL
(as a "layer") induced overhead. Note that the performance was one
persistent "paranoia" from the very start of this project.

The numbers also tell the tale. We have 7.6Gbps jumbo throughput, the
bottleneck is PCI, not the host. We have 13us 1byte netpipe latency. Here's
for example today's netpipe run:

[root@localhost root]# ./nptcp -a -t -l 256 -u 98304 -i 256 -p 5100 -P - h
17.1.1.227
Latency: 0.000013
Now starting main loop
  0:       256 bytes    7 times -->  131.37 Mbps in 0.000015 sec
  1:       512 bytes   65 times -->  239.75 Mbps in 0.000016 sec
  2:       768 bytes 7701 times -->  181.37 Mbps in 0.000032 sec
  3:      1024 bytes 5168 times -->  212.35 Mbps in 0.000037 sec
  4:      1280 bytes 5102 times -->  209.95 Mbps in 0.000047 sec
  5:      1536 bytes 4303 times -->  211.65 Mbps in 0.000055 sec
  6:      1792 bytes 3765 times -->  238.44 Mbps in 0.000057 sec
  7:      2048 bytes 3739 times -->  267.33 Mbps in 0.000058 sec
  8:      2304 bytes 3744 times -->  297.43 Mbps in 0.000059 sec
  9:      2560 bytes 3761 times -->  319.77 Mbps in 0.000061 sec
 10:      2816 bytes 3685 times -->  349.80 Mbps in 0.000061 sec
 11:      3072 bytes 3701 times -->  344.98 Mbps in 0.000068 sec
 12:      3328 bytes 3374 times -->  372.86 Mbps in 0.000068 sec
 13:      3584 bytes 3389 times -->  400.46 Mbps in 0.000068 sec
...

> (Usually the only thing that really counts 
> for performance is data cache misses and the HAL is unlikely 
> to affect this much), but the coding style etc.. Indeed it 
> does not look too Linux like. 
> 
> One thing that's frowned upon in Linux are lots of wrappers 
> for standard functions (like spin_lock etc.). I would 
> recommend to at least replace them with the standard Linux functions.

There's always a tradeoff, a balancing act. The wrappers is a price to pay
for the reusable and extremely well tested code. Note also that a small
company like Neterion does not have the luxury *not* to re-use the code.. 

Having said that, there's couple ideas that can be worked in relatively
easily.

> In principle this can be even done with some kind of 
> preprocessor Also less ifdefs would be nice.

Exactly. Simple script that'll remove extra #ifdefs and wrappers.

> 
> An possible compromise might be to get rid of all the HAL 
> parts that wraps Linux functionality, and then only use a 
> leaner low level library to access the more difficult parts 
> of the hardware.  This would involve moving more code into 
> the Linux specific layers. This should be more low level 
> code, nothing like the high level queue handling functions 
> you currently have etc., with the high level logic all in Linux code

Well, I guess we can do a lot in that direction. Looking forward to get
specific review comments, etc. However, the main question remains: will the
HAL-based driver (because even after the script-produced "surgery" it'll
continue to be HAL based) ever get accepted?

> 
> -Andi
> 
> P.S.: The patch would be much easier to read if it created 
> new files instead of changing the old ones. 

Can be done, very easy. Obviously, only one of the driver has to be
selected..

> This makes sense 
> since the new driver will probably live next to the old one 
> for some time.
> 
> 

Alex

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-15  1:07                           ` Alex Aizman
@ 2005-03-15  1:29                             ` Rick Jones
  2005-03-15  2:28                               ` Leonid Grossman
  2005-03-15 15:07                             ` Leonid Grossman
  1 sibling, 1 reply; 74+ messages in thread
From: Rick Jones @ 2005-03-15  1:29 UTC (permalink / raw)
  To: netdev

Alex Aizman wrote:
> Andi Kleen writes: 
> 
> 
>>I guess the main objection to the HAL comes not from 
>>performance issues 
> 
> 
> But the second or the third objection comes from there, I guess... As far as
> the data path, HAL as a "layer" completely disappears. There is just a few
> inline instructions that post descriptors and process completed descriptors.
> These same instructions are unavoidable; they'd be present HAL or no-HAL.
> There's no HAL locks on the data path (the locks are compiled out), no HAL
> (as a "layer") induced overhead. Note that the performance was one
> persistent "paranoia" from the very start of this project.
> 
> The numbers also tell the tale. We have 7.6Gbps jumbo throughput, the
> bottleneck is PCI, not the host.

That would seem to suggest then comparing (using netperf terminology) service 
demands between HAL and no HAL.  JumboFrame can compensate for a host of ills :) 
I really do _not_ mean to imply there are any ills for which compensation is 
required, just suggesting to get folks into the habit of including CPU 
utilization.  And since we cannot count on JumboFrame being there end-to-end, 
performance with 1500 byte frames, while perhaps a bit unpleasant, is still 
important.

> We have 13us 1byte netpipe latency. 

So 76,000 transactions per second on something like single-byte netperf 
TCP_RR?!? Or am I mis-interpreting the netpipe latency figure?

I am of course biased, but netperf (compiled with -DUSE_PROCSTAT under Linux, 
somethign else for other OSes - feel free to contact me about it) tests along 
the lines of:

netperf -c -C -t TCP_STREAM -H <remote> -l <length> -i 10,3 -- -s 256K -S 256K 
-m 32K

and

netperf -c -C -t TCP_RR -H <remote> -l <length> -i 10,3

are generally useful.  If you have the same system type at each end, the -C can 
be dropped from the TCP_RR test since it _should_ be symmetric.  If -C dumps 
core on the TCP_STREAM test, drop it and add a TCP_MAERTS test to get receive 
service demand.

rick jones

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-15  1:29                             ` Rick Jones
@ 2005-03-15  2:28                               ` Leonid Grossman
  0 siblings, 0 replies; 74+ messages in thread
From: Leonid Grossman @ 2005-03-15  2:28 UTC (permalink / raw)
  To: 'Rick Jones', netdev



> -----Original Message-----
> From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On
> Behalf Of Rick Jones
> Sent: Monday, March 14, 2005 5:30 PM
> To: netdev@oss.sgi.com
> Subject: Re: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE
> Adapters
> 
> Alex Aizman wrote:
> > Andi Kleen writes:
> >
> >
> >>I guess the main objection to the HAL comes not from
> >>performance issues
> >
> >
> > But the second or the third objection comes from there, I guess... As
> far as
> > the data path, HAL as a "layer" completely disappears. There is just a
> few
> > inline instructions that post descriptors and process completed
> descriptors.
> > These same instructions are unavoidable; they'd be present HAL or no-
> HAL.
> > There's no HAL locks on the data path (the locks are compiled out), no
> HAL
> > (as a "layer") induced overhead. Note that the performance was one
> > persistent "paranoia" from the very start of this project.
> >
> > The numbers also tell the tale. We have 7.6Gbps jumbo throughput, the
> > bottleneck is PCI, not the host.
> 
> That would seem to suggest then comparing (using netperf terminology)
> service
> demands between HAL and no HAL.  JumboFrame can compensate for a host of
> ills :)
> I really do _not_ mean to imply there are any ills for which compensation
> is
> required, just suggesting to get folks into the habit of including CPU
> utilization.  And since we cannot count on JumboFrame being there end-to-
> end,
> performance with 1500 byte frames, while perhaps a bit unpleasant, is
> still
> important.

Hi Rick,
With Jumbo frames, the performance and %cpu for the two drivers are at par.
With 1500, HAL driver actually shows somewhat better throughput and %cpu -
though the approach has nothing to do with it, the important point is that
there is no performance drop that could be traced to the usage of HAL per
say
Leonid.

> 
> > We have 13us 1byte netpipe latency.
> 
> So 76,000 transactions per second on something like single-byte netperf
> TCP_RR?!? Or am I mis-interpreting the netpipe latency figure?
> 
> I am of course biased, but netperf (compiled with -DUSE_PROCSTAT under
> Linux,
> somethign else for other OSes - feel free to contact me about it) tests
> along
> the lines of:
> 
> netperf -c -C -t TCP_STREAM -H <remote> -l <length> -i 10,3 -- -s 256K -S
> 256K
> -m 32K
> 
> and
> 
> netperf -c -C -t TCP_RR -H <remote> -l <length> -i 10,3
> 
> are generally useful.  If you have the same system type at each end, the -
> C can
> be dropped from the TCP_RR test since it _should_ be symmetric.  If -C
> dumps
> core on the TCP_STREAM test, drop it and add a TCP_MAERTS test to get
> receive
> service demand.
> 
> rick jones

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-14 20:22                   ` [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters Alex Aizman
  2005-03-14 20:38                     ` David S. Miller
@ 2005-03-15  5:14                     ` Scott Feldman
  2005-03-15  5:59                       ` Matt Mackall
  2005-03-15  6:02                       ` Leonid Grossman
  1 sibling, 2 replies; 74+ messages in thread
From: Scott Feldman @ 2005-03-15  5:14 UTC (permalink / raw)
  To: <alex@neterion.com>
  Cc: <netdev@oss.sgi.com>, <leonid@neterion.com>,
	'Jeff Garzik'


On Mar 14, 2005, at 12:22 PM, Alex Aizman wrote:

> HAL-based
> =========
> Most Neterion drivers are HAL (Hardware Abstraction Layer) based. This 
> is
> always a curse and blessing; in our experience this was the latter by 
> a big
> margin. While the current "s2io" driver in the kernel doesn't share 
> HAL code
> with other driver, the "xge" driver is HAL-based.

e1000 and ixgb are HAL-based, which is why there is always push back 
when someone in the community modifies *_hw.[ch].  I'd hate to see more 
of this in the kernel, but I can definitely relate to the "testing 
across multiple OSes" gain.

Here's an (old?) idea: remember the NDIS-wrapper project?  I think the 
reverse is much more interesting.  A linux-wrapper takes a plain old 
Linux driver and wraps it with what ever is needed to make it an NDIS 
driver.  Or FreeBSD, or whatever.  Let's pretend this is trivial for a 
second.  What do we gain?  1) one clean Linux driver to maintain, 2) 
testability on other OSes, and 3) access to other OSes' certification 
kits.  Licensing is clean: the Linux driver is GPL and the 
linux-wrapper code is GPL.  Can't the world revolve around Linux and 
let everyone else be burdened with the abstraction layer overhead?

-scott

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-15  5:14                     ` Scott Feldman
@ 2005-03-15  5:59                       ` Matt Mackall
  2005-03-15  6:02                       ` Leonid Grossman
  1 sibling, 0 replies; 74+ messages in thread
From: Matt Mackall @ 2005-03-15  5:59 UTC (permalink / raw)
  To: Scott Feldman
  Cc: <alex@neterion.com>, <netdev@oss.sgi.com>,
	<leonid@neterion.com>, 'Jeff Garzik'

On Mon, Mar 14, 2005 at 09:14:59PM -0800, Scott Feldman wrote:
> 
> On Mar 14, 2005, at 12:22 PM, Alex Aizman wrote:
> 
> >HAL-based
> >=========
> >Most Neterion drivers are HAL (Hardware Abstraction Layer) based. This 
> >is
> >always a curse and blessing; in our experience this was the latter by 
> >a big
> >margin. While the current "s2io" driver in the kernel doesn't share 
> >HAL code
> >with other driver, the "xge" driver is HAL-based.
> 
> e1000 and ixgb are HAL-based, which is why there is always push back 
> when someone in the community modifies *_hw.[ch].  I'd hate to see more 
> of this in the kernel, but I can definitely relate to the "testing 
> across multiple OSes" gain.
> 
> Here's an (old?) idea: remember the NDIS-wrapper project?  I think the 
> reverse is much more interesting.  A linux-wrapper takes a plain old 
> Linux driver and wraps it with what ever is needed to make it an NDIS 
> driver.  Or FreeBSD, or whatever.  Let's pretend this is trivial for a 
> second.  What do we gain?  1) one clean Linux driver to maintain, 2) 
> testability on other OSes, and 3) access to other OSes' certification 
> kits.  Licensing is clean: the Linux driver is GPL and the 
> linux-wrapper code is GPL.  Can't the world revolve around Linux and 
> let everyone else be burdened with the abstraction layer overhead?

Depends. Vendors of non-GPL OSes can't ship such drivers for risk of
their product becoming a derived work to the extent they're relying on
such drivers to make their system useful.

You can't get around the GPL by putting a GPL wrapper with an
exception around something.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-15  5:14                     ` Scott Feldman
  2005-03-15  5:59                       ` Matt Mackall
@ 2005-03-15  6:02                       ` Leonid Grossman
  1 sibling, 0 replies; 74+ messages in thread
From: Leonid Grossman @ 2005-03-15  6:02 UTC (permalink / raw)
  To: 'Scott Feldman', alex; +Cc: netdev, leonid, 'Jeff Garzik'



> -----Original Message-----
> From: Scott Feldman [mailto:sfeldma@pobox.com]
> Sent: Monday, March 14, 2005 9:15 PM
> To: <alex@neterion.com>
> Cc: <netdev@oss.sgi.com>; <leonid@neterion.com>; 'Jeff Garzik'
> Subject: Re: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE
> Adapters
> 
> 
> On Mar 14, 2005, at 12:22 PM, Alex Aizman wrote:
> 
> > HAL-based
> > =========
> > Most Neterion drivers are HAL (Hardware Abstraction Layer) based. This
> > is
> > always a curse and blessing; in our experience this was the latter by
> > a big
> > margin. While the current "s2io" driver in the kernel doesn't share
> > HAL code
> > with other driver, the "xge" driver is HAL-based.
> 
> e1000 and ixgb are HAL-based, which is why there is always push back
> when someone in the community modifies *_hw.[ch].  I'd hate to see more
> of this in the kernel, but I can definitely relate to the "testing
> across multiple OSes" gain.

I'll be thrilled to see meaningful community changes to xge HAL files :-)

> 
> Here's an (old?) idea: remember the NDIS-wrapper project?  I think the
> reverse is much more interesting.  A linux-wrapper takes a plain old
> Linux driver and wraps it with what ever is needed to make it an NDIS
> driver.  Or FreeBSD, or whatever.  Let's pretend this is trivial for a
> second.  What do we gain?  1) one clean Linux driver to maintain, 2)
> testability on other OSes, and 3) access to other OSes' certification
> kits.  Licensing is clean: the Linux driver is GPL and the
> linux-wrapper code is GPL.  Can't the world revolve around Linux and
> let everyone else be burdened with the abstraction layer overhead?

Believe or not, something like this could (and has been) done but did not
stick. People did even spookier things, like a wrapper on top of another OS
binary driver - some of you guys may remember Novell NLM vs. Unixware driver
project :-)

These attempts were always short-lived, my guess is because the teams behind
the project did not have a long-term motivation and/or the expertise to get
it done and (more importantly) to maintain the framework, as OS driver
models evolved and diverged.

On the other hand, the HAL model is more alive than ever - Intel, Neterion
and many/most other modern NICs I'm aware of use HAL approach. 
This is actually much more common between non-Linux Oses, since there are
still significant (warranted or not) GPL-related concerns. 

HAL discussion is an old one indeed, and the truth is still in the eye of a
beholder :-)

People who are sorely responsible for supporting multiple OS drivers for the
same hardware (that is, NIC vendors) have strong bias in favor of HAL
approach, for all the right reasons (mainly, maintainability and support). 

People who are sorely responsible for supporting drivers for multiple NICs
on the same OS have strong bias against HAL approach - also for some pretty
valid reasons.

Linux is one (but not the only) case when the drivers are maintained by both
groups, and the biases/preferences collide...

Interestingly enough, as time goes by and everybody is getting thin on
resources, HAL approach tends to gain "marketshare" rather fast - at
present, Linux "s2io" is one of the very few Neterion drivers that is not
using HAL.


> 
> -scott

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-15  1:07                           ` Alex Aizman
  2005-03-15  1:29                             ` Rick Jones
@ 2005-03-15 15:07                             ` Leonid Grossman
  2005-03-15 15:55                               ` Leonid Grossman
  1 sibling, 1 reply; 74+ messages in thread
From: Leonid Grossman @ 2005-03-15 15:07 UTC (permalink / raw)
  To: alex, davem; +Cc: netdev, leonid, jgarzik, 'Andi Kleen'


> Alex Aizman writes:
> However, the main question remains: will the
> HAL-based driver (because even after the script-produced "surgery" it'll
> continue to be HAL based) ever get accepted?

Hi all, 
We truly appreciate the time spent on looking at the code and the feedback..

I guess Alex is asking the right question - before we start code changes, it
will be great to get a rough consensus on whether this HAL-based driver
(after suggested changes) will be acceptable to the community - or yet
another HAL driver in tree will be still "one too many"?

In particular - after this discussion, does David's statement below still
stand (not sure there was an unconditional rejection of the HAL model from
someone else)? 

>David Miller writes:
>I totally reject this driver, HAL is unacceptable for in-tree drivers.
>We've been over this a thousand times.

If this stands, we are prepared to recall the submission and keep the
current "Linux and everything else" status-quo for 10GbE Xframe drivers.
It's not the best maintenance option (both for us and arguably, even for a
non-primary-author kernel hackers) but it's workable.

Thanks again,
Leonid

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-15 15:07                             ` Leonid Grossman
@ 2005-03-15 15:55                               ` Leonid Grossman
  2005-03-19 20:15                                 ` Andi Kleen
  0 siblings, 1 reply; 74+ messages in thread
From: Leonid Grossman @ 2005-03-15 15:55 UTC (permalink / raw)
  To: 'Leonid Grossman', alex, davem
  Cc: netdev, leonid, jgarzik, 'Andi Kleen'


> 
> > Leonid Grossman writes:
> It's not the best maintenance option (both for us and arguably, even for a
> non-primary-author kernel hackers) but it's workable.

The statement about non-primary-author kernel hacker interests needs some
clarification, before people started to throw things at me. 
This is the last argument before I shut up, I promise :-)

Arguably, a hacker will be much more interested in changing the non-HAL part
of a driver. This part of the code has to look and feel the same as any
other Linux net driver. If it doesn't, then we've done a poor job and need
to go back.

In our experience, the vast majority of code fixes and new features in the
HAL code comes from our team (hey, we planted all these bugs to begin with
:-)), and to a much lesser extend from other OS developers, and only then
from Linux community. 

This is normal - for example, if Large Receive Offload that was discussed
earlier is to be done, I'd expect a hacker to focus on the kernel changes
and generic driver changes (to make a feature common to all net driver that
want it), and a Neterion developer to focus on the hw-specific changes. 

There will be nothing wrong if they trade places, but the first scenario
seems more likely.

So, for a hacker it's harder to understand HAL part of the code - but then
he is not doing the bulk of the maintenance of the ASIC -specific code that
he's arguably least interested in maintaining...
Leonid

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-15 15:55                               ` Leonid Grossman
@ 2005-03-19 20:15                                 ` Andi Kleen
  2005-03-19 22:19                                   ` Leonid Grossman
  0 siblings, 1 reply; 74+ messages in thread
From: Andi Kleen @ 2005-03-19 20:15 UTC (permalink / raw)
  To: Leonid Grossman; +Cc: netdev, leonid, jgarzik, 'Andi Kleen'

"Leonid Grossman" <leonid.grossman@neterion.com> writes:

>> 
>> > Leonid Grossman writes:
>> It's not the best maintenance option (both for us and arguably, even for a
>> non-primary-author kernel hackers) but it's workable.
>
> The statement about non-primary-author kernel hacker interests needs some
> clarification, before people started to throw things at me. 
> This is the last argument before I shut up, I promise :-)


First you wont like it, but Linux kernel maintainers usually dont give
firm promises to merge anything.  They just say "with that i wont merge it"
and even that is flexible sometimes. It is not like a company who
gives commitments or signs contracts, it is really the final patch that counts.

The best way to get code merged is to write it like the maintainers
wants it, but of course that does not happen always.

However when the code is clean and the biggest issues are fixed
and only some relatively small ones are left I think you have a good
chance to see your code merged.


>
> Arguably, a hacker will be much more interested in changing the non-HAL part
> of a driver. This part of the code has to look and feel the same as any
> other Linux net driver. If it doesn't, then we've done a poor job and need
> to go back.

There are currently some issues, mostly the "LAL" in it (Linux Adaption
Layer that wraps everything) and all the ugly function parameters (IN and 
__STATIC_* and the wrapped list functions ). That is what just poked in my 
eyes at the first look.

I personally dont care that much about the later stuff which
is more cosmetic, but a lot of other people strongly object to 
code that does not look like other Linux code so I would recommend to 
fix it. 


What I also did not like was that you had high level logic in these
HAL parts - like handling packet queues etc. Normally IMHO high level
logic should be directly in the functions that are directly called
from the kernel. This way it is easy to follow the main logic
of the driver if someone is familiar with linux network drivers
in general

LALs are strongly frowned at upon and I personally also dont like them
at all. AFAIK there is one two drivers in the kernel tree that have 
it and it is considered an historic accident by everybody I know  ,=)


> In our experience, the vast majority of code fixes and new features in the
> HAL code comes from our team (hey, we planted all these bugs to begin with
> :-)), and to a much lesser extend from other OS developers, and only then
> from Linux community. 

The problem is usually that when there is some bug in your driver
and someone not in your company wants to debug it because the bug
is causing them problems (that is one of the great advantages of free 
software that they can do it themselves), then they want relatively
clean code. The same happens for the maintainer who is hunting some
issue and needs to change your driver slightly for some infrastructure
change. 

So even though most changes come likely from you there is a need
for other people to understand and change your code.

-Andi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-19 20:15                                 ` Andi Kleen
@ 2005-03-19 22:19                                   ` Leonid Grossman
  2005-03-20 13:40                                     ` jamal
  0 siblings, 1 reply; 74+ messages in thread
From: Leonid Grossman @ 2005-03-19 22:19 UTC (permalink / raw)
  To: 'Andi Kleen'; +Cc: netdev, leonid, jgarzik


> First you wont like it, but Linux kernel maintainers usually dont give
> firm promises to merge anything.  They just say "with that i wont merge
> it"
> and even that is flexible sometimes. It is not like a company who
> gives commitments or signs contracts, it is really the final patch that
> counts.
> 
> The best way to get code merged is to write it like the maintainers
> wants it, but of course that does not happen always.
> 
> However when the code is clean and the biggest issues are fixed
> and only some relatively small ones are left I think you have a good
> chance to see your code merged.

Sure. I was not asking for a promise to merge the code, only for a rough
consensus that a HAL-based approach by itself is not a showstopper. 
Without such a consensus, it doesn't matter if we address other issues,
right?
 
> 
> >
> > Arguably, a hacker will be much more interested in changing the non-HAL
> part
> > of a driver. This part of the code has to look and feel the same as any
> > other Linux net driver. If it doesn't, then we've done a poor job and
> need
> > to go back.
> 
> There are currently some issues, mostly the "LAL" in it (Linux Adaption
> Layer that wraps everything) and all the ugly function parameters (IN and
> __STATIC_* and the wrapped list functions ). That is what just poked in my
> eyes at the first look.
> 
> I personally dont care that much about the later stuff which
> is more cosmetic, but a lot of other people strongly object to
> code that does not look like other Linux code so I would recommend to
> fix it.

Agreed; we are fixing this for the new submission.

> 
> 
> What I also did not like was that you had high level logic in these
> HAL parts - like handling packet queues etc. Normally IMHO high level
> logic should be directly in the functions that are directly called
> from the kernel. This way it is easy to follow the main logic
> of the driver if someone is familiar with linux network drivers
> in general

In general, I agree that low level logic (AKA HAL code) needs to be as small
as possible, and should not include a code that is typically common in Linux
(or any other) network driver. 
Unfortunately MAC driver models for different Operating Systems diverged so
much, the interface for the low level logic ends up higher than anyone would
like...
This is a valid comment though, we will see if there is room for
improvement.

Skip...


> The problem is usually that when there is some bug in your driver
> and someone not in your company wants to debug it because the bug
> is causing them problems (that is one of the great advantages of free
> software that they can do it themselves), then they want relatively
> clean code. The same happens for the maintainer who is hunting some
> issue and needs to change your driver slightly for some infrastructure
> change.
> 
> So even though most changes come likely from you there is a need
> for other people to understand and change your code.

I agree, to a point - if for a hacker or a maintainer this driver is way
hard to understand (comparing to it's "s2io" sibling), then is not a good
thing and has to be addressed as much as practical.

Hopefully follow-up submissions will get to the point when HAL code still
has some non-Linux "accent" but it's sufficiently clean for people to
understand it.
This will be a tradeoff for some, but hopefully a good one - I still think
the majority of low-level code fixes will come from our team (sometimes by
virtue of fixing a problem in different OS).

Thanks for the detailed feedback!
Leonid


> 
> -Andi

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-19 22:19                                   ` Leonid Grossman
@ 2005-03-20 13:40                                     ` jamal
  2005-03-20 20:13                                       ` Leonid Grossman
  0 siblings, 1 reply; 74+ messages in thread
From: jamal @ 2005-03-20 13:40 UTC (permalink / raw)
  To: Leonid Grossman; +Cc: 'Andi Kleen', netdev, leonid, jgarzik

On Sat, 2005-03-19 at 17:19, Leonid Grossman wrote:

> Sure. I was not asking for a promise to merge the code, only for a rough
> consensus that a HAL-based approach by itself is not a showstopper. 
> Without such a consensus, it doesn't matter if we address other issues,
> right?

Typically theres only one motivation for HALs - maintain the same code
base for 20 OSes. Makes it easier for a small company (not sure if yours
fits that category) to maintain. It doesnt matter how differently you
position or spin it (no offense intended), that is the main if not only
motivation. 

OTOH, you shift the burden of the extra maintainance work to the people
on the Linux side who know may have to understand what your layer does.
This is why people hate it.
I dont think it is sufficient to say your company will be updating the
driver:
- Years from now your company may not be around anymore but your
hardware may still be. Who is going to fix the bugs then?  
- APIs, mechanisms etc change on Linux as well. You become a bottleneck
if nobody understands your HAL because they have to wait for you to make
the changes.
Traditionaly whoever makes the changes on dirver schemes ensures all
drivers are updated. Think positively that this is now offloaded from
you.

My suggestion: have a dedicated resource just for Linux - it is big
enough to justify; maintain HAL for other OSes that will never change, I
bet you whatever works for NDIS probably will continue to work for
vxworks for the next 10 years - not much innovation going on there;-> or
you could claim (bah!) theres stability on those OSes.

cheers,
jamal

^ permalink raw reply	[flat|nested] 74+ messages in thread

* RE: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters
  2005-03-20 13:40                                     ` jamal
@ 2005-03-20 20:13                                       ` Leonid Grossman
  0 siblings, 0 replies; 74+ messages in thread
From: Leonid Grossman @ 2005-03-20 20:13 UTC (permalink / raw)
  To: hadi; +Cc: 'Andi Kleen', netdev, jgarzik

Hi Jamal,
You are right on target, it's all about finding the best driver support
model - for the end users, not for a Neterion developer or even for a Linux
hacker.
At the moment, "s2io" driver is supported in exactly the way you suggest;
the model works well but we think it's possible to do better - again, for
the end user sake.

If we could write a driver, put it in the kernel and let it "heal itself",
then I'd agree that the current model would have been the best. We'd have
been happy to stay with it, rather than to go through with a considerable
extra effort - preserving the status quo is always easier.  

I'd argue that the vast majority of a 10GbE NICs will be initially developed
by a smaller company like Neterion, shipped by large server OEMs and
supported (from end-user prospective) by these two entities, as well as by
Linux community and major Linux distributors.

In our experience, even early adopters of 10GbE adapters in Linux space
rarely attempt to fix hardware-dependent bugs themselves - they ask for
support from one of the three entities above (a NIC vendor, a server vendor
an a Linux distributor), with a lion share of urgent level 2 and level 3
issues ending up with a NIC vendor. As 10GbE adoption accelerates, I expect
this trend will accelerate as well - I wish we could shift the support
burden to someone else, but the chances of that to happen are getting
slimmer over time :-).

BTW this is precisely the reason the NIC vendors favor HAL model. Software
teams are about the same size, no matter whether it's a big or small company
- and no NIC vendor likes spending 30% of their resources on one (out of 10)
device drivers.

So, at least from what we see in our space, open source is a great thing but
it doesn't make driver support (at least for the low level NIC code)
fundamentally different from other Operating Systems, at least not for a
foreseeable future. 
IMHO, nor should it - ability to have source and fix hw-dependent bugs is
great, but I'd argue that much bigger value of the open source model is in
the (more strategic) ability to timely identify and fix existing networking
bottlenecks that are common to most/all NIC drivers. 

An example - at present, there are no Linux UDP TSO APIs.  We see a customer
need for it and we have the ASIC support as well; from an end user (and
about everybody else) prospective it would be great to add these APIs by
people who know Linux best.
Whether the hw-dependent part of the UDP TSO is done by a hacker or it is an
existing and working HAL code from a Neterion developer, is not that
important to the end user - though the latter will be a more likely and more
efficient way to go. 

BTW all arguments above make sense only if a high-level part of the driver
is very much "Linux-like" - we are obviously motivated to make the high
level code suitable to the changes by the community, since this is where the
value for the end user is.

We could talk about some other points that you brought up (for example, the
statement about other OSes not changing is applicable only to a small subset
of "legacy" OSes - I suspect the rest of them will be changing NIC APIs
rather fast and HAL is actually a good way to deal with the change across
the board), but as you pointed out the support model is the key here...

Cheers, Leonid



> -----Original Message-----
> From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On
> Behalf Of jamal
> Sent: Sunday, March 20, 2005 5:40 AM
> To: Leonid Grossman
> Cc: 'Andi Kleen'; netdev@oss.sgi.com; leonid@neterion.com;
> jgarzik@pobox.com
> Subject: RE: [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE
> Adapters
> 
> On Sat, 2005-03-19 at 17:19, Leonid Grossman wrote:
> 
> > Sure. I was not asking for a promise to merge the code, only for a rough
> > consensus that a HAL-based approach by itself is not a showstopper.
> > Without such a consensus, it doesn't matter if we address other issues,
> > right?
> 
> Typically theres only one motivation for HALs - maintain the same code
> base for 20 OSes. Makes it easier for a small company (not sure if yours
> fits that category) to maintain. It doesnt matter how differently you
> position or spin it (no offense intended), that is the main if not only
> motivation.
> 
> OTOH, you shift the burden of the extra maintainance work to the people
> on the Linux side who know may have to understand what your layer does.
> This is why people hate it.
> I dont think it is sufficient to say your company will be updating the
> driver:
> - Years from now your company may not be around anymore but your
> hardware may still be. Who is going to fix the bugs then?
> - APIs, mechanisms etc change on Linux as well. You become a bottleneck
> if nobody understands your HAL because they have to wait for you to make
> the changes.
> Traditionaly whoever makes the changes on dirver schemes ensures all
> drivers are updated. Think positively that this is now offloaded from
> you.
> 
> My suggestion: have a dedicated resource just for Linux - it is big
> enough to justify; maintain HAL for other OSes that will never change, I
> bet you whatever works for NDIS probably will continue to work for
> vxworks for the next 10 years - not much innovation going on there;-> or
> you could claim (bah!) theres stability on those OSes.
> 
> cheers,
> jamal
> 
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2005-03-20 20:13 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-19  3:44 Intel and TOE in the news Jeff Garzik
2005-02-19  4:10 ` Lennert Buytenhek
2005-02-19 19:46   ` David S. Miller
2005-02-19 20:27     ` Andi Kleen
2005-02-19 20:32       ` Lennert Buytenhek
2005-02-20 16:46       ` Eugene Surovegin
2005-02-21 14:01         ` jamal
2005-02-20 19:45       ` rick jones
2005-02-20 21:20         ` Michael Richardson
2005-02-20 21:29         ` Andi Kleen
2005-02-20 22:43           ` Leonid Grossman
2005-02-20 23:07             ` Andi Kleen
2005-02-21  1:57               ` Alex Aizman
2005-02-21  2:37                 ` Jeff Garzik
2005-02-21 19:34                   ` Alex Aizman
2005-02-21 20:34                     ` Jeff Garzik
2005-02-22  0:50                       ` Alex Aizman
2005-02-21 11:37                 ` Andi Kleen
2005-02-21  3:31               ` Leonid Grossman
2005-02-21 11:50                 ` Andi Kleen
2005-02-21 13:28                   ` Thomas Graf
2005-02-21 14:03                     ` jamal
2005-02-21 14:17                       ` Thomas Graf
2005-02-21 14:31                         ` jamal
2005-02-21 15:34                           ` Thomas Graf
2005-02-21 15:48                             ` jamal
2005-02-21 16:40                               ` Thomas Graf
2005-02-21 17:03                                 ` jamal
2005-02-21 20:12                                   ` patrick mcmanus
2005-02-21 21:12                                     ` jamal
2005-03-06 11:21                                       ` Harald Welte
2005-02-21 21:41                                   ` Thomas Graf
2005-02-21 15:38                           ` Robert Olsson
2005-02-21 15:50                             ` jamal
2005-02-21 13:44             ` jamal
2005-02-21 16:52               ` Leonid Grossman
2005-02-21 17:11                 ` jamal
2005-02-21 18:02                   ` Leonid Grossman
2005-02-22 18:02                     ` Stephen Hemminger
2005-02-22 18:07                       ` Andi Kleen
2005-02-22 20:51                         ` Leonid Grossman
2005-02-22 21:20                           ` Rick Jones
2005-02-22 21:30                             ` Leonid Grossman
2005-02-22 21:42                               ` Rick Jones
2005-02-22 22:10                                 ` Leonid Grossman
2005-02-22 21:43                           ` Andi Kleen
2005-02-22 22:17                             ` Leonid Grossman
2005-02-22 22:42                               ` Andi Kleen
2005-02-22 22:51                                 ` Leonid Grossman
2005-03-14 20:22                   ` [ANNOUNCE] Experimental Driver for Neterion/S2io 10GbE Adapters Alex Aizman
2005-03-14 20:38                     ` David S. Miller
2005-03-14 20:53                       ` Leonid Grossman
2005-03-14 23:27                         ` Andi Kleen
2005-03-14 23:45                           ` Jeff Garzik
2005-03-15  0:32                             ` Leonid Grossman
2005-03-15  1:07                           ` Alex Aizman
2005-03-15  1:29                             ` Rick Jones
2005-03-15  2:28                               ` Leonid Grossman
2005-03-15 15:07                             ` Leonid Grossman
2005-03-15 15:55                               ` Leonid Grossman
2005-03-19 20:15                                 ` Andi Kleen
2005-03-19 22:19                                   ` Leonid Grossman
2005-03-20 13:40                                     ` jamal
2005-03-20 20:13                                       ` Leonid Grossman
2005-03-15  5:14                     ` Scott Feldman
2005-03-15  5:59                       ` Matt Mackall
2005-03-15  6:02                       ` Leonid Grossman
2005-02-22 17:27                 ` Intel and TOE in the news Andi Kleen
2005-02-19 20:29     ` Lennert Buytenhek
2005-03-02 13:48   ` Lennert Buytenhek
2005-03-02 17:34     ` Leonid Grossman
2005-02-21 13:59 ` P
2005-02-21 14:10   ` jamal
2005-02-21 22:44 ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).