Re: Van Jacobson's net channels and real-time

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Van Jacobson's net channels and real-time
       [not found] <Pine.LNX.4.44L0.0604201819040.19330-100000@lifa01.phys.au.dk>
@ 2006-04-20 19:09 ` David S. Miller
  2006-04-21 16:52   ` Ingo Oeser
  2006-04-22 19:30   ` bert hubert
  0 siblings, 2 replies; 24+ messages in thread
From: David S. Miller @ 2006-04-20 19:09 UTC (permalink / raw)
  To: simlo; +Cc: linux-kernel, mingo, netdev

[ Maybe ask questions like this on "netdev" where the networking
  developers hang out?  Added to CC: ]

Van fell off the face of the planet after giving his presentation and
never published his code, only his slides.

I've started to make a slow attempt at implementing his ideas, nothing
but pure infrastructure so far, but you can look at what I have here:

	kernel.org:/pub/scm/linux/kernel/git/davem/vj-2.6.git

don't expect major progress and don't expect anything beyond a simple
channel to softint packet processing on receive any time soon.

Going all the way to the socket is a large endeavor and will require a
lot of restructuring to do it right, so expect this to take on the
order of months.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-20 19:09 ` Van Jacobson's net channels and real-time David S. Miller
@ 2006-04-21 16:52   ` Ingo Oeser
  2006-04-22 11:48     ` Jörn Engel
  2006-04-23  5:56     ` David S. Miller
  2006-04-22 19:30   ` bert hubert
  1 sibling, 2 replies; 24+ messages in thread
From: Ingo Oeser @ 2006-04-21 16:52 UTC (permalink / raw)
  To: David S. Miller; +Cc: simlo, linux-kernel, mingo, netdev, Ingo Oeser

Hi David,

nice to see you getting started with it.

I'm not sure about the queue logic there.

1867 /* Caller must have exclusive producer access to the netchannel. */
1868 int netchannel_enqueue(struct netchannel *np, struct netchannel_buftrailer *bp)
1869 {
1870 	unsigned long tail;
1871
1872 	tail = np->netchan_tail;
1873 	if (tail == np->netchan_head)
1874 		return -ENOMEM;

This looks wrong, since empty and full are the same condition in your
case.

1891 struct netchannel_buftrailer *__netchannel_dequeue(struct netchannel *np)
1892 {
1893 	unsigned long head = np->netchan_head;
1894 	struct netchannel_buftrailer *bp = np->netchan_queue[head];
1895
1896 	BUG_ON(np->netchan_tail == head);

See?

What about sth. like

struct netchannel {
   /* This is only read/written by the writer (producer) */
   unsigned long write_ptr;
  struct netchannel_buftrailer *netchan_queue[NET_CHANNEL_ENTRIES];

   /* This is modified by both */
  atomic_t filled_entries; /* cache_line_align this? */

   /* This is only read/written by the reader (consumer) */
   unsigned long read_ptr;
}

This would prevent this bug from the beginning and let us still use the
full queue size.

If cacheline bouncing because of the shared filled_entries becomes an issue,
you are receiving or sending a lot.

Then you can enqueue and dequeue multiple and commit the counts later.
To be done with a atomic_read, atomic_add and atomic_sub on filled_entries.

Maybe even cheaper with local_t instead of atomic_t later on.

But I guess the cacheline bouncing will be a non-issue, since the whole
point of netchannels was to keep traffic as local to a cpu as possible, right?

Would you like to see a sample patch relative to your tree, 
to show you what I mean?

Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-21 16:52   ` Ingo Oeser
@ 2006-04-22 11:48     ` Jörn Engel
  2006-04-22 13:29       ` Ingo Oeser
  2006-04-23  5:51       ` David S. Miller
  2006-04-23  5:56     ` David S. Miller
  1 sibling, 2 replies; 24+ messages in thread
From: Jörn Engel @ 2006-04-22 11:48 UTC (permalink / raw)
  To: Ingo Oeser
  Cc: David S. Miller, simlo, linux-kernel, mingo, netdev, Ingo Oeser

On Fri, 21 April 2006 18:52:47 +0200, Ingo Oeser wrote:
> What about sth. like
> 
> struct netchannel {
>    /* This is only read/written by the writer (producer) */
>    unsigned long write_ptr;
>   struct netchannel_buftrailer *netchan_queue[NET_CHANNEL_ENTRIES];
> 
>    /* This is modified by both */
>   atomic_t filled_entries; /* cache_line_align this? */
> 
>    /* This is only read/written by the reader (consumer) */
>    unsigned long read_ptr;
> }
> 
> This would prevent this bug from the beginning and let us still use the
> full queue size.
> 
> If cacheline bouncing because of the shared filled_entries becomes an issue,
> you are receiving or sending a lot.

Unless I completely misunderstand something, one of the main points of
the netchannels if to have *zero* fields written to by both producer
and consumer.  Receiving and sending a lot can be expected to be the
common case, so taking a performance hit in this case is hardly a good
idea.

I haven't looked at Davem's implementation at all, but Van simply
seperated fields in consumer-written and producer-written, with proper
alignment between them.  Some consumer-written fields are also read by
the producer and vice versa.  But none of this results in cacheline
pingpong.

If your description of the problem is correct, it should only mean
that the implementation has a problem, not the concept.

Jörn

-- 
Time? What's that? Time is only worth what you do with it.
-- Theo de Raadt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-22 11:48     ` Jörn Engel
@ 2006-04-22 13:29       ` Ingo Oeser
  2006-04-22 13:49         ` Jörn Engel
                           ` (2 more replies)
  2006-04-23  5:51       ` David S. Miller
  1 sibling, 3 replies; 24+ messages in thread
From: Ingo Oeser @ 2006-04-22 13:29 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev

Hi Jörn,

On Saturday, 22. April 2006 13:48, Jörn Engel wrote:
> Unless I completely misunderstand something, one of the main points of
> the netchannels if to have *zero* fields written to by both producer
> and consumer. 

Hmm, for me the main point was to keep the complete processing
of a single packet within one CPU/Core where this is a non-issue.

> Receiving and sending a lot can be expected to be the 
> common case, so taking a performance hit in this case is hardly a good
> idea.

There is no hit. If you receive/send in bursts you can simply aggregate
them until a certain queueing threshold. 

The queue design outlined can split the queueing in reserve and commit stages,
where the producer can be told how much in can produce and the consumer is
told  how much it can consume. 

Within their areas the producer and consumer can freely move around.
So this is not exactly a queue, but a dynamic double buffer :-)

So maybe doing queueing with the classic head/tail variant is better here,
but the other variant might replace it without problems and allows
for some nice improvements.

Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-22 13:29       ` Ingo Oeser
@ 2006-04-22 13:49         ` Jörn Engel
  2006-04-23  0:05           ` Ingo Oeser
  2006-04-23  5:52         ` David S. Miller
  2006-04-23  9:23         ` Avi Kivity
  2 siblings, 1 reply; 24+ messages in thread
From: Jörn Engel @ 2006-04-22 13:49 UTC (permalink / raw)
  To: Ingo Oeser
  Cc: Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev

On Sat, 22 April 2006 15:29:58 +0200, Ingo Oeser wrote:
> On Saturday, 22. April 2006 13:48, Jörn Engel wrote:
> > Unless I completely misunderstand something, one of the main points of
> > the netchannels if to have *zero* fields written to by both producer
> > and consumer. 
> 
> Hmm, for me the main point was to keep the complete processing
> of a single packet within one CPU/Core where this is a non-issue.

That was another main point, yes.  And the endpoints should be as
little burden on the bottlenecks as possible.  One bottleneck is the
receive interrupt, which shouldn't wait for cachelines from other cpus
too much.

Jörn

-- 
Why do musicians compose symphonies and poets write poems?
They do it because life wouldn't have any meaning for them if they didn't.
That's why I draw cartoons.  It's my life.
-- Charles Shultz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-22 13:49         ` Jörn Engel
@ 2006-04-23  0:05           ` Ingo Oeser
  2006-04-23  5:50             ` David S. Miller
  2006-04-24 16:42             ` Auke Kok
  0 siblings, 2 replies; 24+ messages in thread
From: Ingo Oeser @ 2006-04-23  0:05 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev

On Saturday, 22. April 2006 15:49, Jörn Engel wrote:
> That was another main point, yes.  And the endpoints should be as
> little burden on the bottlenecks as possible.  One bottleneck is the
> receive interrupt, which shouldn't wait for cachelines from other cpus
> too much.

Thats right. This will be made a non issue with early demuxing
on the NIC and MSI (or was it MSI-X?) which will select
the right CPU based on hardware channels.

In the meantime I would reduce the effects with only committing
on full buffer or on leaving the interrupt handler. 

This would be ok, because here you have to wakeup the process
anyway on full buffer and if it slept because of empty buffer. 

You loose only, if your application didn't sleep yet and you need to
leave the interrupt handler because there is no work anymore.
In this case the atomic_add would be significant.

All this is quite similiar to now we do page_vec stuff in mm/ already.

Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-23  0:05           ` Ingo Oeser
@ 2006-04-23  5:50             ` David S. Miller
  2006-04-24 16:42             ` Auke Kok
  1 sibling, 0 replies; 24+ messages in thread
From: David S. Miller @ 2006-04-23  5:50 UTC (permalink / raw)
  To: ioe-lkml; +Cc: joern, netdev, simlo, linux-kernel, mingo, netdev

From: Ingo Oeser <ioe-lkml@rameria.de>
Date: Sun, 23 Apr 2006 02:05:32 +0200

> On Saturday, 22. April 2006 15:49, Jörn Engel wrote:
> > That was another main point, yes.  And the endpoints should be as
> > little burden on the bottlenecks as possible.  One bottleneck is the
> > receive interrupt, which shouldn't wait for cachelines from other cpus
> > too much.
> 
> Thats right. This will be made a non issue with early demuxing
> on the NIC and MSI (or was it MSI-X?) which will select
> the right CPU based on hardware channels.

It is not clear that MSI'ing the RX interrupt to multiple cpus is the
answer.

Consider the fact that by doing so you're reducing the amount of batch
work each interrupt does by a factor N.  One of the biggest gains of
NAPI btw is that it batches patcket receive, if you don't believe the
benefits of this put a simply cycle counter sample around
netif_receive_skb() calls, and note the difference between the first
packet processed and subsequent ones, it's several orders of magnitude
faster to process subsequent packets within a batch.  I've done this
before on tg3 with sparc64 and posted the numbers on netdev about a
year or so ago.

If you are doing something like netchannels, it helps to batch so that
the demuxing table stays hot in the cpu cache.

There is even talk of dedicating a thread on enormously multi-
threaded cpus just to the NIC hardware interrupt, so it could net
channel to the socket processes running on the other strands.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-23  0:05           ` Ingo Oeser
  2006-04-23  5:50             ` David S. Miller
@ 2006-04-24 16:42             ` Auke Kok
  2006-04-24 16:59               ` linux-os (Dick Johnson)
  1 sibling, 1 reply; 24+ messages in thread
From: Auke Kok @ 2006-04-24 16:42 UTC (permalink / raw)
  To: Ingo Oeser
  Cc: Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel,
	mingo, netdev

Ingo Oeser wrote:
> On Saturday, 22. April 2006 15:49, Jörn Engel wrote:
>> That was another main point, yes.  And the endpoints should be as
>> little burden on the bottlenecks as possible.  One bottleneck is the
>> receive interrupt, which shouldn't wait for cachelines from other cpus
>> too much.
> 
> Thats right. This will be made a non issue with early demuxing
> on the NIC and MSI (or was it MSI-X?) which will select
> the right CPU based on hardware channels.

MSI-X. with MSI you still have only one cpu handling all MSI interrupts and 
that doesn't look any different than ordinary interrupts. MSI-X will allow 
much better interrupt handling across several cpu's.

Auke

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-24 16:42             ` Auke Kok
@ 2006-04-24 16:59               ` linux-os (Dick Johnson)
  2006-04-24 17:19                 ` Rick Jones
                                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: linux-os (Dick Johnson) @ 2006-04-24 16:59 UTC (permalink / raw)
  To: Auke Kok
  Cc: Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo,
	linux-kernel, mingo, netdev

On Mon, 24 Apr 2006, Auke Kok wrote:

> Ingo Oeser wrote:
>> On Saturday, 22. April 2006 15:49, Jörn Engel wrote:
>>> That was another main point, yes.  And the endpoints should be as
>>> little burden on the bottlenecks as possible.  One bottleneck is the
>>> receive interrupt, which shouldn't wait for cachelines from other cpus
>>> too much.
>>
>> Thats right. This will be made a non issue with early demuxing
>> on the NIC and MSI (or was it MSI-X?) which will select
>> the right CPU based on hardware channels.
>
> MSI-X. with MSI you still have only one cpu handling all MSI interrupts and
> that doesn't look any different than ordinary interrupts. MSI-X will allow
> much better interrupt handling across several cpu's.
>
> Auke
> -

Message signaled interrupts are just a kudge to save a trace on a
PC board (read make junk cheaper still). They are not faster and
may even be slower. They will not be the salvation of any interrupt
latency problems. The solutions for increasing networking speed,
where the bit-rate on the wire gets close to the bit-rate on the
bus, is to put more and more of the networking code inside the
network board. The CPU get interrupted after most things (like
network handshakes) are complete.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-24 16:59               ` linux-os (Dick Johnson)
@ 2006-04-24 17:19                 ` Rick Jones
  2006-04-24 18:12                   ` linux-os (Dick Johnson)
  2006-04-24 23:17                 ` Michael Chan
  2006-04-25  1:49                 ` Auke Kok
  2 siblings, 1 reply; 24+ messages in thread
From: Rick Jones @ 2006-04-24 17:19 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser,
	David S. Miller, simlo, linux-kernel, mingo, netdev

>>>Thats right. This will be made a non issue with early demuxing
>>>on the NIC and MSI (or was it MSI-X?) which will select
>>>the right CPU based on hardware channels.
>>
>>MSI-X. with MSI you still have only one cpu handling all MSI interrupts and
>>that doesn't look any different than ordinary interrupts. MSI-X will allow
>>much better interrupt handling across several cpu's.
>>
>>Auke
>>-
> 
> 
> Message signaled interrupts are just a kudge to save a trace on a
> PC board (read make junk cheaper still). They are not faster and
> may even be slower. They will not be the salvation of any interrupt
> latency problems. The solutions for increasing networking speed,
> where the bit-rate on the wire gets close to the bit-rate on the
> bus, is to put more and more of the networking code inside the
> network board. The CPU get interrupted after most things (like
> network handshakes) are complete.

if the issue is bus vs network bitrates would offloading really buy that 
much?  i suppose that for minimum sized packets not DMA'ing the headers 
across the bus would be a decent win, but down at small packet sizes 
where headers would be 1/3 to 1/2 the stuff DMA'd around, I would think 
one is talking more about CPU path lengths than bus bitrates.

and up and "full size" segments, since everyone is so fond of bulk 
transfer tests, the transfer saved by not shovig headers across the bus 
is what 54/1448 or ~3.75%

spreading interrupts via MSI-X seems nice and all, but i keep wondering 
if the header field-based distribution that is (will be) done by the 
NICs is putting the cart before the horse - should the NIC essentially 
be telling the system the CPU on which to run the application, or should 
the CPU on which the application runs be telling "networking" where it 
should be happening?

rick jones

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-24 17:19                 ` Rick Jones
@ 2006-04-24 18:12                   ` linux-os (Dick Johnson)
  0 siblings, 0 replies; 24+ messages in thread
From: linux-os (Dick Johnson) @ 2006-04-24 18:12 UTC (permalink / raw)
  To: Rick Jones
  Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser,
	David S. Miller, simlo, linux-kernel, mingo, netdev

On Mon, 24 Apr 2006, Rick Jones wrote:

>>>> Thats right. This will be made a non issue with early demuxing
>>>> on the NIC and MSI (or was it MSI-X?) which will select
>>>> the right CPU based on hardware channels.
>>>
>>> MSI-X. with MSI you still have only one cpu handling all MSI interrupts and
>>> that doesn't look any different than ordinary interrupts. MSI-X will allow
>>> much better interrupt handling across several cpu's.
>>>
>>> Auke
>>> -
>>
>>
>> Message signaled interrupts are just a kudge to save a trace on a
>> PC board (read make junk cheaper still). They are not faster and
>> may even be slower. They will not be the salvation of any interrupt
>> latency problems. The solutions for increasing networking speed,
>> where the bit-rate on the wire gets close to the bit-rate on the
>> bus, is to put more and more of the networking code inside the
>> network board. The CPU get interrupted after most things (like
>> network handshakes) are complete.
>
> if the issue is bus vs network bitrates would offloading really buy that
> much?  i suppose that for minimum sized packets not DMA'ing the headers
> across the bus would be a decent win, but down at small packet sizes
> where headers would be 1/3 to 1/2 the stuff DMA'd around, I would think
> one is talking more about CPU path lengths than bus bitrates.
>
> and up and "full size" segments, since everyone is so fond of bulk
> transfer tests, the transfer saved by not shovig headers across the bus
> is what 54/1448 or ~3.75%
>
> spreading interrupts via MSI-X seems nice and all, but i keep wondering
> if the header field-based distribution that is (will be) done by the
> NICs is putting the cart before the horse - should the NIC essentially
> be telling the system the CPU on which to run the application, or should
> the CPU on which the application runs be telling "networking" where it
> should be happening?
>
> rick jones
>

Ideally, TCP/IP is so mature that one should be able to tell some
hardware state-machine "Connect with 123.555.44.333, port 23" and
it signals via interrupt when that happens. Then one should be
able to say "send these data to that address" or "fill this buffer
with data from that address". All the networking could be done
on the board, perhaps with a dedicated CPU (as is now done) or
all in silicon.

So, the driver end of the networking software just handles
buffers. There are interrupts that show status such as
completions or time-outs, trivial stuff.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-24 16:59               ` linux-os (Dick Johnson)
  2006-04-24 17:19                 ` Rick Jones
@ 2006-04-24 23:17                 ` Michael Chan
  2006-04-25  1:49                 ` Auke Kok
  2 siblings, 0 replies; 24+ messages in thread
From: Michael Chan @ 2006-04-24 23:17 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser,
	David S. Miller, simlo, linux-kernel, mingo, netdev

On Mon, 2006-04-24 at 12:59 -0400, linux-os (Dick Johnson) wrote:

> Message signaled interrupts are just a kudge to save a trace on a
> PC board (read make junk cheaper still). They are not faster and
> may even be slower. They will not be the salvation of any interrupt
> latency problems.

MSI has 2 very nice properties: MSI is never shared and MSI guarantees
that all DMA activities before the MSI have completed. When you take
advantage of these guarantees in your MSI handler, there can be
noticeable improvements compared to using INTA.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-24 16:59               ` linux-os (Dick Johnson)
  2006-04-24 17:19                 ` Rick Jones
  2006-04-24 23:17                 ` Michael Chan
@ 2006-04-25  1:49                 ` Auke Kok
  2006-04-25 11:29                   ` linux-os (Dick Johnson)
  2 siblings, 1 reply; 24+ messages in thread
From: Auke Kok @ 2006-04-25  1:49 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser,
	David S. Miller, simlo, linux-kernel, mingo, netdev

linux-os (Dick Johnson) wrote:
> On Mon, 24 Apr 2006, Auke Kok wrote:
> 
>> Ingo Oeser wrote:
>>> On Saturday, 22. April 2006 15:49, Jörn Engel wrote:
>>>> That was another main point, yes.  And the endpoints should be as
>>>> little burden on the bottlenecks as possible.  One bottleneck is the
>>>> receive interrupt, which shouldn't wait for cachelines from other cpus
>>>> too much.
>>> Thats right. This will be made a non issue with early demuxing
>>> on the NIC and MSI (or was it MSI-X?) which will select
>>> the right CPU based on hardware channels.
>> MSI-X. with MSI you still have only one cpu handling all MSI interrupts and
>> that doesn't look any different than ordinary interrupts. MSI-X will allow
>> much better interrupt handling across several cpu's.
>>
>> Auke
>> -
> 
> Message signaled interrupts are just a kudge to save a trace on a
> PC board (read make junk cheaper still).

yes. Also in PCI-Express there is no physical interrupt line anymore due to 
the architecture, so even classical interrupts are sent as "message" over the bus.

> They are not faster and may even be slower.

thus in the case of PCI-Express, MSI interrupts are just as fast as the 
ordinary ones. I have no numbers on whether MSI is faster or not then e.g. 
interrupts on PCI-X, but generally speaking, the PCI-Express bus is not 
designed to be "low latency" at all, at best it gives you X latency, where X 
is something like microseconds. The MSI message itself only takes 10-20 
nanoseconds though, but all the handling probably adds a large factor to that 
(1000 or so). No clue on classical interrupt line latency - anyone?

> They will not be the salvation of any interrupt latency problems. 

This is also not the problem - we really don't care that our 100.000 packets 
arrive 20usec slower per packet, just as long as the bus is not idle for those 
intervals. We would care a lot if 25.000 of those arrive directly at the 
proper CPU, without the need for one of the cpu's to arbitrate on every 
interrupt. That's the idea anyway.

Nowadays with irq throttling we introduce a lot of designed latency anyway, 
especially with network devices.

> The solutions for increasing networking speed,
> where the bit-rate on the wire gets close to the bit-rate on the
> bus, is to put more and more of the networking code inside the
> network board. The CPU get interrupted after most things (like
> network handshakes) are complete.

That is a limited vision of the situation. You could argue that the current 
CPU's have so much power that they can easily do a lot of the processing 
instead of the hardware, and thus warm caches for userspace, setup sockets 
etc. This is the whole idea of Van Jacobsen's net channels. Putting more 
offloading into the hardware just brings so much problems with itself, that 
are just far easier solved in the OS.

Cheers,

Auke

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-25  1:49                 ` Auke Kok
@ 2006-04-25 11:29                   ` linux-os (Dick Johnson)
  2006-05-02 12:41                     ` Vojtech Pavlik
  0 siblings, 1 reply; 24+ messages in thread
From: linux-os (Dick Johnson) @ 2006-04-25 11:29 UTC (permalink / raw)
  To: Auke Kok
  Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser,
	David S. Miller, simlo, linux-kernel, mingo, netdev

On Mon, 24 Apr 2006, Auke Kok wrote:

> linux-os (Dick Johnson) wrote:
>> On Mon, 24 Apr 2006, Auke Kok wrote:
>>
>>> Ingo Oeser wrote:
>>>> On Saturday, 22. April 2006 15:49, Jörn Engel wrote:
>>>>> That was another main point, yes.  And the endpoints should be as
>>>>> little burden on the bottlenecks as possible.  One bottleneck is the
>>>>> receive interrupt, which shouldn't wait for cachelines from other cpus
>>>>> too much.
>>>> Thats right. This will be made a non issue with early demuxing
>>>> on the NIC and MSI (or was it MSI-X?) which will select
>>>> the right CPU based on hardware channels.
>>> MSI-X. with MSI you still have only one cpu handling all MSI interrupts and
>>> that doesn't look any different than ordinary interrupts. MSI-X will allow
>>> much better interrupt handling across several cpu's.
>>>
>>> Auke
>>> -
>>
>> Message signaled interrupts are just a kudge to save a trace on a
>> PC board (read make junk cheaper still).
>
> yes. Also in PCI-Express there is no physical interrupt line anymore due to
> the architecture, so even classical interrupts are sent as "message" over the bus.
>
>> They are not faster and may even be slower.
>
> thus in the case of PCI-Express, MSI interrupts are just as fast as the
> ordinary ones. I have no numbers on whether MSI is faster or not then e.g.
> interrupts on PCI-X, but generally speaking, the PCI-Express bus is not
> designed to be "low latency" at all, at best it gives you X latency, where X
> is something like microseconds. The MSI message itself only takes 10-20
> nanoseconds though, but all the handling probably adds a large factor to that
> (1000 or so). No clue on classical interrupt line latency - anyone?
>

About 9 nanosecond per foot of FR-4 (G10) trace, plus the access time
through the gate-arrays (about 20 ns) so, from the time a device needs
the CPU, until it hits the interrupt pin, you have typically 30 to
50 nanoseconds. Of course the CPU is __much__ slower. However, these
physical latencies are in series, cannot be compensated for because
the CPU can't see into the future.

>> They will not be the salvation of any interrupt latency problems.
>
> This is also not the problem - we really don't care that our 100.000 packets
> arrive 20usec slower per packet, just as long as the bus is not idle for those
> intervals. We would care a lot if 25.000 of those arrive directly at the
> proper CPU, without the need for one of the cpu's to arbitrate on every
> interrupt. That's the idea anyway.

It forces driver-writers to loop in ISRs to handle new status changes
that happened before an asserted interrupt even got to the CPU. This
is bad. You end up polled in the ISR, with the interrupts off. Turning
on the interrupts exacerbates the problem, you may never leave the
ISR! It becomes the new "idle task". To properly use interrupts,
the hardware latency must be less than the CPUs response to the
hardware stimulus.

>
> Nowadays with irq throttling we introduce a lot of designed latency anyway,
> especially with network devices.
>
>> The solutions for increasing networking speed,
>> where the bit-rate on the wire gets close to the bit-rate on the
>> bus, is to put more and more of the networking code inside the
>> network board. The CPU get interrupted after most things (like
>> network handshakes) are complete.
>
> That is a limited vision of the situation. You could argue that the current
> CPU's have so much power that they can easily do a lot of the processing
> instead of the hardware, and thus warm caches for userspace, setup sockets
> etc. This is the whole idea of Van Jacobsen's net channels. Putting more
> offloading into the hardware just brings so much problems with itself, that
> are just far easier solved in the OS.
>
>
> Cheers,
>
> Auke
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-25 11:29                   ` linux-os (Dick Johnson)
@ 2006-05-02 12:41                     ` Vojtech Pavlik
  2006-05-02 15:58                       ` Andi Kleen
  0 siblings, 1 reply; 24+ messages in thread
From: Vojtech Pavlik @ 2006-05-02 12:41 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Auke Kok, Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser,
	David S. Miller, simlo, linux-kernel, mingo, netdev

On Tue, Apr 25, 2006 at 07:29:40AM -0400, linux-os (Dick Johnson) wrote:

> >> Message signaled interrupts are just a kudge to save a trace on a
> >> PC board (read make junk cheaper still).
> >
> > yes. Also in PCI-Express there is no physical interrupt line anymore due to
> > the architecture, so even classical interrupts are sent as "message" over the bus.
> >
> >> They are not faster and may even be slower.
> >
> > thus in the case of PCI-Express, MSI interrupts are just as fast as the
> > ordinary ones. I have no numbers on whether MSI is faster or not then e.g.
> > interrupts on PCI-X, but generally speaking, the PCI-Express bus is not
> > designed to be "low latency" at all, at best it gives you X latency, where X
> > is something like microseconds. The MSI message itself only takes 10-20
> > nanoseconds though, but all the handling probably adds a large factor to that
> > (1000 or so). No clue on classical interrupt line latency - anyone?
> 
> About 9 nanosecond per foot of FR-4 (G10) trace, plus the access time
> through the gate-arrays (about 20 ns) so, from the time a device needs
> the CPU, until it hits the interrupt pin, you have typically 30 to
> 50 nanoseconds. Of course the CPU is __much__ slower. However, these
> physical latencies are in series, cannot be compensated for because
> the CPU can't see into the future.
 
You seem to be missing the fact that most of todays interrupts are
delivered through the APIC bus, which isn't fast at all.

-- 
Vojtech Pavlik
Director SuSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-05-02 12:41                     ` Vojtech Pavlik
@ 2006-05-02 15:58                       ` Andi Kleen
  0 siblings, 0 replies; 24+ messages in thread
From: Andi Kleen @ 2006-05-02 15:58 UTC (permalink / raw)
  To: Vojtech Pavlik
  Cc: linux-os (Dick Johnson), Auke Kok, Auke Kok, Ingo Oeser,
	Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel,
	mingo, netdev

On Tuesday 02 May 2006 14:41, Vojtech Pavlik wrote:
 
> You seem to be missing the fact that most of todays interrupts are
> delivered through the APIC bus, which isn't fast at all.

You mean slow right?  Modern x86s (anything newer than a P3) generally don't 
have an separate APIC bus anymore but just send messages over their main
processor connection.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-22 13:29       ` Ingo Oeser
  2006-04-22 13:49         ` Jörn Engel
@ 2006-04-23  5:52         ` David S. Miller
  2006-04-23  9:23         ` Avi Kivity
  2 siblings, 0 replies; 24+ messages in thread
From: David S. Miller @ 2006-04-23  5:52 UTC (permalink / raw)
  To: ioe-lkml; +Cc: joern, netdev, simlo, linux-kernel, mingo, netdev

From: Ingo Oeser <ioe-lkml@rameria.de>
Date: Sat, 22 Apr 2006 15:29:58 +0200

> On Saturday, 22. April 2006 13:48, Jörn Engel wrote:
> > Unless I completely misunderstand something, one of the main points of
> > the netchannels if to have *zero* fields written to by both producer
> > and consumer. 
> 
> Hmm, for me the main point was to keep the complete processing
> of a single packet within one CPU/Core where this is a non-issue.

Both are the important issues.

You move the bulk of the packet processing work to the end
cores of the system, yes.  But you do so with an enormously
SMP friendly queue data structure so that it does not matter
at all that the packet is received on one cpu, yet processed
in socket context on another.

If you elide either part of the implementation, you miss the
entire point of net channels.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-22 13:29       ` Ingo Oeser
  2006-04-22 13:49         ` Jörn Engel
  2006-04-23  5:52         ` David S. Miller
@ 2006-04-23  9:23         ` Avi Kivity
  2 siblings, 0 replies; 24+ messages in thread
From: Avi Kivity @ 2006-04-23  9:23 UTC (permalink / raw)
  To: Ingo Oeser
  Cc: Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel,
	mingo, netdev

Ingo Oeser wrote:
> Hi Jörn,
>
> On Saturday, 22. April 2006 13:48, Jörn Engel wrote:
>   
>> Unless I completely misunderstand something, one of the main points of
>> the netchannels if to have *zero* fields written to by both producer
>> and consumer. 
>>     
>
> Hmm, for me the main point was to keep the complete processing
> of a single packet within one CPU/Core where this is a non-issue.
>   
But the interrupt for a packet can be received by cpu 0 whereas the rest 
of processing proceeds on cpu 1; so it still helps to keep the producer 
index and consumer index on separate cachelines.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-22 11:48     ` Jörn Engel
  2006-04-22 13:29       ` Ingo Oeser
@ 2006-04-23  5:51       ` David S. Miller
  1 sibling, 0 replies; 24+ messages in thread
From: David S. Miller @ 2006-04-23  5:51 UTC (permalink / raw)
  To: joern; +Cc: netdev, simlo, linux-kernel, mingo, netdev, ioe-lkml

From: Jörn Engel <joern@wohnheim.fh-wedel.de>
Date: Sat, 22 Apr 2006 13:48:46 +0200

> Unless I completely misunderstand something, one of the main points of
> the netchannels if to have *zero* fields written to by both producer
> and consumer.  Receiving and sending a lot can be expected to be the
> common case, so taking a performance hit in this case is hardly a good
> idea.

That's absolutely correct, this is absolutely critical to
the implementation.

If you're doing any atomic operations, or any write operations by both
consumer and producer to the same cacheline, you've broken things :-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-21 16:52   ` Ingo Oeser
  2006-04-22 11:48     ` Jörn Engel
@ 2006-04-23  5:56     ` David S. Miller
  2006-04-23 14:15       ` Ingo Oeser
  1 sibling, 1 reply; 24+ messages in thread
From: David S. Miller @ 2006-04-23  5:56 UTC (permalink / raw)
  To: netdev; +Cc: simlo, linux-kernel, mingo, netdev, ioe-lkml

From: Ingo Oeser <netdev@axxeo.de>
Date: Fri, 21 Apr 2006 18:52:47 +0200

> nice to see you getting started with it.

Thanks for reviewing.

> I'm not sure about the queue logic there.
> 
> 1867 /* Caller must have exclusive producer access to the netchannel. */
> 1868 int netchannel_enqueue(struct netchannel *np, struct netchannel_buftrailer *bp)
> 1869 {
> 1870 	unsigned long tail;
> 1871
> 1872 	tail = np->netchan_tail;
> 1873 	if (tail == np->netchan_head)
> 1874 		return -ENOMEM;
> 
> This looks wrong, since empty and full are the same condition in your
> case.

Thanks, that's obviously wrong.  I'll try to fix this up.

> What about sth. like
> 
> struct netchannel {
>    /* This is only read/written by the writer (producer) */
>    unsigned long write_ptr;
>   struct netchannel_buftrailer *netchan_queue[NET_CHANNEL_ENTRIES];
> 
>    /* This is modified by both */
>   atomic_t filled_entries; /* cache_line_align this? */
> 
>    /* This is only read/written by the reader (consumer) */
>    unsigned long read_ptr;
> }

As stated elsewhere, if you add atomic operations you break the entire
idea of net channels.  They are meant to be SMP efficient data structures
where the producer has one cache line that only it dirties and the
consumer has one cache line that likewise only it dirties.

> If cacheline bouncing because of the shared filled_entries becomes an issue,
> you are receiving or sending a lot.

Cacheline bouncing is the core issue being addressed by this
data structure, so we really can't consider your idea seriously.

I've just got an off-by-one error, no need to wreck the entire
data structure just to solve that :-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-23  5:56     ` David S. Miller
@ 2006-04-23 14:15       ` Ingo Oeser
  0 siblings, 0 replies; 24+ messages in thread
From: Ingo Oeser @ 2006-04-23 14:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, simlo, linux-kernel, mingo, netdev

Hi Dave,

On Sunday, 23. April 2006 07:56, David S. Miller wrote:
> > If cacheline bouncing because of the shared filled_entries becomes an issue,
> > you are receiving or sending a lot.
> 
> Cacheline bouncing is the core issue being addressed by this
> data structure, so we really can't consider your idea seriously.

Ok, I can see it now more clearly. Many thanks for clearing that up 
in the other replies. I had a major misunderstanding there.

> I've just got an off-by-one error, no need to wreck the entire
> data structure just to solve that :-)

Yes, you are right. But even then I can still implement the
reserve/commit once you provide the helpers for
producer_space and consumer_space.


Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-20 19:09 ` Van Jacobson's net channels and real-time David S. Miller
  2006-04-21 16:52   ` Ingo Oeser
@ 2006-04-22 19:30   ` bert hubert
  2006-04-23  5:53     ` David S. Miller
  1 sibling, 1 reply; 24+ messages in thread
From: bert hubert @ 2006-04-22 19:30 UTC (permalink / raw)
  To: David S. Miller; +Cc: simlo, linux-kernel, mingo, netdev

On Thu, Apr 20, 2006 at 12:09:55PM -0700, David S. Miller wrote:
> Going all the way to the socket is a large endeavor and will require a
> lot of restructuring to do it right, so expect this to take on the
> order of months.

That's what you said about Niagara too :-) 

Good luck!

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://netherlabs.nl              Open and Closed source services

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Van Jacobson's net channels and real-time
  2006-04-22 19:30   ` bert hubert
@ 2006-04-23  5:53     ` David S. Miller
  0 siblings, 0 replies; 24+ messages in thread
From: David S. Miller @ 2006-04-23  5:53 UTC (permalink / raw)
  To: bert.hubert; +Cc: simlo, linux-kernel, mingo, netdev

From: bert hubert <bert.hubert@netherlabs.nl>
Date: Sat, 22 Apr 2006 21:30:24 +0200

> On Thu, Apr 20, 2006 at 12:09:55PM -0700, David S. Miller wrote:
> > Going all the way to the socket is a large endeavor and will require a
> > lot of restructuring to do it right, so expect this to take on the
> > order of months.
> 
> That's what you said about Niagara too :-) 

I'm just trying to keep the expectations low so it's easier
to exceed them :-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: Van Jacobson's net channels and real-time
@ 2006-04-24 17:28 Caitlin Bestler
  0 siblings, 0 replies; 24+ messages in thread
From: Caitlin Bestler @ 2006-04-24 17:28 UTC (permalink / raw)
  To: netdev

netdev-owner@vger.kernel.org wrote:
> Subject: Re: Van Jacobson's net channels and real-time
> 
> 
> On Mon, 24 Apr 2006, Auke Kok wrote:
> 
>> Ingo Oeser wrote:
>>> On Saturday, 22. April 2006 15:49, Jörn Engel wrote:
>>>> That was another main point, yes.  And the endpoints should be as
>>>> little burden on the bottlenecks as possible.  One bottleneck is
>>>> the receive interrupt, which shouldn't wait for cachelines from
>>>> other cpus too much.
>>> 
>>> Thats right. This will be made a non issue with early demuxing on
>>> the NIC and MSI (or was it MSI-X?) which will select the right CPU
>>> based on hardware channels.
>> 
>> MSI-X. with MSI you still have only one cpu handling all MSI
>> interrupts and that doesn't look any different than ordinary
>> interrupts. MSI-X will allow much better interrupt handling across
>> several cpu's. 
>> 
>> Auke
>> -
> 
> Message signaled interrupts are just a kudge to save a trace
> on a PC board (read make junk cheaper still). They are not
> faster and may even be slower. They will not be the salvation
> of any interrupt latency problems. The solutions for
> increasing networking speed, where the bit-rate on the wire
> gets close to the bit-rate on the bus, is to put more and
> more of the networking code inside the network board. The CPU
> get interrupted after most things (like network handshakes)
> are complete.
> 

The number of hardware interrupts supported is a bit out of scope.
Whatever the capacity is, the key is to have as few meaningless
interrupts as possible.

In the context of netchannels this would mean that an interrupt
should only be fired when there is a sufficient number of packets
for the user-mode code to process. Fully offloading the protocol
to the hardware is certainly one option, that I also thinks make
sense, but the goal of netchannels is to try to optimize performance
while keeping TCP processing on host.

More hardware offload is distinctly possible and relevant in this
context. Statefull offload, such as TSO, are fully relevant.
Going directly from the NIC to the channel is also possible (after
the channel is set up by the kernel, of course). If the NIC is
aware of the channels directly then interrupts can be limited to
packets that cross per-channel thresholds configured directly
by the ring consumer.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2006-05-02 16:03 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.44L0.0604201819040.19330-100000@lifa01.phys.au.dk>
2006-04-20 19:09 ` Van Jacobson's net channels and real-time David S. Miller
2006-04-21 16:52   ` Ingo Oeser
2006-04-22 11:48     ` Jörn Engel
2006-04-22 13:29       ` Ingo Oeser
2006-04-22 13:49         ` Jörn Engel
2006-04-23  0:05           ` Ingo Oeser
2006-04-23  5:50             ` David S. Miller
2006-04-24 16:42             ` Auke Kok
2006-04-24 16:59               ` linux-os (Dick Johnson)
2006-04-24 17:19                 ` Rick Jones
2006-04-24 18:12                   ` linux-os (Dick Johnson)
2006-04-24 23:17                 ` Michael Chan
2006-04-25  1:49                 ` Auke Kok
2006-04-25 11:29                   ` linux-os (Dick Johnson)
2006-05-02 12:41                     ` Vojtech Pavlik
2006-05-02 15:58                       ` Andi Kleen
2006-04-23  5:52         ` David S. Miller
2006-04-23  9:23         ` Avi Kivity
2006-04-23  5:51       ` David S. Miller
2006-04-23  5:56     ` David S. Miller
2006-04-23 14:15       ` Ingo Oeser
2006-04-22 19:30   ` bert hubert
2006-04-23  5:53     ` David S. Miller
2006-04-24 17:28 Caitlin Bestler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).