* Re: Van Jacobson's net channels and real-time [not found] <Pine.LNX.4.44L0.0604201819040.19330-100000@lifa01.phys.au.dk> @ 2006-04-20 19:09 ` David S. Miller 2006-04-21 16:52 ` Ingo Oeser 2006-04-22 19:30 ` bert hubert 0 siblings, 2 replies; 24+ messages in thread From: David S. Miller @ 2006-04-20 19:09 UTC (permalink / raw) To: simlo; +Cc: linux-kernel, mingo, netdev [ Maybe ask questions like this on "netdev" where the networking developers hang out? Added to CC: ] Van fell off the face of the planet after giving his presentation and never published his code, only his slides. I've started to make a slow attempt at implementing his ideas, nothing but pure infrastructure so far, but you can look at what I have here: kernel.org:/pub/scm/linux/kernel/git/davem/vj-2.6.git don't expect major progress and don't expect anything beyond a simple channel to softint packet processing on receive any time soon. Going all the way to the socket is a large endeavor and will require a lot of restructuring to do it right, so expect this to take on the order of months. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-20 19:09 ` Van Jacobson's net channels and real-time David S. Miller @ 2006-04-21 16:52 ` Ingo Oeser 2006-04-22 11:48 ` Jörn Engel 2006-04-23 5:56 ` David S. Miller 2006-04-22 19:30 ` bert hubert 1 sibling, 2 replies; 24+ messages in thread From: Ingo Oeser @ 2006-04-21 16:52 UTC (permalink / raw) To: David S. Miller; +Cc: simlo, linux-kernel, mingo, netdev, Ingo Oeser Hi David, nice to see you getting started with it. I'm not sure about the queue logic there. 1867 /* Caller must have exclusive producer access to the netchannel. */ 1868 int netchannel_enqueue(struct netchannel *np, struct netchannel_buftrailer *bp) 1869 { 1870 unsigned long tail; 1871 1872 tail = np->netchan_tail; 1873 if (tail == np->netchan_head) 1874 return -ENOMEM; This looks wrong, since empty and full are the same condition in your case. 1891 struct netchannel_buftrailer *__netchannel_dequeue(struct netchannel *np) 1892 { 1893 unsigned long head = np->netchan_head; 1894 struct netchannel_buftrailer *bp = np->netchan_queue[head]; 1895 1896 BUG_ON(np->netchan_tail == head); See? What about sth. like struct netchannel { /* This is only read/written by the writer (producer) */ unsigned long write_ptr; struct netchannel_buftrailer *netchan_queue[NET_CHANNEL_ENTRIES]; /* This is modified by both */ atomic_t filled_entries; /* cache_line_align this? */ /* This is only read/written by the reader (consumer) */ unsigned long read_ptr; } This would prevent this bug from the beginning and let us still use the full queue size. If cacheline bouncing because of the shared filled_entries becomes an issue, you are receiving or sending a lot. Then you can enqueue and dequeue multiple and commit the counts later. To be done with a atomic_read, atomic_add and atomic_sub on filled_entries. Maybe even cheaper with local_t instead of atomic_t later on. But I guess the cacheline bouncing will be a non-issue, since the whole point of netchannels was to keep traffic as local to a cpu as possible, right? Would you like to see a sample patch relative to your tree, to show you what I mean? Regards Ingo Oeser ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-21 16:52 ` Ingo Oeser @ 2006-04-22 11:48 ` Jörn Engel 2006-04-22 13:29 ` Ingo Oeser 2006-04-23 5:51 ` David S. Miller 2006-04-23 5:56 ` David S. Miller 1 sibling, 2 replies; 24+ messages in thread From: Jörn Engel @ 2006-04-22 11:48 UTC (permalink / raw) To: Ingo Oeser Cc: David S. Miller, simlo, linux-kernel, mingo, netdev, Ingo Oeser On Fri, 21 April 2006 18:52:47 +0200, Ingo Oeser wrote: > What about sth. like > > struct netchannel { > /* This is only read/written by the writer (producer) */ > unsigned long write_ptr; > struct netchannel_buftrailer *netchan_queue[NET_CHANNEL_ENTRIES]; > > /* This is modified by both */ > atomic_t filled_entries; /* cache_line_align this? */ > > /* This is only read/written by the reader (consumer) */ > unsigned long read_ptr; > } > > This would prevent this bug from the beginning and let us still use the > full queue size. > > If cacheline bouncing because of the shared filled_entries becomes an issue, > you are receiving or sending a lot. Unless I completely misunderstand something, one of the main points of the netchannels if to have *zero* fields written to by both producer and consumer. Receiving and sending a lot can be expected to be the common case, so taking a performance hit in this case is hardly a good idea. I haven't looked at Davem's implementation at all, but Van simply seperated fields in consumer-written and producer-written, with proper alignment between them. Some consumer-written fields are also read by the producer and vice versa. But none of this results in cacheline pingpong. If your description of the problem is correct, it should only mean that the implementation has a problem, not the concept. Jörn -- Time? What's that? Time is only worth what you do with it. -- Theo de Raadt ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 11:48 ` Jörn Engel @ 2006-04-22 13:29 ` Ingo Oeser 2006-04-22 13:49 ` Jörn Engel ` (2 more replies) 2006-04-23 5:51 ` David S. Miller 1 sibling, 3 replies; 24+ messages in thread From: Ingo Oeser @ 2006-04-22 13:29 UTC (permalink / raw) To: Jörn Engel Cc: Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev Hi Jörn, On Saturday, 22. April 2006 13:48, Jörn Engel wrote: > Unless I completely misunderstand something, one of the main points of > the netchannels if to have *zero* fields written to by both producer > and consumer. Hmm, for me the main point was to keep the complete processing of a single packet within one CPU/Core where this is a non-issue. > Receiving and sending a lot can be expected to be the > common case, so taking a performance hit in this case is hardly a good > idea. There is no hit. If you receive/send in bursts you can simply aggregate them until a certain queueing threshold. The queue design outlined can split the queueing in reserve and commit stages, where the producer can be told how much in can produce and the consumer is told how much it can consume. Within their areas the producer and consumer can freely move around. So this is not exactly a queue, but a dynamic double buffer :-) So maybe doing queueing with the classic head/tail variant is better here, but the other variant might replace it without problems and allows for some nice improvements. Regards Ingo Oeser ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 13:29 ` Ingo Oeser @ 2006-04-22 13:49 ` Jörn Engel 2006-04-23 0:05 ` Ingo Oeser 2006-04-23 5:52 ` David S. Miller 2006-04-23 9:23 ` Avi Kivity 2 siblings, 1 reply; 24+ messages in thread From: Jörn Engel @ 2006-04-22 13:49 UTC (permalink / raw) To: Ingo Oeser Cc: Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Sat, 22 April 2006 15:29:58 +0200, Ingo Oeser wrote: > On Saturday, 22. April 2006 13:48, Jörn Engel wrote: > > Unless I completely misunderstand something, one of the main points of > > the netchannels if to have *zero* fields written to by both producer > > and consumer. > > Hmm, for me the main point was to keep the complete processing > of a single packet within one CPU/Core where this is a non-issue. That was another main point, yes. And the endpoints should be as little burden on the bottlenecks as possible. One bottleneck is the receive interrupt, which shouldn't wait for cachelines from other cpus too much. Jörn -- Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life. -- Charles Shultz ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 13:49 ` Jörn Engel @ 2006-04-23 0:05 ` Ingo Oeser 2006-04-23 5:50 ` David S. Miller 2006-04-24 16:42 ` Auke Kok 0 siblings, 2 replies; 24+ messages in thread From: Ingo Oeser @ 2006-04-23 0:05 UTC (permalink / raw) To: Jörn Engel Cc: Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Saturday, 22. April 2006 15:49, Jörn Engel wrote: > That was another main point, yes. And the endpoints should be as > little burden on the bottlenecks as possible. One bottleneck is the > receive interrupt, which shouldn't wait for cachelines from other cpus > too much. Thats right. This will be made a non issue with early demuxing on the NIC and MSI (or was it MSI-X?) which will select the right CPU based on hardware channels. In the meantime I would reduce the effects with only committing on full buffer or on leaving the interrupt handler. This would be ok, because here you have to wakeup the process anyway on full buffer and if it slept because of empty buffer. You loose only, if your application didn't sleep yet and you need to leave the interrupt handler because there is no work anymore. In this case the atomic_add would be significant. All this is quite similiar to now we do page_vec stuff in mm/ already. Regards Ingo Oeser ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-23 0:05 ` Ingo Oeser @ 2006-04-23 5:50 ` David S. Miller 2006-04-24 16:42 ` Auke Kok 1 sibling, 0 replies; 24+ messages in thread From: David S. Miller @ 2006-04-23 5:50 UTC (permalink / raw) To: ioe-lkml; +Cc: joern, netdev, simlo, linux-kernel, mingo, netdev From: Ingo Oeser <ioe-lkml@rameria.de> Date: Sun, 23 Apr 2006 02:05:32 +0200 > On Saturday, 22. April 2006 15:49, Jörn Engel wrote: > > That was another main point, yes. And the endpoints should be as > > little burden on the bottlenecks as possible. One bottleneck is the > > receive interrupt, which shouldn't wait for cachelines from other cpus > > too much. > > Thats right. This will be made a non issue with early demuxing > on the NIC and MSI (or was it MSI-X?) which will select > the right CPU based on hardware channels. It is not clear that MSI'ing the RX interrupt to multiple cpus is the answer. Consider the fact that by doing so you're reducing the amount of batch work each interrupt does by a factor N. One of the biggest gains of NAPI btw is that it batches patcket receive, if you don't believe the benefits of this put a simply cycle counter sample around netif_receive_skb() calls, and note the difference between the first packet processed and subsequent ones, it's several orders of magnitude faster to process subsequent packets within a batch. I've done this before on tg3 with sparc64 and posted the numbers on netdev about a year or so ago. If you are doing something like netchannels, it helps to batch so that the demuxing table stays hot in the cpu cache. There is even talk of dedicating a thread on enormously multi- threaded cpus just to the NIC hardware interrupt, so it could net channel to the socket processes running on the other strands. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-23 0:05 ` Ingo Oeser 2006-04-23 5:50 ` David S. Miller @ 2006-04-24 16:42 ` Auke Kok 2006-04-24 16:59 ` linux-os (Dick Johnson) 1 sibling, 1 reply; 24+ messages in thread From: Auke Kok @ 2006-04-24 16:42 UTC (permalink / raw) To: Ingo Oeser Cc: Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev Ingo Oeser wrote: > On Saturday, 22. April 2006 15:49, Jörn Engel wrote: >> That was another main point, yes. And the endpoints should be as >> little burden on the bottlenecks as possible. One bottleneck is the >> receive interrupt, which shouldn't wait for cachelines from other cpus >> too much. > > Thats right. This will be made a non issue with early demuxing > on the NIC and MSI (or was it MSI-X?) which will select > the right CPU based on hardware channels. MSI-X. with MSI you still have only one cpu handling all MSI interrupts and that doesn't look any different than ordinary interrupts. MSI-X will allow much better interrupt handling across several cpu's. Auke ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-24 16:42 ` Auke Kok @ 2006-04-24 16:59 ` linux-os (Dick Johnson) 2006-04-24 17:19 ` Rick Jones ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: linux-os (Dick Johnson) @ 2006-04-24 16:59 UTC (permalink / raw) To: Auke Kok Cc: Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Mon, 24 Apr 2006, Auke Kok wrote: > Ingo Oeser wrote: >> On Saturday, 22. April 2006 15:49, Jörn Engel wrote: >>> That was another main point, yes. And the endpoints should be as >>> little burden on the bottlenecks as possible. One bottleneck is the >>> receive interrupt, which shouldn't wait for cachelines from other cpus >>> too much. >> >> Thats right. This will be made a non issue with early demuxing >> on the NIC and MSI (or was it MSI-X?) which will select >> the right CPU based on hardware channels. > > MSI-X. with MSI you still have only one cpu handling all MSI interrupts and > that doesn't look any different than ordinary interrupts. MSI-X will allow > much better interrupt handling across several cpu's. > > Auke > - Message signaled interrupts are just a kudge to save a trace on a PC board (read make junk cheaper still). They are not faster and may even be slower. They will not be the salvation of any interrupt latency problems. The solutions for increasing networking speed, where the bit-rate on the wire gets close to the bit-rate on the bus, is to put more and more of the networking code inside the network board. The CPU get interrupted after most things (like network handshakes) are complete. Cheers, Dick Johnson Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips). Warning : 98.36% of all statistics are fiction, book release in April. _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-24 16:59 ` linux-os (Dick Johnson) @ 2006-04-24 17:19 ` Rick Jones 2006-04-24 18:12 ` linux-os (Dick Johnson) 2006-04-24 23:17 ` Michael Chan 2006-04-25 1:49 ` Auke Kok 2 siblings, 1 reply; 24+ messages in thread From: Rick Jones @ 2006-04-24 17:19 UTC (permalink / raw) To: linux-os (Dick Johnson) Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev >>>Thats right. This will be made a non issue with early demuxing >>>on the NIC and MSI (or was it MSI-X?) which will select >>>the right CPU based on hardware channels. >> >>MSI-X. with MSI you still have only one cpu handling all MSI interrupts and >>that doesn't look any different than ordinary interrupts. MSI-X will allow >>much better interrupt handling across several cpu's. >> >>Auke >>- > > > Message signaled interrupts are just a kudge to save a trace on a > PC board (read make junk cheaper still). They are not faster and > may even be slower. They will not be the salvation of any interrupt > latency problems. The solutions for increasing networking speed, > where the bit-rate on the wire gets close to the bit-rate on the > bus, is to put more and more of the networking code inside the > network board. The CPU get interrupted after most things (like > network handshakes) are complete. if the issue is bus vs network bitrates would offloading really buy that much? i suppose that for minimum sized packets not DMA'ing the headers across the bus would be a decent win, but down at small packet sizes where headers would be 1/3 to 1/2 the stuff DMA'd around, I would think one is talking more about CPU path lengths than bus bitrates. and up and "full size" segments, since everyone is so fond of bulk transfer tests, the transfer saved by not shovig headers across the bus is what 54/1448 or ~3.75% spreading interrupts via MSI-X seems nice and all, but i keep wondering if the header field-based distribution that is (will be) done by the NICs is putting the cart before the horse - should the NIC essentially be telling the system the CPU on which to run the application, or should the CPU on which the application runs be telling "networking" where it should be happening? rick jones ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-24 17:19 ` Rick Jones @ 2006-04-24 18:12 ` linux-os (Dick Johnson) 0 siblings, 0 replies; 24+ messages in thread From: linux-os (Dick Johnson) @ 2006-04-24 18:12 UTC (permalink / raw) To: Rick Jones Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Mon, 24 Apr 2006, Rick Jones wrote: >>>> Thats right. This will be made a non issue with early demuxing >>>> on the NIC and MSI (or was it MSI-X?) which will select >>>> the right CPU based on hardware channels. >>> >>> MSI-X. with MSI you still have only one cpu handling all MSI interrupts and >>> that doesn't look any different than ordinary interrupts. MSI-X will allow >>> much better interrupt handling across several cpu's. >>> >>> Auke >>> - >> >> >> Message signaled interrupts are just a kudge to save a trace on a >> PC board (read make junk cheaper still). They are not faster and >> may even be slower. They will not be the salvation of any interrupt >> latency problems. The solutions for increasing networking speed, >> where the bit-rate on the wire gets close to the bit-rate on the >> bus, is to put more and more of the networking code inside the >> network board. The CPU get interrupted after most things (like >> network handshakes) are complete. > > if the issue is bus vs network bitrates would offloading really buy that > much? i suppose that for minimum sized packets not DMA'ing the headers > across the bus would be a decent win, but down at small packet sizes > where headers would be 1/3 to 1/2 the stuff DMA'd around, I would think > one is talking more about CPU path lengths than bus bitrates. > > and up and "full size" segments, since everyone is so fond of bulk > transfer tests, the transfer saved by not shovig headers across the bus > is what 54/1448 or ~3.75% > > spreading interrupts via MSI-X seems nice and all, but i keep wondering > if the header field-based distribution that is (will be) done by the > NICs is putting the cart before the horse - should the NIC essentially > be telling the system the CPU on which to run the application, or should > the CPU on which the application runs be telling "networking" where it > should be happening? > > rick jones > Ideally, TCP/IP is so mature that one should be able to tell some hardware state-machine "Connect with 123.555.44.333, port 23" and it signals via interrupt when that happens. Then one should be able to say "send these data to that address" or "fill this buffer with data from that address". All the networking could be done on the board, perhaps with a dedicated CPU (as is now done) or all in silicon. So, the driver end of the networking software just handles buffers. There are interrupts that show status such as completions or time-outs, trivial stuff. Cheers, Dick Johnson Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips). Warning : 98.36% of all statistics are fiction, book release in April. _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-24 16:59 ` linux-os (Dick Johnson) 2006-04-24 17:19 ` Rick Jones @ 2006-04-24 23:17 ` Michael Chan 2006-04-25 1:49 ` Auke Kok 2 siblings, 0 replies; 24+ messages in thread From: Michael Chan @ 2006-04-24 23:17 UTC (permalink / raw) To: linux-os (Dick Johnson) Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Mon, 2006-04-24 at 12:59 -0400, linux-os (Dick Johnson) wrote: > Message signaled interrupts are just a kudge to save a trace on a > PC board (read make junk cheaper still). They are not faster and > may even be slower. They will not be the salvation of any interrupt > latency problems. MSI has 2 very nice properties: MSI is never shared and MSI guarantees that all DMA activities before the MSI have completed. When you take advantage of these guarantees in your MSI handler, there can be noticeable improvements compared to using INTA. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-24 16:59 ` linux-os (Dick Johnson) 2006-04-24 17:19 ` Rick Jones 2006-04-24 23:17 ` Michael Chan @ 2006-04-25 1:49 ` Auke Kok 2006-04-25 11:29 ` linux-os (Dick Johnson) 2 siblings, 1 reply; 24+ messages in thread From: Auke Kok @ 2006-04-25 1:49 UTC (permalink / raw) To: linux-os (Dick Johnson) Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev linux-os (Dick Johnson) wrote: > On Mon, 24 Apr 2006, Auke Kok wrote: > >> Ingo Oeser wrote: >>> On Saturday, 22. April 2006 15:49, Jörn Engel wrote: >>>> That was another main point, yes. And the endpoints should be as >>>> little burden on the bottlenecks as possible. One bottleneck is the >>>> receive interrupt, which shouldn't wait for cachelines from other cpus >>>> too much. >>> Thats right. This will be made a non issue with early demuxing >>> on the NIC and MSI (or was it MSI-X?) which will select >>> the right CPU based on hardware channels. >> MSI-X. with MSI you still have only one cpu handling all MSI interrupts and >> that doesn't look any different than ordinary interrupts. MSI-X will allow >> much better interrupt handling across several cpu's. >> >> Auke >> - > > Message signaled interrupts are just a kudge to save a trace on a > PC board (read make junk cheaper still). yes. Also in PCI-Express there is no physical interrupt line anymore due to the architecture, so even classical interrupts are sent as "message" over the bus. > They are not faster and may even be slower. thus in the case of PCI-Express, MSI interrupts are just as fast as the ordinary ones. I have no numbers on whether MSI is faster or not then e.g. interrupts on PCI-X, but generally speaking, the PCI-Express bus is not designed to be "low latency" at all, at best it gives you X latency, where X is something like microseconds. The MSI message itself only takes 10-20 nanoseconds though, but all the handling probably adds a large factor to that (1000 or so). No clue on classical interrupt line latency - anyone? > They will not be the salvation of any interrupt latency problems. This is also not the problem - we really don't care that our 100.000 packets arrive 20usec slower per packet, just as long as the bus is not idle for those intervals. We would care a lot if 25.000 of those arrive directly at the proper CPU, without the need for one of the cpu's to arbitrate on every interrupt. That's the idea anyway. Nowadays with irq throttling we introduce a lot of designed latency anyway, especially with network devices. > The solutions for increasing networking speed, > where the bit-rate on the wire gets close to the bit-rate on the > bus, is to put more and more of the networking code inside the > network board. The CPU get interrupted after most things (like > network handshakes) are complete. That is a limited vision of the situation. You could argue that the current CPU's have so much power that they can easily do a lot of the processing instead of the hardware, and thus warm caches for userspace, setup sockets etc. This is the whole idea of Van Jacobsen's net channels. Putting more offloading into the hardware just brings so much problems with itself, that are just far easier solved in the OS. Cheers, Auke ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-25 1:49 ` Auke Kok @ 2006-04-25 11:29 ` linux-os (Dick Johnson) 2006-05-02 12:41 ` Vojtech Pavlik 0 siblings, 1 reply; 24+ messages in thread From: linux-os (Dick Johnson) @ 2006-04-25 11:29 UTC (permalink / raw) To: Auke Kok Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Mon, 24 Apr 2006, Auke Kok wrote: > linux-os (Dick Johnson) wrote: >> On Mon, 24 Apr 2006, Auke Kok wrote: >> >>> Ingo Oeser wrote: >>>> On Saturday, 22. April 2006 15:49, Jörn Engel wrote: >>>>> That was another main point, yes. And the endpoints should be as >>>>> little burden on the bottlenecks as possible. One bottleneck is the >>>>> receive interrupt, which shouldn't wait for cachelines from other cpus >>>>> too much. >>>> Thats right. This will be made a non issue with early demuxing >>>> on the NIC and MSI (or was it MSI-X?) which will select >>>> the right CPU based on hardware channels. >>> MSI-X. with MSI you still have only one cpu handling all MSI interrupts and >>> that doesn't look any different than ordinary interrupts. MSI-X will allow >>> much better interrupt handling across several cpu's. >>> >>> Auke >>> - >> >> Message signaled interrupts are just a kudge to save a trace on a >> PC board (read make junk cheaper still). > > yes. Also in PCI-Express there is no physical interrupt line anymore due to > the architecture, so even classical interrupts are sent as "message" over the bus. > >> They are not faster and may even be slower. > > thus in the case of PCI-Express, MSI interrupts are just as fast as the > ordinary ones. I have no numbers on whether MSI is faster or not then e.g. > interrupts on PCI-X, but generally speaking, the PCI-Express bus is not > designed to be "low latency" at all, at best it gives you X latency, where X > is something like microseconds. The MSI message itself only takes 10-20 > nanoseconds though, but all the handling probably adds a large factor to that > (1000 or so). No clue on classical interrupt line latency - anyone? > About 9 nanosecond per foot of FR-4 (G10) trace, plus the access time through the gate-arrays (about 20 ns) so, from the time a device needs the CPU, until it hits the interrupt pin, you have typically 30 to 50 nanoseconds. Of course the CPU is __much__ slower. However, these physical latencies are in series, cannot be compensated for because the CPU can't see into the future. >> They will not be the salvation of any interrupt latency problems. > > This is also not the problem - we really don't care that our 100.000 packets > arrive 20usec slower per packet, just as long as the bus is not idle for those > intervals. We would care a lot if 25.000 of those arrive directly at the > proper CPU, without the need for one of the cpu's to arbitrate on every > interrupt. That's the idea anyway. It forces driver-writers to loop in ISRs to handle new status changes that happened before an asserted interrupt even got to the CPU. This is bad. You end up polled in the ISR, with the interrupts off. Turning on the interrupts exacerbates the problem, you may never leave the ISR! It becomes the new "idle task". To properly use interrupts, the hardware latency must be less than the CPUs response to the hardware stimulus. > > Nowadays with irq throttling we introduce a lot of designed latency anyway, > especially with network devices. > >> The solutions for increasing networking speed, >> where the bit-rate on the wire gets close to the bit-rate on the >> bus, is to put more and more of the networking code inside the >> network board. The CPU get interrupted after most things (like >> network handshakes) are complete. > > That is a limited vision of the situation. You could argue that the current > CPU's have so much power that they can easily do a lot of the processing > instead of the hardware, and thus warm caches for userspace, setup sockets > etc. This is the whole idea of Van Jacobsen's net channels. Putting more > offloading into the hardware just brings so much problems with itself, that > are just far easier solved in the OS. > > > Cheers, > > Auke > Cheers, Dick Johnson Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips). Warning : 98.36% of all statistics are fiction, book release in April. _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-25 11:29 ` linux-os (Dick Johnson) @ 2006-05-02 12:41 ` Vojtech Pavlik 2006-05-02 15:58 ` Andi Kleen 0 siblings, 1 reply; 24+ messages in thread From: Vojtech Pavlik @ 2006-05-02 12:41 UTC (permalink / raw) To: linux-os (Dick Johnson) Cc: Auke Kok, Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Tue, Apr 25, 2006 at 07:29:40AM -0400, linux-os (Dick Johnson) wrote: > >> Message signaled interrupts are just a kudge to save a trace on a > >> PC board (read make junk cheaper still). > > > > yes. Also in PCI-Express there is no physical interrupt line anymore due to > > the architecture, so even classical interrupts are sent as "message" over the bus. > > > >> They are not faster and may even be slower. > > > > thus in the case of PCI-Express, MSI interrupts are just as fast as the > > ordinary ones. I have no numbers on whether MSI is faster or not then e.g. > > interrupts on PCI-X, but generally speaking, the PCI-Express bus is not > > designed to be "low latency" at all, at best it gives you X latency, where X > > is something like microseconds. The MSI message itself only takes 10-20 > > nanoseconds though, but all the handling probably adds a large factor to that > > (1000 or so). No clue on classical interrupt line latency - anyone? > > About 9 nanosecond per foot of FR-4 (G10) trace, plus the access time > through the gate-arrays (about 20 ns) so, from the time a device needs > the CPU, until it hits the interrupt pin, you have typically 30 to > 50 nanoseconds. Of course the CPU is __much__ slower. However, these > physical latencies are in series, cannot be compensated for because > the CPU can't see into the future. You seem to be missing the fact that most of todays interrupts are delivered through the APIC bus, which isn't fast at all. -- Vojtech Pavlik Director SuSE Labs ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-05-02 12:41 ` Vojtech Pavlik @ 2006-05-02 15:58 ` Andi Kleen 0 siblings, 0 replies; 24+ messages in thread From: Andi Kleen @ 2006-05-02 15:58 UTC (permalink / raw) To: Vojtech Pavlik Cc: linux-os (Dick Johnson), Auke Kok, Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Tuesday 02 May 2006 14:41, Vojtech Pavlik wrote: > You seem to be missing the fact that most of todays interrupts are > delivered through the APIC bus, which isn't fast at all. You mean slow right? Modern x86s (anything newer than a P3) generally don't have an separate APIC bus anymore but just send messages over their main processor connection. -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 13:29 ` Ingo Oeser 2006-04-22 13:49 ` Jörn Engel @ 2006-04-23 5:52 ` David S. Miller 2006-04-23 9:23 ` Avi Kivity 2 siblings, 0 replies; 24+ messages in thread From: David S. Miller @ 2006-04-23 5:52 UTC (permalink / raw) To: ioe-lkml; +Cc: joern, netdev, simlo, linux-kernel, mingo, netdev From: Ingo Oeser <ioe-lkml@rameria.de> Date: Sat, 22 Apr 2006 15:29:58 +0200 > On Saturday, 22. April 2006 13:48, Jörn Engel wrote: > > Unless I completely misunderstand something, one of the main points of > > the netchannels if to have *zero* fields written to by both producer > > and consumer. > > Hmm, for me the main point was to keep the complete processing > of a single packet within one CPU/Core where this is a non-issue. Both are the important issues. You move the bulk of the packet processing work to the end cores of the system, yes. But you do so with an enormously SMP friendly queue data structure so that it does not matter at all that the packet is received on one cpu, yet processed in socket context on another. If you elide either part of the implementation, you miss the entire point of net channels. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 13:29 ` Ingo Oeser 2006-04-22 13:49 ` Jörn Engel 2006-04-23 5:52 ` David S. Miller @ 2006-04-23 9:23 ` Avi Kivity 2 siblings, 0 replies; 24+ messages in thread From: Avi Kivity @ 2006-04-23 9:23 UTC (permalink / raw) To: Ingo Oeser Cc: Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev Ingo Oeser wrote: > Hi Jörn, > > On Saturday, 22. April 2006 13:48, Jörn Engel wrote: > >> Unless I completely misunderstand something, one of the main points of >> the netchannels if to have *zero* fields written to by both producer >> and consumer. >> > > Hmm, for me the main point was to keep the complete processing > of a single packet within one CPU/Core where this is a non-issue. > But the interrupt for a packet can be received by cpu 0 whereas the rest of processing proceeds on cpu 1; so it still helps to keep the producer index and consumer index on separate cachelines. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 11:48 ` Jörn Engel 2006-04-22 13:29 ` Ingo Oeser @ 2006-04-23 5:51 ` David S. Miller 1 sibling, 0 replies; 24+ messages in thread From: David S. Miller @ 2006-04-23 5:51 UTC (permalink / raw) To: joern; +Cc: netdev, simlo, linux-kernel, mingo, netdev, ioe-lkml From: Jörn Engel <joern@wohnheim.fh-wedel.de> Date: Sat, 22 Apr 2006 13:48:46 +0200 > Unless I completely misunderstand something, one of the main points of > the netchannels if to have *zero* fields written to by both producer > and consumer. Receiving and sending a lot can be expected to be the > common case, so taking a performance hit in this case is hardly a good > idea. That's absolutely correct, this is absolutely critical to the implementation. If you're doing any atomic operations, or any write operations by both consumer and producer to the same cacheline, you've broken things :-) ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-21 16:52 ` Ingo Oeser 2006-04-22 11:48 ` Jörn Engel @ 2006-04-23 5:56 ` David S. Miller 2006-04-23 14:15 ` Ingo Oeser 1 sibling, 1 reply; 24+ messages in thread From: David S. Miller @ 2006-04-23 5:56 UTC (permalink / raw) To: netdev; +Cc: simlo, linux-kernel, mingo, netdev, ioe-lkml From: Ingo Oeser <netdev@axxeo.de> Date: Fri, 21 Apr 2006 18:52:47 +0200 > nice to see you getting started with it. Thanks for reviewing. > I'm not sure about the queue logic there. > > 1867 /* Caller must have exclusive producer access to the netchannel. */ > 1868 int netchannel_enqueue(struct netchannel *np, struct netchannel_buftrailer *bp) > 1869 { > 1870 unsigned long tail; > 1871 > 1872 tail = np->netchan_tail; > 1873 if (tail == np->netchan_head) > 1874 return -ENOMEM; > > This looks wrong, since empty and full are the same condition in your > case. Thanks, that's obviously wrong. I'll try to fix this up. > What about sth. like > > struct netchannel { > /* This is only read/written by the writer (producer) */ > unsigned long write_ptr; > struct netchannel_buftrailer *netchan_queue[NET_CHANNEL_ENTRIES]; > > /* This is modified by both */ > atomic_t filled_entries; /* cache_line_align this? */ > > /* This is only read/written by the reader (consumer) */ > unsigned long read_ptr; > } As stated elsewhere, if you add atomic operations you break the entire idea of net channels. They are meant to be SMP efficient data structures where the producer has one cache line that only it dirties and the consumer has one cache line that likewise only it dirties. > If cacheline bouncing because of the shared filled_entries becomes an issue, > you are receiving or sending a lot. Cacheline bouncing is the core issue being addressed by this data structure, so we really can't consider your idea seriously. I've just got an off-by-one error, no need to wreck the entire data structure just to solve that :-) ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-23 5:56 ` David S. Miller @ 2006-04-23 14:15 ` Ingo Oeser 0 siblings, 0 replies; 24+ messages in thread From: Ingo Oeser @ 2006-04-23 14:15 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, simlo, linux-kernel, mingo, netdev Hi Dave, On Sunday, 23. April 2006 07:56, David S. Miller wrote: > > If cacheline bouncing because of the shared filled_entries becomes an issue, > > you are receiving or sending a lot. > > Cacheline bouncing is the core issue being addressed by this > data structure, so we really can't consider your idea seriously. Ok, I can see it now more clearly. Many thanks for clearing that up in the other replies. I had a major misunderstanding there. > I've just got an off-by-one error, no need to wreck the entire > data structure just to solve that :-) Yes, you are right. But even then I can still implement the reserve/commit once you provide the helpers for producer_space and consumer_space. Regards Ingo Oeser ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-20 19:09 ` Van Jacobson's net channels and real-time David S. Miller 2006-04-21 16:52 ` Ingo Oeser @ 2006-04-22 19:30 ` bert hubert 2006-04-23 5:53 ` David S. Miller 1 sibling, 1 reply; 24+ messages in thread From: bert hubert @ 2006-04-22 19:30 UTC (permalink / raw) To: David S. Miller; +Cc: simlo, linux-kernel, mingo, netdev On Thu, Apr 20, 2006 at 12:09:55PM -0700, David S. Miller wrote: > Going all the way to the socket is a large endeavor and will require a > lot of restructuring to do it right, so expect this to take on the > order of months. That's what you said about Niagara too :-) Good luck! -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 19:30 ` bert hubert @ 2006-04-23 5:53 ` David S. Miller 0 siblings, 0 replies; 24+ messages in thread From: David S. Miller @ 2006-04-23 5:53 UTC (permalink / raw) To: bert.hubert; +Cc: simlo, linux-kernel, mingo, netdev From: bert hubert <bert.hubert@netherlabs.nl> Date: Sat, 22 Apr 2006 21:30:24 +0200 > On Thu, Apr 20, 2006 at 12:09:55PM -0700, David S. Miller wrote: > > Going all the way to the socket is a large endeavor and will require a > > lot of restructuring to do it right, so expect this to take on the > > order of months. > > That's what you said about Niagara too :-) I'm just trying to keep the expectations low so it's easier to exceed them :-) ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: Van Jacobson's net channels and real-time
@ 2006-04-24 17:28 Caitlin Bestler
0 siblings, 0 replies; 24+ messages in thread
From: Caitlin Bestler @ 2006-04-24 17:28 UTC (permalink / raw)
To: netdev
netdev-owner@vger.kernel.org wrote:
> Subject: Re: Van Jacobson's net channels and real-time
>
>
> On Mon, 24 Apr 2006, Auke Kok wrote:
>
>> Ingo Oeser wrote:
>>> On Saturday, 22. April 2006 15:49, Jörn Engel wrote:
>>>> That was another main point, yes. And the endpoints should be as
>>>> little burden on the bottlenecks as possible. One bottleneck is
>>>> the receive interrupt, which shouldn't wait for cachelines from
>>>> other cpus too much.
>>>
>>> Thats right. This will be made a non issue with early demuxing on
>>> the NIC and MSI (or was it MSI-X?) which will select the right CPU
>>> based on hardware channels.
>>
>> MSI-X. with MSI you still have only one cpu handling all MSI
>> interrupts and that doesn't look any different than ordinary
>> interrupts. MSI-X will allow much better interrupt handling across
>> several cpu's.
>>
>> Auke
>> -
>
> Message signaled interrupts are just a kudge to save a trace
> on a PC board (read make junk cheaper still). They are not
> faster and may even be slower. They will not be the salvation
> of any interrupt latency problems. The solutions for
> increasing networking speed, where the bit-rate on the wire
> gets close to the bit-rate on the bus, is to put more and
> more of the networking code inside the network board. The CPU
> get interrupted after most things (like network handshakes)
> are complete.
>
The number of hardware interrupts supported is a bit out of scope.
Whatever the capacity is, the key is to have as few meaningless
interrupts as possible.
In the context of netchannels this would mean that an interrupt
should only be fired when there is a sufficient number of packets
for the user-mode code to process. Fully offloading the protocol
to the hardware is certainly one option, that I also thinks make
sense, but the goal of netchannels is to try to optimize performance
while keeping TCP processing on host.
More hardware offload is distinctly possible and relevant in this
context. Statefull offload, such as TSO, are fully relevant.
Going directly from the NIC to the channel is also possible (after
the channel is set up by the kernel, of course). If the NIC is
aware of the channels directly then interrupts can be limited to
packets that cross per-channel thresholds configured directly
by the ring consumer.
^ permalink raw reply [flat|nested] 24+ messages in threadend of thread, other threads:[~2006-05-02 16:03 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.44L0.0604201819040.19330-100000@lifa01.phys.au.dk>
2006-04-20 19:09 ` Van Jacobson's net channels and real-time David S. Miller
2006-04-21 16:52 ` Ingo Oeser
2006-04-22 11:48 ` Jörn Engel
2006-04-22 13:29 ` Ingo Oeser
2006-04-22 13:49 ` Jörn Engel
2006-04-23 0:05 ` Ingo Oeser
2006-04-23 5:50 ` David S. Miller
2006-04-24 16:42 ` Auke Kok
2006-04-24 16:59 ` linux-os (Dick Johnson)
2006-04-24 17:19 ` Rick Jones
2006-04-24 18:12 ` linux-os (Dick Johnson)
2006-04-24 23:17 ` Michael Chan
2006-04-25 1:49 ` Auke Kok
2006-04-25 11:29 ` linux-os (Dick Johnson)
2006-05-02 12:41 ` Vojtech Pavlik
2006-05-02 15:58 ` Andi Kleen
2006-04-23 5:52 ` David S. Miller
2006-04-23 9:23 ` Avi Kivity
2006-04-23 5:51 ` David S. Miller
2006-04-23 5:56 ` David S. Miller
2006-04-23 14:15 ` Ingo Oeser
2006-04-22 19:30 ` bert hubert
2006-04-23 5:53 ` David S. Miller
2006-04-24 17:28 Caitlin Bestler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).