* Van Jacobson's net channels and real-time @ 2006-04-20 16:29 Esben Nielsen 2006-04-20 19:09 ` David S. Miller 2006-04-21 8:53 ` Jan Kiszka 0 siblings, 2 replies; 31+ messages in thread From: Esben Nielsen @ 2006-04-20 16:29 UTC (permalink / raw) To: linux-kernel; +Cc: Ingo Molnar Before I start, where is VJ's code? I have not been able to find it anywhere. With the preempt-realtime branch maturing and finding it's way into the mainline kernel, using Linux (without sub-kernels) for real-time applications is becomming an realistic option without having to do a lot of hacks in the kernel on your own. But the network stack could be improved and some of the ideas in Van Jacobson's net channels could be usefull when receiving network packages with real-time latencies. Finding the end point in the receive interrupt and send of the packet to the receiving process directly is a good idea if it is fast enough to do so in the interrupt context (and I think it can be done very fast). One problem in the current setup, is that everything has to go through the soft interrupt. That is even if you make a completely new, non-IP protocol, the latency for delivering the frame to your application is still limited by the latency of the IP-stack because it still have to go through soft irq which might be busy working on IP packages. Even if you open a raw socket, the latency is limited to the latency of the soft irq. At work we use a widely used commercial RTOS. It got exactly the same problem of having every network packet being handled by the same thread. Buffer management is another issue. On the RTOS above you make a buffer pool per network device for receiving packages. On Linux received packages are taken from the global memory pool with GFP_ATOMIC. On both systems you can easily run out of buffers if they are not freed back to the pool fast enough. In that case you will just have to drop packages as they are received. Without having the code to VJ's net channels, it looks like they solve the problem: Each end receiver provides his own receive resources. If a receiver can't cope with all the traffic, it will loose packages, the others wont. That makes it safe to run important real-time traffic along with some unpredictable, low priority TCP/IP traffic. If the TCP/IP receivers does not run fast enough, their packages will be dropped, but the driver will not drop the real-time packages. The nice thing about a real-time task is that you know it's latency and therefore know how many receive buffers it needs to avoid loosing packages in a worst case scenario. Implementing new protocols in user space is a good idea, too. The developer - who doesn't need to be a hard-core kernel hacker - can pick whatever language he wants and has far easier access to debugging tools than in the kernel. Unfortunately, it does not perform very well. Using raw sockets is a way to do protocol stacks in user space now, but you can only listen to packets with a specific protocol id. Therefore you either have to make one thread or process in userspace receive everything and then send it to the end receivers, or let all threads receive all and let them throw away packages not for them. Apparently the filter mechanism for VJ's net channels (if it is made general enough) would solve that problem, too. Many realtime applications are time triggered. I.e. they wake up say every 5 ms and poll their environment for new inputs, do their calculations and then send out the results again. For such an application it will be very efficient to make the driver put the network packages in a mmap'ed area, but not try to wake up the application. The application will simply poll the mmap'ed channel every 5 ms. Once this is setup there is no system calls issued for receiving packages at all! On Linux today - for packet orientated protocols at least - the application has to issue a system call for every packet received plus a system call in the end to check there is no more packeges to be read. Esben ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-20 16:29 Van Jacobson's net channels and real-time Esben Nielsen @ 2006-04-20 19:09 ` David S. Miller 2006-04-21 16:52 ` Ingo Oeser 2006-04-22 19:30 ` bert hubert 2006-04-21 8:53 ` Jan Kiszka 1 sibling, 2 replies; 31+ messages in thread From: David S. Miller @ 2006-04-20 19:09 UTC (permalink / raw) To: simlo; +Cc: linux-kernel, mingo, netdev [ Maybe ask questions like this on "netdev" where the networking developers hang out? Added to CC: ] Van fell off the face of the planet after giving his presentation and never published his code, only his slides. I've started to make a slow attempt at implementing his ideas, nothing but pure infrastructure so far, but you can look at what I have here: kernel.org:/pub/scm/linux/kernel/git/davem/vj-2.6.git don't expect major progress and don't expect anything beyond a simple channel to softint packet processing on receive any time soon. Going all the way to the socket is a large endeavor and will require a lot of restructuring to do it right, so expect this to take on the order of months. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-20 19:09 ` David S. Miller @ 2006-04-21 16:52 ` Ingo Oeser 2006-04-22 11:48 ` Jörn Engel 2006-04-23 5:56 ` David S. Miller 2006-04-22 19:30 ` bert hubert 1 sibling, 2 replies; 31+ messages in thread From: Ingo Oeser @ 2006-04-21 16:52 UTC (permalink / raw) To: David S. Miller; +Cc: simlo, linux-kernel, mingo, netdev, Ingo Oeser Hi David, nice to see you getting started with it. I'm not sure about the queue logic there. 1867 /* Caller must have exclusive producer access to the netchannel. */ 1868 int netchannel_enqueue(struct netchannel *np, struct netchannel_buftrailer *bp) 1869 { 1870 unsigned long tail; 1871 1872 tail = np->netchan_tail; 1873 if (tail == np->netchan_head) 1874 return -ENOMEM; This looks wrong, since empty and full are the same condition in your case. 1891 struct netchannel_buftrailer *__netchannel_dequeue(struct netchannel *np) 1892 { 1893 unsigned long head = np->netchan_head; 1894 struct netchannel_buftrailer *bp = np->netchan_queue[head]; 1895 1896 BUG_ON(np->netchan_tail == head); See? What about sth. like struct netchannel { /* This is only read/written by the writer (producer) */ unsigned long write_ptr; struct netchannel_buftrailer *netchan_queue[NET_CHANNEL_ENTRIES]; /* This is modified by both */ atomic_t filled_entries; /* cache_line_align this? */ /* This is only read/written by the reader (consumer) */ unsigned long read_ptr; } This would prevent this bug from the beginning and let us still use the full queue size. If cacheline bouncing because of the shared filled_entries becomes an issue, you are receiving or sending a lot. Then you can enqueue and dequeue multiple and commit the counts later. To be done with a atomic_read, atomic_add and atomic_sub on filled_entries. Maybe even cheaper with local_t instead of atomic_t later on. But I guess the cacheline bouncing will be a non-issue, since the whole point of netchannels was to keep traffic as local to a cpu as possible, right? Would you like to see a sample patch relative to your tree, to show you what I mean? Regards Ingo Oeser ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-21 16:52 ` Ingo Oeser @ 2006-04-22 11:48 ` Jörn Engel 2006-04-22 13:29 ` Ingo Oeser 2006-04-23 5:51 ` David S. Miller 2006-04-23 5:56 ` David S. Miller 1 sibling, 2 replies; 31+ messages in thread From: Jörn Engel @ 2006-04-22 11:48 UTC (permalink / raw) To: Ingo Oeser Cc: David S. Miller, simlo, linux-kernel, mingo, netdev, Ingo Oeser On Fri, 21 April 2006 18:52:47 +0200, Ingo Oeser wrote: > What about sth. like > > struct netchannel { > /* This is only read/written by the writer (producer) */ > unsigned long write_ptr; > struct netchannel_buftrailer *netchan_queue[NET_CHANNEL_ENTRIES]; > > /* This is modified by both */ > atomic_t filled_entries; /* cache_line_align this? */ > > /* This is only read/written by the reader (consumer) */ > unsigned long read_ptr; > } > > This would prevent this bug from the beginning and let us still use the > full queue size. > > If cacheline bouncing because of the shared filled_entries becomes an issue, > you are receiving or sending a lot. Unless I completely misunderstand something, one of the main points of the netchannels if to have *zero* fields written to by both producer and consumer. Receiving and sending a lot can be expected to be the common case, so taking a performance hit in this case is hardly a good idea. I haven't looked at Davem's implementation at all, but Van simply seperated fields in consumer-written and producer-written, with proper alignment between them. Some consumer-written fields are also read by the producer and vice versa. But none of this results in cacheline pingpong. If your description of the problem is correct, it should only mean that the implementation has a problem, not the concept. Jörn -- Time? What's that? Time is only worth what you do with it. -- Theo de Raadt ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 11:48 ` Jörn Engel @ 2006-04-22 13:29 ` Ingo Oeser 2006-04-22 13:49 ` Jörn Engel ` (2 more replies) 2006-04-23 5:51 ` David S. Miller 1 sibling, 3 replies; 31+ messages in thread From: Ingo Oeser @ 2006-04-22 13:29 UTC (permalink / raw) To: Jörn Engel Cc: Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev Hi Jörn, On Saturday, 22. April 2006 13:48, Jörn Engel wrote: > Unless I completely misunderstand something, one of the main points of > the netchannels if to have *zero* fields written to by both producer > and consumer. Hmm, for me the main point was to keep the complete processing of a single packet within one CPU/Core where this is a non-issue. > Receiving and sending a lot can be expected to be the > common case, so taking a performance hit in this case is hardly a good > idea. There is no hit. If you receive/send in bursts you can simply aggregate them until a certain queueing threshold. The queue design outlined can split the queueing in reserve and commit stages, where the producer can be told how much in can produce and the consumer is told how much it can consume. Within their areas the producer and consumer can freely move around. So this is not exactly a queue, but a dynamic double buffer :-) So maybe doing queueing with the classic head/tail variant is better here, but the other variant might replace it without problems and allows for some nice improvements. Regards Ingo Oeser ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 13:29 ` Ingo Oeser @ 2006-04-22 13:49 ` Jörn Engel 2006-04-23 0:05 ` Ingo Oeser 2006-04-23 5:52 ` David S. Miller 2006-04-23 9:23 ` Avi Kivity 2 siblings, 1 reply; 31+ messages in thread From: Jörn Engel @ 2006-04-22 13:49 UTC (permalink / raw) To: Ingo Oeser Cc: Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Sat, 22 April 2006 15:29:58 +0200, Ingo Oeser wrote: > On Saturday, 22. April 2006 13:48, Jörn Engel wrote: > > Unless I completely misunderstand something, one of the main points of > > the netchannels if to have *zero* fields written to by both producer > > and consumer. > > Hmm, for me the main point was to keep the complete processing > of a single packet within one CPU/Core where this is a non-issue. That was another main point, yes. And the endpoints should be as little burden on the bottlenecks as possible. One bottleneck is the receive interrupt, which shouldn't wait for cachelines from other cpus too much. Jörn -- Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life. -- Charles Shultz ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 13:49 ` Jörn Engel @ 2006-04-23 0:05 ` Ingo Oeser 2006-04-23 5:50 ` David S. Miller 2006-04-24 16:42 ` Auke Kok 0 siblings, 2 replies; 31+ messages in thread From: Ingo Oeser @ 2006-04-23 0:05 UTC (permalink / raw) To: Jörn Engel Cc: Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Saturday, 22. April 2006 15:49, Jörn Engel wrote: > That was another main point, yes. And the endpoints should be as > little burden on the bottlenecks as possible. One bottleneck is the > receive interrupt, which shouldn't wait for cachelines from other cpus > too much. Thats right. This will be made a non issue with early demuxing on the NIC and MSI (or was it MSI-X?) which will select the right CPU based on hardware channels. In the meantime I would reduce the effects with only committing on full buffer or on leaving the interrupt handler. This would be ok, because here you have to wakeup the process anyway on full buffer and if it slept because of empty buffer. You loose only, if your application didn't sleep yet and you need to leave the interrupt handler because there is no work anymore. In this case the atomic_add would be significant. All this is quite similiar to now we do page_vec stuff in mm/ already. Regards Ingo Oeser ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-23 0:05 ` Ingo Oeser @ 2006-04-23 5:50 ` David S. Miller 2006-04-24 16:42 ` Auke Kok 1 sibling, 0 replies; 31+ messages in thread From: David S. Miller @ 2006-04-23 5:50 UTC (permalink / raw) To: ioe-lkml; +Cc: joern, netdev, simlo, linux-kernel, mingo, netdev From: Ingo Oeser <ioe-lkml@rameria.de> Date: Sun, 23 Apr 2006 02:05:32 +0200 > On Saturday, 22. April 2006 15:49, Jörn Engel wrote: > > That was another main point, yes. And the endpoints should be as > > little burden on the bottlenecks as possible. One bottleneck is the > > receive interrupt, which shouldn't wait for cachelines from other cpus > > too much. > > Thats right. This will be made a non issue with early demuxing > on the NIC and MSI (or was it MSI-X?) which will select > the right CPU based on hardware channels. It is not clear that MSI'ing the RX interrupt to multiple cpus is the answer. Consider the fact that by doing so you're reducing the amount of batch work each interrupt does by a factor N. One of the biggest gains of NAPI btw is that it batches patcket receive, if you don't believe the benefits of this put a simply cycle counter sample around netif_receive_skb() calls, and note the difference between the first packet processed and subsequent ones, it's several orders of magnitude faster to process subsequent packets within a batch. I've done this before on tg3 with sparc64 and posted the numbers on netdev about a year or so ago. If you are doing something like netchannels, it helps to batch so that the demuxing table stays hot in the cpu cache. There is even talk of dedicating a thread on enormously multi- threaded cpus just to the NIC hardware interrupt, so it could net channel to the socket processes running on the other strands. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-23 0:05 ` Ingo Oeser 2006-04-23 5:50 ` David S. Miller @ 2006-04-24 16:42 ` Auke Kok 2006-04-24 16:59 ` linux-os (Dick Johnson) 1 sibling, 1 reply; 31+ messages in thread From: Auke Kok @ 2006-04-24 16:42 UTC (permalink / raw) To: Ingo Oeser Cc: Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev Ingo Oeser wrote: > On Saturday, 22. April 2006 15:49, Jörn Engel wrote: >> That was another main point, yes. And the endpoints should be as >> little burden on the bottlenecks as possible. One bottleneck is the >> receive interrupt, which shouldn't wait for cachelines from other cpus >> too much. > > Thats right. This will be made a non issue with early demuxing > on the NIC and MSI (or was it MSI-X?) which will select > the right CPU based on hardware channels. MSI-X. with MSI you still have only one cpu handling all MSI interrupts and that doesn't look any different than ordinary interrupts. MSI-X will allow much better interrupt handling across several cpu's. Auke ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-24 16:42 ` Auke Kok @ 2006-04-24 16:59 ` linux-os (Dick Johnson) 2006-04-24 17:19 ` Rick Jones ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: linux-os (Dick Johnson) @ 2006-04-24 16:59 UTC (permalink / raw) To: Auke Kok Cc: Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Mon, 24 Apr 2006, Auke Kok wrote: > Ingo Oeser wrote: >> On Saturday, 22. April 2006 15:49, Jörn Engel wrote: >>> That was another main point, yes. And the endpoints should be as >>> little burden on the bottlenecks as possible. One bottleneck is the >>> receive interrupt, which shouldn't wait for cachelines from other cpus >>> too much. >> >> Thats right. This will be made a non issue with early demuxing >> on the NIC and MSI (or was it MSI-X?) which will select >> the right CPU based on hardware channels. > > MSI-X. with MSI you still have only one cpu handling all MSI interrupts and > that doesn't look any different than ordinary interrupts. MSI-X will allow > much better interrupt handling across several cpu's. > > Auke > - Message signaled interrupts are just a kudge to save a trace on a PC board (read make junk cheaper still). They are not faster and may even be slower. They will not be the salvation of any interrupt latency problems. The solutions for increasing networking speed, where the bit-rate on the wire gets close to the bit-rate on the bus, is to put more and more of the networking code inside the network board. The CPU get interrupted after most things (like network handshakes) are complete. Cheers, Dick Johnson Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips). Warning : 98.36% of all statistics are fiction, book release in April. _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-24 16:59 ` linux-os (Dick Johnson) @ 2006-04-24 17:19 ` Rick Jones 2006-04-24 18:12 ` linux-os (Dick Johnson) 2006-04-24 23:17 ` Michael Chan 2006-04-25 1:49 ` Auke Kok 2 siblings, 1 reply; 31+ messages in thread From: Rick Jones @ 2006-04-24 17:19 UTC (permalink / raw) To: linux-os (Dick Johnson) Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev >>>Thats right. This will be made a non issue with early demuxing >>>on the NIC and MSI (or was it MSI-X?) which will select >>>the right CPU based on hardware channels. >> >>MSI-X. with MSI you still have only one cpu handling all MSI interrupts and >>that doesn't look any different than ordinary interrupts. MSI-X will allow >>much better interrupt handling across several cpu's. >> >>Auke >>- > > > Message signaled interrupts are just a kudge to save a trace on a > PC board (read make junk cheaper still). They are not faster and > may even be slower. They will not be the salvation of any interrupt > latency problems. The solutions for increasing networking speed, > where the bit-rate on the wire gets close to the bit-rate on the > bus, is to put more and more of the networking code inside the > network board. The CPU get interrupted after most things (like > network handshakes) are complete. if the issue is bus vs network bitrates would offloading really buy that much? i suppose that for minimum sized packets not DMA'ing the headers across the bus would be a decent win, but down at small packet sizes where headers would be 1/3 to 1/2 the stuff DMA'd around, I would think one is talking more about CPU path lengths than bus bitrates. and up and "full size" segments, since everyone is so fond of bulk transfer tests, the transfer saved by not shovig headers across the bus is what 54/1448 or ~3.75% spreading interrupts via MSI-X seems nice and all, but i keep wondering if the header field-based distribution that is (will be) done by the NICs is putting the cart before the horse - should the NIC essentially be telling the system the CPU on which to run the application, or should the CPU on which the application runs be telling "networking" where it should be happening? rick jones ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-24 17:19 ` Rick Jones @ 2006-04-24 18:12 ` linux-os (Dick Johnson) 0 siblings, 0 replies; 31+ messages in thread From: linux-os (Dick Johnson) @ 2006-04-24 18:12 UTC (permalink / raw) To: Rick Jones Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Mon, 24 Apr 2006, Rick Jones wrote: >>>> Thats right. This will be made a non issue with early demuxing >>>> on the NIC and MSI (or was it MSI-X?) which will select >>>> the right CPU based on hardware channels. >>> >>> MSI-X. with MSI you still have only one cpu handling all MSI interrupts and >>> that doesn't look any different than ordinary interrupts. MSI-X will allow >>> much better interrupt handling across several cpu's. >>> >>> Auke >>> - >> >> >> Message signaled interrupts are just a kudge to save a trace on a >> PC board (read make junk cheaper still). They are not faster and >> may even be slower. They will not be the salvation of any interrupt >> latency problems. The solutions for increasing networking speed, >> where the bit-rate on the wire gets close to the bit-rate on the >> bus, is to put more and more of the networking code inside the >> network board. The CPU get interrupted after most things (like >> network handshakes) are complete. > > if the issue is bus vs network bitrates would offloading really buy that > much? i suppose that for minimum sized packets not DMA'ing the headers > across the bus would be a decent win, but down at small packet sizes > where headers would be 1/3 to 1/2 the stuff DMA'd around, I would think > one is talking more about CPU path lengths than bus bitrates. > > and up and "full size" segments, since everyone is so fond of bulk > transfer tests, the transfer saved by not shovig headers across the bus > is what 54/1448 or ~3.75% > > spreading interrupts via MSI-X seems nice and all, but i keep wondering > if the header field-based distribution that is (will be) done by the > NICs is putting the cart before the horse - should the NIC essentially > be telling the system the CPU on which to run the application, or should > the CPU on which the application runs be telling "networking" where it > should be happening? > > rick jones > Ideally, TCP/IP is so mature that one should be able to tell some hardware state-machine "Connect with 123.555.44.333, port 23" and it signals via interrupt when that happens. Then one should be able to say "send these data to that address" or "fill this buffer with data from that address". All the networking could be done on the board, perhaps with a dedicated CPU (as is now done) or all in silicon. So, the driver end of the networking software just handles buffers. There are interrupts that show status such as completions or time-outs, trivial stuff. Cheers, Dick Johnson Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips). Warning : 98.36% of all statistics are fiction, book release in April. _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-24 16:59 ` linux-os (Dick Johnson) 2006-04-24 17:19 ` Rick Jones @ 2006-04-24 23:17 ` Michael Chan 2006-04-25 1:49 ` Auke Kok 2 siblings, 0 replies; 31+ messages in thread From: Michael Chan @ 2006-04-24 23:17 UTC (permalink / raw) To: linux-os (Dick Johnson) Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Mon, 2006-04-24 at 12:59 -0400, linux-os (Dick Johnson) wrote: > Message signaled interrupts are just a kudge to save a trace on a > PC board (read make junk cheaper still). They are not faster and > may even be slower. They will not be the salvation of any interrupt > latency problems. MSI has 2 very nice properties: MSI is never shared and MSI guarantees that all DMA activities before the MSI have completed. When you take advantage of these guarantees in your MSI handler, there can be noticeable improvements compared to using INTA. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-24 16:59 ` linux-os (Dick Johnson) 2006-04-24 17:19 ` Rick Jones 2006-04-24 23:17 ` Michael Chan @ 2006-04-25 1:49 ` Auke Kok 2006-04-25 11:29 ` linux-os (Dick Johnson) 2 siblings, 1 reply; 31+ messages in thread From: Auke Kok @ 2006-04-25 1:49 UTC (permalink / raw) To: linux-os (Dick Johnson) Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev linux-os (Dick Johnson) wrote: > On Mon, 24 Apr 2006, Auke Kok wrote: > >> Ingo Oeser wrote: >>> On Saturday, 22. April 2006 15:49, Jörn Engel wrote: >>>> That was another main point, yes. And the endpoints should be as >>>> little burden on the bottlenecks as possible. One bottleneck is the >>>> receive interrupt, which shouldn't wait for cachelines from other cpus >>>> too much. >>> Thats right. This will be made a non issue with early demuxing >>> on the NIC and MSI (or was it MSI-X?) which will select >>> the right CPU based on hardware channels. >> MSI-X. with MSI you still have only one cpu handling all MSI interrupts and >> that doesn't look any different than ordinary interrupts. MSI-X will allow >> much better interrupt handling across several cpu's. >> >> Auke >> - > > Message signaled interrupts are just a kudge to save a trace on a > PC board (read make junk cheaper still). yes. Also in PCI-Express there is no physical interrupt line anymore due to the architecture, so even classical interrupts are sent as "message" over the bus. > They are not faster and may even be slower. thus in the case of PCI-Express, MSI interrupts are just as fast as the ordinary ones. I have no numbers on whether MSI is faster or not then e.g. interrupts on PCI-X, but generally speaking, the PCI-Express bus is not designed to be "low latency" at all, at best it gives you X latency, where X is something like microseconds. The MSI message itself only takes 10-20 nanoseconds though, but all the handling probably adds a large factor to that (1000 or so). No clue on classical interrupt line latency - anyone? > They will not be the salvation of any interrupt latency problems. This is also not the problem - we really don't care that our 100.000 packets arrive 20usec slower per packet, just as long as the bus is not idle for those intervals. We would care a lot if 25.000 of those arrive directly at the proper CPU, without the need for one of the cpu's to arbitrate on every interrupt. That's the idea anyway. Nowadays with irq throttling we introduce a lot of designed latency anyway, especially with network devices. > The solutions for increasing networking speed, > where the bit-rate on the wire gets close to the bit-rate on the > bus, is to put more and more of the networking code inside the > network board. The CPU get interrupted after most things (like > network handshakes) are complete. That is a limited vision of the situation. You could argue that the current CPU's have so much power that they can easily do a lot of the processing instead of the hardware, and thus warm caches for userspace, setup sockets etc. This is the whole idea of Van Jacobsen's net channels. Putting more offloading into the hardware just brings so much problems with itself, that are just far easier solved in the OS. Cheers, Auke ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-25 1:49 ` Auke Kok @ 2006-04-25 11:29 ` linux-os (Dick Johnson) 2006-05-02 12:41 ` Vojtech Pavlik 0 siblings, 1 reply; 31+ messages in thread From: linux-os (Dick Johnson) @ 2006-04-25 11:29 UTC (permalink / raw) To: Auke Kok Cc: Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Mon, 24 Apr 2006, Auke Kok wrote: > linux-os (Dick Johnson) wrote: >> On Mon, 24 Apr 2006, Auke Kok wrote: >> >>> Ingo Oeser wrote: >>>> On Saturday, 22. April 2006 15:49, Jörn Engel wrote: >>>>> That was another main point, yes. And the endpoints should be as >>>>> little burden on the bottlenecks as possible. One bottleneck is the >>>>> receive interrupt, which shouldn't wait for cachelines from other cpus >>>>> too much. >>>> Thats right. This will be made a non issue with early demuxing >>>> on the NIC and MSI (or was it MSI-X?) which will select >>>> the right CPU based on hardware channels. >>> MSI-X. with MSI you still have only one cpu handling all MSI interrupts and >>> that doesn't look any different than ordinary interrupts. MSI-X will allow >>> much better interrupt handling across several cpu's. >>> >>> Auke >>> - >> >> Message signaled interrupts are just a kudge to save a trace on a >> PC board (read make junk cheaper still). > > yes. Also in PCI-Express there is no physical interrupt line anymore due to > the architecture, so even classical interrupts are sent as "message" over the bus. > >> They are not faster and may even be slower. > > thus in the case of PCI-Express, MSI interrupts are just as fast as the > ordinary ones. I have no numbers on whether MSI is faster or not then e.g. > interrupts on PCI-X, but generally speaking, the PCI-Express bus is not > designed to be "low latency" at all, at best it gives you X latency, where X > is something like microseconds. The MSI message itself only takes 10-20 > nanoseconds though, but all the handling probably adds a large factor to that > (1000 or so). No clue on classical interrupt line latency - anyone? > About 9 nanosecond per foot of FR-4 (G10) trace, plus the access time through the gate-arrays (about 20 ns) so, from the time a device needs the CPU, until it hits the interrupt pin, you have typically 30 to 50 nanoseconds. Of course the CPU is __much__ slower. However, these physical latencies are in series, cannot be compensated for because the CPU can't see into the future. >> They will not be the salvation of any interrupt latency problems. > > This is also not the problem - we really don't care that our 100.000 packets > arrive 20usec slower per packet, just as long as the bus is not idle for those > intervals. We would care a lot if 25.000 of those arrive directly at the > proper CPU, without the need for one of the cpu's to arbitrate on every > interrupt. That's the idea anyway. It forces driver-writers to loop in ISRs to handle new status changes that happened before an asserted interrupt even got to the CPU. This is bad. You end up polled in the ISR, with the interrupts off. Turning on the interrupts exacerbates the problem, you may never leave the ISR! It becomes the new "idle task". To properly use interrupts, the hardware latency must be less than the CPUs response to the hardware stimulus. > > Nowadays with irq throttling we introduce a lot of designed latency anyway, > especially with network devices. > >> The solutions for increasing networking speed, >> where the bit-rate on the wire gets close to the bit-rate on the >> bus, is to put more and more of the networking code inside the >> network board. The CPU get interrupted after most things (like >> network handshakes) are complete. > > That is a limited vision of the situation. You could argue that the current > CPU's have so much power that they can easily do a lot of the processing > instead of the hardware, and thus warm caches for userspace, setup sockets > etc. This is the whole idea of Van Jacobsen's net channels. Putting more > offloading into the hardware just brings so much problems with itself, that > are just far easier solved in the OS. > > > Cheers, > > Auke > Cheers, Dick Johnson Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips). Warning : 98.36% of all statistics are fiction, book release in April. _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-25 11:29 ` linux-os (Dick Johnson) @ 2006-05-02 12:41 ` Vojtech Pavlik 2006-05-02 15:58 ` Andi Kleen 0 siblings, 1 reply; 31+ messages in thread From: Vojtech Pavlik @ 2006-05-02 12:41 UTC (permalink / raw) To: linux-os (Dick Johnson) Cc: Auke Kok, Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Tue, Apr 25, 2006 at 07:29:40AM -0400, linux-os (Dick Johnson) wrote: > >> Message signaled interrupts are just a kudge to save a trace on a > >> PC board (read make junk cheaper still). > > > > yes. Also in PCI-Express there is no physical interrupt line anymore due to > > the architecture, so even classical interrupts are sent as "message" over the bus. > > > >> They are not faster and may even be slower. > > > > thus in the case of PCI-Express, MSI interrupts are just as fast as the > > ordinary ones. I have no numbers on whether MSI is faster or not then e.g. > > interrupts on PCI-X, but generally speaking, the PCI-Express bus is not > > designed to be "low latency" at all, at best it gives you X latency, where X > > is something like microseconds. The MSI message itself only takes 10-20 > > nanoseconds though, but all the handling probably adds a large factor to that > > (1000 or so). No clue on classical interrupt line latency - anyone? > > About 9 nanosecond per foot of FR-4 (G10) trace, plus the access time > through the gate-arrays (about 20 ns) so, from the time a device needs > the CPU, until it hits the interrupt pin, you have typically 30 to > 50 nanoseconds. Of course the CPU is __much__ slower. However, these > physical latencies are in series, cannot be compensated for because > the CPU can't see into the future. You seem to be missing the fact that most of todays interrupts are delivered through the APIC bus, which isn't fast at all. -- Vojtech Pavlik Director SuSE Labs ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-05-02 12:41 ` Vojtech Pavlik @ 2006-05-02 15:58 ` Andi Kleen 0 siblings, 0 replies; 31+ messages in thread From: Andi Kleen @ 2006-05-02 15:58 UTC (permalink / raw) To: Vojtech Pavlik Cc: linux-os (Dick Johnson), Auke Kok, Auke Kok, Ingo Oeser, Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev On Tuesday 02 May 2006 14:41, Vojtech Pavlik wrote: > You seem to be missing the fact that most of todays interrupts are > delivered through the APIC bus, which isn't fast at all. You mean slow right? Modern x86s (anything newer than a P3) generally don't have an separate APIC bus anymore but just send messages over their main processor connection. -Andi ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 13:29 ` Ingo Oeser 2006-04-22 13:49 ` Jörn Engel @ 2006-04-23 5:52 ` David S. Miller 2006-04-23 9:23 ` Avi Kivity 2 siblings, 0 replies; 31+ messages in thread From: David S. Miller @ 2006-04-23 5:52 UTC (permalink / raw) To: ioe-lkml; +Cc: joern, netdev, simlo, linux-kernel, mingo, netdev From: Ingo Oeser <ioe-lkml@rameria.de> Date: Sat, 22 Apr 2006 15:29:58 +0200 > On Saturday, 22. April 2006 13:48, Jörn Engel wrote: > > Unless I completely misunderstand something, one of the main points of > > the netchannels if to have *zero* fields written to by both producer > > and consumer. > > Hmm, for me the main point was to keep the complete processing > of a single packet within one CPU/Core where this is a non-issue. Both are the important issues. You move the bulk of the packet processing work to the end cores of the system, yes. But you do so with an enormously SMP friendly queue data structure so that it does not matter at all that the packet is received on one cpu, yet processed in socket context on another. If you elide either part of the implementation, you miss the entire point of net channels. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 13:29 ` Ingo Oeser 2006-04-22 13:49 ` Jörn Engel 2006-04-23 5:52 ` David S. Miller @ 2006-04-23 9:23 ` Avi Kivity 2 siblings, 0 replies; 31+ messages in thread From: Avi Kivity @ 2006-04-23 9:23 UTC (permalink / raw) To: Ingo Oeser Cc: Jörn Engel, Ingo Oeser, David S. Miller, simlo, linux-kernel, mingo, netdev Ingo Oeser wrote: > Hi Jörn, > > On Saturday, 22. April 2006 13:48, Jörn Engel wrote: > >> Unless I completely misunderstand something, one of the main points of >> the netchannels if to have *zero* fields written to by both producer >> and consumer. >> > > Hmm, for me the main point was to keep the complete processing > of a single packet within one CPU/Core where this is a non-issue. > But the interrupt for a packet can be received by cpu 0 whereas the rest of processing proceeds on cpu 1; so it still helps to keep the producer index and consumer index on separate cachelines. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 11:48 ` Jörn Engel 2006-04-22 13:29 ` Ingo Oeser @ 2006-04-23 5:51 ` David S. Miller 1 sibling, 0 replies; 31+ messages in thread From: David S. Miller @ 2006-04-23 5:51 UTC (permalink / raw) To: joern; +Cc: netdev, simlo, linux-kernel, mingo, netdev, ioe-lkml From: Jörn Engel <joern@wohnheim.fh-wedel.de> Date: Sat, 22 Apr 2006 13:48:46 +0200 > Unless I completely misunderstand something, one of the main points of > the netchannels if to have *zero* fields written to by both producer > and consumer. Receiving and sending a lot can be expected to be the > common case, so taking a performance hit in this case is hardly a good > idea. That's absolutely correct, this is absolutely critical to the implementation. If you're doing any atomic operations, or any write operations by both consumer and producer to the same cacheline, you've broken things :-) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-21 16:52 ` Ingo Oeser 2006-04-22 11:48 ` Jörn Engel @ 2006-04-23 5:56 ` David S. Miller 2006-04-23 14:15 ` Ingo Oeser 1 sibling, 1 reply; 31+ messages in thread From: David S. Miller @ 2006-04-23 5:56 UTC (permalink / raw) To: netdev; +Cc: simlo, linux-kernel, mingo, netdev, ioe-lkml From: Ingo Oeser <netdev@axxeo.de> Date: Fri, 21 Apr 2006 18:52:47 +0200 > nice to see you getting started with it. Thanks for reviewing. > I'm not sure about the queue logic there. > > 1867 /* Caller must have exclusive producer access to the netchannel. */ > 1868 int netchannel_enqueue(struct netchannel *np, struct netchannel_buftrailer *bp) > 1869 { > 1870 unsigned long tail; > 1871 > 1872 tail = np->netchan_tail; > 1873 if (tail == np->netchan_head) > 1874 return -ENOMEM; > > This looks wrong, since empty and full are the same condition in your > case. Thanks, that's obviously wrong. I'll try to fix this up. > What about sth. like > > struct netchannel { > /* This is only read/written by the writer (producer) */ > unsigned long write_ptr; > struct netchannel_buftrailer *netchan_queue[NET_CHANNEL_ENTRIES]; > > /* This is modified by both */ > atomic_t filled_entries; /* cache_line_align this? */ > > /* This is only read/written by the reader (consumer) */ > unsigned long read_ptr; > } As stated elsewhere, if you add atomic operations you break the entire idea of net channels. They are meant to be SMP efficient data structures where the producer has one cache line that only it dirties and the consumer has one cache line that likewise only it dirties. > If cacheline bouncing because of the shared filled_entries becomes an issue, > you are receiving or sending a lot. Cacheline bouncing is the core issue being addressed by this data structure, so we really can't consider your idea seriously. I've just got an off-by-one error, no need to wreck the entire data structure just to solve that :-) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-23 5:56 ` David S. Miller @ 2006-04-23 14:15 ` Ingo Oeser 0 siblings, 0 replies; 31+ messages in thread From: Ingo Oeser @ 2006-04-23 14:15 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, simlo, linux-kernel, mingo, netdev Hi Dave, On Sunday, 23. April 2006 07:56, David S. Miller wrote: > > If cacheline bouncing because of the shared filled_entries becomes an issue, > > you are receiving or sending a lot. > > Cacheline bouncing is the core issue being addressed by this > data structure, so we really can't consider your idea seriously. Ok, I can see it now more clearly. Many thanks for clearing that up in the other replies. I had a major misunderstanding there. > I've just got an off-by-one error, no need to wreck the entire > data structure just to solve that :-) Yes, you are right. But even then I can still implement the reserve/commit once you provide the helpers for producer_space and consumer_space. Regards Ingo Oeser ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-20 19:09 ` David S. Miller 2006-04-21 16:52 ` Ingo Oeser @ 2006-04-22 19:30 ` bert hubert 2006-04-23 5:53 ` David S. Miller 1 sibling, 1 reply; 31+ messages in thread From: bert hubert @ 2006-04-22 19:30 UTC (permalink / raw) To: David S. Miller; +Cc: simlo, linux-kernel, mingo, netdev On Thu, Apr 20, 2006 at 12:09:55PM -0700, David S. Miller wrote: > Going all the way to the socket is a large endeavor and will require a > lot of restructuring to do it right, so expect this to take on the > order of months. That's what you said about Niagara too :-) Good luck! -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-22 19:30 ` bert hubert @ 2006-04-23 5:53 ` David S. Miller 0 siblings, 0 replies; 31+ messages in thread From: David S. Miller @ 2006-04-23 5:53 UTC (permalink / raw) To: bert.hubert; +Cc: simlo, linux-kernel, mingo, netdev From: bert hubert <bert.hubert@netherlabs.nl> Date: Sat, 22 Apr 2006 21:30:24 +0200 > On Thu, Apr 20, 2006 at 12:09:55PM -0700, David S. Miller wrote: > > Going all the way to the socket is a large endeavor and will require a > > lot of restructuring to do it right, so expect this to take on the > > order of months. > > That's what you said about Niagara too :-) I'm just trying to keep the expectations low so it's easier to exceed them :-) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-20 16:29 Van Jacobson's net channels and real-time Esben Nielsen 2006-04-20 19:09 ` David S. Miller @ 2006-04-21 8:53 ` Jan Kiszka 2006-04-24 14:22 ` Esben Nielsen 1 sibling, 1 reply; 31+ messages in thread From: Jan Kiszka @ 2006-04-21 8:53 UTC (permalink / raw) To: Esben Nielsen; +Cc: linux-kernel, Ingo Molnar, David S. Miller 2006/4/20, Esben Nielsen <simlo@phys.au.dk>: > Before I start, where is VJ's code? I have not been able to find it anywhere. > > With the preempt-realtime branch maturing and finding it's way into the > mainline kernel, using Linux (without sub-kernels) for real-time applications > is becomming an realistic option without having to do a lot of hacks in the > kernel on your own. Well, commenting on this statement would likely create a thread of its own... > But the network stack could be improved and some of the > ideas in Van Jacobson's net channels could be usefull when receiving network > packages with real-time latencies. ... so it's better to focus on a fruitful discussion on these interesting ideas which may lay the ground for a future coexistence of both hard-RT and throughput optimised networking stacks in the mainline kernel. I'm slightly sceptical, but maybe I'll be proven wrong. My following remarks are biased toward hart-RT. What may appear problematic in this context could be a non-issue for scenarios where the overall throughput counts, not individual packet latencies. > > Finding the end point in the receive interrupt and send of the packet to > the receiving process directly is a good idea if it is fast enough to do > so in the interrupt context (and I think it can be done very fast). One This heavily depends on the protocol to parse. Single-packet messages based on TCP, UDP, or whatever, are yet easy to demux: some table for the frame type, some for the IP protocol, and another for the port (or an overall hash for a single table) -> here's the receiver. But now think of fragmented IP packets. The first piece can be assigned normally, but the succeeding fragments require a dynamically added detection in that critical demux path (IP fragments are identified by src+dest IP, protocol, and an arbitrary ID). Each pending chain of fragments for a netchannel would create yet another demux rule. But I'm also curious to see the code used for this by Van Jacobson. BTW, that's the issue we also face in RTnet when handling UDP/IP under hart-RT constraints. We avoid unbounded demux complexity by setting a hard limit on the number of open chains. If you want to have a look at the code: www.rtnet.org > problem in the current setup, is that everything has to go through the > soft interrupt. That is even if you make a completely new, non-IP > protocol, the latency for delivering the frame to your application is > still limited by the latency of the IP-stack because it still have to go > through soft irq which might be busy working on IP packages. Even if you > open a raw socket, the latency is limited to the latency of the soft irq. > At work we use a widely used commercial RTOS. It got exactly the same > problem of having every network packet being handled by the same thread. The question of _where_ to do that demultiplexing is actually not that critical as _how_ to do it - and with which complexity. For hard-RT, it should to be O(1) or, if not feasible, O(n), where n is only influenced by the RT applications and their traffic footprints, but not by an unknown set of non-RT applications and communication links. [Even with O(1) demux, the pure numbers of incoming non-RT packets can still cause QoS crosstalk - a different issue.] > > Buffer management is another issue. On the RTOS above you make a buffer pool > per network device for receiving packages. On Linux received packages are taken > from the global memory pool with GFP_ATOMIC. On both systems you can easily run > out of buffers if they are not freed back to the pool fast enough. In that > case you will just have to drop packages as they are received. Without > having the code to VJ's net channels, it looks like they solve the problem: > Each end receiver provides his own receive resources. If a receiver can't cope > with all the traffic, it will loose packages, the others wont. That makes it > safe to run important real-time traffic along with some unpredictable, low > priority TCP/IP traffic. If the TCP/IP receivers does not run fast enough, > their packages will be dropped, but the driver will not drop the real-time > packages. The nice thing about a real-time task is that you know it's latency > and therefore know how many receive buffers it needs to avoid loosing > packages in a worst case scenario. Yep, this is a core feature for RT networking. And this is essentially the way we handle it in RTnet for quite some time: "Here is a filled buffer for you. Give me an empty one from your pool, and it will be yours. If you can't, I'll drop it!" The existing concept works quite well for single consumers. But it's still a tricky thing when considering multiple consumers sharing a physical buffer. RTnet doesn't handle this so far (except for packet capturing). I have some rough ideas for a generic solution in my mind, but RTnet users didn't ask for this so far loudly, thus no further effort was spent on it. Actually the pre-allocation issue is not only limited to skb-based networking. It's one reason why we have separate RT Firewire and RT USB projects. The restrictions and modifications they require make them unhandy for standard use but perfectly fitting for deterministic applications. Ok, to sum up what I see as the core topics for the first steps: we need A) a mechanism to use pre-allocated buffers for certain communication links and B) a smart early-demux algorithm of manageable complexity which decides what receiver has to be accounted for an incoming packet. The former is widely a question of restructuring the existing code, but the latter is still unclear to me. Let me sketch my first idea: struct pattern { unsigned long offset; unsigned long mask; unsigned long value; /* buffer[offset] & mask == value? */ } struct rule { struct list_head rules; int pattern_count; struct pattern pattern[MAX_PATTERNS_PER_RULE]; struct netchannel *destination; } For each incoming packet, the NIC or a demux thread would then walk its list of rules, apply all patterns, and push the packet into the channel on match. Kind of generic and protocol-agnostic, but will surely not scale very well, specifically when allowing rules for fragmented messages popping up. An optimisation might be to use hard-coded pattern checks for the well-known protocols (UDP, TCP, IP fragment, etc.). But maybe I'm just overseeing THE simple solution of the problem now, am I? Once we had those two core features in the kernel, it would start making sense to think about how to manage other modifications to NIC drivers, protocols, and APIs gracefully that are required or desirable for hard-RT networking. Looking forward to further discussions! Jan ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-21 8:53 ` Jan Kiszka @ 2006-04-24 14:22 ` Esben Nielsen 2006-04-27 8:09 ` Jan Kiszka 0 siblings, 1 reply; 31+ messages in thread From: Esben Nielsen @ 2006-04-24 14:22 UTC (permalink / raw) To: Jan Kiszka; +Cc: Esben Nielsen, linux-kernel, Ingo Molnar, David S. Miller On Fri, 21 Apr 2006, Jan Kiszka wrote: > 2006/4/20, Esben Nielsen <simlo@phys.au.dk>: >> Before I start, where is VJ's code? I have not been able to find it anywhere. >> >> With the preempt-realtime branch maturing and finding it's way into the >> mainline kernel, using Linux (without sub-kernels) for real-time applications >> is becomming an realistic option without having to do a lot of hacks in the >> kernel on your own. > > Well, commenting on this statement would likely create a thread of its own... We have had that last year I think... > >> But the network stack could be improved and some of the >> ideas in Van Jacobson's net channels could be usefull when receiving network >> packages with real-time latencies. > > ... so it's better to focus on a fruitful discussion on these > interesting ideas which may lay the ground for a future coexistence of > both hard-RT and throughput optimised networking stacks in the > mainline kernel. I'm slightly sceptical, but maybe I'll be proven > wrong. Scaling over many CPUs and RT share some common techniques. The two goals are not the same but they are not completely opposit, nor orthorgonal goals, but rather like two arrows pointing in the same general direction. > > My following remarks are biased toward hart-RT. What may appear > problematic in this context could be a non-issue for scenarios where > the overall throughput counts, not individual packet latencies. > >> >> Finding the end point in the receive interrupt and send of the packet to >> the receiving process directly is a good idea if it is fast enough to do >> so in the interrupt context (and I think it can be done very fast). One > > This heavily depends on the protocol to parse. Single-packet messages > based on TCP, UDP, or whatever, are yet easy to demux: some table for > the frame type, some for the IP protocol, and another for the port (or > an overall hash for a single table) -> here's the receiver. > > But now think of fragmented IP packets. The first piece can be > assigned normally, but the succeeding fragments require a dynamically > added detection in that critical demux path (IP fragments are > identified by src+dest IP, protocol, and an arbitrary ID). Each > pending chain of fragments for a netchannel would create yet another > demux rule. But I'm also curious to see the code used for this by Van > Jacobson. Turn off fragmentation :-) Web servers often do that (giving a lot of trouble to pppoe users). IPv6 is also defined without fragmentation at this level, right? A good first solution would be to send framented IP through the usual IP stack. > > BTW, that's the issue we also face in RTnet when handling UDP/IP under > hart-RT constraints. We avoid unbounded demux complexity by setting a > hard limit on the number of open chains. If you want to have a look at > the code: www.rtnet.org I am only on the net about 30 min every 2nd day. I write mails offline and send them later - that is why I am so late at answering. > >> problem in the current setup, is that everything has to go through the >> soft interrupt. That is even if you make a completely new, non-IP >> protocol, the latency for delivering the frame to your application is >> still limited by the latency of the IP-stack because it still have to go >> through soft irq which might be busy working on IP packages. Even if you >> open a raw socket, the latency is limited to the latency of the soft irq. >> At work we use a widely used commercial RTOS. It got exactly the same >> problem of having every network packet being handled by the same thread. > > The question of _where_ to do that demultiplexing is actually not that > critical as _how_ to do it - and with which complexity. For hard-RT, > it should to be O(1) or, if not feasible, O(n), where n is only > influenced by the RT applications and their traffic footprints, but > not by an unknown set of non-RT applications and communication links. > [Even with O(1) demux, the pure numbers of incoming non-RT packets can > still cause QoS crosstalk - a different issue.] Yep, ofcourse. But not obviouse to people in not familiar with deterministic RT. I assume that you mean the same by "hard RT" as I mean by "deterministic RT". Old discussions on lkml has shown that there is a lot of disagreement about what is meant :-) > >> >> Buffer management is another issue. On the RTOS above you make a buffer pool >> per network device for receiving packages. On Linux received packages are taken >> from the global memory pool with GFP_ATOMIC. On both systems you can easily run >> out of buffers if they are not freed back to the pool fast enough. In that >> case you will just have to drop packages as they are received. Without >> having the code to VJ's net channels, it looks like they solve the problem: >> Each end receiver provides his own receive resources. If a receiver can't cope >> with all the traffic, it will loose packages, the others wont. That makes it >> safe to run important real-time traffic along with some unpredictable, low >> priority TCP/IP traffic. If the TCP/IP receivers does not run fast enough, >> their packages will be dropped, but the driver will not drop the real-time >> packages. The nice thing about a real-time task is that you know it's latency >> and therefore know how many receive buffers it needs to avoid loosing >> packages in a worst case scenario. > > Yep, this is a core feature for RT networking. And this is essentially > the way we handle it in RTnet for quite some time: "Here is a filled > buffer for you. Give me an empty one from your pool, and it will be > yours. If you can't, I'll drop it!" The existing concept works quite > well for single consumers. But it's still a tricky thing when > considering multiple consumers sharing a physical buffer. RTnet > doesn't handle this so far (except for packet capturing). I have some > rough ideas for a generic solution in my mind, but RTnet users didn't > ask for this so far loudly, thus no further effort was spent on it. > Exchanging skbs instead of simply handing over skbs is ofcourse a good idea, but it will slow down the stack slightly. VJ _might_ have made stuff more effective. > Actually the pre-allocation issue is not only limited to skb-based > networking. It's one reason why we have separate RT Firewire and RT USB > projects. The restrictions and modifications they require make them > unhandy for standard use but perfectly fitting for deterministic > applications. > > > Ok, to sum up what I see as the core topics for the first steps: we > need A) a mechanism to use pre-allocated buffers for certain > communication links and B) a smart early-demux algorithm of manageable > complexity which decides what receiver has to be accounted for an > incoming packet. > > The former is widely a question of restructuring the existing code, > but the latter is still unclear to me. Let me sketch my first idea: > > struct pattern { > unsigned long offset; > unsigned long mask; > unsigned long value; /* buffer[offset] & mask == value? */ > } > > struct rule { > struct list_head rules; > int pattern_count; > struct pattern pattern[MAX_PATTERNS_PER_RULE]; > struct netchannel *destination; > } > > For each incoming packet, the NIC or a demux thread would then walk > its list of rules, apply all patterns, and push the packet into the > channel on match. Kind of generic and protocol-agnostic, but will > surely not scale very well, specifically when allowing rules for > fragmented messages popping up. An optimisation might be to use > hard-coded pattern checks for the well-known protocols (UDP, TCP, IP > fragment, etc.). But maybe I'm just overseeing THE simple solution of > the problem now, am I? > I came up with a simple, quite general idea - but not general enough to include fragmentation. See below. > Once we had those two core features in the kernel, it would start > making sense to think about how to manage other modifications to NIC > drivers, protocols, and APIs gracefully that are required or desirable > for hard-RT networking. > > Looking forward to further discussions! > You will have it :-) > Jan > Here is a simple filter idea. The kernel have to put a maximum filter length to make the filtering deterministic. filter.h: ---------------------------------------------------------------------- /* * Copyright (c) 2006 Esben Nielsen * * Distributeable under GPL * * Released under the terms of the GNU GPL v2.0. */ #ifndef FILTER_H #define FILTER_H enum rx_action_type { LOOK_AT_BYTE, GIVE_TO_CHANNEL }; struct rx_action { int usage; enum rx_action_type type; union { struct { unsigned long offset; struct rx_action *actions[256]; struct rx_action *not_that_long; } look_at_byte; struct { struct netchannel *channel; struct rx_action *cont; } give_to_channel; } args; }; #endif ------------------------------------------------------------------------ filter.c: ----------------------------------------------------------------------- /* * Copyright (c) 2006 Esben Nielsen * * Distributeable under GPL * * Released under the terms of the GNU GPL v2.0. */ #include <stdlib.h> #include <string.h> #include <stdint.h> #include <errno.h> #include "filter.h" void do_rx_actions(struct rx_action *act, const unsigned char *data, unsigned long int length) { start: switch(act->type) { case LOOK_AT_BYTE: if(act->args.look_at_byte.offset >= length) act = act->args.look_at_byte.not_that_long; else act = act->args.look_at_byte.actions [data[act->args.look_at_byte.offset]]; goto start; case GIVE_TO_CHANNEL: netchannel_receive(act->args.give_to_channel.channel, data, length); act = act->args.give_to_channel.cont; if(act) goto start; break; default: BUG_ON(1); } } extern struct netchannel * const default_netchannel; struct rx_action default_action; struct rx_action *get_action(struct rx_action *act) { act->usage++; return act; } struct rx_action *alloc_rx_action(enum rx_action_type type) { struct rx_action *res = malloc(sizeof(struct rx_action)); if(res) { int i; res->usage = 1; res->type = type; switch(type) { case LOOK_AT_BYTE: for(i=0;i<256;i++) res->args.look_at_byte.actions[i] = get_action(&default_action); res->args.look_at_byte.not_that_long = get_action(&default_action); break; case GIVE_TO_CHANNEL: res->args.give_to_channel.channel = default_netchannel; res->args.give_to_channel.cont = NULL; break; default: BUG_ON(1); } } return res; } void free_rx_action(struct rx_action **a_ref) { int i; struct rx_action *a = *a_ref; *a_ref = NULL; if(!a) return; a->usage--; if(a->usage) return; switch(a->type) { case LOOK_AT_BYTE: for(i=0; i<256;i++) free_rx_action(&a->args.look_at_byte.actions[i]); free_rx_action(&a->args.look_at_byte.not_that_long); break; case GIVE_TO_CHANNEL: free_rx_action(&a->args.give_to_channel.cont);; break; default: BUG_ON(1); } free(a); } struct rx_action *make_look_at_byte(unsigned long offset, unsigned char val, struct rx_action *todo) { struct rx_action *act; if(!todo) return NULL; act = alloc_rx_action(LOOK_AT_BYTE); if( !act) { free_rx_action(&todo); return NULL; } act->args.look_at_byte.offset = offset; free_rx_action(&act->args.look_at_byte.actions[val]); act->args.look_at_byte.actions[val] = todo; return act; } struct rx_action *ethernet_to_ip(struct rx_action *todo) { return make_look_at_byte( 12, 0x08, make_look_at_byte( 13, 0, todo) ); } struct rx_action *ethernet_to_udp(struct rx_action *todo) { return ethernet_to_ip(make_look_at_byte( 23, 17 /* IPPROTO_UDP */, todo)); } struct rx_action *ethernet_to_udp_port(struct rx_action *todo, uint16_t port) { return ethernet_to_udp (make_look_at_byte( 36, port>>8, make_look_at_byte(37,port & 0xFF,todo))); } struct rx_action *merge_rx_actions(struct rx_action *act1, struct rx_action *act2); struct rx_action *merge_give_to_channel(struct rx_action *give, struct rx_action *other) { int was_not_zero; struct rx_action *res = alloc_rx_action(GIVE_TO_CHANNEL); if(!res) return NULL; BUG_ON(give->type!=GIVE_TO_CHANNEL); res->args.give_to_channel.channel = give->args.give_to_channel.channel; was_not_zero = (res->args.give_to_channel.cont != NULL); res->args.give_to_channel.cont = merge_rx_actions(give->args.give_to_channel.cont, other); if( was_not_zero && !res->args.give_to_channel.cont ) { free_rx_action(&res); return NULL; } return res; } struct rx_action *merge_rx_actions(struct rx_action *act1, struct rx_action *act2) { if( !act1 || act1 == &default_action) return act2 ? get_action(act2) : NULL; if( !act2 || act2 == &default_action) return get_action(act1); switch(act1->type) { case LOOK_AT_BYTE: switch(act2->type) { case LOOK_AT_BYTE: if( act1->args.look_at_byte.offset == act2->args.look_at_byte.offset ) { int i; struct rx_action *res = alloc_rx_action(LOOK_AT_BYTE); if(!res) return NULL; res->args.look_at_byte.offset = act1->args.look_at_byte.offset; for(i=0; i<256; i++) { free_rx_action(&res->args.look_at_byte. actions[i]); res->args.look_at_byte.actions[i] = merge_rx_actions ( act1->args.look_at_byte.actions[i], act2->args.look_at_byte.actions[i]); if(!res->args.look_at_byte.actions[i]) { free_rx_action(&res); return NULL; } } res->args.look_at_byte.not_that_long = merge_rx_actions ( act1->args.look_at_byte.not_that_long, act2->args.look_at_byte.not_that_long); if(!res->args.look_at_byte.not_that_long) { free_rx_action(&res); return NULL; } return res; } if( act2->args.look_at_byte.offset < act1->args.look_at_byte.offset ) { struct rx_action *tmp; tmp = act1; act1 = act2; act2 = tmp; } if( act1->args.look_at_byte.offset < act2->args.look_at_byte.offset ) { int i; struct rx_action *res = alloc_rx_action(LOOK_AT_BYTE); if(!res) return NULL; res->args.look_at_byte.offset = act1->args.look_at_byte.offset; for(i=0; i<256; i++) { free_rx_action(&res->args.look_at_byte. actions[i]); res->args.look_at_byte.actions[i] = merge_rx_actions(act1->args.look_at_byte.actions[i],act2); if(!res->args.look_at_byte.actions[i]) { free_rx_action(&res); return NULL; } } res->args.look_at_byte.not_that_long = merge_rx_actions(act1->args.look_at_byte.not_that_long, act2); if(!res->args.look_at_byte.not_that_long) { free_rx_action(&res); return NULL; } return res; } else BUG_ON(1); case GIVE_TO_CHANNEL: return merge_give_to_channel(act2,act1); } BUG_ON(1); break; case GIVE_TO_CHANNEL: return merge_give_to_channel(act1,act2); } BUG_ON(1); return NULL; } void init_rx_actions() { default_action.usage = 1; default_action.type = GIVE_TO_CHANNEL; default_action.args.give_to_channel.channel = default_netchannel; default_action.args.give_to_channel.cont = NULL; } struct netdevice { struct rx_action *action; struct mutex *change_action_lock; }; int add_action_to_device(struct netdevice *dev, struct rx_action *act) { struct rx_action *new, *old; mutex_lock(&dev->change_action_lock); new = merge_rx_actions(dev->action, act); if(!new) { mutex_unlock(&dev->change_action_lock); return -ENOMEM; } old = dev->action; dev->action = new; mutex_unlock(&dev->change_action_lock); syncronize_rcu(); free_rx_action(&old); return 0; } ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-24 14:22 ` Esben Nielsen @ 2006-04-27 8:09 ` Jan Kiszka 2006-04-27 8:16 ` David S. Miller 0 siblings, 1 reply; 31+ messages in thread From: Jan Kiszka @ 2006-04-27 8:09 UTC (permalink / raw) To: Esben Nielsen; +Cc: linux-kernel, Ingo Molnar, David S. Miller 2006/4/24, Esben Nielsen <simlo@phys.au.dk>: > On Fri, 21 Apr 2006, Jan Kiszka wrote: > > > 2006/4/20, Esben Nielsen <simlo@phys.au.dk>: > >> > >> Finding the end point in the receive interrupt and send of the packet to > >> the receiving process directly is a good idea if it is fast enough to do > >> so in the interrupt context (and I think it can be done very fast). One > > > > This heavily depends on the protocol to parse. Single-packet messages > > based on TCP, UDP, or whatever, are yet easy to demux: some table for > > the frame type, some for the IP protocol, and another for the port (or > > an overall hash for a single table) -> here's the receiver. > > > > But now think of fragmented IP packets. The first piece can be > > assigned normally, but the succeeding fragments require a dynamically > > added detection in that critical demux path (IP fragments are > > identified by src+dest IP, protocol, and an arbitrary ID). Each > > pending chain of fragments for a netchannel would create yet another > > demux rule. But I'm also curious to see the code used for this by Van > > Jacobson. > > Turn off fragmentation :-) Web servers often do that (giving a lot of > trouble to pppoe users). IPv6 is also defined without fragmentation at > this level, right? As far as I see it the demux situation is not that different with IPv6 - as long as you do not prepare the fragments specifically. I'm thinking of IP options carrying the destination port which is so far only contained in the first fragment. But such tweaks only work if all participants follow the rules. Anyway, worth to keep in mind. > A good first solution would be to send framented IP through the usual IP > stack. > Although this excludes protocols which exploit this feature. But you are right, one problem after the other. > > > > > BTW, that's the issue we also face in RTnet when handling UDP/IP under > > hart-RT constraints. We avoid unbounded demux complexity by setting a > > hard limit on the number of open chains. If you want to have a look at > > the code: www.rtnet.org > > I am only on the net about 30 min every 2nd day. I write mails offline and > send them later - that is why I am so late at answering. > Different situation on my side - same effect. :-/ > > > >> problem in the current setup, is that everything has to go through the > >> soft interrupt. That is even if you make a completely new, non-IP > >> protocol, the latency for delivering the frame to your application is > >> still limited by the latency of the IP-stack because it still have to go > >> through soft irq which might be busy working on IP packages. Even if you > >> open a raw socket, the latency is limited to the latency of the soft irq. > >> At work we use a widely used commercial RTOS. It got exactly the same > >> problem of having every network packet being handled by the same thread. > > > > The question of _where_ to do that demultiplexing is actually not that > > critical as _how_ to do it - and with which complexity. For hard-RT, > > it should to be O(1) or, if not feasible, O(n), where n is only > > influenced by the RT applications and their traffic footprints, but > > not by an unknown set of non-RT applications and communication links. > > [Even with O(1) demux, the pure numbers of incoming non-RT packets can > > still cause QoS crosstalk - a different issue.] > > Yep, ofcourse. But not obviouse to people in not familiar with > deterministic RT. I assume that you mean the same by "hard RT" as I mean > by "deterministic RT". Old discussions on lkml has shown that there is a > lot of disagreement about what is meant :-) I tend to be sluggish, I know. Actually, when being pedantic, soft RT can also be deterministic in failing to meet the specified deadline once in a while :). But I'm sure we mean the same: the required logical and temporal properties must always be fulfilled, i.e. without even rare exceptions. > > > > >> > >> Buffer management is another issue. On the RTOS above you make a buffer pool > >> per network device for receiving packages. On Linux received packages are taken > >> from the global memory pool with GFP_ATOMIC. On both systems you can easily run > >> out of buffers if they are not freed back to the pool fast enough. In that > >> case you will just have to drop packages as they are received. Without > >> having the code to VJ's net channels, it looks like they solve the problem: > >> Each end receiver provides his own receive resources. If a receiver can't cope > >> with all the traffic, it will loose packages, the others wont. That makes it > >> safe to run important real-time traffic along with some unpredictable, low > >> priority TCP/IP traffic. If the TCP/IP receivers does not run fast enough, > >> their packages will be dropped, but the driver will not drop the real-time > >> packages. The nice thing about a real-time task is that you know it's latency > >> and therefore know how many receive buffers it needs to avoid loosing > >> packages in a worst case scenario. > > > > Yep, this is a core feature for RT networking. And this is essentially > > the way we handle it in RTnet for quite some time: "Here is a filled > > buffer for you. Give me an empty one from your pool, and it will be > > yours. If you can't, I'll drop it!" The existing concept works quite > > well for single consumers. But it's still a tricky thing when > > considering multiple consumers sharing a physical buffer. RTnet > > doesn't handle this so far (except for packet capturing). I have some > > rough ideas for a generic solution in my mind, but RTnet users didn't > > ask for this so far loudly, thus no further effort was spent on it. > > > > Exchanging skbs instead of simply handing over skbs is ofcourse a good > idea, but it will slow down the stack slightly. VJ _might_ have made > stuff more effective. I'm scratching my head, digging for the reasons why I once considered and then dropped the idea of accounting, i.e. maintaining counters instead of passing empty buffers. One reason likely was that RTnet is not built on top of strict one-way channels. This means you either have to maintain a central pool for free buffers (not that scalable) or will run into troubles having enough real buffers in the local pools of NICs, sockets, etc. when they are actually needed. Might be feasible, though, under the constraint of single producer / single consumer. > > I came up with a simple, quite general idea - but not general enough > to include fragmentation. See below. > I like it! It's a bit memory-hungry, but I guess this can be improved. See data structure suggestions below. > > Once we had those two core features in the kernel, it would start > > making sense to think about how to manage other modifications to NIC > > drivers, protocols, and APIs gracefully that are required or desirable > > for hard-RT networking. > > > > Looking forward to further discussions! > > > You will have it :-) Oh, yeah, I'm afraid. ;) > > > Jan > > > > Here is a simple filter idea. The kernel have to put a maximum > filter length to make the filtering deterministic. > > filter.h: > ---------------------------------------------------------------------- > /* > * Copyright (c) 2006 Esben Nielsen > * > * Distributeable under GPL > * > * Released under the terms of the GNU GPL v2.0. > */ > > #ifndef FILTER_H > #define FILTER_H > > enum rx_action_type > { > LOOK_AT_BYTE, > GIVE_TO_CHANNEL > }; I would additionally introduce COMPARE_BYTE in order to replace table-based lookup in case the number of different bytes is simply too small. Look e.g. at the high byte of the Ethernet frame type considering ETH_P_ARP and ETH_P_IP (the common case) - they are identical. > > struct rx_action > { > int usage; > enum rx_action_type type; > union > { > struct { > unsigned long offset; > struct rx_action *actions[256]; > struct rx_action *not_that_long; > } look_at_byte; > struct { > struct netchannel *channel; > struct rx_action *cont; > } give_to_channel; > } args; > }; What about this: struct rx_action_hdr { int usage; enum rx_action_type type; } struct rx_demux_byte { struct rx_action_hdr hdr; unsigned long offset; struct rx_action *actions[256]; struct rx_action *not_that_long; } struct rx_compare_byte { struct rx_action_hdr hdr; unsigned long offset; unsigned char value; struct rx_action *action; struct rx_action *not_that_long; } struct rx_give_to_channel{ struct rx_action_hdr hdr; struct netchannel *channel; struct rx_action *cont; } Sorry, I haven't looked at further details of your implementation yet. Jan ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-27 8:09 ` Jan Kiszka @ 2006-04-27 8:16 ` David S. Miller 2006-04-27 10:00 ` Jan Kiszka 0 siblings, 1 reply; 31+ messages in thread From: David S. Miller @ 2006-04-27 8:16 UTC (permalink / raw) To: jan.kiszka; +Cc: simlo, linux-kernel, mingo From: "Jan Kiszka" <jan.kiszka@googlemail.com> Date: Thu, 27 Apr 2006 10:09:06 +0200 > What about this: Can I recommend a trip to the local university engineering library for a quick readup on the current state of the art wrt. packet classification algorithms? Barring that, a read of chapter 12 "Packet Classification" from Networking Algorithmics will give you a great primer. I'm suggesting this, because all I see is fishing around with painfully inefficient algorithms. In any event, the initial net channel implementation will likely just do straight socket hash lookups identical to how TCP does socket lookups in the current stack. Full match on established sockets, and failing that fall back to the listening socket lookup which allows some forms of wildcarding. Thanks. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-27 8:16 ` David S. Miller @ 2006-04-27 10:00 ` Jan Kiszka 2006-04-27 19:50 ` David S. Miller 0 siblings, 1 reply; 31+ messages in thread From: Jan Kiszka @ 2006-04-27 10:00 UTC (permalink / raw) To: David S. Miller; +Cc: simlo, linux-kernel, mingo 2006/4/27, David S. Miller <davem@davemloft.net>: > > Can I recommend a trip to the local university engineering library for > a quick readup on the current state of the art wrt. packet > classification algorithms? > > Barring that, a read of chapter 12 "Packet Classification" > from Networking Algorithmics will give you a great primer. > > I'm suggesting this, because all I see is fishing around with > painfully inefficient algorithms. > > In any event, the initial net channel implementation will likely just > do straight socket hash lookups identical to how TCP does socket > lookups in the current stack. Full match on established sockets, and > failing that fall back to the listening socket lookup which allows > some forms of wildcarding. > Sorry that you had to remind of the different primary goals. I think we may look for something pluggable to support both large-scale rule tables as well as small ones for embedded RT-systems. Jan ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Van Jacobson's net channels and real-time 2006-04-27 10:00 ` Jan Kiszka @ 2006-04-27 19:50 ` David S. Miller 0 siblings, 0 replies; 31+ messages in thread From: David S. Miller @ 2006-04-27 19:50 UTC (permalink / raw) To: jan.kiszka; +Cc: simlo, linux-kernel, mingo From: "Jan Kiszka" <jan.kiszka@googlemail.com> Date: Thu, 27 Apr 2006 12:00:53 +0200 > Sorry that you had to remind of the different primary goals. I think > we may look for something pluggable to support both large-scale rule > tables as well as small ones for embedded RT-systems. Even for such objectives, very specific understanding exists in the algorithmic community for what is known to work best for various kinds of packet classification. ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <63KcN-6lD-25@gated-at.bofh.it>]
[parent not found: <64wrg-2cg-41@gated-at.bofh.it>]
[parent not found: <64wAE-2Cs-9@gated-at.bofh.it>]
[parent not found: <64AkV-8cG-7@gated-at.bofh.it>]
[parent not found: <65cqo-5tR-33@gated-at.bofh.it>]
[parent not found: <65cJF-66i-11@gated-at.bofh.it>]
* Re: Van Jacobson's net channels and real-time [not found] ` <65cJF-66i-11@gated-at.bofh.it> @ 2006-04-24 23:48 ` Robert Hancock 0 siblings, 0 replies; 31+ messages in thread From: Robert Hancock @ 2006-04-24 23:48 UTC (permalink / raw) To: linux-kernel linux-os (Dick Johnson) wrote: > Message signaled interrupts are just a kudge to save a trace on a > PC board (read make junk cheaper still). They are not faster and > may even be slower. Save a trace on the PC board? How about no, since the devices still need to support INTX interrupts anyway. And yes, they can be faster, mainly because being an in-band signal it simplifies some PCI posting related issues, and there is no need to worry about sharing. > They will not be the salvation of any interrupt > latency problems. The solutions for increasing networking speed, > where the bit-rate on the wire gets close to the bit-rate on the > bus, is to put more and more of the networking code inside the > network board. The CPU get interrupted after most things (like > network handshakes) are complete. You mean like these? http://linux-net.osdl.org/index.php/TOE -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2006-05-02 16:03 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-20 16:29 Van Jacobson's net channels and real-time Esben Nielsen
2006-04-20 19:09 ` David S. Miller
2006-04-21 16:52 ` Ingo Oeser
2006-04-22 11:48 ` Jörn Engel
2006-04-22 13:29 ` Ingo Oeser
2006-04-22 13:49 ` Jörn Engel
2006-04-23 0:05 ` Ingo Oeser
2006-04-23 5:50 ` David S. Miller
2006-04-24 16:42 ` Auke Kok
2006-04-24 16:59 ` linux-os (Dick Johnson)
2006-04-24 17:19 ` Rick Jones
2006-04-24 18:12 ` linux-os (Dick Johnson)
2006-04-24 23:17 ` Michael Chan
2006-04-25 1:49 ` Auke Kok
2006-04-25 11:29 ` linux-os (Dick Johnson)
2006-05-02 12:41 ` Vojtech Pavlik
2006-05-02 15:58 ` Andi Kleen
2006-04-23 5:52 ` David S. Miller
2006-04-23 9:23 ` Avi Kivity
2006-04-23 5:51 ` David S. Miller
2006-04-23 5:56 ` David S. Miller
2006-04-23 14:15 ` Ingo Oeser
2006-04-22 19:30 ` bert hubert
2006-04-23 5:53 ` David S. Miller
2006-04-21 8:53 ` Jan Kiszka
2006-04-24 14:22 ` Esben Nielsen
2006-04-27 8:09 ` Jan Kiszka
2006-04-27 8:16 ` David S. Miller
2006-04-27 10:00 ` Jan Kiszka
2006-04-27 19:50 ` David S. Miller
[not found] <63KcN-6lD-25@gated-at.bofh.it>
[not found] ` <64wrg-2cg-41@gated-at.bofh.it>
[not found] ` <64wAE-2Cs-9@gated-at.bofh.it>
[not found] ` <64AkV-8cG-7@gated-at.bofh.it>
[not found] ` <65cqo-5tR-33@gated-at.bofh.it>
[not found] ` <65cJF-66i-11@gated-at.bofh.it>
2006-04-24 23:48 ` Robert Hancock
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).