* [Question] Explanation of zero-copy networking @ 2001-05-07 13:43 Alexander Eichhorn 2001-05-07 13:56 ` Alan Cox 0 siblings, 1 reply; 18+ messages in thread From: Alexander Eichhorn @ 2001-05-07 13:43 UTC (permalink / raw) To: linux-kernel Hi all, we are currently developing (as part of my dissertation) a research-platform to study some new ideas in constructing transport systems to support applications with realtime-requirements (e.g. multimedia) and new networking technologies. The test-platform consists of typical multimedia-elements, such as sources, filters, sinks and transport-modules, which can be distributed across a set of computers. To achieve the principle of sparing ressource-usage - which we consider fundamental for multimedia-systems - we are looking for new (already implemented or planned) mechanisms to avoid copying the data-streams where possible (Device-IO, especially Network-IO; User-User-IPC). That's why i'd like to ask if one of the net-core developers could give us a (more or less - depends on what you've documented so far) detailed description of the newly implemented zero-copy mechanisms in the network-stack. We are interested in how to use it (changed network-API?) and also in the internal architecture. We already had a look at the kernel mailingslist archieves and some search machines, but all we found were some fragments of the puzzle only. Before digging into the sourcecode we try this way to get an overall description. Our second question: Are there any plans for contructing a general copy-avoidance infrastructure (smth. like UVM in NetBSD does) and new IPC-mechanisms on top of it yet?? Thanks in advance. Alexander Eichhorn -- Alexander Eichhorn Technical University of Ilmenau Computer Science And Automation Faculty Distributed Systems and Operating Systems Department Phone +49 3677 69 4557, Fax +49 3677 69 4541 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 13:43 [Question] Explanation of zero-copy networking Alexander Eichhorn @ 2001-05-07 13:56 ` Alan Cox 2001-05-07 16:12 ` Richard B. Johnson 2001-05-07 18:21 ` dean gaudet 0 siblings, 2 replies; 18+ messages in thread From: Alan Cox @ 2001-05-07 13:56 UTC (permalink / raw) To: alexander.eichhorn; +Cc: linux-kernel > documented so far) detailed description of the newly > implemented zero-copy mechanisms in the network-stack. > We are interested in how to use it (changed network-API?) > and also in the internal architecture. It is built around sendfile. Trying to do zero copy on pages with user space mappings get so horribly non pretty it is better to build the API from the physical side of things. > Our second question: Are there any plans for contructing > a general copy-avoidance infrastructure (smth. like UVM in > NetBSD does) and new IPC-mechanisms on top of it yet?? Andrea Arcangeli has O_DIRECT file I/O for the ext2 file system. There are also several patches for kiovec based single copy pipes have been posted too. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 13:56 ` Alan Cox @ 2001-05-07 16:12 ` Richard B. Johnson 2001-05-07 17:53 ` Francois Romieu ` (3 more replies) 2001-05-07 18:21 ` dean gaudet 1 sibling, 4 replies; 18+ messages in thread From: Richard B. Johnson @ 2001-05-07 16:12 UTC (permalink / raw) To: Alan Cox; +Cc: alexander.eichhorn, linux-kernel On Mon, 7 May 2001, Alan Cox wrote: > > documented so far) detailed description of the newly > > implemented zero-copy mechanisms in the network-stack. > > We are interested in how to use it (changed network-API?) > > and also in the internal architecture. > > It is built around sendfile. Trying to do zero copy on pages with user space > mappings get so horribly non pretty it is better to build the API from the > physical side of things. > > > Our second question: Are there any plans for contructing > > a general copy-avoidance infrastructure (smth. like UVM in > > NetBSD does) and new IPC-mechanisms on top of it yet?? > > Andrea Arcangeli has O_DIRECT file I/O for the ext2 file system. There are also > several patches for kiovec based single copy pipes have been posted too. > > The Networking RFCs talk about "not copying data" as they attempt to give pointers on improving network speed. However, PCI to memory copying runs at about 300 megabytes per second on modern PCs and memory to memory copying runs at over 1,000 megabytes per second. In the future, these speeds will increase. I don't advise retrofitting network code to improve the speed of older machines. Instead, time should be spent to improve the robustness and capability of the networking speed and accommodating the new breeds of GHz network boards. In case anybody is interested, Networking remains a serial communications element. As such, it functions as a low-pass filter. The speed of a serial communications link is set primarily by the dominant pole of the links transfer function, which in the frequency domain, is information_rate * 2. With 100 megabits/second link we have 200 MHz as the dominent pole. The 2 comes from Shannon, it takes 2 carrier events to determine if anything has changed (to transfer information). Therefore, if we can detect changes 100 million times per second, the information carrier must have been at least 200 MHz. This is the dominent pole. With a 300 Megabyte / second transfer via PCI, the information carrier must have been 300 * 8 * 2 = 4,800 MHz. This is 4,800/200 = 24 times the frequency of the dominent pole of the network transfer function. This is so far removed from the dominent pole of the system's transfer function that even doubling the PCI speed (66 MHz v.s. 33 MHz) will have no measurable affect upon networking speed. With existing kernels, you can perform network speed tests using "lo", removing the network board from the speed test. You will note that the network speed, due to software, is over 10 times faster, 30 times on some machines) than when the hardware I/O is used. This shows that the network code, alone, cannot be improved very much to provide an improvement in throughput. However, a new breed of GHz boards are now available. These boards have a dominent pole of 1000 * 2 = 2000 MHz. This is rougly one- half of the PCI bandwidth, and roughly the same as a 66 MHz bus. This is where some work needs to be done. Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 16:12 ` Richard B. Johnson @ 2001-05-07 17:53 ` Francois Romieu 2001-05-07 18:00 ` Blue Lang 2001-05-07 18:25 ` dean gaudet ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: Francois Romieu @ 2001-05-07 17:53 UTC (permalink / raw) To: alexander.eichhorn; +Cc: linux-kernel Richard B. Johnson <root@chaos.analogic.com> ecrit : [...] > when the hardware I/O is used. This shows that the network code, alone, > cannot be improved very much to provide an improvement in throughput. It shows that cached code performs well with ~0us latency device/memory. Networking is about latency and pps too. They both dramatically reduce the (axe-)evaluated bandwith. -- Ueimor ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 17:53 ` Francois Romieu @ 2001-05-07 18:00 ` Blue Lang 0 siblings, 0 replies; 18+ messages in thread From: Blue Lang @ 2001-05-07 18:00 UTC (permalink / raw) To: Francois Romieu; +Cc: alexander.eichhorn, linux-kernel On Mon, 7 May 2001, Francois Romieu wrote: > It shows that cached code performs well with ~0us latency device/memory. > > Networking is about latency and pps too. They both dramatically reduce > the (axe-)evaluated bandwith. I think his point is more along the lines of return on investment. You can tweak linux to move from 9MB/sec to 9.5MB/sec on a 100Mb link, or you can spend those same developer cycles getting much larger returns out of much sexier hardware. Now, who's gonna supply us with those NICs? ;) -- Blue Lang http://www.gator.net/~blue Unix Administrator Veritas Software 2315 McMullan Circle, Raleigh, North Carolina, USA 919 835 1540 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 16:12 ` Richard B. Johnson 2001-05-07 17:53 ` Francois Romieu @ 2001-05-07 18:25 ` dean gaudet 2001-05-07 19:54 ` Richard B. Johnson 2001-05-07 18:30 ` Pekka Pietikainen 2001-05-08 7:18 ` Jamie Lokier 3 siblings, 1 reply; 18+ messages in thread From: dean gaudet @ 2001-05-07 18:25 UTC (permalink / raw) To: Richard B. Johnson; +Cc: Alan Cox, alexander.eichhorn, linux-kernel On Mon, 7 May 2001, Richard B. Johnson wrote: > when the hardware I/O is used. This shows that the network code, alone, > cannot be improved very much to provide an improvement in throughput. doesn't your analysis assume that we've got nothing else interesting to do while doing the network i/o? for example, i may want to do something else which needs the memory bandwidth i'd otherwise spend on a single-copy... -dean ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 18:25 ` dean gaudet @ 2001-05-07 19:54 ` Richard B. Johnson 2001-05-07 20:23 ` dean gaudet ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Richard B. Johnson @ 2001-05-07 19:54 UTC (permalink / raw) To: dean gaudet; +Cc: Alan Cox, alexander.eichhorn, linux-kernel On Mon, 7 May 2001, dean gaudet wrote: > On Mon, 7 May 2001, Richard B. Johnson wrote: > > > when the hardware I/O is used. This shows that the network code, alone, > > cannot be improved very much to provide an improvement in throughput. > > doesn't your analysis assume that we've got nothing else interesting to do > while doing the network i/o? for example, i may want to do something else > which needs the memory bandwidth i'd otherwise spend on a single-copy... > > -dean Yes and no. It is assumed by most everybody that a single CPU cycle saved in doing something is automatically available for doing something else. This is never the case unless you have a completely polled OS environment that is not doing I/O. In any OS that preempts using a timer, CPU activity (actual work being done) bunches up around that timer interval. The same is true for interrupts. This happens because, to a single measurement task, the CPU seems slower as it keeps getting taken away. So, we end up with a lot of CPU activity bunched up around interrupts and timer-ticks, with not much happening elsewhere. In Unix, a system call does not produce a context-switch unless the task is required to sleep while waiting for I/O. So, the kernel is going to send a packet to another host on behalf of the system caller. It copies the data, (partial checksum) assembles the packet, finishes the checksum, then sends it. The CPU is given to somebody else while waiting for the packet to get somewhere and be ACKed. But, think about a server where EVERY task is waiting for I/O to complete! These CPU cycles, that you saved by eliminating a copy (or two), are now wasted spinning. Let's say the first packet got sent quicker because of the reduced latency of the copy. After that, you still are waiting for I/O. Reduced to the limit, look at using zero CPU cycles to send and receive packets. Now, with a server loaded to its natural ability, i.e., bandwidth limited by the round-trip loop band- width, you still have all the tasks waiting for I/O to complete. Basically, "no copy" is an academic exercise. It makes the first packet get sent more quickly, after which everything slows to the natural bandwidth of the system. If you used a server for multicast-only. In other words, you just spewed out unidirectional data, you still slow to the rate at which the media can take the data. And CPUs can obtain or generate these data a lot faster than 100-base can sink them. When we get to media that can sink data as fast as we can generate them (it), then we have to worry about memory copy speed. However, these new devices are actually an IP subsystem. They generate and receive entire datagrams. To fully utilize these devices, the data- gram generation and reception (the basis of all TCP/IP networking) will have to be moved out of the kernel and into these boards. The kernel code will only handle interfaces, connections, and rules. Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 19:54 ` Richard B. Johnson @ 2001-05-07 20:23 ` dean gaudet 2001-05-08 11:09 ` Bjorn Wesen 2001-05-08 17:30 ` Alexander Eichhorn 2 siblings, 0 replies; 18+ messages in thread From: dean gaudet @ 2001-05-07 20:23 UTC (permalink / raw) To: Richard B. Johnson; +Cc: Alan Cox, alexander.eichhorn, linux-kernel On Mon, 7 May 2001, Richard B. Johnson wrote: > When we get to media that can sink data as fast as we can generate > them (it), then we have to worry about memory copy speed. However, > these new devices are actually an IP subsystem. They generate and > receive entire datagrams. To fully utilize these devices, the data- > gram generation and reception (the basis of all TCP/IP networking) > will have to be moved out of the kernel and into these boards. The > kernel code will only handle interfaces, connections, and rules. heh, and then these things will be expensive, so few will buy them and they'll remain in older process technologies (like .21u) because there's no economy of scale, while CPUs jump ahead to fewer and fewer microns (.13u, .10u), and in a moore's law doubling or so someone will come up with the bright idea to move everything back to the CPU again and use mostly dumb i/o devices. (or they'll use a bunch of general purpose computers clustered behind inexpensive switches to achieve the same thing at a fraction of the cost.) we've never seen this happen before! :) -dean ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 19:54 ` Richard B. Johnson 2001-05-07 20:23 ` dean gaudet @ 2001-05-08 11:09 ` Bjorn Wesen 2001-05-08 17:30 ` Alexander Eichhorn 2 siblings, 0 replies; 18+ messages in thread From: Bjorn Wesen @ 2001-05-08 11:09 UTC (permalink / raw) To: Richard B. Johnson; +Cc: linux-kernel On Mon, 7 May 2001, Richard B. Johnson wrote: > Basically, "no copy" is an academic exercise. It makes the first > packet get sent more quickly, after which everything slows to > the natural bandwidth of the system. > > If you used a server for multicast-only. In other words, you > just spewed out unidirectional data, you still slow to the rate > at which the media can take the data. And CPUs can obtain or > generate these data a lot faster than 100-base can sink them. This is an awfully PC-centric way of putting things. You assume that the only ones who use Linux are those with a 1 ghz CPU and those 66 mhz PCI boards and whatever. You simply cannot make that assumption anymore; the diversity of Linux HW these days is so broad that the sweet spot between CPU cycles, memory bandwidth etc which controls the code optimization fluctuates wildly. A simple kernel profile of one of our embedded Linux systems for example show csum_partial_copy limiting the performance. Now for us zero-copy cannot be implemented anyway because we don't have a checksumming ethernet controller but if we had, we could enhance performance by 50% by skipping the copy perhaps. And there definitely are no 1 GHZ embedded CPU's in the same price range to choose instead, or Rambus memories etc.. raw power simply is not an option sometimes. It's still true of course that it's not obvious that the cycles spent on copying can be used for anything better in all cases. However, the beauty of open-source is that there is no need to debate over whether something should be done or not. If someone feels the need, it will be coded and if it's good people will use it. In this case, if anyone gets a 200% boost in performance, they probably won't listen to the argument that "it's academic" afterwards :) And some others might go twiddle their hardware and skip the zero-copy mechanism altogether. -BW ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 19:54 ` Richard B. Johnson 2001-05-07 20:23 ` dean gaudet 2001-05-08 11:09 ` Bjorn Wesen @ 2001-05-08 17:30 ` Alexander Eichhorn 2001-05-09 9:56 ` Reto Baettig 2 siblings, 1 reply; 18+ messages in thread From: Alexander Eichhorn @ 2001-05-08 17:30 UTC (permalink / raw) To: root; +Cc: linux-kernel At first, thanks for the (unexpected large) discussion and hints! Second: sorry for the multimedia-centric viewpoint, but i think it's an important task for future operating systems development (or better: for a real world OS like linux) to have sophisticated support for a _large diversity_ in application requirements and realtime/multimedia apps are treated stepmotherly for too long. "Richard B. Johnson" wrote: > So, the kernel is going to send a packet to another host on > behalf of the system caller. It copies the data, (partial > checksum) assembles the packet, finishes the checksum, then > sends it. The CPU is given to somebody else while waiting > for the packet to get somewhere and be ACKed. But, think > about a server where EVERY task is waiting for I/O to > complete! These CPU cycles, that you saved by eliminating > a copy (or two), are now wasted spinning. > > Basically, "no copy" is an academic exercise. It makes the first > packet get sent more quickly, after which everything slows to > the natural bandwidth of the system. This is the semantic of a typical client/server request/reply protocol which is used in "traditional" applications. But it isn't appropriate for the communication of realtime mediastreams because it breakes the strict timing constraints. Here we need asynchronous (*non blocking semantics*) communication. > > If you used a server for multicast-only. In other words, you > just spewed out unidirectional data, you still slow to the rate > at which the media can take the data. And CPUs can obtain or > generate these data a lot faster than 100-base can sink them. > > When we get to media that can sink data as fast as we can generate > them (it), then we have to worry about memory copy speed. However, > these new devices are actually an IP subsystem. They generate and > receive entire datagrams. To fully utilize these devices, the data- > gram generation and reception (the basis of all TCP/IP networking) > will have to be moved out of the kernel and into these boards. The > kernel code will only handle interfaces, connections, and rules. Ohhhh, these are the arguments of people rather investing in more ressources than investing in clever algorithms. It's comparable to the old war between the ATM folks and the IP/Ethernet folks; concepts against "brute" ressources. 1. You don't take into account that there are not only high-end PC's and Workstations with enormous CPU and memory resources! Devices for "pervasive ubiquitous computing" (don't blame me for this fashion word) for example are mostly embedded systems with scarce ressources, happy to have enough CPU-cycles for video-codecs. 2. On the other hand are Video-on-Demand servers with (not only one) high speed NIC's, large SAN's or disk arrays for video storage with gigabit/infiniband connections, <fill in your favorite toy>. Here's the problem not only saturating the links (for economic reasons), but also to guarantee low delay and jitter to every connection. I think we should extend the usability of linux to this class of servers too. 3. Have a look at the various papers on high performance networking. The gap between the growth in network bandwidth and the growth in CPU and bus performance is increasing. Today the system-busses are not considered to be in the "window of scarcity" (today we have 100MBit Ethernets and 133++MB/s PCI). Tomorrow our operating system concepts have to cope with 1, 10, ?? Gigabit Ethernets, Infiniband , ... who knows. This means: scale CPU and memory-bus performance accordingly or use ressource-sparing ipc-mechanisms and implement computational complex algorithms (checksum calculations, encryption) in hardware. Besides continuous-media applications other applications who need to move data-chunks much larger as the CPU-caches will benefit from such infrastructures too. (Both classes of systems from above will be affected.) For those applications copy avoidance is so fundamantal or copying is so expensive because copying needs all three basic system ressources (CPU, memory and bandwidth of local communication- facilities - busses) at the same time (synchronous)! Many researchers recognized this problem and developed techniques to overcome the dusted os-concepts (UNet, UVM,..). Unfortunately they need special hardware (NIC's), have partially too much overheads or are not general enough. The one thing it shows us is that there is still some work to be done. Regards, Alexander Eichhorn -- Alexander Eichhorn Technical University of Ilmenau Computer Science And Automation Faculty Distributed Systems and Operating Systems Department Phone +49 3677 69 4557, Fax +49 3677 69 4541 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-08 17:30 ` Alexander Eichhorn @ 2001-05-09 9:56 ` Reto Baettig 0 siblings, 0 replies; 18+ messages in thread From: Reto Baettig @ 2001-05-09 9:56 UTC (permalink / raw) To: alexander.eichhorn; +Cc: root, linux-kernel > considered to be in the "window of scarcity" (today we have 100MBit > Ethernets and 133++MB/s PCI). Tomorrow our operating system concepts > have to cope with 1, 10, ?? Gigabit Ethernets, Infiniband , > ... who knows. We had to write our own RPC mechanism because with the standard-stacks we had no chance of achieving our goals. We would have loved to use tcp/ip but it was not possible with Linux 2.2. Today we achieve almost 200MB/s over our RPC stack and this with the CPU's almost idle. With TCP/IP and Gig-E we only came up to 60-70MB/s and then the system was completely busy and unresponsive (Linux 2.4 is supposed to be better but I doubt that we get a CPU load this low without zerocopy networking). We would like to look at the zerocopy ideas of Linux 2.4 and try to implement our RPC mechanism over zerocopy-TCP (if something like this exists). We just started with this idea and don't know exactly where to start yet (we are looking for something like a de-facto zerocopy standard for sockets)... Any ideas are welcome. Reto ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 16:12 ` Richard B. Johnson 2001-05-07 17:53 ` Francois Romieu 2001-05-07 18:25 ` dean gaudet @ 2001-05-07 18:30 ` Pekka Pietikainen 2001-05-07 19:00 ` Venkatesh Ramamurthy 2001-05-08 7:18 ` Jamie Lokier 3 siblings, 1 reply; 18+ messages in thread From: Pekka Pietikainen @ 2001-05-07 18:30 UTC (permalink / raw) To: linux-kernel On Mon, May 07, 2001 at 12:12:57PM -0400, Richard B. Johnson wrote: > you can perform network speed tests using "lo", removing the network > board from the speed test. You will note that the network speed, due > to software, is over 10 times faster, 30 times on some machines) than > when the hardware I/O is used. This shows that the network code, alone, > cannot be improved very much to provide an improvement in throughput. I'd say more like a factor of 2. Socket bandwidth using localhost: 141.63 MB/sec Socket bandwidth using 192.168.9.3: 74.79 MB/sec (with the boxes being able to do ~= 100MB/s when the receiver CPU/mem bandwidth isn't limiting things). So I have slow pIII/500 class machines with fast networking. You could rerun the test with your favourite multi-gigabit network and latest 1.7GHz PC and still have a similar ratio. Being on the bleeding edge isn't easy, and waiting for a few years for faster hardware isn't a solution for everyone ;) Zero-copy mostly helps against CPU use (where it'll make your heavily loaded server be able to serve a lot more connections), not so much against bandwidth. The receiver will still run into problems with the copy it has to do unless you do some very evil tricks like header-splitting+MMU tricks or run protocols designed to be accelerated in hardware. Not that zero-copy isn't problem-free. If your bus starts corrupting random bits there's no way of really noticing it since the NIC happily creates a correct TCP checksum based on the corrupt data. It's not like hardware engineers can be expected to design hardware that works according to spec :) Then there's the interrupt problem, which someone will have to solve before they start shipping 10gigE NICs that use 1500-byte frames, 850000 interrupts/s without mitigation. Wheeee!!!! -- Pekka Pietikainen ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 18:30 ` Pekka Pietikainen @ 2001-05-07 19:00 ` Venkatesh Ramamurthy 0 siblings, 0 replies; 18+ messages in thread From: Venkatesh Ramamurthy @ 2001-05-07 19:00 UTC (permalink / raw) To: Pekka Pietikainen, linux-kernel > Then there's the interrupt problem, which someone will have to solve > before they start shipping 10gigE NICs that use 1500-byte frames, 850000 > interrupts/s without mitigation. Wheeee!!!! In this situations polling helps rather than interrupt driven IO. When there is heavy IO(read more interrupts per sec), we should automatically switch to polling mode, once the IO drops we can go to Interrupt driven. But how do we decide when to switch modes? Just my 2 cents ..... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 16:12 ` Richard B. Johnson ` (2 preceding siblings ...) 2001-05-07 18:30 ` Pekka Pietikainen @ 2001-05-08 7:18 ` Jamie Lokier 2001-05-09 15:13 ` Eric W. Biederman 3 siblings, 1 reply; 18+ messages in thread From: Jamie Lokier @ 2001-05-08 7:18 UTC (permalink / raw) To: Richard B. Johnson; +Cc: Alan Cox, alexander.eichhorn, linux-kernel Richard B. Johnson wrote: > However, PCI to memory copying runs at about 300 megabytes per > second on modern PCs and memory to memory copying runs at over 1,000 > megabytes per second. In the future, these speeds will increase. That would be "big expensive modern PCs" then. Our clusters of 700MHz boxes are strictly limited to 132 megabytes per second over PCI... -- Jamie ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-08 7:18 ` Jamie Lokier @ 2001-05-09 15:13 ` Eric W. Biederman 0 siblings, 0 replies; 18+ messages in thread From: Eric W. Biederman @ 2001-05-09 15:13 UTC (permalink / raw) To: Jamie Lokier Cc: Richard B. Johnson, Alan Cox, alexander.eichhorn, linux-kernel Jamie Lokier <lk@tantalophile.demon.co.uk> writes: > Richard B. Johnson wrote: > > However, PCI to memory copying runs at about 300 megabytes per > > second on modern PCs and memory to memory copying runs at over 1,000 > > megabytes per second. In the future, these speeds will increase. > > That would be "big expensive modern PCs" then. Our clusters of 700MHz > boxes are strictly limited to 132 megabytes per second over PCI... 300 Megabytes per second is definitely an odd number for a PCI bus. But 132 Megabytes per second is actually high, the continuous burst speeds are: 32bit 33Mhz: 33*1000*1000*32/(1024*1024*8) = 125.8 Megabytes/second 64bit 33Mhz: 33*1000*1000*64/(1024*1024*8) = 251.7 Megabytes/second 32bit 66Mhz: 66*1000*1000*32/(1024*1024*8) = 251.7 Megabytes/second 64bit 66Mhz: 66*1000*1000*64/(1024*1024*8) = 503.4 Megabytes/second The possibility of getting a continuous bursts is actually low, if nothing else you have an interrupt acknowledgement 100 times per second. But if you are pushing the bus it should deliver close to it's burst potential. But the ISA traffic doing subtractive decode can be nasty because you get 4 PCI cycles before you even get acknowledgement from the PCI/ISA bridge that you there is something to transfer to. Eric ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 13:56 ` Alan Cox 2001-05-07 16:12 ` Richard B. Johnson @ 2001-05-07 18:21 ` dean gaudet 2001-05-07 21:59 ` Alan Cox 1 sibling, 1 reply; 18+ messages in thread From: dean gaudet @ 2001-05-07 18:21 UTC (permalink / raw) To: Alan Cox; +Cc: alexander.eichhorn, linux-kernel On Mon, 7 May 2001, Alan Cox wrote: > > documented so far) detailed description of the newly > > implemented zero-copy mechanisms in the network-stack. > > We are interested in how to use it (changed network-API?) > > and also in the internal architecture. > > It is built around sendfile. Trying to do zero copy on pages with user space > mappings get so horribly non pretty it is better to build the API from the > physical side of things. so there's still single copy for write() of a mmap()ed page? since i'm naive about the high-end databases -- do they have a mechanism to access zero-copy? i suppose sendfile() on a raw device fd would work... nice. -dean ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 18:21 ` dean gaudet @ 2001-05-07 21:59 ` Alan Cox 2001-05-08 16:20 ` Jamie Lokier 0 siblings, 1 reply; 18+ messages in thread From: Alan Cox @ 2001-05-07 21:59 UTC (permalink / raw) To: dean gaudet; +Cc: Alan Cox, alexander.eichhorn, linux-kernel > so there's still single copy for write() of a mmap()ed page? An mmap page will go direct to disk. But mmap() isnt a good model for streaming I/O. > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Question] Explanation of zero-copy networking 2001-05-07 21:59 ` Alan Cox @ 2001-05-08 16:20 ` Jamie Lokier 0 siblings, 0 replies; 18+ messages in thread From: Jamie Lokier @ 2001-05-08 16:20 UTC (permalink / raw) To: Alan Cox; +Cc: dean gaudet, alexander.eichhorn, linux-kernel Alan Cox wrote: > > so there's still single copy for write() of a mmap()ed page? > > An mmap page will go direct to disk. Looking at the 2.4.4 code, mmap() of file followed by write() to socket will copy the data once. I could be mistaken (only glanced at the code quickly) but I base that on the only call to ->sendpage being through sendfile. So yes, there's a single copy overhead for mmap()+write(). -- Jamie ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2001-05-13 15:55 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-05-07 13:43 [Question] Explanation of zero-copy networking Alexander Eichhorn 2001-05-07 13:56 ` Alan Cox 2001-05-07 16:12 ` Richard B. Johnson 2001-05-07 17:53 ` Francois Romieu 2001-05-07 18:00 ` Blue Lang 2001-05-07 18:25 ` dean gaudet 2001-05-07 19:54 ` Richard B. Johnson 2001-05-07 20:23 ` dean gaudet 2001-05-08 11:09 ` Bjorn Wesen 2001-05-08 17:30 ` Alexander Eichhorn 2001-05-09 9:56 ` Reto Baettig 2001-05-07 18:30 ` Pekka Pietikainen 2001-05-07 19:00 ` Venkatesh Ramamurthy 2001-05-08 7:18 ` Jamie Lokier 2001-05-09 15:13 ` Eric W. Biederman 2001-05-07 18:21 ` dean gaudet 2001-05-07 21:59 ` Alan Cox 2001-05-08 16:20 ` Jamie Lokier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox