From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751076AbWDSWS2 (ORCPT ); Wed, 19 Apr 2006 18:18:28 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751172AbWDSWS2 (ORCPT ); Wed, 19 Apr 2006 18:18:28 -0400 Received: from snowski.convera.com ([67.133.116.244]:64130 "EHLO cbmail.convera.com") by vger.kernel.org with ESMTP id S1751076AbWDSWS1 (ORCPT ); Wed, 19 Apr 2006 18:18:27 -0400 Message-ID: <4446B756.7080102@chocky.org> Date: Wed, 19 Apr 2006 15:19:02 -0700 From: Peter Naulls User-Agent: Mail/News 1.5 (X11/20060228) MIME-Version: 1.0 To: linux-kernel@vger.kernel.org Subject: Re: Linux 2.6.17-rc2 References: <20060419200001.fe2385f4.diegocg@gmail.com> <1145481827.8440.30.camel@lade.trondhjem.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 19 Apr 2006 22:18:26.0240 (UTC) FILETIME=[2DC60400:01C663FF] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Linus Torvalds wrote: > > On Wed, 19 Apr 2006, Trond Myklebust wrote: >> Any chance this could be adapted to work with all those DMA (and RDMA) >> engines that litter our motherboards? I'm thinking in particular of >> stuff like the drm drivers, and userspace rdma. > > Absolutely. Especially with "vmsplice()" (the not-yet-implemented "move > these user pages into a kernel buffer") it should be entirely possible to > set up an efficient zero-copy setup that does NOT have any of the problems > with aio and TLB shootdown etc. > > Note that a driver would have to support the splice_in() and splice_out() > interfaces (which are basically just given the pipe buffers to do with as > they wish), and perhaps more importantly: note that you need specialized > apps that actually use splice() to do this. > > That's the biggest downside by far, and is why I'm not 100% convinced > splice() usage will be all that wide-spread. If you look at sendfile(), > it's been available for a long time, and is actually even almost portable > across different OS's _and_ it is easy to use. But almost nobody actually > does. I suspect the only users are some apache mods, perhaps a ftp deamon > or two, and probably samba. And that's probably largely it. I am. I'm developing a distributed file system responsible for transferring GBs of files around a network. The biggest problem here with the traditional send/recv/poll that was in use was heavy duty CPU usage. Maxing out the gigabit network eats about 60% CPU. In some simple experiments, sendfile reduced that to 10% or less (depending, there's a lot of variation in stuff that goes on). One big problem I had is that sendfile is not symmetric (for quite understable reasons), but that meant the overlying file system API (it's a userspace library) has to undergo various changes to make effective use of sendfile. Doing so in a sensible manner proved tricky, but not impossible Anyway, CPU usage is still a big deal, which is why I'm interested in these new zero-copy calls I've just caught up on the discussion about. And if I decide to use them, that means moving a whole load of machines to 2.6.17 - some of which will be running 2.6.12 for at least a little while longer. I guess I might be asking for the opposite of this: > So I'd expect this to be most useful for perhaps things like some HPC > apps, where you can have specialized libraries for data communication. And > servers, of course (but they might just continue to use the old > "sendfile()" interface, without even knowing that it's not sendfile() any > more, but just a wrapper around splice()). i.e, a splice emulation, that happens to use sendfile when it can. I very much appreciate the conceptual improvements that splice has over sendfile, but can anyone give some examples significant CPU savings that would not be possible using sendfile?