* Is sendfile all that sexy?
@ 2001-01-14 18:29 jamal
2001-01-14 18:50 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 70+ messages in thread
From: jamal @ 2001-01-14 18:29 UTC (permalink / raw)
To: linux-kernel, netdev
I thought i'd run some tests on the new zerocopy patches
(this is using a hacked ttcp which knows how to do sendfile
and does MSG_TRUNC for true zero-copy receive, if you know what i mean
;-> ).
2 back to back SMP 2*PII-450Mhz hooked up via 1M acenics (gigE).
MTU 9K.
Before getting excited i had the courage to give plain 2.4.0-pre3 a whirl
and somethings bothered me.
test1:
------
regular ttcp, no ZC and no sendfile. send as much as you can in 15secs;
actually 8192 byte chunks, 2048 of them at a time. Repeat until 15 secs is
complete.
Repeat the test 5 times to narrow experimental deviation.
Throughput: ~99MB/sec (for those obsessed with Mbps ~810Mbps)
CPU abuse: server side 87% client side 22% (the CPU measurement could do
with some work and proper measure for SMP).
test2:
------
sendfile server.
created a file which is 8192*2048 bytes. Again the same 15 second
exercise as test1 (and the 5-set thing):
- throughput: 86MB/sec
- CPU: server 100%, client 17%
So i figured, no problem i'll re-run it with a file 10 times larger.
**I was dissapointed to see no improvement.**
Looking at the system calls being made:
with the non-sendfile version, approximately 182K write-to-socket system
calls were made each writing 8192 bytes, Each call lasted on average
0.08ms.
With sendfile test2: 78 calls were made, each sending the file
size 8192*2048 bytes; each lasted about 199 msecs
TWO observations:
- Given Linux's non-pre-emptability of the kernel i get the feeling that
sendfile could starve other user space programs. Imagine trying to send a
1Gig file on 10Mbps pipe in one shot.
- It doesnt matter if you break down the file into chunks for
self-pre-emption; sendfile is still a pig.
I have a feeling i am missing some very serious shit. So enlighten me.
Has anyone done similar tests?
Anyways, the struggle continues next with zc patches.
cheers,
jamal
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread* Re: Is sendfile all that sexy? 2001-01-14 18:29 Is sendfile all that sexy? jamal @ 2001-01-14 18:50 ` Ingo Molnar 2001-01-14 19:02 ` jamal 2001-01-14 20:22 ` Linus Torvalds 2001-01-15 23:16 ` Pavel Machek 2 siblings, 1 reply; 70+ messages in thread From: Ingo Molnar @ 2001-01-14 18:50 UTC (permalink / raw) To: jamal; +Cc: linux-kernel, netdev On Sun, 14 Jan 2001, jamal wrote: > regular ttcp, no ZC and no sendfile. [...] > Throughput: ~99MB/sec (for those obsessed with Mbps ~810Mbps) > CPU abuse: server side 87% client side 22% [...] > sendfile server. > - throughput: 86MB/sec > - CPU: server 100%, client 17% i believe what you are seeing here is the overhead of the pagecache. When using sendmsg() only, you do not read() the file every time, right? Is ttcp using multiple threads? In that case if the sendfile() is using the *same* file all the time, creating SMP locking overhead. if this is the case, what result do you get if you use a separate, isolated file per process? (And i bet that with DaveM's pagecache scalability patch the situation would also get much better - the global pagecache_lock hurts.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 18:50 ` Ingo Molnar @ 2001-01-14 19:02 ` jamal 2001-01-14 19:09 ` Ingo Molnar 0 siblings, 1 reply; 70+ messages in thread From: jamal @ 2001-01-14 19:02 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, netdev On Sun, 14 Jan 2001, Ingo Molnar wrote: > > i believe what you are seeing here is the overhead of the pagecache. When > using sendmsg() only, you do not read() the file every time, right? Is In that case just a user space buffer is sent i.e no file association. > ttcp using multiple threads? Only a single thread, single flow setup. Very primitive but simple. > In that case if the sendfile() is using the > *same* file all the time, creating SMP locking overhead. > > if this is the case, what result do you get if you use a separate, > isolated file per process? (And i bet that with DaveM's pagecache > scalability patch the situation would also get much better - the global > pagecache_lock hurts.) > Already doing the single file, single process. However, i do run by time which means i could read the file from the begining(offset 0) to the end then re-do it for as many times as 15secs would allow. Does this affect it? I tried one 1.5 GB file, it was oopsing and given my setup right now i cant trace it. So i am using about 170M which is read about 8 times in the 15 secs cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 19:02 ` jamal @ 2001-01-14 19:09 ` Ingo Molnar 2001-01-14 19:18 ` jamal 0 siblings, 1 reply; 70+ messages in thread From: Ingo Molnar @ 2001-01-14 19:09 UTC (permalink / raw) To: jamal; +Cc: linux-kernel, netdev On Sun, 14 Jan 2001, jamal wrote: > Already doing the single file, single process. [...] in this case there could still be valid performance differences, as copying from user-space is cheaper than copying from the pagecache. To rule out SMP interactions, you could try a UP-IOAPIC kernel on that box. (I'm also curious what kind of numbers you'll get with the zerocopy patch.) > However, i do run by time which means i could read the file from the > begining(offset 0) to the end then re-do it for as many times as > 15secs would allow. Does this affect it? [...] no, in the case of a single thread this should have minimum impact. But i'd suggest to increase the /proc/sys/net/tcp*mem* values (to 1MB or more). Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 19:09 ` Ingo Molnar @ 2001-01-14 19:18 ` jamal 0 siblings, 0 replies; 70+ messages in thread From: jamal @ 2001-01-14 19:18 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel, netdev On Sun, 14 Jan 2001, Ingo Molnar wrote: > > in this case there could still be valid performance differences, as > copying from user-space is cheaper than copying from the pagecache. To > rule out SMP interactions, you could try a UP-IOAPIC kernel on that box. > Let me complete this with the ZC patches first. then i'll do that. There are a few retarnsmits; maybe receiver IRQ affinity might help some. > (I'm also curious what kind of numbers you'll get with the zerocopy > patch.) Working on it. > no, in the case of a single thread this should have minimum impact. But > i'd suggest to increase the /proc/sys/net/tcp*mem* values (to 1MB or > more). The upper thresholds to 1000000 ? I should have mentioned that i set /proc/sys/net/core/*mem* to currently 262144. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 18:29 Is sendfile all that sexy? jamal 2001-01-14 18:50 ` Ingo Molnar @ 2001-01-14 20:22 ` Linus Torvalds 2001-01-14 20:38 ` Ingo Molnar ` (3 more replies) 2001-01-15 23:16 ` Pavel Machek 2 siblings, 4 replies; 70+ messages in thread From: Linus Torvalds @ 2001-01-14 20:22 UTC (permalink / raw) To: linux-kernel In article <Pine.GSO.4.30.0101141237020.12354-100000@shell.cyberus.ca>, jamal <hadi@cyberus.ca> wrote: > >Before getting excited i had the courage to give plain 2.4.0-pre3 a whirl >and somethings bothered me. Note that "sendfile(fd, file, len)" is never going to be faster than "write(fd, userdata, len)". That's not the point of sendfile(). The point of sendfile() is to be faster than the _combination_ of: addr = mmap(file, ...len...); write(fd, addr, len); or read(file, userdata, len); write(fd, userdata, len); and in your case you're not comparing sendfile() against this combination. You're just comparing sendfile() against a simple "write()". And no, I don't actually hink that sendfile() is all that hot. It was _very_ easy to implement, and can be considered a 5-minute hack to give a feature that fit very well in the MM architecture, and that the Apache folks had already been using on other architectures. The only obvious use for it is file serving, and as high-performance file serving tends to end up as a kernel module in the end anyway (the only hold-out is samba, and that's been discussed too), "sendfile()" really is more a proof of concept than anything else. Does anybody but apache actually use it? Linus PS. I still _like_ sendfile(), even if the above sounds negative. It's basically a "cool feature" that has zero negative impact on the design of the system. It uses the same "do_generic_file_read()" that is used for normal "read()", and is also used by the loop device and by in-kernel fileserving. But it's not really "important". - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 20:22 ` Linus Torvalds @ 2001-01-14 20:38 ` Ingo Molnar 2001-01-14 21:44 ` Linus Torvalds 2001-01-14 21:54 ` Gerhard Mack 2001-01-15 1:14 ` Dan Hollis ` (2 subsequent siblings) 3 siblings, 2 replies; 70+ messages in thread From: Ingo Molnar @ 2001-01-14 20:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel List On 14 Jan 2001, Linus Torvalds wrote: > Does anybody but apache actually use it? There is a Samba patch as well that makes it sendfile() based. Various other projects use it too (phttpd for example), some FTP servers i believe, and khttpd and TUX. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 20:38 ` Ingo Molnar @ 2001-01-14 21:44 ` Linus Torvalds 2001-01-14 21:49 ` Ingo Molnar 2001-01-14 21:54 ` Gerhard Mack 1 sibling, 1 reply; 70+ messages in thread From: Linus Torvalds @ 2001-01-14 21:44 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linux Kernel List On Sun, 14 Jan 2001, Ingo Molnar wrote: > > There is a Samba patch as well that makes it sendfile() based. Various > other projects use it too (phttpd for example), some FTP servers i > believe, and khttpd and TUX. At least khttpd uses "do_generic_file_read()", not sendfile per se. I assume TUX does too. Sendfile itself is mainly only useful from user space.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 21:44 ` Linus Torvalds @ 2001-01-14 21:49 ` Ingo Molnar 0 siblings, 0 replies; 70+ messages in thread From: Ingo Molnar @ 2001-01-14 21:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel List On Sun, 14 Jan 2001, Linus Torvalds wrote: > > There is a Samba patch as well that makes it sendfile() based. Various > > other projects use it too (phttpd for example), some FTP servers i > > believe, and khttpd and TUX. > > At least khttpd uses "do_generic_file_read()", not sendfile per se. I > assume TUX does too. Sendfile itself is mainly only useful from user > space.. yes, you are right. TUX does it mainly to avoid some of the user-space interfacing overhead present in sys_sendfile(), and to be able to control packet boundaries. (ie. to have or not have the MSG_MORE flag). So TUX is using its own sock_send_actor and own read_descriptor. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 20:38 ` Ingo Molnar 2001-01-14 21:44 ` Linus Torvalds @ 2001-01-14 21:54 ` Gerhard Mack 2001-01-14 22:40 ` Linus Torvalds 2001-01-15 13:02 ` Florian Weimer 1 sibling, 2 replies; 70+ messages in thread From: Gerhard Mack @ 2001-01-14 21:54 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel List On Sun, 14 Jan 2001, Ingo Molnar wrote: > > On 14 Jan 2001, Linus Torvalds wrote: > > > Does anybody but apache actually use it? > > There is a Samba patch as well that makes it sendfile() based. Various > other projects use it too (phttpd for example), some FTP servers i > believe, and khttpd and TUX. Proftpd to name one ftp server, nice little daemon uses linux-privs too. Gerhard PS I wish someone would explain to me why distros insist on using WU instead given it's horrid security record. -- Gerhard Mack gmack@innerfire.net <>< As a computer I find your faith in technology amusing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 21:54 ` Gerhard Mack @ 2001-01-14 22:40 ` Linus Torvalds 2001-01-14 22:45 ` J Sloan 2001-01-15 3:43 ` Michael Peddemors 2001-01-15 13:02 ` Florian Weimer 1 sibling, 2 replies; 70+ messages in thread From: Linus Torvalds @ 2001-01-14 22:40 UTC (permalink / raw) To: Gerhard Mack; +Cc: Ingo Molnar, Linux Kernel List On Sun, 14 Jan 2001, Gerhard Mack wrote: > > PS I wish someone would explain to me why distros insist on using WU > instead given it's horrid security record. I think it's a case of "better the devil you know..". Think of all the security scares sendmail has historically had. But it's a pretty secure piece of work now - and people know if backwards and forward. Few people advocate switching from sendmail these days (sure, they do exist, but what I'm saying is that a long track record that includes security issues isn't necessarily bad, if it has gotten fixed). Of course, you may be right on wuftpd. It obviously wasn't designed with security in mind, other alternatives may be better. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 22:40 ` Linus Torvalds @ 2001-01-14 22:45 ` J Sloan 2001-01-15 20:15 ` H. Peter Anvin 2001-01-15 3:43 ` Michael Peddemors 1 sibling, 1 reply; 70+ messages in thread From: J Sloan @ 2001-01-14 22:45 UTC (permalink / raw) To: Kernel Mailing List Linus Torvalds wrote: > Of course, you may be right on wuftpd. It obviously wasn't designed with > security in mind, other alternatives may be better. I run proftpd on all my ftp servers - it's fast, configurable and can do all the tricks I need - even red hat seems to agree that proftpd is the way to go. Visit any red hat ftp site and they are running proftpd - So, why do they keep shipping us wu-ftpd instead? That really frosts me. jjs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 22:45 ` J Sloan @ 2001-01-15 20:15 ` H. Peter Anvin 0 siblings, 0 replies; 70+ messages in thread From: H. Peter Anvin @ 2001-01-15 20:15 UTC (permalink / raw) To: linux-kernel Followup to: <3A622C25.766F3BCE@pobox.com> By author: J Sloan <jjs@pobox.com> In newsgroup: linux.dev.kernel > > Linus Torvalds wrote: > > > Of course, you may be right on wuftpd. It obviously wasn't designed with > > security in mind, other alternatives may be better. > > I run proftpd on all my ftp servers - it's fast, configurable > and can do all the tricks I need - even red hat seems to > agree that proftpd is the way to go. > > Visit any red hat ftp site and they are running proftpd - > > So, why do they keep shipping us wu-ftpd instead? > > That really frosts me. > proftpd is not what you want for an FTP server whose main function is *non-*anonymous access. It is very much written for the sole purpose of being a great FTP server for a large anonymous FTP site. If you're running a site large enough to matter, you can replace an RPM or two. -hpa -- <hpa@transmeta.com> at work, <hpa@zytor.com> in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 22:40 ` Linus Torvalds 2001-01-14 22:45 ` J Sloan @ 2001-01-15 3:43 ` Michael Peddemors 1 sibling, 0 replies; 70+ messages in thread From: Michael Peddemors @ 2001-01-15 3:43 UTC (permalink / raw) To: Gerhard Mack; +Cc: Ingo Molnar, Linux Kernel List The two things I change everytime are sendmail->qmail and wuftpd->proftpd But remember, security bugs are caught because more people use one vs the other.. Bugs in Proftpd weren't caught until more people started changing from wu-ftpd... Often, all it means when one product has more bugs than another, is that more people tried to find bugs in one than another... (Yes, a plug to get everyone to test 2.4 here) On Sun, 14 Jan 2001, Linus Torvalds wrote: > On Sun, 14 Jan 2001, Gerhard Mack wrote: > > PS I wish someone would explain to me why distros insist on using WU > > instead given it's horrid security record. > > Of course, you may be right on wuftpd. It obviously wasn't designed with > security in mind, other alternatives may be better. > > Linus -- -------------------------------------------------------- Michael Peddemors - Senior Consultant Unix Administration - WebSite Hosting Network Services - Programming Wizard Internet Services http://www.wizard.ca Linux Support Specialist - http://www.linuxmagic.com -------------------------------------------------------- (604) 589-0037 Beautiful British Columbia, Canada -------------------------------------------------------- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 21:54 ` Gerhard Mack 2001-01-14 22:40 ` Linus Torvalds @ 2001-01-15 13:02 ` Florian Weimer 2001-01-15 13:45 ` Tristan Greaves 1 sibling, 1 reply; 70+ messages in thread From: Florian Weimer @ 2001-01-15 13:02 UTC (permalink / raw) To: Gerhard Mack; +Cc: Linux Kernel List Gerhard Mack <gmack@innerfire.net> writes: > PS I wish someone would explain to me why distros insist on using WU > instead given it's horrid security record. The security record of Proftpd is not horrid, but embarrassing. They once claimed to have fixed vulnerability, but in fact introduced another one... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* RE: Is sendfile all that sexy? 2001-01-15 13:02 ` Florian Weimer @ 2001-01-15 13:45 ` Tristan Greaves 0 siblings, 0 replies; 70+ messages in thread From: Tristan Greaves @ 2001-01-15 13:45 UTC (permalink / raw) To: 'Linux Kernel List' > -----Original Message----- > From: linux-kernel-owner@vger.kernel.org > [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Florian Weimer > Sent: 15 January 2001 13:02 > To: Gerhard Mack > Cc: Linux Kernel List > Subject: Re: Is sendfile all that sexy? > > The security record of Proftpd is not horrid, but embarrassing. They > once claimed to have fixed vulnerability, but in fact introduced > another one... Oh, come on, this is a classic event in bug fixing. All Software Has Bugs [TM]. Nothing Is Completely Secure [TM]. As long as the vulnerabilities are fixed as they happen (where possible), we should be happy. Tris. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 20:22 ` Linus Torvalds 2001-01-14 20:38 ` Ingo Molnar @ 2001-01-15 1:14 ` Dan Hollis 2001-01-15 15:24 ` Jonathan Thackray 2001-01-24 0:58 ` Sasi Peter 3 siblings, 0 replies; 70+ messages in thread From: Dan Hollis @ 2001-01-15 1:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On 14 Jan 2001, Linus Torvalds wrote: > That's not the point of sendfile(). The point of sendfile() is to be > faster than the _combination_ of: > addr = mmap(file, ...len...); > write(fd, addr, len); > or > read(file, userdata, len); > write(fd, userdata, len); And boy is it ever. It blows both away by more than double. Not only that the mmap one grinds my box into the ground with swapping, while the sendfile() case you can't even tell its running except that the drive is going like mad. > Does anybody but apache actually use it? I wonder why samba doesn't use it. -Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 20:22 ` Linus Torvalds 2001-01-14 20:38 ` Ingo Molnar 2001-01-15 1:14 ` Dan Hollis @ 2001-01-15 15:24 ` Jonathan Thackray 2001-01-15 15:36 ` Matti Aarnio ` (2 more replies) 2001-01-24 0:58 ` Sasi Peter 3 siblings, 3 replies; 70+ messages in thread From: Jonathan Thackray @ 2001-01-15 15:24 UTC (permalink / raw) To: linux-kernel > Does anybody but apache actually use it? Zeus uses it! (it was HP who added it to HP-UX first at our request :-) > PS. I still _like_ sendfile(), even if the above sounds negative. It's > basically a "cool feature" that has zero negative impact on the design > of the system. It uses the same "do_generic_file_read()" that is used > for normal "read()", and is also used by the loop device and by > in-kernel fileserving. But it's not really "important". It's a very useful system call and makes file serving much more scalable, and I'm glad that most Un*xes now have support for it (Linux, FreeBSD, HP-UX, AIX, Tru64). The next cool feature to add to Linux is sendpath(), which does the open() before the sendfile() all combined into one system call. Ugh, I hear you all scream :-) Jon. -- Jonathan Thackray Zeus House, Cowley Road, Cambridge CB4 OZT, UK Software Engineer +44 1223 525000, fax +44 1223 525100 Zeus Technology http://www.zeus.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-15 15:24 ` Jonathan Thackray @ 2001-01-15 15:36 ` Matti Aarnio 2001-01-15 20:17 ` H. Peter Anvin 2001-01-15 16:05 ` dean gaudet 2001-01-15 19:41 ` Ingo Molnar 2 siblings, 1 reply; 70+ messages in thread From: Matti Aarnio @ 2001-01-15 15:36 UTC (permalink / raw) To: Jonathan Thackray; +Cc: linux-kernel On Mon, Jan 15, 2001 at 03:24:55PM +0000, Jonathan Thackray wrote: > It's a very useful system call and makes file serving much more > scalable, and I'm glad that most Un*xes now have support for it > (Linux, FreeBSD, HP-UX, AIX, Tru64). The next cool feature to add to > Linux is sendpath(), which does the open() before the sendfile() > all combined into one system call. One thing about 'sendfile' (and likely 'sendpath') is that current (hammered into running binaries -> unchangeable) syscalls support only up to 2GB files at 32 bit systems. Glibc 2.2(9) at RedHat <sys/sendfile.h>: #ifdef __USE_FILE_OFFSET64 # error "<sendfile.h> cannot be used with _FILE_OFFSET_BITS=64" #endif I do admit that doing sendfile() on some extremely large file is unlikely, but still... > Ugh, I hear you all scream :-) > Jon. > -- > Jonathan Thackray Zeus House, Cowley Road, Cambridge CB4 OZT, UK > Zeus Technology http://www.zeus.com/ /Matti Aarnio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-15 15:36 ` Matti Aarnio @ 2001-01-15 20:17 ` H. Peter Anvin 0 siblings, 0 replies; 70+ messages in thread From: H. Peter Anvin @ 2001-01-15 20:17 UTC (permalink / raw) To: linux-kernel Followup to: <20010115173607.S25659@mea-ext.zmailer.org> By author: Matti Aarnio <matti.aarnio@zmailer.org> In newsgroup: linux.dev.kernel > > One thing about 'sendfile' (and likely 'sendpath') is that > current (hammered into running binaries -> unchangeable) > syscalls support only up to 2GB files at 32 bit systems. > > Glibc 2.2(9) at RedHat <sys/sendfile.h>: > > #ifdef __USE_FILE_OFFSET64 > # error "<sendfile.h> cannot be used with _FILE_OFFSET_BITS=64" > #endif > > I do admit that doing sendfile() on some extremely large > file is unlikely, but still... > 2 GB isn't really that extremely large these days. This is an unpleasant limitation. -hpa -- <hpa@transmeta.com> at work, <hpa@zytor.com> in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-15 15:24 ` Jonathan Thackray 2001-01-15 15:36 ` Matti Aarnio @ 2001-01-15 16:05 ` dean gaudet 2001-01-15 18:34 ` Jonathan Thackray 2001-01-15 19:41 ` Ingo Molnar 2 siblings, 1 reply; 70+ messages in thread From: dean gaudet @ 2001-01-15 16:05 UTC (permalink / raw) To: Jonathan Thackray; +Cc: linux-kernel On Mon, 15 Jan 2001, Jonathan Thackray wrote: > (Linux, FreeBSD, HP-UX, AIX, Tru64). The next cool feature to add to > Linux is sendpath(), which does the open() before the sendfile() > all combined into one system call. how would sendpath() construct the Content-Length in the HTTP header? it's totally unfortunate that the other unixes chose to combine writev() into sendfile() rather than implementing TCP_CORK. TCP_CORK is useful for FAR more than just sendfile() headers and footers. it's arguably the most correct way to write server code. nagle/no-nagle in the default BSD API both suck -- nagle because it delays packets which need to be sent; no-nagle because it can send incomplete packets. i'm completely happy that linus, davem and ingo refused to combine writev() into sendfile() and suggested CORK when i pointed out the header/trailer problem. imnsho if you want to optimise static file serving then it's pretty pointless to continue working in userland. nobody is going to catch up with all the kernel-side implementations in linux, NT, and solaris. -dean p.s. linus, apache-1.3 does *not* use sendfile(). it's in apache-2.0, which unfortunately is now performing like crap because they didn't listen to some of my advice well over a year ago. a case of "let's make a pretty API and hope performance works out"... where i told them "i've already written code using the API you suggest, and it *doesn't* work." </rant> thankfully linux now has TUX. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-15 16:05 ` dean gaudet @ 2001-01-15 18:34 ` Jonathan Thackray 2001-01-15 18:46 ` Linus Torvalds 2001-01-15 18:58 ` Is sendfile all that sexy? dean gaudet 0 siblings, 2 replies; 70+ messages in thread From: Jonathan Thackray @ 2001-01-15 18:34 UTC (permalink / raw) To: dean gaudet; +Cc: linux-kernel > how would sendpath() construct the Content-Length in the HTTP header? You'd still stat() the file to decide whether to use sendpath() to send it or not, if it was Last-Modified: etc. Of course, you'd cache stat() calls too for a few seconds. The main thing is that you save a valuable fd and open() is expensive, even more so than stat(). > TCP_CORK is useful for FAR more than just sendfile() headers and > footers. it's arguably the most correct way to write server code. Agreed -- the hard-coded Nagle algorithm makes no sense these days. > imnsho if you want to optimise static file serving then it's pretty > pointless to continue working in userland. nobody is going to catch up > with all the kernel-side implementations in linux, NT, and solaris. Hmmm, there's a place for userland httpds that are within a few percent of kernel ones (like Zeus is, when I last looked). But I agree, hybrid approaches will become more common, although the trend towards server-side dynamic pages negate this. A kernel approach is a definite win if you're used to using a limited-scalability userland httpd like Apache. Jon. -- Jonathan Thackray Zeus House, Cowley Road, Cambridge CB4 OZT, UK Software Engineer +44 1223 525000, fax +44 1223 525100 Zeus Technology http://www.zeus.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-15 18:34 ` Jonathan Thackray @ 2001-01-15 18:46 ` Linus Torvalds 2001-01-15 20:47 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar 2001-01-15 18:58 ` Is sendfile all that sexy? dean gaudet 1 sibling, 1 reply; 70+ messages in thread From: Linus Torvalds @ 2001-01-15 18:46 UTC (permalink / raw) To: linux-kernel In article <14947.17050.127502.936533@leda.cam.zeus.com>, Jonathan Thackray <jthackray@zeus.com> wrote: > >> how would sendpath() construct the Content-Length in the HTTP header? > >You'd still stat() the file to decide whether to use sendpath() to >send it or not, if it was Last-Modified: etc. Of course, you'd cache >stat() calls too for a few seconds. The main thing is that you save >a valuable fd and open() is expensive, even more so than stat(). "open" expensive? Maybe on HP-UX and other platforms. But give me numbers: I seriously doubt that int fd = open(..) fstat(fd..); sendfile(fd..); close(fd); is any slower than .. cache stat() in user space based on name .. sendpath(name, ..); on any real load. >> TCP_CORK is useful for FAR more than just sendfile() headers and >> footers. it's arguably the most correct way to write server code. > >Agreed -- the hard-coded Nagle algorithm makes no sense these days. The fact I dislike about the HP-UX implementation is that it is so _obviously_ stupid. And I have to say that I absolutely despise the BSD people. They did sendfile() after both Linux and HP-UX had done it, and they must have known about both implementations. And they chose the HP-UX braindamage, and even brag about the fact that they were stupid and didn't understand TCP_CORK (they don't say so in those exact words, of course - they just show that they were stupid and clueless by the things they brag about). Oh, well. Not everybody can be as goodlooking as me. It's a curse. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* [patch] sendpath() support, 2.4.0-test3/-ac9 2001-01-15 18:46 ` Linus Torvalds @ 2001-01-15 20:47 ` Ingo Molnar 2001-01-16 4:51 ` dean gaudet 0 siblings, 1 reply; 70+ messages in thread From: Ingo Molnar @ 2001-01-15 20:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Linux Kernel List, Jonathan Thackray [-- Attachment #1: Type: TEXT/PLAIN, Size: 596 bytes --] On 15 Jan 2001, Linus Torvalds wrote: > int fd = open(..) > fstat(fd..); > sendfile(fd..); > close(fd); > > is any slower than > > .. cache stat() in user space based on name .. > sendpath(name, ..); > > on any real load. just for kicks i've implemented sendpath() support. (patch against 2.4.0-test and sample code attached) It appears to work just fine here. With a bit of reorganization in mm/filemap.c it was quite straightforward to do. Jonathan, is this what Zeus needs? If yes, it could be interesting to run a simple benchmark to compare sendpath() to open()+sendfile()? Ingo [-- Attachment #2: Type: TEXT/PLAIN, Size: 4020 bytes --] --- linux/mm/filemap.c.orig Mon Jan 15 22:43:21 2001 +++ linux/mm/filemap.c Mon Jan 15 23:09:55 2001 @@ -39,6 +39,8 @@ * page-cache, 21.05.1999, Ingo Molnar <mingo@redhat.com> * * SMP-threaded pagemap-LRU 1999, Andrea Arcangeli <andrea@suse.de> + * + * Started sendpath() support, (C) 2000 Ingo Molnar <mingo@redhat.com> */ atomic_t page_cache_size = ATOMIC_INIT(0); @@ -1450,15 +1452,15 @@ return written; } -asmlinkage ssize_t sys_sendfile(int out_fd, int in_fd, off_t *offset, size_t count) +/* + * Get input file, and verify that it is ok.. + */ +static struct file * get_verify_in_file (int in_fd, size_t count) { - ssize_t retval; - struct file * in_file, * out_file; - struct inode * in_inode, * out_inode; + struct inode * in_inode; + struct file * in_file; + int retval; - /* - * Get input file, and verify that it is ok.. - */ retval = -EBADF; in_file = fget(in_fd); if (!in_file) @@ -1474,10 +1476,21 @@ retval = locks_verify_area(FLOCK_VERIFY_READ, in_inode, in_file, in_file->f_pos, count); if (retval) goto fput_in; + return in_file; +fput_in: + fput(in_file); +out: + return ERR_PTR(retval); +} +/* + * Get output file, and verify that it is ok.. + */ +static struct file * get_verify_out_file (int out_fd, size_t count) +{ + struct file *out_file; + struct inode *out_inode; + int retval; - /* - * Get output file, and verify that it is ok.. - */ retval = -EBADF; out_file = fget(out_fd); if (!out_file) @@ -1491,6 +1504,29 @@ retval = locks_verify_area(FLOCK_VERIFY_WRITE, out_inode, out_file, out_file->f_pos, count); if (retval) goto fput_out; + return out_file; + +fput_out: + fput(out_file); +fput_in: + return ERR_PTR(retval); +} + +asmlinkage ssize_t sys_sendfile(int out_fd, int in_fd, off_t *offset, size_t count) +{ + ssize_t retval; + struct file * in_file, *out_file; + + in_file = get_verify_in_file(in_fd, count); + if (IS_ERR(in_file)) { + retval = PTR_ERR(in_file); + goto out; + } + out_file = get_verify_out_file(out_fd, count); + if (IS_ERR(out_file)) { + retval = PTR_ERR(out_file); + goto fput_in; + } retval = 0; if (count) { @@ -1524,6 +1560,56 @@ fput(in_file); out: return retval; +} + +asmlinkage ssize_t sys_sendpath(int out_fd, char *path, off_t *offset, size_t count) +{ + struct file in_file, *out_file; + read_descriptor_t desc; + loff_t pos = 0, *ppos; + struct nameidata nd; + int ret; + + out_file = get_verify_out_file(out_fd, count); + if (IS_ERR(out_file)) { + ret = PTR_ERR(out_file); + goto err; + } + ret = user_path_walk(path, &nd); + if (ret) + goto put_out; + ret = -EINVAL; + if (!nd.dentry || !nd.dentry->d_inode) + goto put_in_out; + + memset(&in_file, 0, sizeof(in_file)); + in_file.f_dentry = nd.dentry; + in_file.f_op = nd.dentry->d_inode->i_fop; + + ppos = &in_file.f_pos; + if (offset) { + if (get_user(pos, offset)) + goto put_in_out; + ppos = &pos; + } + desc.written = 0; + desc.count = count; + desc.buf = (char *) out_file; + desc.error = 0; + do_generic_file_read(&in_file, ppos, &desc, file_send_actor, 0); + + ret = desc.written; + if (!ret) + ret = desc.error; + if (offset) + put_user(pos, offset); + +put_in_out: + fput(out_file); +put_out: + path_release(&nd); +err: + return ret; } /* --- linux/arch/i386/kernel/entry.S.orig Mon Jan 15 22:42:47 2001 +++ linux/arch/i386/kernel/entry.S Mon Jan 15 22:43:12 2001 @@ -646,6 +646,7 @@ .long SYMBOL_NAME(sys_getdents64) /* 220 */ .long SYMBOL_NAME(sys_fcntl64) .long SYMBOL_NAME(sys_ni_syscall) /* reserved for TUX */ + .long SYMBOL_NAME(sys_sendpath) /* * NOTE!! This doesn't have to be exact - we just have @@ -653,6 +654,6 @@ * entries. Don't panic if you notice that this hasn't * been shrunk every time we add a new system call. */ - .rept NR_syscalls-221 + .rept NR_syscalls-223 .long SYMBOL_NAME(sys_ni_syscall) .endr [-- Attachment #3: Type: TEXT/PLAIN, Size: 593 bytes --] /* * Sample sendpath() code. It should mainly be used for sockets. */ #include <linux/unistd.h> #include <sys/sendfile.h> #include <stdlib.h> #include <unistd.h> #include <stdio.h> #include <fcntl.h> #define __NR_sendpath 223 _syscall4 (int, sendpath, int, out_fd, char *, path, off_t *, off, size_t, size) int main (int argc, char **argv) { int out_fd; int ret; out_fd = open("./tmpfile", O_RDWR|O_CREAT|O_TRUNC, 0700); ret = sendpath(out_fd, "/usr/include/unistd.h", NULL, 300); printf("sendpath wrote %d bytes into ./tmpfile.\n", ret); return 0; } ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [patch] sendpath() support, 2.4.0-test3/-ac9 2001-01-15 20:47 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar @ 2001-01-16 4:51 ` dean gaudet 2001-01-16 4:59 ` Linus Torvalds 2001-01-16 9:19 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar 0 siblings, 2 replies; 70+ messages in thread From: dean gaudet @ 2001-01-16 4:51 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel List, Jonathan Thackray On Mon, 15 Jan 2001, Ingo Molnar wrote: > just for kicks i've implemented sendpath() support. > > _syscall4 (int, sendpath, int, out_fd, char *, path, off_t *, off, size_t, size) hey so how do you implement transmit timeouts with sendpath() ? (i.e. drop the client after 30 seconds of no progress.) -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [patch] sendpath() support, 2.4.0-test3/-ac9 2001-01-16 4:51 ` dean gaudet @ 2001-01-16 4:59 ` Linus Torvalds 2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar 2001-01-16 9:19 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar 1 sibling, 1 reply; 70+ messages in thread From: Linus Torvalds @ 2001-01-16 4:59 UTC (permalink / raw) To: dean gaudet; +Cc: Ingo Molnar, Linux Kernel List, Jonathan Thackray On Mon, 15 Jan 2001, dean gaudet wrote: > On Mon, 15 Jan 2001, Ingo Molnar wrote: > > > just for kicks i've implemented sendpath() support. > > > > _syscall4 (int, sendpath, int, out_fd, char *, path, off_t *, off, size_t, size) > > hey so how do you implement transmit timeouts with sendpath() ? (i.e. > drop the client after 30 seconds of no progress.) The whole "sendpath()" idea is just stupid. You want to do a non-blocking send, so that you don't block on the socket, and do some simple multiplexing in your server. And "sendpath()" cannot do that without having to look up the name again, and again, and again. Which makes the performance "optimization" a horrible pessimisation. Basically, sendpath() seems to be only useful for blocking and uninterruptible file sending. Bad design. I'm not touching it with a ten-foot pole. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* 'native files', 'object fingerprints' [was: sendpath()] 2001-01-16 4:59 ` Linus Torvalds @ 2001-01-16 9:48 ` Ingo Molnar 2000-01-01 2:02 ` Pavel Machek ` (5 more replies) 0 siblings, 6 replies; 70+ messages in thread From: Ingo Molnar @ 2001-01-16 9:48 UTC (permalink / raw) To: Linus Torvalds; +Cc: dean gaudet, Linux Kernel List, Jonathan Thackray On Mon, 15 Jan 2001, Linus Torvalds wrote: > > _syscall4 (int, sendpath, int, out_fd, char *, path, off_t *, off, size_t, size) > You want to do a non-blocking send, so that you don't block on the > socket, and do some simple multiplexing in your server. > > And "sendpath()" cannot do that without having to look up the name > again, and again, and again. Which makes the performance > "optimization" a horrible pessimisation. yep, correct. But take a look at the trick it does with file descriptors, i believe it could be a useful way of doing things. It basically privatizes a struct file, without inserting it into the enumerated file descriptors. This shows that 'native files' are possible: file struct without file descriptor integers mapped to them. ob'plug: this privatized file descriptor mechanizm is used in TUX [TUX privatizes files by putting them into the HTTP request structure - ie. timeouts and continuation/nonblocking logic can be done with them]. But TUX is trusted code, and it can pass a struct file to the VFS without having to validate it, and TUX will also free such file descriptors. But even user-space code could use 'native files', via the following, safe mechanizm: 1) current->native_files list, freed at exit_files() time. 2) "struct native_file" which embedds "struct file". It has the following fields: struct native_file { unsigned long master_fingerprint[8]; unsigned long file_fingerprint[8]; struct file file; }; 'fingerprints' are 256 bit, true random numbers. master_fingerprint is global to the kernel and is generated once per boot. It validates the pointer of the structure. The master fingerprint is never known to user-space. file_fingerprint is a 256-bit identifier generated for this native file. The file fingerprint and the (kernel) pointer to the native file is returned to user-space. The cryptographical safety of these 256-bit random numbers guarantees that no breach can occur in a reasonable period of time. It's in essence an 'encrypted' communication between kernel and user-space. user-space thus can pass a pointer to the following structure: struct safe_kpointer { void *kaddr; unsigned long fingerprint[4]; }; the kernel can validate kaddr by 1) validating the pointer via the master fingerprint (every valid kernel pointer must point to a structure that starts with the master fingerprint's copy). Then usage-permissions are validated by checking the file fingerprint (the per-object fingerprint). this is a safe, very fast [ O(1) ] object-permission model. (it's a variation of a former idea of yours.) A process can pass object fingerprints and kernel pointers to other processes too - thus the other process can access the object too. Threads will 'naturally' share objects, because fingerprints are typically stored in memory. 3) on closing a native file the fingerprint is destroyed (first byte of the master fingerprint copy is overwritten). what do you think about this? I believe most of the file APIs can be / should be reworked to use native files, and 'Unix files' would just be a compatibility layer parallel to them. Then various applications could convert to 'native file' usage - i believe file servers which have lots of file descriptors would do this first. (this 'fingerprint' mechanizm can be used for any object, not only files.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()] 2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar @ 2000-01-01 2:02 ` Pavel Machek 2001-01-16 11:13 ` Andi Kleen ` (4 subsequent siblings) 5 siblings, 0 replies; 70+ messages in thread From: Pavel Machek @ 2000-01-01 2:02 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray Hi! > struct safe_kpointer { > void *kaddr; > unsigned long fingerprint[4]; > }; > > the kernel can validate kaddr by 1) validating the pointer via the master > fingerprint (every valid kernel pointer must point to a structure that > starts with the master fingerprint's copy). Then usage-permissions are > validated by checking the file fingerprint (the per-object fingerprint). > > this is a safe, very fast [ O(1) ] object-permission model. (it's a > variation of a former idea of yours.) A process can pass object > fingerprints and kernel pointers to other processes too - thus the other > process can access the object too. Threads will 'naturally' share objects, > because fingerprints are typically stored in memory. I do not know if I'd trust this. First, (fd < current->fdlimit && current->fdlist[fd]) if O(1), too. Sure, passing those is slightly hard, but we can do that already. With your proposal, all hopes for fuser and revoke are out. Ouch; you say process can pass it to other process. How will kernel know not to free fd until _both_ freed it? Plus, you are playing tricks with random numbers. Up to now, only ssh and similar depended on random numbers. Now kernel relies on them during boot. Notice that most important "master fingerprint" is generated first. At that timeyou might not have enough entropy in your pools. Pavel -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()] 2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar 2000-01-01 2:02 ` Pavel Machek @ 2001-01-16 11:13 ` Andi Kleen 2001-01-16 11:26 ` Ingo Molnar 2001-01-16 13:57 ` 'native files', 'object fingerprints' [was: sendpath()] Jamie Lokier ` (3 subsequent siblings) 5 siblings, 1 reply; 70+ messages in thread From: Andi Kleen @ 2001-01-16 11:13 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray On Tue, Jan 16, 2001 at 10:48:34AM +0100, Ingo Molnar wrote: > this is a safe, very fast [ O(1) ] object-permission model. (it's a > variation of a former idea of yours.) A process can pass object > fingerprints and kernel pointers to other processes too - thus the other > process can access the object too. Threads will 'naturally' share objects, >... Just setuid etc. doesn't work with that because access cannot be easily revoked without disturbing other clients. To handle that you would probably need a "relookup if needed" mechanism similar to what NFSv4 has, so that you can force other users to relookup after you revoked a key. That complicates the use a lot though. Also the model depends on good secure random numbers, which is questionable in many environments (e.g. a diskless box where the random device effectively gets no new input) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()] 2001-01-16 11:13 ` Andi Kleen @ 2001-01-16 11:26 ` Ingo Molnar 2001-01-16 11:37 ` Andi Kleen 0 siblings, 1 reply; 70+ messages in thread From: Ingo Molnar @ 2001-01-16 11:26 UTC (permalink / raw) To: Andi Kleen Cc: Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray On Tue, 16 Jan 2001, Andi Kleen wrote: > On Tue, Jan 16, 2001 at 10:48:34AM +0100, Ingo Molnar wrote: > > this is a safe, very fast [ O(1) ] object-permission model. (it's a > > variation of a former idea of yours.) A process can pass object > > fingerprints and kernel pointers to other processes too - thus the other > > process can access the object too. Threads will 'naturally' share objects, > >... > > Just setuid etc. doesn't work with that because access cannot be > easily revoked without disturbing other clients. well, you cannot easily close() an already shared file descriptor in another process's context either. Is revocation so important? Why is setuid() a problem? A native file is just like a normal file, with the difference that not an integer but a fingerprint identifies it, and that access and usage counts are not automatically inherited across some explicit sharing interface. perhaps we could get most of the advantages by allowing the relaxation of the 'allocate first free file descriptor number' rule for normal Unix files? > Also the model depends on good secure random numbers, which is > questionable in many environments (e.g. a diskless box where the > random device effectively gets no new input) true, although newer chipsets include hardware random generators. But indeed, object fingerprints (tokens? ids?) make the random generator a much more central thing. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()] 2001-01-16 11:26 ` Ingo Molnar @ 2001-01-16 11:37 ` Andi Kleen 2001-01-16 12:04 ` O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] Ingo Molnar 0 siblings, 1 reply; 70+ messages in thread From: Andi Kleen @ 2001-01-16 11:37 UTC (permalink / raw) To: Ingo Molnar Cc: Andi Kleen, Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray On Tue, Jan 16, 2001 at 12:26:12PM +0100, Ingo Molnar wrote: > > On Tue, 16 Jan 2001, Andi Kleen wrote: > > > On Tue, Jan 16, 2001 at 10:48:34AM +0100, Ingo Molnar wrote: > > > this is a safe, very fast [ O(1) ] object-permission model. (it's a > > > variation of a former idea of yours.) A process can pass object > > > fingerprints and kernel pointers to other processes too - thus the other > > > process can access the object too. Threads will 'naturally' share objects, > > >... > > > > Just setuid etc. doesn't work with that because access cannot be > > easily revoked without disturbing other clients. > > well, you cannot easily close() an already shared file descriptor in > another process's context either. Is revocation so important? Why is > setuid() a problem? A native file is just like a normal file, with the > difference that not an integer but a fingerprint identifies it, and that > access and usage counts are not automatically inherited across some > explicit sharing interface. Actually on second thought exec() is more a problem than setuid(), because it requires closing for file descriptors. So if you could devise a security model that doesn't depend on exec giving you a clean plate -- then it could work, but would probably not be very unixy. I'm amazed how non flamed you can present radical API ideas though, I even get flamed for much smaller things (like using text errors to replace the hundreds of EINVALs in the rtnetlink message interface) ;);) > > perhaps we could get most of the advantages by allowing the relaxation of > the 'allocate first free file descriptor number' rule for normal Unix > files? Not sure I follow. You mean dup2() ? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] 2001-01-16 11:37 ` Andi Kleen @ 2001-01-16 12:04 ` Ingo Molnar 2001-01-16 12:09 ` Ingo Molnar ` (3 more replies) 0 siblings, 4 replies; 70+ messages in thread From: Ingo Molnar @ 2001-01-16 12:04 UTC (permalink / raw) To: Andi Kleen Cc: Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray On Tue, 16 Jan 2001, Andi Kleen wrote: > > the 'allocate first free file descriptor number' rule for normal Unix > > files? > Not sure I follow. You mean dup2() ? I'm sure you know this: when there are thousands of files open already, much of the overhead of opening a new file comes from the mandatory POSIX requirement of allocating the first not yet allocated file descriptor integer to this file. Eg. if files 0, 1, 2, 10, 11 are already open, the kernel must allocate file descriptor 3. Many utilities rely on this, and the rule makes sense in a select() environment, because it compresses the 'file descriptor spectrum'. But in a non-select(), event-drive environment it becomes unnecessery overhead. - probably the most radical solution is what i suggested, to completely avoid the unique-mapping of file structures to an integer range, and use the address of the file structure (and some cookies) as an identification. - a less radical solution would be to still map file structures to an integer range (file descriptors) and usage-maintain files per processes, but relax the 'allocate first non-allocated integer in the range' rule. I'm not sure exactly how simple this is, but something like this should work: on close()-ing file descriptors the freed file descriptors would be cached in a list (this needs a new, separate structure which must be allocated/freed as well). Something like: struct lazy_filedesc { int fd; struct file *file; } struct task { ... struct lazy_filedesc *lazy_files; ... } the actual filedescriptor bit of a 'lazy file' would be cleared for real on close(), and the '*file' argument is not a real file - it's NULL if at close() time this process wasnt the last user of the file, or contains a pointer to an allocated (but otherwise invalid) file structure. This must happen to ensure the first-free-desc rule, and to optimize freeing/allocate of file structures. Now, if the new code does a: fd = open(...,O_ANY); then the kernel looks at the current->lazy_files list, and tries to set the file descriptor bit in the current->files file table. If successful then open() uses desc->fd and desc->file (if available) for opening the new file, and unlinks+frees the lazy descriptor. If unsuccessful then open() frees desc->file, frees and unlinks the descriptor and goes on to look at the next descriptor. - worst case overhead is the extra allocation overhead of the (very small) lazy file descriptor. Worst-case happens only if O_ANY allocation is mixed in a special way with normal open()s. - Best-case overhead saves us a get_unused_fd() call, which can be *very* expensive (in terms of CPU time and cache footprint) if thousands of files are used. If O_ANY is used mostly, then the best-case is always triggered. - (the number of lazy files must be limited to some sane value) at exit_files() time the current->lazy_files list must be processed. On exec() it does not get inherited. current->lazy_files has no effect on task state or semantics otherwise, it's only an isolated 'information cache'. Have i missed something important? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] 2001-01-16 12:04 ` O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] Ingo Molnar @ 2001-01-16 12:09 ` Ingo Molnar 2001-01-16 12:13 ` Peter Samuelson ` (2 subsequent siblings) 3 siblings, 0 replies; 70+ messages in thread From: Ingo Molnar @ 2001-01-16 12:09 UTC (permalink / raw) To: Andi Kleen Cc: Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray On Tue, 16 Jan 2001, Ingo Molnar wrote: > struct lazy_filedesc { > int fd; > struct file *file; > } in fact "struct file" can (ab)used for this, no need for new structures or new fields. Eg. file->f_flags contains the cached descriptor-information. file->f_list is used for the current->lazy_files ringlist. this way there is no additional allocation overhead in the worst-case. (unless i'm missing something obvious.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] 2001-01-16 12:04 ` O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] Ingo Molnar 2001-01-16 12:09 ` Ingo Molnar @ 2001-01-16 12:13 ` Peter Samuelson 2001-01-16 12:33 ` Ingo Molnar 2001-01-16 12:34 ` Andi Kleen 2001-01-16 13:00 ` Mitchell Blank Jr 3 siblings, 1 reply; 70+ messages in thread From: Peter Samuelson @ 2001-01-16 12:13 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linux Kernel List [Ingo Molnar] > - probably the most radical solution is what i suggested, to > completely avoid the unique-mapping of file structures to an integer > range, and use the address of the file structure (and some cookies) > as an identification. Careful, these must cast to non-negative integers, without clashing. > fd = open(...,O_ANY); I like this idea, but call it O_ALLOCANYFD. Peter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] 2001-01-16 12:13 ` Peter Samuelson @ 2001-01-16 12:33 ` Ingo Molnar 2001-01-16 14:40 ` Felix von Leitner 0 siblings, 1 reply; 70+ messages in thread From: Ingo Molnar @ 2001-01-16 12:33 UTC (permalink / raw) To: Peter Samuelson; +Cc: Linux Kernel List On Tue, 16 Jan 2001, Peter Samuelson wrote: > [Ingo Molnar] > > - probably the most radical solution is what i suggested, to > > completely avoid the unique-mapping of file structures to an integer > > range, and use the address of the file structure (and some cookies) > > as an identification. > > Careful, these must cast to non-negative integers, without clashing. if you read my (radical) proposal, the identification is based on a kernel pointer and a 256-bit random integer. So non-negative integers are not needed. (file-IO system-calls would be modified to detect if 'Unix file descriptors' or pointers to 'native file descriptors' are passed to them, so this is truly radical.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] 2001-01-16 12:33 ` Ingo Molnar @ 2001-01-16 14:40 ` Felix von Leitner 0 siblings, 0 replies; 70+ messages in thread From: Felix von Leitner @ 2001-01-16 14:40 UTC (permalink / raw) To: Linux Kernel List Thus spake Ingo Molnar (mingo@elte.hu): > if you read my (radical) proposal, the identification is based on a kernel > pointer and a 256-bit random integer. So non-negative integers are not > needed. (file-IO system-calls would be modified to detect if 'Unix file > descriptors' or pointers to 'native file descriptors' are passed to them, > so this is truly radical.) Yuck, don't pass pointers in kernel space to user space! NT does it and look what kernel call argument verification havoc it wrought over them! Felix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] 2001-01-16 12:04 ` O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] Ingo Molnar 2001-01-16 12:09 ` Ingo Molnar 2001-01-16 12:13 ` Peter Samuelson @ 2001-01-16 12:34 ` Andi Kleen 2001-01-16 13:00 ` Mitchell Blank Jr 3 siblings, 0 replies; 70+ messages in thread From: Andi Kleen @ 2001-01-16 12:34 UTC (permalink / raw) To: Ingo Molnar Cc: Andi Kleen, Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray On Tue, Jan 16, 2001 at 01:04:22PM +0100, Ingo Molnar wrote: > - a less radical solution would be to still map file structures to an > integer range (file descriptors) and usage-maintain files per processes, > but relax the 'allocate first non-allocated integer in the range' rule. > I'm not sure exactly how simple this is, but something like this should > work: on close()-ing file descriptors the freed file descriptors would be > cached in a list (this needs a new, separate structure which must be > allocated/freed as well). Something like: > > struct lazy_filedesc { > int fd; > struct file *file; > } More generic file -> fd mapping would be useful to speed up poll() too, because the event trigger could directly modify the poll table without a second slow walk over the whole table. So you could add another bit that tells if the fd is open or closed and share it with poll. Also in that table you could just keep a linked ordered free list and not use GFP_ANY, because getting the lowest would be rather cheap. Disadvantage is that it would need more cache and more overhead than the current scheme. [in a way it is a ugly duck like pte<->vma links] > - Best-case overhead saves us a get_unused_fd() call, which can be *very* > expensive (in terms of CPU time and cache footprint) if thousands of > files are used. If O_ANY is used mostly, then the best-case is always > triggered. Really? Does the open_fds bitmap get that big ? Maybe it just needs a faster find_next_zero_bit() @) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] 2001-01-16 12:04 ` O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] Ingo Molnar ` (2 preceding siblings ...) 2001-01-16 12:34 ` Andi Kleen @ 2001-01-16 13:00 ` Mitchell Blank Jr 3 siblings, 0 replies; 70+ messages in thread From: Mitchell Blank Jr @ 2001-01-16 13:00 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linux Kernel List Ingo Molnar wrote: > - probably the most radical solution is what i suggested, to completely > avoid the unique-mapping of file structures to an integer range, and use > the address of the file structure (and some cookies) as an identification. IMO... gross. We do pretty much this exact thing in the ATM code (for the signalling daemon and the kernel exchainging status on VCCs) and it's pretty disgusting. I want to make it go away. > - a less radical solution would be to still map file structures to an > integer range (file descriptors) and usage-maintain files per processes, > but relax the 'allocate first non-allocated integer in the range' rule. [...] > fd = open(...,O_ANY); Yeah, this gets talked about, but I don't think a new flag for open is a good way to do this, because open() isn't the only thing that returns a new fd. What about socket()? pipe()? Maybe we could have a new prctl() control that turns this behavior on and off. Then you'd just have to be careful to turn it back off before calling any library functions that require ordering (like popen). Other than that, I think it'd be a good idea, especially if it could be implemented clean enough to make it CONFIG_'urable. That can't really be fairly judged until someone produces the code. -Mitch - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()] 2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar 2000-01-01 2:02 ` Pavel Machek 2001-01-16 11:13 ` Andi Kleen @ 2001-01-16 13:57 ` Jamie Lokier 2001-01-16 14:27 ` Felix von Leitner ` (2 subsequent siblings) 5 siblings, 0 replies; 70+ messages in thread From: Jamie Lokier @ 2001-01-16 13:57 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray Ingo Molnar wrote: > struct native_file { > unsigned long master_fingerprint[8]; > unsigned long file_fingerprint[8]; > struct file file; > }; > > 'fingerprints' are 256 bit, true random numbers. master_fingerprint is > global to the kernel and is generated once per boot. It validates the > pointer of the structure. The master fingerprint is never known to > user-space. > > file_fingerprint is a 256-bit identifier generated for this native file. > The file fingerprint and the (kernel) pointer to the native file is > returned to user-space. The cryptographical safety of these 256-bit random > numbers guarantees that no breach can occur in a reasonable period of > time. It's in essence an 'encrypted' communication between kernel and > user-space. Sounds similar to the Hurd... -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()] 2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar ` (2 preceding siblings ...) 2001-01-16 13:57 ` 'native files', 'object fingerprints' [was: sendpath()] Jamie Lokier @ 2001-01-16 14:27 ` Felix von Leitner 2001-01-16 17:47 ` Linus Torvalds 2001-01-17 4:39 ` dean gaudet 5 siblings, 0 replies; 70+ messages in thread From: Felix von Leitner @ 2001-01-16 14:27 UTC (permalink / raw) To: Linux Kernel List Thus spake Ingo Molnar (mingo@elte.hu): > But even user-space code could use 'native files', via the following, safe > mechanizm: [something reminiscient of a token from a capability system] > (this 'fingerprint' mechanizm can be used for any object, not only files.) One good thing about tokens is that file handles can be implemented on top of them in user space. On the other hand, there already are mechanisms to pass file descriptors around and so on, so you don't gain anything tangible from your efford. I would advise reading some text books about capability systems, there is a lot to be learned here. But retrofitting something like this on an existing kernel is probably not a very good idea. Experience shows that you can't "un-bloat" a piece of software by introducing a few elegant concepts. The compatibility stuff eats most of the benefits. Felix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()] 2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar ` (3 preceding siblings ...) 2001-01-16 14:27 ` Felix von Leitner @ 2001-01-16 17:47 ` Linus Torvalds 2001-01-17 4:39 ` dean gaudet 5 siblings, 0 replies; 70+ messages in thread From: Linus Torvalds @ 2001-01-16 17:47 UTC (permalink / raw) To: Ingo Molnar; +Cc: dean gaudet, Linux Kernel List, Jonathan Thackray On Tue, 16 Jan 2001, Ingo Molnar wrote: > > yep, correct. But take a look at the trick it does with file descriptors, > i believe it could be a useful way of doing things. It basically > privatizes a struct file, without inserting it into the enumerated file > descriptors. This shows that 'native files' are possible: file struct > without file descriptor integers mapped to them. That's nothing new: the exec() code does exactly the same. In fact, there's a function for it: filp_open() and filp_close(). Which do a better job of it than your private implementation did, I suspect. I don't think your object fingerprints are anything more generic than the current file descriptors. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()] 2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar ` (4 preceding siblings ...) 2001-01-16 17:47 ` Linus Torvalds @ 2001-01-17 4:39 ` dean gaudet 5 siblings, 0 replies; 70+ messages in thread From: dean gaudet @ 2001-01-17 4:39 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel List, Jonathan Thackray On Tue, 16 Jan 2001, Ingo Molnar wrote: > But even user-space code could use 'native files', via the following, safe > mechanizm: so here's an alternative to ingo's proposal which i think solves some of the other objections raised. it's something i've proposed in the past under the name "extended file handles". struct extended_file_permission { int refcount; some form of mutex to protect refcount; some list structure head; }; struct extended_file { struct file *file; struct extended_file_permission *perm; whatever list foo is needed to link with extended_file_perm above; }; if you allocate a few huge arrays of struct extended_file, then you can verify if a pointer passed from user space fits into one of those arrays pretty quickly. struct task has a struct extended_file_permission * added to it to indicate which perm struct that task is associated with. so you just compare the f->perm to current->extended_file_perm and you know if the task is allowed to use it or not. clone() allows you to create tasks sharing the same extended_file_permissions. fork()/exec() would create new extended_file_perms -- which implicitly causes all those files to be closed. this gives you pretty light cgi fork()/exec() off a main "process" which is handling thousands of sockets. i also proposed various methods of doing O_foo flag inheritance... but the above is more interesting. -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [patch] sendpath() support, 2.4.0-test3/-ac9 2001-01-16 4:51 ` dean gaudet 2001-01-16 4:59 ` Linus Torvalds @ 2001-01-16 9:19 ` Ingo Molnar 2001-01-17 0:03 ` dean gaudet 1 sibling, 1 reply; 70+ messages in thread From: Ingo Molnar @ 2001-01-16 9:19 UTC (permalink / raw) To: dean gaudet; +Cc: Linus Torvalds, Linux Kernel List, Jonathan Thackray On Mon, 15 Jan 2001, dean gaudet wrote: > > just for kicks i've implemented sendpath() support. > > > > _syscall4 (int, sendpath, int, out_fd, char *, path, off_t *, off, size_t, size) > > hey so how do you implement transmit timeouts with sendpath() ? > (i.e. drop the client after 30 seconds of no progress.) well this problem is not unique to sendpath(), sendfile() has it as well. in TUX i've added per-socket connection timers, and i believe something like this should be done in Apache as well - timers are IMO not a good enough excuse for avoiding event-based IO models and using select() or poll(). Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [patch] sendpath() support, 2.4.0-test3/-ac9 2001-01-16 9:19 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar @ 2001-01-17 0:03 ` dean gaudet 0 siblings, 0 replies; 70+ messages in thread From: dean gaudet @ 2001-01-17 0:03 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel List, Jonathan Thackray On Tue, 16 Jan 2001, Ingo Molnar wrote: > > On Mon, 15 Jan 2001, dean gaudet wrote: > > > > just for kicks i've implemented sendpath() support. > > > > > > _syscall4 (int, sendpath, int, out_fd, char *, path, off_t *, off, size_t, size) > > > > hey so how do you implement transmit timeouts with sendpath() ? > > (i.e. drop the client after 30 seconds of no progress.) > > well this problem is not unique to sendpath(), sendfile() has it as well. hrm? with sendfile() i just send 32k or 64k at a time and use alarm() or non-blocking/select() to implement timeouts. with sendpath() i can do the same thing but i'm gonna pay a path lookup each time... and there's no guarantee that i'm getting the same file each time. > in TUX i've added per-socket connection timers, and i believe something > like this should be done in Apache as well - timers are IMO not a good > enough excuse for avoiding event-based IO models and using select() or > poll(). i wasn't suggesting avoiding sendfile/sendpath -- i just couldn't see how to use sendpath() effectively. explain per-socket connection timers. are they available to the userland? at least with the apache-2.0 i/o stuff i should be able to support kernel-based timers. apache-2.0 uses non-blocking/poll() to implement timeouts -- does write() or sendfile() until there's an EWOULDBLOCK then it calls poll() waiting for write/timeout. with kernel supported timeouts i could just block in the write() and that'd be fine by me. 1.2 used alarm() ... 1.3 communicates each child's activity to the parent through the scoreboard and the parent occasionally wakes up and sends SIGALRM to children that are past their timeout. (that let me get rid of a few syscalls.) -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-15 18:34 ` Jonathan Thackray 2001-01-15 18:46 ` Linus Torvalds @ 2001-01-15 18:58 ` dean gaudet 1 sibling, 0 replies; 70+ messages in thread From: dean gaudet @ 2001-01-15 18:58 UTC (permalink / raw) To: Jonathan Thackray; +Cc: linux-kernel On Mon, 15 Jan 2001, Jonathan Thackray wrote: > > TCP_CORK is useful for FAR more than just sendfile() headers and > > footers. it's arguably the most correct way to write server code. > > Agreed -- the hard-coded Nagle algorithm makes no sense these days. hey, actually a little more thinking this morning made me think nagle *may* have a place. i don't like any of the solutions i've come up with though for this. the problem specifically is how do you implement an efficient HTTP/ng server which supports WebMUX and parallel processing of multiple responses. the problem in a nutshell is that multiple threads may be working on responses which are multiplexed onto a single socket -- there's some extra mux header info used to separate each of the response streams. like what if the response stream is a few hundred HEADs (for cache validation) some of which are static files and others which require some dynamic code. the static responses will finish really fast, and you want to fill up network packets with them. but you don't know when the dynamic responses will finish so you can't be sure when to start sending the packets. i don't know NFSv3 very much, but i imagine it's got similar problems -- any multiplexed request/response protocol allowing out-of-order responses would have this problem. any gurus got suggestions? -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-15 15:24 ` Jonathan Thackray 2001-01-15 15:36 ` Matti Aarnio 2001-01-15 16:05 ` dean gaudet @ 2001-01-15 19:41 ` Ingo Molnar 2001-01-15 20:33 ` Albert D. Cahalan 2 siblings, 1 reply; 70+ messages in thread From: Ingo Molnar @ 2001-01-15 19:41 UTC (permalink / raw) To: Jonathan Thackray; +Cc: Linux Kernel List On Mon, 15 Jan 2001, Jonathan Thackray wrote: > It's a very useful system call and makes file serving much more > scalable, and I'm glad that most Un*xes now have support for it > (Linux, FreeBSD, HP-UX, AIX, Tru64). The next cool feature to add to > Linux is sendpath(), which does the open() before the sendfile() all > combined into one system call. i believe the right model for a user-space webserver is to cache open file descriptors, and directly hash URLs to open files. This way you can do pure sendfile() without any open(). Not that open() is too expensive in Linux: m:~/lm/lmbench-2alpha9/bin/i686-linux> ./lat_syscall open Simple open/close: 7.5756 microseconds m:~/lm/lmbench-2alpha9/bin/i686-linux> ./lat_syscall stat Simple stat: 5.4864 microseconds m:~/lm/lmbench-2alpha9/bin/i686-linux> ./lat_syscall write Simple write: 0.9614 microseconds m:~/lm/lmbench-2alpha9/bin/i686-linux> ./lat_syscall read Simple read: 1.1420 microseconds m:~/lm/lmbench-2alpha9/bin/i686-linux> ./lat_syscall null Simple syscall: 0.6349 microseconds (note that lmbench opens a nontrivial path, it can be cheaper than this.) nevertheless saving the lookup can be win. [ TUX uses dentries directly so there is no file opening cost - it's pretty equivalent to sendpath(), with the difference that TUX can do security evaluation on the (held) file prior sending it - while sendpath() is pretty much a shot into the dark. ] Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-15 19:41 ` Ingo Molnar @ 2001-01-15 20:33 ` Albert D. Cahalan 2001-01-15 21:00 ` Linus Torvalds 2001-01-16 10:40 ` Felix von Leitner 0 siblings, 2 replies; 70+ messages in thread From: Albert D. Cahalan @ 2001-01-15 20:33 UTC (permalink / raw) To: mingo; +Cc: Jonathan Thackray, Linux Kernel List Ingo Molnar writes: > On Mon, 15 Jan 2001, Jonathan Thackray wrote: >> It's a very useful system call and makes file serving much more >> scalable, and I'm glad that most Un*xes now have support for it >> (Linux, FreeBSD, HP-UX, AIX, Tru64). The next cool feature to add to >> Linux is sendpath(), which does the open() before the sendfile() all >> combined into one system call. Ingo Molnar's data in a nice table: open/close 7.5756 microseconds stat 5.4864 microseconds write 0.9614 microseconds read 1.1420 microseconds syscall 0.6349 microseconds Rather than combining open() with sendfile(), it could be combined with stat(). Since the syscall would be new anyway, it could skip the normal requirement about returning the next free file descriptor in favor of returning whatever can be most quickly found. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-15 20:33 ` Albert D. Cahalan @ 2001-01-15 21:00 ` Linus Torvalds 2001-01-16 10:40 ` Felix von Leitner 1 sibling, 0 replies; 70+ messages in thread From: Linus Torvalds @ 2001-01-15 21:00 UTC (permalink / raw) To: linux-kernel In article <200101152033.f0FKXpv250839@saturn.cs.uml.edu>, Albert D. Cahalan <acahalan@cs.uml.edu> wrote: >Ingo Molnar writes: >> On Mon, 15 Jan 2001, Jonathan Thackray wrote: > >>> It's a very useful system call and makes file serving much more >>> scalable, and I'm glad that most Un*xes now have support for it >>> (Linux, FreeBSD, HP-UX, AIX, Tru64). The next cool feature to add to >>> Linux is sendpath(), which does the open() before the sendfile() all >>> combined into one system call. > >Ingo Molnar's data in a nice table: > >open/close 7.5756 microseconds >stat 5.4864 microseconds >write 0.9614 microseconds >read 1.1420 microseconds >syscall 0.6349 microseconds > >Rather than combining open() with sendfile(), it could be combined >with stat(). Note that "fstat()" is fairly low-overhead (unlike "stat()" it obviously doesn't have to parse the name again), so "open+fstat" is quite fine as-is. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-15 20:33 ` Albert D. Cahalan 2001-01-15 21:00 ` Linus Torvalds @ 2001-01-16 10:40 ` Felix von Leitner 2001-01-16 11:56 ` Peter Samuelson ` (2 more replies) 1 sibling, 3 replies; 70+ messages in thread From: Felix von Leitner @ 2001-01-16 10:40 UTC (permalink / raw) To: Linux Kernel List Thus spake Albert D. Cahalan (acahalan@cs.uml.edu): > Rather than combining open() with sendfile(), it could be combined > with stat(). Since the syscall would be new anyway, it could skip > the normal requirement about returning the next free file descriptor > in favor of returning whatever can be most quickly found. I don't know how Linux does it, but returning the first free file descriptor can be implemented as O(1) operation. Felix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-16 10:40 ` Felix von Leitner @ 2001-01-16 11:56 ` Peter Samuelson 2001-01-16 12:37 ` Ingo Molnar 2001-01-16 12:42 ` Ingo Molnar 2 siblings, 0 replies; 70+ messages in thread From: Peter Samuelson @ 2001-01-16 11:56 UTC (permalink / raw) To: Linux Kernel List [Felix von Leitner] > I don't know how Linux does it, but returning the first free file > descriptor can be implemented as O(1) operation. How exactly? Maybe I'm being dense today. Having used up the lowest available fd, how do you find the next-lowest one, the next open()? I can't think of anything that isn't O(n). (Sure you can amortize it different ways by keeping lists of fds, etc.) Peter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-16 10:40 ` Felix von Leitner 2001-01-16 11:56 ` Peter Samuelson @ 2001-01-16 12:37 ` Ingo Molnar 2001-01-16 12:42 ` Ingo Molnar 2 siblings, 0 replies; 70+ messages in thread From: Ingo Molnar @ 2001-01-16 12:37 UTC (permalink / raw) To: Felix von Leitner; +Cc: Linux Kernel List On Tue, 16 Jan 2001, Felix von Leitner wrote: > I don't know how Linux does it, but returning the first free file > descriptor can be implemented as O(1) operation. only if special allocation patters are assumed. Otherwise it cannot be a generic O(1) solution. The first-free rule adds an implicit ordering to the file descriptor space, and this order cannot be maintained in an O(1) way. Linux can allocate up to a million file descriptors. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-16 10:40 ` Felix von Leitner 2001-01-16 11:56 ` Peter Samuelson 2001-01-16 12:37 ` Ingo Molnar @ 2001-01-16 12:42 ` Ingo Molnar 2001-01-16 12:47 ` Felix von Leitner 2 siblings, 1 reply; 70+ messages in thread From: Ingo Molnar @ 2001-01-16 12:42 UTC (permalink / raw) To: Felix von Leitner; +Cc: Linux Kernel List On Tue, 16 Jan 2001, Felix von Leitner wrote: > I don't know how Linux does it, but returning the first free file > descriptor can be implemented as O(1) operation. to put it more accurately: the requirement is to be able to open(), use and close() an unlimited number of file descriptors with O(1) overhead, under any allocation pattern, with only RAM limiting the number of files. Both of my proposals attempt to provide this. It's possible to open() O(1) but do a O(log(N)) close(), but that is of no practical value IMO. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-16 12:42 ` Ingo Molnar @ 2001-01-16 12:47 ` Felix von Leitner 2001-01-16 13:48 ` Jamie Lokier 0 siblings, 1 reply; 70+ messages in thread From: Felix von Leitner @ 2001-01-16 12:47 UTC (permalink / raw) To: Linux Kernel List Thus spake Ingo Molnar (mingo@elte.hu): > > I don't know how Linux does it, but returning the first free file > > descriptor can be implemented as O(1) operation. > to put it more accurately: the requirement is to be able to open(), use > and close() an unlimited number of file descriptors with O(1) overhead, > under any allocation pattern, with only RAM limiting the number of files. > Both of my proposals attempt to provide this. It's possible to open() O(1) > but do a O(log(N)) close(), but that is of no practical value IMO. I cheated. I was only talking about open(). close() is of course more expensive then. Other than that: where does the requirement come from? Can't we just use a free list where we prepend closed fds and always use the first one on open()? That would even increase spatial locality and be good for the CPU caches. Felix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-16 12:47 ` Felix von Leitner @ 2001-01-16 13:48 ` Jamie Lokier 2001-01-16 14:20 ` Felix von Leitner 0 siblings, 1 reply; 70+ messages in thread From: Jamie Lokier @ 2001-01-16 13:48 UTC (permalink / raw) To: Linux Kernel List Felix von Leitner wrote: > I cheated. I was only talking about open(). > close() is of course more expensive then. > > Other than that: where does the requirement come from? > Can't we just use a free list where we prepend closed fds and always use > the first one on open()? That would even increase spatial locality and > be good for the CPU caches. You would need to use a new open() flag: O_ANYFD. The requirement comes from this like this: close (0); close (1); close (2); open ("/dev/console", O_RDWR); dup (); dup (); -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-16 13:48 ` Jamie Lokier @ 2001-01-16 14:20 ` Felix von Leitner 2001-01-16 15:05 ` David L. Parsley 0 siblings, 1 reply; 70+ messages in thread From: Felix von Leitner @ 2001-01-16 14:20 UTC (permalink / raw) To: Linux Kernel List Thus spake Jamie Lokier (lk@tantalophile.demon.co.uk): > You would need to use a new open() flag: O_ANYFD. > The requirement comes from this like this: > close (0); > close (1); > close (2); > open ("/dev/console", O_RDWR); > dup (); > dup (); So it's not actually part of POSIX, it's just to get around fixing legacy code? ;-) Felix - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-16 14:20 ` Felix von Leitner @ 2001-01-16 15:05 ` David L. Parsley 2001-01-16 15:05 ` Jakub Jelinek 2001-01-17 19:27 ` dean gaudet 0 siblings, 2 replies; 70+ messages in thread From: David L. Parsley @ 2001-01-16 15:05 UTC (permalink / raw) To: Felix von Leitner, linux-kernel, mingo Felix von Leitner wrote: > > close (0); > > close (1); > > close (2); > > open ("/dev/console", O_RDWR); > > dup (); > > dup (); > > So it's not actually part of POSIX, it's just to get around fixing > legacy code? ;-) This makes me wonder... If the kernel only kept a queue of the three smallest unused fd's, and when the queue emptied handed out whatever it liked, how many things would break? I suspect this would cover a lot of bases... <dons flameproof underwear> regards, David -- David L. Parsley Network Administrator Roanoke College - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-16 15:05 ` David L. Parsley @ 2001-01-16 15:05 ` Jakub Jelinek 2001-01-16 15:46 ` David L. Parsley 2001-01-17 19:27 ` dean gaudet 1 sibling, 1 reply; 70+ messages in thread From: Jakub Jelinek @ 2001-01-16 15:05 UTC (permalink / raw) To: David L. Parsley; +Cc: Felix von Leitner, linux-kernel, mingo On Tue, Jan 16, 2001 at 10:05:06AM -0500, David L. Parsley wrote: > Felix von Leitner wrote: > > > close (0); > > > close (1); > > > close (2); > > > open ("/dev/console", O_RDWR); > > > dup (); > > > dup (); > > > > So it's not actually part of POSIX, it's just to get around fixing > > legacy code? ;-) > > This makes me wonder... > > If the kernel only kept a queue of the three smallest unused fd's, and > when the queue emptied handed out whatever it liked, how many things > would break? I suspect this would cover a lot of bases... First it would break Unix98 and other standards: The Single UNIX (R) Specification, Version 2 Copyright (c) 1997 The Open Group ... int open(const char *path, int oflag, ... ); ... The open() function will return a file descriptor for the named file that is the lowest file descriptor not currently open for that process. The open file description is new, and therefore the file descriptor does not share it with any other process in the system. The FD_CLOEXEC file descriptor flag associated with the new file descriptor will be cleared. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-16 15:05 ` Jakub Jelinek @ 2001-01-16 15:46 ` David L. Parsley 2001-01-18 14:00 ` Laramie Leavitt 0 siblings, 1 reply; 70+ messages in thread From: David L. Parsley @ 2001-01-16 15:46 UTC (permalink / raw) To: Jakub Jelinek, linux-kernel, leitner, mingo Jakub Jelinek wrote: > > This makes me wonder... > > > > If the kernel only kept a queue of the three smallest unused fd's, and > > when the queue emptied handed out whatever it liked, how many things > > would break? I suspect this would cover a lot of bases... > > First it would break Unix98 and other standards: [snip] Yeah, I reallized it would violate at least POSIX. The discussion was just bandying about ways to avoid an expensive 'open()' without breaking lots of utilities and glibc stuff. This might be something that could be configured for specific server environments, where performance is more imporant than POSIX/Unix98, but you still don't want to completely break the system. Just a thought, brain-damaged as it might be. ;-) regards, David -- David L. Parsley Network Administrator Roanoke College - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* RE: Is sendfile all that sexy? 2001-01-16 15:46 ` David L. Parsley @ 2001-01-18 14:00 ` Laramie Leavitt 0 siblings, 0 replies; 70+ messages in thread From: Laramie Leavitt @ 2001-01-18 14:00 UTC (permalink / raw) To: linux-kernel > Jakub Jelinek wrote: > > > > This makes me wonder... > > > > > > If the kernel only kept a queue of the three smallest unused fd's, and > > > when the queue emptied handed out whatever it liked, how many things > > > would break? I suspect this would cover a lot of bases... > > > > First it would break Unix98 and other standards: > [snip] > > Yeah, I reallized it would violate at least POSIX. The discussion was > just bandying about ways to avoid an expensive 'open()' without breaking > lots of utilities and glibc stuff. This might be something that could > be configured for specific server environments, where performance is > more imporant than POSIX/Unix98, but you still don't want to completely > break the system. Just a thought, brain-damaged as it might be. ;-) > Merely following the discussion a thought occurred to me of how to make fd allocation fairly efficient (and simple) even if it retains the O(n) structure worst case. I don't know how it is currently implemented so this may be how it is done, or I may be way off base. First, keep a table of FDs in sorted order ( mark deleted entries ) that you can access quickly. O(1) lookup. Then, maintain this struct like struct { int lowest_fd; int highest_fd; } open: if( lowest_fd == highest_fd ) { fd = lowest_fd; lowest_fd = ++highest_fd; } if( flags == IGNORE_UNIX98 ) { fd = highest_fd++; } else { fd = lowest_fd lowest_fd = linear_search( lowest_fd+1, highest_fd ); } close: if( fd < lowest_fd ) { lowest_fd = fd; } else if( fd == highest_fd - 1 ) { if( highest_fd == lowest_fd ) { lowest_fd = --highest_fd; } else { highest_fd; } } For common cases this would be fairly quick. It would be very easy to implement an O(1) allocation if you want it to be fast ( at the expense of a growing file handle table. ) Just thinking about it. Laramie. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-16 15:05 ` David L. Parsley 2001-01-16 15:05 ` Jakub Jelinek @ 2001-01-17 19:27 ` dean gaudet 1 sibling, 0 replies; 70+ messages in thread From: dean gaudet @ 2001-01-17 19:27 UTC (permalink / raw) To: David L. Parsley; +Cc: Felix von Leitner, linux-kernel, mingo On Tue, 16 Jan 2001, David L. Parsley wrote: > Felix von Leitner wrote: > > > close (0); > > > close (1); > > > close (2); > > > open ("/dev/console", O_RDWR); > > > dup (); > > > dup (); > > > > So it's not actually part of POSIX, it's just to get around fixing > > legacy code? ;-) it's part of POSIX. > This makes me wonder... > > If the kernel only kept a queue of the three smallest unused fd's, and > when the queue emptied handed out whatever it liked, how many things > would break? I suspect this would cover a lot of bases... apache-1.3 relies on the open-lowest-numbered-free-fd behaviour... but only as a band-aid to work around other broken behaviours surrounding FD_SETSIZE. when opening the log files, and listening sockets apache uses fcntl(F_DUPFD) to push them all higher than fd 15. (see ap_slack) some sites are configured in a way that there's thousands of log files or listening fds (both are bogus configs in my opinion, but hey, let the admin shoot themself). this generally leaves a handful of low numbered fds available. this pretty much protects apache from broken libraries compiled with small FD_SETSIZE, or which otherwise can't handle big fds. libc used to be just such a library because it used select() in the DNS resolver code. (a libc guru can tell you when this was fixed.) it also ensures that the client fd will be low numbered, and lets us be lazy and just use select() rather than do all the config tests to figure out which OSs support poll(). it's all pretty gross... but then select() is pretty gross and it's essentially the bug that necessitated this. (solaris also has a stupid FILE * limitation that it can't use fds > 255 in a FILE * ... which breaks even more libraries than fds >= FD_SETSIZE.) -dean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 20:22 ` Linus Torvalds ` (2 preceding siblings ...) 2001-01-15 15:24 ` Jonathan Thackray @ 2001-01-24 0:58 ` Sasi Peter 2001-01-24 8:44 ` James Sutherland 2001-01-25 10:20 ` Anton Blanchard 3 siblings, 2 replies; 70+ messages in thread From: Sasi Peter @ 2001-01-24 0:58 UTC (permalink / raw) To: linux-kernel On 14 Jan 2001, Linus Torvalds wrote: > The only obvious use for it is file serving, and as high-performance > file serving tends to end up as a kernel module in the end anyway (the > only hold-out is samba, and that's been discussed too), "sendfile()" > really is more a proof of concept than anything else. No plans for samba to use sendfile? Even better make it a tux-like module? (that would enable Netware-Linux like performance with the standard kernel... would be cool afterall ;) -- SaPE - Peter, Sasi - mailto:sape@sch.hu - http://sape.iq.rulez.org/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-24 0:58 ` Sasi Peter @ 2001-01-24 8:44 ` James Sutherland 2001-01-25 10:20 ` Anton Blanchard 1 sibling, 0 replies; 70+ messages in thread From: James Sutherland @ 2001-01-24 8:44 UTC (permalink / raw) To: Sasi Peter; +Cc: linux-kernel On Wed, 24 Jan 2001, Sasi Peter wrote: > On 14 Jan 2001, Linus Torvalds wrote: > > > The only obvious use for it is file serving, and as high-performance > > file serving tends to end up as a kernel module in the end anyway (the > > only hold-out is samba, and that's been discussed too), "sendfile()" > > really is more a proof of concept than anything else. > > No plans for samba to use sendfile? Even better make it a tux-like module? > (that would enable Netware-Linux like performance with the standard > kernel... would be cool afterall ;) AIUI, Jeff Merkey was working on loading "userspace" apps into the kernel to tackle this sort of problem generically. I don't know if he's tried it with Samba - the forking would probably be a problem... James. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-24 0:58 ` Sasi Peter 2001-01-24 8:44 ` James Sutherland @ 2001-01-25 10:20 ` Anton Blanchard 2001-01-25 10:58 ` Sasi Peter 1 sibling, 1 reply; 70+ messages in thread From: Anton Blanchard @ 2001-01-25 10:20 UTC (permalink / raw) To: Sasi Peter; +Cc: linux-kernel > No plans for samba to use sendfile? Even better make it a tux-like module? > (that would enable Netware-Linux like performance with the standard > kernel... would be cool afterall ;) I have patches for samba to do sendfile. Making a tux module does not make sense to me, especially since we are nowhere near the limits of samba in userspace. Once userspace samba can run no faster, then we should think about other options. Anton - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-25 10:20 ` Anton Blanchard @ 2001-01-25 10:58 ` Sasi Peter 2001-01-26 6:10 ` Anton Blanchard 0 siblings, 1 reply; 70+ messages in thread From: Sasi Peter @ 2001-01-25 10:58 UTC (permalink / raw) To: Anton Blanchard; +Cc: linux-kernel On Thu, 25 Jan 2001, Anton Blanchard wrote: > I have patches for samba to do sendfile. Making a tux module does not make > sense to me, especially since we are nowhere near the limits of samba in > userspace. Once userspace samba can run no faster, then we should think > about other options. Do you have it at a URL? -- SaPE - Peter, Sasi - mailto:sape@sch.hu - http://sape.iq.rulez.org/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-25 10:58 ` Sasi Peter @ 2001-01-26 6:10 ` Anton Blanchard 2001-01-26 11:46 ` David S. Miller 0 siblings, 1 reply; 70+ messages in thread From: Anton Blanchard @ 2001-01-26 6:10 UTC (permalink / raw) To: Sasi Peter; +Cc: linux-kernel > Do you have it at a URL? The patch is small so I have attached it to this email. It should apply to the samba CVS tree. Remember this is still a hack and I need to add code to ensure the file is not truncated and we sendfile() less than we promised. (After talking to tridge and davem, this should be fixed shortly.) There is a lot more going on than in the web serving case, so sendfile+zero copy is not going to help us as much as it did for the tux guys. For example currently on 2.4.0 + zero copy patches: anton@drongo:~/dbench$ ~anton/samba/source/bin/smbtorture //otherhost/netbench -U% -N 15 NBW95 read/write: Throughput 16.5478 MB/sec (NB=20.6848 MB/sec 165.478 MBit/sec) sendfile: Throughput 17.0128 MB/sec (NB=21.266 MB/sec 170.128 MBit/sec) Of course there is still lots to be done :) Cheers, Anton diff -u -u -r1.195 includes.h --- source/include/includes.h 2000/12/06 00:05:14 1.195 +++ source/include/includes.h 2001/01/26 05:38:51 @@ -871,7 +871,8 @@ /* default socket options. Dave Miller thinks we should default to TCP_NODELAY given the socket IO pattern that Samba uses */ -#ifdef TCP_NODELAY + +#if 0 #define DEFAULT_SOCKET_OPTIONS "TCP_NODELAY" #else #define DEFAULT_SOCKET_OPTIONS "" diff -u -u -r1.257 reply.c --- source/smbd/reply.c 2001/01/24 19:34:53 1.257 +++ source/smbd/reply.c 2001/01/26 05:38:53 @@ -2383,6 +2391,51 @@ END_PROFILE(SMBreadX); return(ERROR(ERRDOS,ERRlock)); } + +#if 1 + /* We can use sendfile if it is not chained */ + if (CVAL(inbuf,smb_vwv0) == 0xFF) { + off_t tmpoffset; + struct stat buf; + int flags = 0; + + nread = smb_maxcnt; + + fstat(fsp->fd, &buf); + if (startpos > buf.st_size) + return(UNIXERROR(ERRDOS,ERRnoaccess)); + if (nread > (buf.st_size - startpos)) + nread = (buf.st_size - startpos); + + SSVAL(outbuf,smb_vwv5,nread); + SSVAL(outbuf,smb_vwv6,smb_offset(data,outbuf)); + SSVAL(smb_buf(outbuf),-2,nread); + CVAL(outbuf,smb_vwv0) = 0xFF; + set_message(outbuf,12,nread,False); + +#define MSG_MORE 0x8000 + if (nread > 0) + flags = MSG_MORE; + if (send(smbd_server_fd(), outbuf, data - outbuf, flags) == -1) + DEBUG(0,("reply_read_and_X: send ERROR!\n")); + + tmpoffset = startpos; + while(nread) { + int nwritten; + nwritten = sendfile(smbd_server_fd(), fsp->fd, &tmpoffset, nread); + if (nwritten == -1) + DEBUG(0,("reply_read_and_X: sendfile ERROR!\n")); + + if (!nwritten) + break; + + nread -= nwritten; + } + + return -1; + } +#endif + nread = read_file(fsp,data,startpos,smb_maxcnt); if (nread < 0) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-26 6:10 ` Anton Blanchard @ 2001-01-26 11:46 ` David S. Miller 2001-01-26 14:12 ` Anton Blanchard 0 siblings, 1 reply; 70+ messages in thread From: David S. Miller @ 2001-01-26 11:46 UTC (permalink / raw) To: Anton Blanchard; +Cc: Sasi Peter, linux-kernel Anton Blanchard writes: > diff -u -u -r1.257 reply.c > --- source/smbd/reply.c 2001/01/24 19:34:53 1.257 > +++ source/smbd/reply.c 2001/01/26 05:38:53 > @@ -2383,6 +2391,51 @@ ... > + while(nread) { > + int nwritten; > + nwritten = sendfile(smbd_server_fd(), fsp->fd, &tmpoffset, nread); > + if (nwritten == -1) > + DEBUG(0,("reply_read_and_X: sendfile ERROR!\n")); > + > + if (!nwritten) > + break; > + > + nread -= nwritten; > + } > + > + return -1; Anton, why are you always returning -1 (which means error for the smb_message[] array functions) when using sendfile? Aren't you supposed to return the number of bytes output or something like this? I'm probably missing something subtle here, so just let me know what I missed. Thanks. Later, David S. Miller davem@redhat.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-26 11:46 ` David S. Miller @ 2001-01-26 14:12 ` Anton Blanchard 0 siblings, 0 replies; 70+ messages in thread From: Anton Blanchard @ 2001-01-26 14:12 UTC (permalink / raw) To: David S. Miller; +Cc: Sasi Peter, linux-kernel Hi Dave, How are the VB withdrawal symptoms going? :) > Anton, why are you always returning -1 (which means error for the > smb_message[] array functions) when using sendfile? Returning -1 tells the higher level code that we actually sent the bytes out ourselves and not to bother doing it. > Aren't you supposed to return the number of bytes output or > something like this? Only if you want the code to do a send() on outbuf which we dont here. Cheers, Anton - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-14 18:29 Is sendfile all that sexy? jamal 2001-01-14 18:50 ` Ingo Molnar 2001-01-14 20:22 ` Linus Torvalds @ 2001-01-15 23:16 ` Pavel Machek 2001-01-16 13:47 ` jamal 2 siblings, 1 reply; 70+ messages in thread From: Pavel Machek @ 2001-01-15 23:16 UTC (permalink / raw) To: jamal, linux-kernel, netdev Hi! > TWO observations: > - Given Linux's non-pre-emptability of the kernel i get the feeling that > sendfile could starve other user space programs. Imagine trying to send a > 1Gig file on 10Mbps pipe in one shot. Hehe, try sigkilling process doing that transfer. Last time I tried it it did not work. Pavel -- I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-15 23:16 ` Pavel Machek @ 2001-01-16 13:47 ` jamal 2001-01-16 14:41 ` Pavel Machek 0 siblings, 1 reply; 70+ messages in thread From: jamal @ 2001-01-16 13:47 UTC (permalink / raw) To: Pavel Machek; +Cc: linux-kernel, netdev On Tue, 16 Jan 2001, Pavel Machek wrote: > > TWO observations: > > - Given Linux's non-pre-emptability of the kernel i get the feeling that > > sendfile could starve other user space programs. Imagine trying to send a > > 1Gig file on 10Mbps pipe in one shot. > > Hehe, try sigkilling process doing that transfer. Last time I tried it > it did not work. >From Alexey's response: it does get descheduled possibly every sndbuf send. So you should be able to sneak that sigkill. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy? 2001-01-16 13:47 ` jamal @ 2001-01-16 14:41 ` Pavel Machek 0 siblings, 0 replies; 70+ messages in thread From: Pavel Machek @ 2001-01-16 14:41 UTC (permalink / raw) To: jamal; +Cc: linux-kernel, netdev Hi! > > > TWO observations: > > > - Given Linux's non-pre-emptability of the kernel i get the feeling that > > > sendfile could starve other user space programs. Imagine trying to send a > > > 1Gig file on 10Mbps pipe in one shot. > > > > Hehe, try sigkilling process doing that transfer. Last time I tried it > > it did not work. > > >From Alexey's response: it does get descheduled possibly every sndbuf > send. So you should be able to sneak that sigkill. Did you actually tried it? Last time I did the test, SIGKILL did not make it in. sendfile did not actually check for signals... (And you could do something like send 100MB from cache into dev null. I do not see where sigkill could sneak in in this case.) Pavel -- The best software in life is free (not shareware)! Pavel GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 70+ messages in thread
end of thread, other threads:[~2001-01-26 14:16 UTC | newest] Thread overview: 70+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-01-14 18:29 Is sendfile all that sexy? jamal 2001-01-14 18:50 ` Ingo Molnar 2001-01-14 19:02 ` jamal 2001-01-14 19:09 ` Ingo Molnar 2001-01-14 19:18 ` jamal 2001-01-14 20:22 ` Linus Torvalds 2001-01-14 20:38 ` Ingo Molnar 2001-01-14 21:44 ` Linus Torvalds 2001-01-14 21:49 ` Ingo Molnar 2001-01-14 21:54 ` Gerhard Mack 2001-01-14 22:40 ` Linus Torvalds 2001-01-14 22:45 ` J Sloan 2001-01-15 20:15 ` H. Peter Anvin 2001-01-15 3:43 ` Michael Peddemors 2001-01-15 13:02 ` Florian Weimer 2001-01-15 13:45 ` Tristan Greaves 2001-01-15 1:14 ` Dan Hollis 2001-01-15 15:24 ` Jonathan Thackray 2001-01-15 15:36 ` Matti Aarnio 2001-01-15 20:17 ` H. Peter Anvin 2001-01-15 16:05 ` dean gaudet 2001-01-15 18:34 ` Jonathan Thackray 2001-01-15 18:46 ` Linus Torvalds 2001-01-15 20:47 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar 2001-01-16 4:51 ` dean gaudet 2001-01-16 4:59 ` Linus Torvalds 2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar 2000-01-01 2:02 ` Pavel Machek 2001-01-16 11:13 ` Andi Kleen 2001-01-16 11:26 ` Ingo Molnar 2001-01-16 11:37 ` Andi Kleen 2001-01-16 12:04 ` O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] Ingo Molnar 2001-01-16 12:09 ` Ingo Molnar 2001-01-16 12:13 ` Peter Samuelson 2001-01-16 12:33 ` Ingo Molnar 2001-01-16 14:40 ` Felix von Leitner 2001-01-16 12:34 ` Andi Kleen 2001-01-16 13:00 ` Mitchell Blank Jr 2001-01-16 13:57 ` 'native files', 'object fingerprints' [was: sendpath()] Jamie Lokier 2001-01-16 14:27 ` Felix von Leitner 2001-01-16 17:47 ` Linus Torvalds 2001-01-17 4:39 ` dean gaudet 2001-01-16 9:19 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar 2001-01-17 0:03 ` dean gaudet 2001-01-15 18:58 ` Is sendfile all that sexy? dean gaudet 2001-01-15 19:41 ` Ingo Molnar 2001-01-15 20:33 ` Albert D. Cahalan 2001-01-15 21:00 ` Linus Torvalds 2001-01-16 10:40 ` Felix von Leitner 2001-01-16 11:56 ` Peter Samuelson 2001-01-16 12:37 ` Ingo Molnar 2001-01-16 12:42 ` Ingo Molnar 2001-01-16 12:47 ` Felix von Leitner 2001-01-16 13:48 ` Jamie Lokier 2001-01-16 14:20 ` Felix von Leitner 2001-01-16 15:05 ` David L. Parsley 2001-01-16 15:05 ` Jakub Jelinek 2001-01-16 15:46 ` David L. Parsley 2001-01-18 14:00 ` Laramie Leavitt 2001-01-17 19:27 ` dean gaudet 2001-01-24 0:58 ` Sasi Peter 2001-01-24 8:44 ` James Sutherland 2001-01-25 10:20 ` Anton Blanchard 2001-01-25 10:58 ` Sasi Peter 2001-01-26 6:10 ` Anton Blanchard 2001-01-26 11:46 ` David S. Miller 2001-01-26 14:12 ` Anton Blanchard 2001-01-15 23:16 ` Pavel Machek 2001-01-16 13:47 ` jamal 2001-01-16 14:41 ` Pavel Machek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox