* Re: 'native files', 'object fingerprints' [was: sendpath()]
2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar
@ 2000-01-01 2:02 ` Pavel Machek
2001-01-16 11:13 ` Andi Kleen
` (4 subsequent siblings)
5 siblings, 0 replies; 70+ messages in thread
From: Pavel Machek @ 2000-01-01 2:02 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray
Hi!
> struct safe_kpointer {
> void *kaddr;
> unsigned long fingerprint[4];
> };
>
> the kernel can validate kaddr by 1) validating the pointer via the master
> fingerprint (every valid kernel pointer must point to a structure that
> starts with the master fingerprint's copy). Then usage-permissions are
> validated by checking the file fingerprint (the per-object fingerprint).
>
> this is a safe, very fast [ O(1) ] object-permission model. (it's a
> variation of a former idea of yours.) A process can pass object
> fingerprints and kernel pointers to other processes too - thus the other
> process can access the object too. Threads will 'naturally' share objects,
> because fingerprints are typically stored in memory.
I do not know if I'd trust this.
First,
(fd < current->fdlimit && current->fdlist[fd])
if O(1), too. Sure, passing those is slightly hard, but we can do that already.
With your proposal, all hopes for fuser and revoke are out.
Ouch; you say process can pass it to other process. How will kernel know not
to free fd until _both_ freed it?
Plus, you are playing tricks with random numbers. Up to now, only ssh and
similar depended on random numbers. Now kernel relies on them during boot.
Notice that most important "master fingerprint" is generated first. At that
timeyou might not have enough entropy in your pools.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Is sendfile all that sexy?
@ 2001-01-14 18:29 jamal
2001-01-14 18:50 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 70+ messages in thread
From: jamal @ 2001-01-14 18:29 UTC (permalink / raw)
To: linux-kernel, netdev
I thought i'd run some tests on the new zerocopy patches
(this is using a hacked ttcp which knows how to do sendfile
and does MSG_TRUNC for true zero-copy receive, if you know what i mean
;-> ).
2 back to back SMP 2*PII-450Mhz hooked up via 1M acenics (gigE).
MTU 9K.
Before getting excited i had the courage to give plain 2.4.0-pre3 a whirl
and somethings bothered me.
test1:
------
regular ttcp, no ZC and no sendfile. send as much as you can in 15secs;
actually 8192 byte chunks, 2048 of them at a time. Repeat until 15 secs is
complete.
Repeat the test 5 times to narrow experimental deviation.
Throughput: ~99MB/sec (for those obsessed with Mbps ~810Mbps)
CPU abuse: server side 87% client side 22% (the CPU measurement could do
with some work and proper measure for SMP).
test2:
------
sendfile server.
created a file which is 8192*2048 bytes. Again the same 15 second
exercise as test1 (and the 5-set thing):
- throughput: 86MB/sec
- CPU: server 100%, client 17%
So i figured, no problem i'll re-run it with a file 10 times larger.
**I was dissapointed to see no improvement.**
Looking at the system calls being made:
with the non-sendfile version, approximately 182K write-to-socket system
calls were made each writing 8192 bytes, Each call lasted on average
0.08ms.
With sendfile test2: 78 calls were made, each sending the file
size 8192*2048 bytes; each lasted about 199 msecs
TWO observations:
- Given Linux's non-pre-emptability of the kernel i get the feeling that
sendfile could starve other user space programs. Imagine trying to send a
1Gig file on 10Mbps pipe in one shot.
- It doesnt matter if you break down the file into chunks for
self-pre-emption; sendfile is still a pig.
I have a feeling i am missing some very serious shit. So enlighten me.
Has anyone done similar tests?
Anyways, the struggle continues next with zc patches.
cheers,
jamal
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 18:29 Is sendfile all that sexy? jamal
@ 2001-01-14 18:50 ` Ingo Molnar
2001-01-14 19:02 ` jamal
2001-01-14 20:22 ` Linus Torvalds
2001-01-15 23:16 ` Pavel Machek
2 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2001-01-14 18:50 UTC (permalink / raw)
To: jamal; +Cc: linux-kernel, netdev
On Sun, 14 Jan 2001, jamal wrote:
> regular ttcp, no ZC and no sendfile. [...]
> Throughput: ~99MB/sec (for those obsessed with Mbps ~810Mbps)
> CPU abuse: server side 87% client side 22% [...]
> sendfile server.
> - throughput: 86MB/sec
> - CPU: server 100%, client 17%
i believe what you are seeing here is the overhead of the pagecache. When
using sendmsg() only, you do not read() the file every time, right? Is
ttcp using multiple threads? In that case if the sendfile() is using the
*same* file all the time, creating SMP locking overhead.
if this is the case, what result do you get if you use a separate,
isolated file per process? (And i bet that with DaveM's pagecache
scalability patch the situation would also get much better - the global
pagecache_lock hurts.)
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 18:50 ` Ingo Molnar
@ 2001-01-14 19:02 ` jamal
2001-01-14 19:09 ` Ingo Molnar
0 siblings, 1 reply; 70+ messages in thread
From: jamal @ 2001-01-14 19:02 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, netdev
On Sun, 14 Jan 2001, Ingo Molnar wrote:
>
> i believe what you are seeing here is the overhead of the pagecache. When
> using sendmsg() only, you do not read() the file every time, right? Is
In that case just a user space buffer is sent i.e no file association.
> ttcp using multiple threads?
Only a single thread, single flow setup. Very primitive but simple.
> In that case if the sendfile() is using the
> *same* file all the time, creating SMP locking overhead.
>
> if this is the case, what result do you get if you use a separate,
> isolated file per process? (And i bet that with DaveM's pagecache
> scalability patch the situation would also get much better - the global
> pagecache_lock hurts.)
>
Already doing the single file, single process. However, i do run by time
which means i could read the file from the begining(offset 0) to the end
then re-do it for as many times as 15secs would allow. Does this affect
it? I tried one 1.5 GB file, it was oopsing and given my setup right now i
cant trace it. So i am using about 170M which is read about 8 times in
the 15 secs
cheers,
jamal
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 19:02 ` jamal
@ 2001-01-14 19:09 ` Ingo Molnar
2001-01-14 19:18 ` jamal
0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2001-01-14 19:09 UTC (permalink / raw)
To: jamal; +Cc: linux-kernel, netdev
On Sun, 14 Jan 2001, jamal wrote:
> Already doing the single file, single process. [...]
in this case there could still be valid performance differences, as
copying from user-space is cheaper than copying from the pagecache. To
rule out SMP interactions, you could try a UP-IOAPIC kernel on that box.
(I'm also curious what kind of numbers you'll get with the zerocopy
patch.)
> However, i do run by time which means i could read the file from the
> begining(offset 0) to the end then re-do it for as many times as
> 15secs would allow. Does this affect it? [...]
no, in the case of a single thread this should have minimum impact. But
i'd suggest to increase the /proc/sys/net/tcp*mem* values (to 1MB or
more).
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 19:09 ` Ingo Molnar
@ 2001-01-14 19:18 ` jamal
0 siblings, 0 replies; 70+ messages in thread
From: jamal @ 2001-01-14 19:18 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, netdev
On Sun, 14 Jan 2001, Ingo Molnar wrote:
>
> in this case there could still be valid performance differences, as
> copying from user-space is cheaper than copying from the pagecache. To
> rule out SMP interactions, you could try a UP-IOAPIC kernel on that box.
>
Let me complete this with the ZC patches first. then i'll do that.
There are a few retarnsmits; maybe receiver IRQ affinity might help some.
> (I'm also curious what kind of numbers you'll get with the zerocopy
> patch.)
Working on it.
> no, in the case of a single thread this should have minimum impact. But
> i'd suggest to increase the /proc/sys/net/tcp*mem* values (to 1MB or
> more).
The upper thresholds to 1000000 ?
I should have mentioned that i set /proc/sys/net/core/*mem*
to currently 262144.
cheers,
jamal
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 18:29 Is sendfile all that sexy? jamal
2001-01-14 18:50 ` Ingo Molnar
@ 2001-01-14 20:22 ` Linus Torvalds
2001-01-14 20:38 ` Ingo Molnar
` (3 more replies)
2001-01-15 23:16 ` Pavel Machek
2 siblings, 4 replies; 70+ messages in thread
From: Linus Torvalds @ 2001-01-14 20:22 UTC (permalink / raw)
To: linux-kernel
In article <Pine.GSO.4.30.0101141237020.12354-100000@shell.cyberus.ca>,
jamal <hadi@cyberus.ca> wrote:
>
>Before getting excited i had the courage to give plain 2.4.0-pre3 a whirl
>and somethings bothered me.
Note that "sendfile(fd, file, len)" is never going to be faster than
"write(fd, userdata, len)".
That's not the point of sendfile(). The point of sendfile() is to be
faster than the _combination_ of:
addr = mmap(file, ...len...);
write(fd, addr, len);
or
read(file, userdata, len);
write(fd, userdata, len);
and in your case you're not comparing sendfile() against this
combination. You're just comparing sendfile() against a simple
"write()".
And no, I don't actually hink that sendfile() is all that hot. It was
_very_ easy to implement, and can be considered a 5-minute hack to give
a feature that fit very well in the MM architecture, and that the Apache
folks had already been using on other architectures.
The only obvious use for it is file serving, and as high-performance
file serving tends to end up as a kernel module in the end anyway (the
only hold-out is samba, and that's been discussed too), "sendfile()"
really is more a proof of concept than anything else.
Does anybody but apache actually use it?
Linus
PS. I still _like_ sendfile(), even if the above sounds negative. It's
basically a "cool feature" that has zero negative impact on the design
of the system. It uses the same "do_generic_file_read()" that is used
for normal "read()", and is also used by the loop device and by
in-kernel fileserving. But it's not really "important".
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 20:22 ` Linus Torvalds
@ 2001-01-14 20:38 ` Ingo Molnar
2001-01-14 21:44 ` Linus Torvalds
2001-01-14 21:54 ` Gerhard Mack
2001-01-15 1:14 ` Dan Hollis
` (2 subsequent siblings)
3 siblings, 2 replies; 70+ messages in thread
From: Ingo Molnar @ 2001-01-14 20:38 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Linux Kernel List
On 14 Jan 2001, Linus Torvalds wrote:
> Does anybody but apache actually use it?
There is a Samba patch as well that makes it sendfile() based. Various
other projects use it too (phttpd for example), some FTP servers i
believe, and khttpd and TUX.
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 20:38 ` Ingo Molnar
@ 2001-01-14 21:44 ` Linus Torvalds
2001-01-14 21:49 ` Ingo Molnar
2001-01-14 21:54 ` Gerhard Mack
1 sibling, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2001-01-14 21:44 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linux Kernel List
On Sun, 14 Jan 2001, Ingo Molnar wrote:
>
> There is a Samba patch as well that makes it sendfile() based. Various
> other projects use it too (phttpd for example), some FTP servers i
> believe, and khttpd and TUX.
At least khttpd uses "do_generic_file_read()", not sendfile per se. I
assume TUX does too. Sendfile itself is mainly only useful from user
space..
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 21:44 ` Linus Torvalds
@ 2001-01-14 21:49 ` Ingo Molnar
0 siblings, 0 replies; 70+ messages in thread
From: Ingo Molnar @ 2001-01-14 21:49 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Linux Kernel List
On Sun, 14 Jan 2001, Linus Torvalds wrote:
> > There is a Samba patch as well that makes it sendfile() based. Various
> > other projects use it too (phttpd for example), some FTP servers i
> > believe, and khttpd and TUX.
>
> At least khttpd uses "do_generic_file_read()", not sendfile per se. I
> assume TUX does too. Sendfile itself is mainly only useful from user
> space..
yes, you are right. TUX does it mainly to avoid some of the user-space
interfacing overhead present in sys_sendfile(), and to be able to control
packet boundaries. (ie. to have or not have the MSG_MORE flag). So TUX is
using its own sock_send_actor and own read_descriptor.
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 20:38 ` Ingo Molnar
2001-01-14 21:44 ` Linus Torvalds
@ 2001-01-14 21:54 ` Gerhard Mack
2001-01-14 22:40 ` Linus Torvalds
2001-01-15 13:02 ` Florian Weimer
1 sibling, 2 replies; 70+ messages in thread
From: Gerhard Mack @ 2001-01-14 21:54 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel List
On Sun, 14 Jan 2001, Ingo Molnar wrote:
>
> On 14 Jan 2001, Linus Torvalds wrote:
>
> > Does anybody but apache actually use it?
>
> There is a Samba patch as well that makes it sendfile() based. Various
> other projects use it too (phttpd for example), some FTP servers i
> believe, and khttpd and TUX.
Proftpd to name one ftp server, nice little daemon uses linux-privs too.
Gerhard
PS I wish someone would explain to me why distros insist on using WU
instead given it's horrid security record.
--
Gerhard Mack
gmack@innerfire.net
<>< As a computer I find your faith in technology amusing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 21:54 ` Gerhard Mack
@ 2001-01-14 22:40 ` Linus Torvalds
2001-01-14 22:45 ` J Sloan
2001-01-15 3:43 ` Michael Peddemors
2001-01-15 13:02 ` Florian Weimer
1 sibling, 2 replies; 70+ messages in thread
From: Linus Torvalds @ 2001-01-14 22:40 UTC (permalink / raw)
To: Gerhard Mack; +Cc: Ingo Molnar, Linux Kernel List
On Sun, 14 Jan 2001, Gerhard Mack wrote:
>
> PS I wish someone would explain to me why distros insist on using WU
> instead given it's horrid security record.
I think it's a case of "better the devil you know..".
Think of all the security scares sendmail has historically had. But it's a
pretty secure piece of work now - and people know if backwards and
forward. Few people advocate switching from sendmail these days (sure,
they do exist, but what I'm saying is that a long track record that
includes security issues isn't necessarily bad, if it has gotten fixed).
Of course, you may be right on wuftpd. It obviously wasn't designed with
security in mind, other alternatives may be better.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 22:40 ` Linus Torvalds
@ 2001-01-14 22:45 ` J Sloan
2001-01-15 20:15 ` H. Peter Anvin
2001-01-15 3:43 ` Michael Peddemors
1 sibling, 1 reply; 70+ messages in thread
From: J Sloan @ 2001-01-14 22:45 UTC (permalink / raw)
To: Kernel Mailing List
Linus Torvalds wrote:
> Of course, you may be right on wuftpd. It obviously wasn't designed with
> security in mind, other alternatives may be better.
I run proftpd on all my ftp servers - it's fast, configurable
and can do all the tricks I need - even red hat seems to
agree that proftpd is the way to go.
Visit any red hat ftp site and they are running proftpd -
So, why do they keep shipping us wu-ftpd instead?
That really frosts me.
jjs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 20:22 ` Linus Torvalds
2001-01-14 20:38 ` Ingo Molnar
@ 2001-01-15 1:14 ` Dan Hollis
2001-01-15 15:24 ` Jonathan Thackray
2001-01-24 0:58 ` Sasi Peter
3 siblings, 0 replies; 70+ messages in thread
From: Dan Hollis @ 2001-01-15 1:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel
On 14 Jan 2001, Linus Torvalds wrote:
> That's not the point of sendfile(). The point of sendfile() is to be
> faster than the _combination_ of:
> addr = mmap(file, ...len...);
> write(fd, addr, len);
> or
> read(file, userdata, len);
> write(fd, userdata, len);
And boy is it ever. It blows both away by more than double.
Not only that the mmap one grinds my box into the ground with swapping,
while the sendfile() case you can't even tell its running except that the
drive is going like mad.
> Does anybody but apache actually use it?
I wonder why samba doesn't use it.
-Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 22:40 ` Linus Torvalds
2001-01-14 22:45 ` J Sloan
@ 2001-01-15 3:43 ` Michael Peddemors
1 sibling, 0 replies; 70+ messages in thread
From: Michael Peddemors @ 2001-01-15 3:43 UTC (permalink / raw)
To: Gerhard Mack; +Cc: Ingo Molnar, Linux Kernel List
The two things I change everytime are sendmail->qmail and wuftpd->proftpd
But remember, security bugs are caught because more people use one vs the
other.. Bugs in Proftpd weren't caught until more people started changing
from wu-ftpd...
Often, all it means when one product has more bugs than another, is that more
people tried to find bugs in one than another...
(Yes, a plug to get everyone to test 2.4 here)
On Sun, 14 Jan 2001, Linus Torvalds wrote:
> On Sun, 14 Jan 2001, Gerhard Mack wrote:
> > PS I wish someone would explain to me why distros insist on using WU
> > instead given it's horrid security record.
>
> Of course, you may be right on wuftpd. It obviously wasn't designed with
> security in mind, other alternatives may be better.
>
> Linus
--
--------------------------------------------------------
Michael Peddemors - Senior Consultant
Unix Administration - WebSite Hosting
Network Services - Programming
Wizard Internet Services http://www.wizard.ca
Linux Support Specialist - http://www.linuxmagic.com
--------------------------------------------------------
(604) 589-0037 Beautiful British Columbia, Canada
--------------------------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 21:54 ` Gerhard Mack
2001-01-14 22:40 ` Linus Torvalds
@ 2001-01-15 13:02 ` Florian Weimer
2001-01-15 13:45 ` Tristan Greaves
1 sibling, 1 reply; 70+ messages in thread
From: Florian Weimer @ 2001-01-15 13:02 UTC (permalink / raw)
To: Gerhard Mack; +Cc: Linux Kernel List
Gerhard Mack <gmack@innerfire.net> writes:
> PS I wish someone would explain to me why distros insist on using WU
> instead given it's horrid security record.
The security record of Proftpd is not horrid, but embarrassing. They
once claimed to have fixed vulnerability, but in fact introduced
another one...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* RE: Is sendfile all that sexy?
2001-01-15 13:02 ` Florian Weimer
@ 2001-01-15 13:45 ` Tristan Greaves
0 siblings, 0 replies; 70+ messages in thread
From: Tristan Greaves @ 2001-01-15 13:45 UTC (permalink / raw)
To: 'Linux Kernel List'
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Florian Weimer
> Sent: 15 January 2001 13:02
> To: Gerhard Mack
> Cc: Linux Kernel List
> Subject: Re: Is sendfile all that sexy?
>
> The security record of Proftpd is not horrid, but embarrassing. They
> once claimed to have fixed vulnerability, but in fact introduced
> another one...
Oh, come on, this is a classic event in bug fixing. All Software Has
Bugs [TM]. Nothing Is Completely Secure [TM].
As long as the vulnerabilities are fixed as they happen (where possible),
we should be happy.
Tris.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 20:22 ` Linus Torvalds
2001-01-14 20:38 ` Ingo Molnar
2001-01-15 1:14 ` Dan Hollis
@ 2001-01-15 15:24 ` Jonathan Thackray
2001-01-15 15:36 ` Matti Aarnio
` (2 more replies)
2001-01-24 0:58 ` Sasi Peter
3 siblings, 3 replies; 70+ messages in thread
From: Jonathan Thackray @ 2001-01-15 15:24 UTC (permalink / raw)
To: linux-kernel
> Does anybody but apache actually use it?
Zeus uses it! (it was HP who added it to HP-UX first at our request :-)
> PS. I still _like_ sendfile(), even if the above sounds negative. It's
> basically a "cool feature" that has zero negative impact on the design
> of the system. It uses the same "do_generic_file_read()" that is used
> for normal "read()", and is also used by the loop device and by
> in-kernel fileserving. But it's not really "important".
It's a very useful system call and makes file serving much more
scalable, and I'm glad that most Un*xes now have support for it
(Linux, FreeBSD, HP-UX, AIX, Tru64). The next cool feature to add to
Linux is sendpath(), which does the open() before the sendfile()
all combined into one system call.
Ugh, I hear you all scream :-)
Jon.
--
Jonathan Thackray Zeus House, Cowley Road, Cambridge CB4 OZT, UK
Software Engineer +44 1223 525000, fax +44 1223 525100
Zeus Technology http://www.zeus.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-15 15:24 ` Jonathan Thackray
@ 2001-01-15 15:36 ` Matti Aarnio
2001-01-15 20:17 ` H. Peter Anvin
2001-01-15 16:05 ` dean gaudet
2001-01-15 19:41 ` Ingo Molnar
2 siblings, 1 reply; 70+ messages in thread
From: Matti Aarnio @ 2001-01-15 15:36 UTC (permalink / raw)
To: Jonathan Thackray; +Cc: linux-kernel
On Mon, Jan 15, 2001 at 03:24:55PM +0000, Jonathan Thackray wrote:
> It's a very useful system call and makes file serving much more
> scalable, and I'm glad that most Un*xes now have support for it
> (Linux, FreeBSD, HP-UX, AIX, Tru64). The next cool feature to add to
> Linux is sendpath(), which does the open() before the sendfile()
> all combined into one system call.
One thing about 'sendfile' (and likely 'sendpath') is that
current (hammered into running binaries -> unchangeable)
syscalls support only up to 2GB files at 32 bit systems.
Glibc 2.2(9) at RedHat <sys/sendfile.h>:
#ifdef __USE_FILE_OFFSET64
# error "<sendfile.h> cannot be used with _FILE_OFFSET_BITS=64"
#endif
I do admit that doing sendfile() on some extremely large
file is unlikely, but still...
> Ugh, I hear you all scream :-)
> Jon.
> --
> Jonathan Thackray Zeus House, Cowley Road, Cambridge CB4 OZT, UK
> Zeus Technology http://www.zeus.com/
/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-15 15:24 ` Jonathan Thackray
2001-01-15 15:36 ` Matti Aarnio
@ 2001-01-15 16:05 ` dean gaudet
2001-01-15 18:34 ` Jonathan Thackray
2001-01-15 19:41 ` Ingo Molnar
2 siblings, 1 reply; 70+ messages in thread
From: dean gaudet @ 2001-01-15 16:05 UTC (permalink / raw)
To: Jonathan Thackray; +Cc: linux-kernel
On Mon, 15 Jan 2001, Jonathan Thackray wrote:
> (Linux, FreeBSD, HP-UX, AIX, Tru64). The next cool feature to add to
> Linux is sendpath(), which does the open() before the sendfile()
> all combined into one system call.
how would sendpath() construct the Content-Length in the HTTP header?
it's totally unfortunate that the other unixes chose to combine writev()
into sendfile() rather than implementing TCP_CORK. TCP_CORK is useful for
FAR more than just sendfile() headers and footers. it's arguably the most
correct way to write server code. nagle/no-nagle in the default BSD API
both suck -- nagle because it delays packets which need to be sent;
no-nagle because it can send incomplete packets.
i'm completely happy that linus, davem and ingo refused to combine
writev() into sendfile() and suggested CORK when i pointed out the
header/trailer problem.
imnsho if you want to optimise static file serving then it's pretty
pointless to continue working in userland. nobody is going to catch up
with all the kernel-side implementations in linux, NT, and solaris.
-dean
p.s. linus, apache-1.3 does *not* use sendfile(). it's in apache-2.0,
which unfortunately is now performing like crap because they didn't listen
to some of my advice well over a year ago. a case of "let's make a pretty
API and hope performance works out"... where i told them "i've already
written code using the API you suggest, and it *doesn't* work." </rant>
thankfully linux now has TUX.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-15 16:05 ` dean gaudet
@ 2001-01-15 18:34 ` Jonathan Thackray
2001-01-15 18:46 ` Linus Torvalds
2001-01-15 18:58 ` Is sendfile all that sexy? dean gaudet
0 siblings, 2 replies; 70+ messages in thread
From: Jonathan Thackray @ 2001-01-15 18:34 UTC (permalink / raw)
To: dean gaudet; +Cc: linux-kernel
> how would sendpath() construct the Content-Length in the HTTP header?
You'd still stat() the file to decide whether to use sendpath() to
send it or not, if it was Last-Modified: etc. Of course, you'd cache
stat() calls too for a few seconds. The main thing is that you save
a valuable fd and open() is expensive, even more so than stat().
> TCP_CORK is useful for FAR more than just sendfile() headers and
> footers. it's arguably the most correct way to write server code.
Agreed -- the hard-coded Nagle algorithm makes no sense these days.
> imnsho if you want to optimise static file serving then it's pretty
> pointless to continue working in userland. nobody is going to catch up
> with all the kernel-side implementations in linux, NT, and solaris.
Hmmm, there's a place for userland httpds that are within a few
percent of kernel ones (like Zeus is, when I last looked). But I
agree, hybrid approaches will become more common, although the trend
towards server-side dynamic pages negate this. A kernel approach is a
definite win if you're used to using a limited-scalability userland
httpd like Apache.
Jon.
--
Jonathan Thackray Zeus House, Cowley Road, Cambridge CB4 OZT, UK
Software Engineer +44 1223 525000, fax +44 1223 525100
Zeus Technology http://www.zeus.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-15 18:34 ` Jonathan Thackray
@ 2001-01-15 18:46 ` Linus Torvalds
2001-01-15 20:47 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar
2001-01-15 18:58 ` Is sendfile all that sexy? dean gaudet
1 sibling, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2001-01-15 18:46 UTC (permalink / raw)
To: linux-kernel
In article <14947.17050.127502.936533@leda.cam.zeus.com>,
Jonathan Thackray <jthackray@zeus.com> wrote:
>
>> how would sendpath() construct the Content-Length in the HTTP header?
>
>You'd still stat() the file to decide whether to use sendpath() to
>send it or not, if it was Last-Modified: etc. Of course, you'd cache
>stat() calls too for a few seconds. The main thing is that you save
>a valuable fd and open() is expensive, even more so than stat().
"open" expensive?
Maybe on HP-UX and other platforms. But give me numbers: I seriously
doubt that
int fd = open(..)
fstat(fd..);
sendfile(fd..);
close(fd);
is any slower than
.. cache stat() in user space based on name ..
sendpath(name, ..);
on any real load.
>> TCP_CORK is useful for FAR more than just sendfile() headers and
>> footers. it's arguably the most correct way to write server code.
>
>Agreed -- the hard-coded Nagle algorithm makes no sense these days.
The fact I dislike about the HP-UX implementation is that it is so
_obviously_ stupid.
And I have to say that I absolutely despise the BSD people. They did
sendfile() after both Linux and HP-UX had done it, and they must have
known about both implementations. And they chose the HP-UX braindamage,
and even brag about the fact that they were stupid and didn't understand
TCP_CORK (they don't say so in those exact words, of course - they just
show that they were stupid and clueless by the things they brag about).
Oh, well. Not everybody can be as goodlooking as me. It's a curse.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-15 18:34 ` Jonathan Thackray
2001-01-15 18:46 ` Linus Torvalds
@ 2001-01-15 18:58 ` dean gaudet
1 sibling, 0 replies; 70+ messages in thread
From: dean gaudet @ 2001-01-15 18:58 UTC (permalink / raw)
To: Jonathan Thackray; +Cc: linux-kernel
On Mon, 15 Jan 2001, Jonathan Thackray wrote:
> > TCP_CORK is useful for FAR more than just sendfile() headers and
> > footers. it's arguably the most correct way to write server code.
>
> Agreed -- the hard-coded Nagle algorithm makes no sense these days.
hey, actually a little more thinking this morning made me think nagle
*may* have a place. i don't like any of the solutions i've come up with
though for this. the problem specifically is how do you implement an
efficient HTTP/ng server which supports WebMUX and parallel processing of
multiple responses.
the problem in a nutshell is that multiple threads may be working on
responses which are multiplexed onto a single socket -- there's some extra
mux header info used to separate each of the response streams.
like what if the response stream is a few hundred HEADs (for cache
validation) some of which are static files and others which require some
dynamic code. the static responses will finish really fast, and you want
to fill up network packets with them. but you don't know when the dynamic
responses will finish so you can't be sure when to start sending the
packets.
i don't know NFSv3 very much, but i imagine it's got similar problems --
any multiplexed request/response protocol allowing out-of-order responses
would have this problem. any gurus got suggestions?
-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-15 15:24 ` Jonathan Thackray
2001-01-15 15:36 ` Matti Aarnio
2001-01-15 16:05 ` dean gaudet
@ 2001-01-15 19:41 ` Ingo Molnar
2001-01-15 20:33 ` Albert D. Cahalan
2 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2001-01-15 19:41 UTC (permalink / raw)
To: Jonathan Thackray; +Cc: Linux Kernel List
On Mon, 15 Jan 2001, Jonathan Thackray wrote:
> It's a very useful system call and makes file serving much more
> scalable, and I'm glad that most Un*xes now have support for it
> (Linux, FreeBSD, HP-UX, AIX, Tru64). The next cool feature to add to
> Linux is sendpath(), which does the open() before the sendfile() all
> combined into one system call.
i believe the right model for a user-space webserver is to cache open file
descriptors, and directly hash URLs to open files. This way you can do
pure sendfile() without any open(). Not that open() is too expensive in
Linux:
m:~/lm/lmbench-2alpha9/bin/i686-linux> ./lat_syscall open
Simple open/close: 7.5756 microseconds
m:~/lm/lmbench-2alpha9/bin/i686-linux> ./lat_syscall stat
Simple stat: 5.4864 microseconds
m:~/lm/lmbench-2alpha9/bin/i686-linux> ./lat_syscall write
Simple write: 0.9614 microseconds
m:~/lm/lmbench-2alpha9/bin/i686-linux> ./lat_syscall read
Simple read: 1.1420 microseconds
m:~/lm/lmbench-2alpha9/bin/i686-linux> ./lat_syscall null
Simple syscall: 0.6349 microseconds
(note that lmbench opens a nontrivial path, it can be cheaper than this.)
nevertheless saving the lookup can be win.
[ TUX uses dentries directly so there is no file opening cost - it's
pretty equivalent to sendpath(), with the difference that TUX can do
security evaluation on the (held) file prior sending it - while sendpath()
is pretty much a shot into the dark. ]
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 22:45 ` J Sloan
@ 2001-01-15 20:15 ` H. Peter Anvin
0 siblings, 0 replies; 70+ messages in thread
From: H. Peter Anvin @ 2001-01-15 20:15 UTC (permalink / raw)
To: linux-kernel
Followup to: <3A622C25.766F3BCE@pobox.com>
By author: J Sloan <jjs@pobox.com>
In newsgroup: linux.dev.kernel
>
> Linus Torvalds wrote:
>
> > Of course, you may be right on wuftpd. It obviously wasn't designed with
> > security in mind, other alternatives may be better.
>
> I run proftpd on all my ftp servers - it's fast, configurable
> and can do all the tricks I need - even red hat seems to
> agree that proftpd is the way to go.
>
> Visit any red hat ftp site and they are running proftpd -
>
> So, why do they keep shipping us wu-ftpd instead?
>
> That really frosts me.
>
proftpd is not what you want for an FTP server whose main function is
*non-*anonymous access. It is very much written for the sole purpose
of being a great FTP server for a large anonymous FTP site. If you're
running a site large enough to matter, you can replace an RPM or two.
-hpa
--
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-15 15:36 ` Matti Aarnio
@ 2001-01-15 20:17 ` H. Peter Anvin
0 siblings, 0 replies; 70+ messages in thread
From: H. Peter Anvin @ 2001-01-15 20:17 UTC (permalink / raw)
To: linux-kernel
Followup to: <20010115173607.S25659@mea-ext.zmailer.org>
By author: Matti Aarnio <matti.aarnio@zmailer.org>
In newsgroup: linux.dev.kernel
>
> One thing about 'sendfile' (and likely 'sendpath') is that
> current (hammered into running binaries -> unchangeable)
> syscalls support only up to 2GB files at 32 bit systems.
>
> Glibc 2.2(9) at RedHat <sys/sendfile.h>:
>
> #ifdef __USE_FILE_OFFSET64
> # error "<sendfile.h> cannot be used with _FILE_OFFSET_BITS=64"
> #endif
>
> I do admit that doing sendfile() on some extremely large
> file is unlikely, but still...
>
2 GB isn't really that extremely large these days. This is an
unpleasant limitation.
-hpa
--
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-15 19:41 ` Ingo Molnar
@ 2001-01-15 20:33 ` Albert D. Cahalan
2001-01-15 21:00 ` Linus Torvalds
2001-01-16 10:40 ` Felix von Leitner
0 siblings, 2 replies; 70+ messages in thread
From: Albert D. Cahalan @ 2001-01-15 20:33 UTC (permalink / raw)
To: mingo; +Cc: Jonathan Thackray, Linux Kernel List
Ingo Molnar writes:
> On Mon, 15 Jan 2001, Jonathan Thackray wrote:
>> It's a very useful system call and makes file serving much more
>> scalable, and I'm glad that most Un*xes now have support for it
>> (Linux, FreeBSD, HP-UX, AIX, Tru64). The next cool feature to add to
>> Linux is sendpath(), which does the open() before the sendfile() all
>> combined into one system call.
Ingo Molnar's data in a nice table:
open/close 7.5756 microseconds
stat 5.4864 microseconds
write 0.9614 microseconds
read 1.1420 microseconds
syscall 0.6349 microseconds
Rather than combining open() with sendfile(), it could be combined
with stat(). Since the syscall would be new anyway, it could skip
the normal requirement about returning the next free file descriptor
in favor of returning whatever can be most quickly found.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* [patch] sendpath() support, 2.4.0-test3/-ac9
2001-01-15 18:46 ` Linus Torvalds
@ 2001-01-15 20:47 ` Ingo Molnar
2001-01-16 4:51 ` dean gaudet
0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2001-01-15 20:47 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Linux Kernel List, Jonathan Thackray
[-- Attachment #1: Type: TEXT/PLAIN, Size: 596 bytes --]
On 15 Jan 2001, Linus Torvalds wrote:
> int fd = open(..)
> fstat(fd..);
> sendfile(fd..);
> close(fd);
>
> is any slower than
>
> .. cache stat() in user space based on name ..
> sendpath(name, ..);
>
> on any real load.
just for kicks i've implemented sendpath() support. (patch against
2.4.0-test and sample code attached) It appears to work just fine here.
With a bit of reorganization in mm/filemap.c it was quite straightforward
to do.
Jonathan, is this what Zeus needs? If yes, it could be interesting to run
a simple benchmark to compare sendpath() to open()+sendfile()?
Ingo
[-- Attachment #2: Type: TEXT/PLAIN, Size: 4020 bytes --]
--- linux/mm/filemap.c.orig Mon Jan 15 22:43:21 2001
+++ linux/mm/filemap.c Mon Jan 15 23:09:55 2001
@@ -39,6 +39,8 @@
* page-cache, 21.05.1999, Ingo Molnar <mingo@redhat.com>
*
* SMP-threaded pagemap-LRU 1999, Andrea Arcangeli <andrea@suse.de>
+ *
+ * Started sendpath() support, (C) 2000 Ingo Molnar <mingo@redhat.com>
*/
atomic_t page_cache_size = ATOMIC_INIT(0);
@@ -1450,15 +1452,15 @@
return written;
}
-asmlinkage ssize_t sys_sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
+/*
+ * Get input file, and verify that it is ok..
+ */
+static struct file * get_verify_in_file (int in_fd, size_t count)
{
- ssize_t retval;
- struct file * in_file, * out_file;
- struct inode * in_inode, * out_inode;
+ struct inode * in_inode;
+ struct file * in_file;
+ int retval;
- /*
- * Get input file, and verify that it is ok..
- */
retval = -EBADF;
in_file = fget(in_fd);
if (!in_file)
@@ -1474,10 +1476,21 @@
retval = locks_verify_area(FLOCK_VERIFY_READ, in_inode, in_file, in_file->f_pos, count);
if (retval)
goto fput_in;
+ return in_file;
+fput_in:
+ fput(in_file);
+out:
+ return ERR_PTR(retval);
+}
+/*
+ * Get output file, and verify that it is ok..
+ */
+static struct file * get_verify_out_file (int out_fd, size_t count)
+{
+ struct file *out_file;
+ struct inode *out_inode;
+ int retval;
- /*
- * Get output file, and verify that it is ok..
- */
retval = -EBADF;
out_file = fget(out_fd);
if (!out_file)
@@ -1491,6 +1504,29 @@
retval = locks_verify_area(FLOCK_VERIFY_WRITE, out_inode, out_file, out_file->f_pos, count);
if (retval)
goto fput_out;
+ return out_file;
+
+fput_out:
+ fput(out_file);
+fput_in:
+ return ERR_PTR(retval);
+}
+
+asmlinkage ssize_t sys_sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
+{
+ ssize_t retval;
+ struct file * in_file, *out_file;
+
+ in_file = get_verify_in_file(in_fd, count);
+ if (IS_ERR(in_file)) {
+ retval = PTR_ERR(in_file);
+ goto out;
+ }
+ out_file = get_verify_out_file(out_fd, count);
+ if (IS_ERR(out_file)) {
+ retval = PTR_ERR(out_file);
+ goto fput_in;
+ }
retval = 0;
if (count) {
@@ -1524,6 +1560,56 @@
fput(in_file);
out:
return retval;
+}
+
+asmlinkage ssize_t sys_sendpath(int out_fd, char *path, off_t *offset, size_t count)
+{
+ struct file in_file, *out_file;
+ read_descriptor_t desc;
+ loff_t pos = 0, *ppos;
+ struct nameidata nd;
+ int ret;
+
+ out_file = get_verify_out_file(out_fd, count);
+ if (IS_ERR(out_file)) {
+ ret = PTR_ERR(out_file);
+ goto err;
+ }
+ ret = user_path_walk(path, &nd);
+ if (ret)
+ goto put_out;
+ ret = -EINVAL;
+ if (!nd.dentry || !nd.dentry->d_inode)
+ goto put_in_out;
+
+ memset(&in_file, 0, sizeof(in_file));
+ in_file.f_dentry = nd.dentry;
+ in_file.f_op = nd.dentry->d_inode->i_fop;
+
+ ppos = &in_file.f_pos;
+ if (offset) {
+ if (get_user(pos, offset))
+ goto put_in_out;
+ ppos = &pos;
+ }
+ desc.written = 0;
+ desc.count = count;
+ desc.buf = (char *) out_file;
+ desc.error = 0;
+ do_generic_file_read(&in_file, ppos, &desc, file_send_actor, 0);
+
+ ret = desc.written;
+ if (!ret)
+ ret = desc.error;
+ if (offset)
+ put_user(pos, offset);
+
+put_in_out:
+ fput(out_file);
+put_out:
+ path_release(&nd);
+err:
+ return ret;
}
/*
--- linux/arch/i386/kernel/entry.S.orig Mon Jan 15 22:42:47 2001
+++ linux/arch/i386/kernel/entry.S Mon Jan 15 22:43:12 2001
@@ -646,6 +646,7 @@
.long SYMBOL_NAME(sys_getdents64) /* 220 */
.long SYMBOL_NAME(sys_fcntl64)
.long SYMBOL_NAME(sys_ni_syscall) /* reserved for TUX */
+ .long SYMBOL_NAME(sys_sendpath)
/*
* NOTE!! This doesn't have to be exact - we just have
@@ -653,6 +654,6 @@
* entries. Don't panic if you notice that this hasn't
* been shrunk every time we add a new system call.
*/
- .rept NR_syscalls-221
+ .rept NR_syscalls-223
.long SYMBOL_NAME(sys_ni_syscall)
.endr
[-- Attachment #3: Type: TEXT/PLAIN, Size: 593 bytes --]
/*
* Sample sendpath() code. It should mainly be used for sockets.
*/
#include <linux/unistd.h>
#include <sys/sendfile.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#define __NR_sendpath 223
_syscall4 (int, sendpath, int, out_fd, char *, path, off_t *, off, size_t, size)
int main (int argc, char **argv)
{
int out_fd;
int ret;
out_fd = open("./tmpfile", O_RDWR|O_CREAT|O_TRUNC, 0700);
ret = sendpath(out_fd, "/usr/include/unistd.h", NULL, 300);
printf("sendpath wrote %d bytes into ./tmpfile.\n", ret);
return 0;
}
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-15 20:33 ` Albert D. Cahalan
@ 2001-01-15 21:00 ` Linus Torvalds
2001-01-16 10:40 ` Felix von Leitner
1 sibling, 0 replies; 70+ messages in thread
From: Linus Torvalds @ 2001-01-15 21:00 UTC (permalink / raw)
To: linux-kernel
In article <200101152033.f0FKXpv250839@saturn.cs.uml.edu>,
Albert D. Cahalan <acahalan@cs.uml.edu> wrote:
>Ingo Molnar writes:
>> On Mon, 15 Jan 2001, Jonathan Thackray wrote:
>
>>> It's a very useful system call and makes file serving much more
>>> scalable, and I'm glad that most Un*xes now have support for it
>>> (Linux, FreeBSD, HP-UX, AIX, Tru64). The next cool feature to add to
>>> Linux is sendpath(), which does the open() before the sendfile() all
>>> combined into one system call.
>
>Ingo Molnar's data in a nice table:
>
>open/close 7.5756 microseconds
>stat 5.4864 microseconds
>write 0.9614 microseconds
>read 1.1420 microseconds
>syscall 0.6349 microseconds
>
>Rather than combining open() with sendfile(), it could be combined
>with stat().
Note that "fstat()" is fairly low-overhead (unlike "stat()" it obviously
doesn't have to parse the name again), so "open+fstat" is quite fine
as-is.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 18:29 Is sendfile all that sexy? jamal
2001-01-14 18:50 ` Ingo Molnar
2001-01-14 20:22 ` Linus Torvalds
@ 2001-01-15 23:16 ` Pavel Machek
2001-01-16 13:47 ` jamal
2 siblings, 1 reply; 70+ messages in thread
From: Pavel Machek @ 2001-01-15 23:16 UTC (permalink / raw)
To: jamal, linux-kernel, netdev
Hi!
> TWO observations:
> - Given Linux's non-pre-emptability of the kernel i get the feeling that
> sendfile could starve other user space programs. Imagine trying to send a
> 1Gig file on 10Mbps pipe in one shot.
Hehe, try sigkilling process doing that transfer. Last time I tried it
it did not work.
Pavel
--
I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [patch] sendpath() support, 2.4.0-test3/-ac9
2001-01-15 20:47 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar
@ 2001-01-16 4:51 ` dean gaudet
2001-01-16 4:59 ` Linus Torvalds
2001-01-16 9:19 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar
0 siblings, 2 replies; 70+ messages in thread
From: dean gaudet @ 2001-01-16 4:51 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel List, Jonathan Thackray
On Mon, 15 Jan 2001, Ingo Molnar wrote:
> just for kicks i've implemented sendpath() support.
>
> _syscall4 (int, sendpath, int, out_fd, char *, path, off_t *, off, size_t, size)
hey so how do you implement transmit timeouts with sendpath() ? (i.e.
drop the client after 30 seconds of no progress.)
-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [patch] sendpath() support, 2.4.0-test3/-ac9
2001-01-16 4:51 ` dean gaudet
@ 2001-01-16 4:59 ` Linus Torvalds
2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar
2001-01-16 9:19 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar
1 sibling, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2001-01-16 4:59 UTC (permalink / raw)
To: dean gaudet; +Cc: Ingo Molnar, Linux Kernel List, Jonathan Thackray
On Mon, 15 Jan 2001, dean gaudet wrote:
> On Mon, 15 Jan 2001, Ingo Molnar wrote:
>
> > just for kicks i've implemented sendpath() support.
> >
> > _syscall4 (int, sendpath, int, out_fd, char *, path, off_t *, off, size_t, size)
>
> hey so how do you implement transmit timeouts with sendpath() ? (i.e.
> drop the client after 30 seconds of no progress.)
The whole "sendpath()" idea is just stupid.
You want to do a non-blocking send, so that you don't block on the socket,
and do some simple multiplexing in your server.
And "sendpath()" cannot do that without having to look up the name again,
and again, and again. Which makes the performance "optimization" a
horrible pessimisation.
Basically, sendpath() seems to be only useful for blocking and
uninterruptible file sending.
Bad design. I'm not touching it with a ten-foot pole.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [patch] sendpath() support, 2.4.0-test3/-ac9
2001-01-16 4:51 ` dean gaudet
2001-01-16 4:59 ` Linus Torvalds
@ 2001-01-16 9:19 ` Ingo Molnar
2001-01-17 0:03 ` dean gaudet
1 sibling, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2001-01-16 9:19 UTC (permalink / raw)
To: dean gaudet; +Cc: Linus Torvalds, Linux Kernel List, Jonathan Thackray
On Mon, 15 Jan 2001, dean gaudet wrote:
> > just for kicks i've implemented sendpath() support.
> >
> > _syscall4 (int, sendpath, int, out_fd, char *, path, off_t *, off, size_t, size)
>
> hey so how do you implement transmit timeouts with sendpath() ?
> (i.e. drop the client after 30 seconds of no progress.)
well this problem is not unique to sendpath(), sendfile() has it as well.
in TUX i've added per-socket connection timers, and i believe something
like this should be done in Apache as well - timers are IMO not a good
enough excuse for avoiding event-based IO models and using select() or
poll().
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* 'native files', 'object fingerprints' [was: sendpath()]
2001-01-16 4:59 ` Linus Torvalds
@ 2001-01-16 9:48 ` Ingo Molnar
2000-01-01 2:02 ` Pavel Machek
` (5 more replies)
0 siblings, 6 replies; 70+ messages in thread
From: Ingo Molnar @ 2001-01-16 9:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: dean gaudet, Linux Kernel List, Jonathan Thackray
On Mon, 15 Jan 2001, Linus Torvalds wrote:
> > _syscall4 (int, sendpath, int, out_fd, char *, path, off_t *, off, size_t, size)
> You want to do a non-blocking send, so that you don't block on the
> socket, and do some simple multiplexing in your server.
>
> And "sendpath()" cannot do that without having to look up the name
> again, and again, and again. Which makes the performance
> "optimization" a horrible pessimisation.
yep, correct. But take a look at the trick it does with file descriptors,
i believe it could be a useful way of doing things. It basically
privatizes a struct file, without inserting it into the enumerated file
descriptors. This shows that 'native files' are possible: file struct
without file descriptor integers mapped to them.
ob'plug: this privatized file descriptor mechanizm is used in TUX [TUX
privatizes files by putting them into the HTTP request structure - ie.
timeouts and continuation/nonblocking logic can be done with them]. But
TUX is trusted code, and it can pass a struct file to the VFS without
having to validate it, and TUX will also free such file descriptors.
But even user-space code could use 'native files', via the following, safe
mechanizm:
1) current->native_files list, freed at exit_files() time.
2) "struct native_file" which embedds "struct file". It has the following
fields:
struct native_file {
unsigned long master_fingerprint[8];
unsigned long file_fingerprint[8];
struct file file;
};
'fingerprints' are 256 bit, true random numbers. master_fingerprint is
global to the kernel and is generated once per boot. It validates the
pointer of the structure. The master fingerprint is never known to
user-space.
file_fingerprint is a 256-bit identifier generated for this native file.
The file fingerprint and the (kernel) pointer to the native file is
returned to user-space. The cryptographical safety of these 256-bit random
numbers guarantees that no breach can occur in a reasonable period of
time. It's in essence an 'encrypted' communication between kernel and
user-space.
user-space thus can pass a pointer to the following structure:
struct safe_kpointer {
void *kaddr;
unsigned long fingerprint[4];
};
the kernel can validate kaddr by 1) validating the pointer via the master
fingerprint (every valid kernel pointer must point to a structure that
starts with the master fingerprint's copy). Then usage-permissions are
validated by checking the file fingerprint (the per-object fingerprint).
this is a safe, very fast [ O(1) ] object-permission model. (it's a
variation of a former idea of yours.) A process can pass object
fingerprints and kernel pointers to other processes too - thus the other
process can access the object too. Threads will 'naturally' share objects,
because fingerprints are typically stored in memory.
3) on closing a native file the fingerprint is destroyed (first byte of
the master fingerprint copy is overwritten).
what do you think about this? I believe most of the file APIs can be /
should be reworked to use native files, and 'Unix files' would just be a
compatibility layer parallel to them. Then various applications could
convert to 'native file' usage - i believe file servers which have lots of
file descriptors would do this first.
(this 'fingerprint' mechanizm can be used for any object, not only files.)
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-15 20:33 ` Albert D. Cahalan
2001-01-15 21:00 ` Linus Torvalds
@ 2001-01-16 10:40 ` Felix von Leitner
2001-01-16 11:56 ` Peter Samuelson
` (2 more replies)
1 sibling, 3 replies; 70+ messages in thread
From: Felix von Leitner @ 2001-01-16 10:40 UTC (permalink / raw)
To: Linux Kernel List
Thus spake Albert D. Cahalan (acahalan@cs.uml.edu):
> Rather than combining open() with sendfile(), it could be combined
> with stat(). Since the syscall would be new anyway, it could skip
> the normal requirement about returning the next free file descriptor
> in favor of returning whatever can be most quickly found.
I don't know how Linux does it, but returning the first free file
descriptor can be implemented as O(1) operation.
Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()]
2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar
2000-01-01 2:02 ` Pavel Machek
@ 2001-01-16 11:13 ` Andi Kleen
2001-01-16 11:26 ` Ingo Molnar
2001-01-16 13:57 ` 'native files', 'object fingerprints' [was: sendpath()] Jamie Lokier
` (3 subsequent siblings)
5 siblings, 1 reply; 70+ messages in thread
From: Andi Kleen @ 2001-01-16 11:13 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray
On Tue, Jan 16, 2001 at 10:48:34AM +0100, Ingo Molnar wrote:
> this is a safe, very fast [ O(1) ] object-permission model. (it's a
> variation of a former idea of yours.) A process can pass object
> fingerprints and kernel pointers to other processes too - thus the other
> process can access the object too. Threads will 'naturally' share objects,
>...
Just setuid etc. doesn't work with that because access cannot be easily
revoked without disturbing other clients.
To handle that you would probably need a "relookup if needed" mechanism
similar to what NFSv4 has, so that you can force other users to relookup
after you revoked a key. That complicates the use a lot though.
Also the model depends on good secure random numbers, which is questionable
in many environments (e.g. a diskless box where the random device effectively
gets no new input)
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()]
2001-01-16 11:13 ` Andi Kleen
@ 2001-01-16 11:26 ` Ingo Molnar
2001-01-16 11:37 ` Andi Kleen
0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2001-01-16 11:26 UTC (permalink / raw)
To: Andi Kleen
Cc: Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray
On Tue, 16 Jan 2001, Andi Kleen wrote:
> On Tue, Jan 16, 2001 at 10:48:34AM +0100, Ingo Molnar wrote:
> > this is a safe, very fast [ O(1) ] object-permission model. (it's a
> > variation of a former idea of yours.) A process can pass object
> > fingerprints and kernel pointers to other processes too - thus the other
> > process can access the object too. Threads will 'naturally' share objects,
> >...
>
> Just setuid etc. doesn't work with that because access cannot be
> easily revoked without disturbing other clients.
well, you cannot easily close() an already shared file descriptor in
another process's context either. Is revocation so important? Why is
setuid() a problem? A native file is just like a normal file, with the
difference that not an integer but a fingerprint identifies it, and that
access and usage counts are not automatically inherited across some
explicit sharing interface.
perhaps we could get most of the advantages by allowing the relaxation of
the 'allocate first free file descriptor number' rule for normal Unix
files?
> Also the model depends on good secure random numbers, which is
> questionable in many environments (e.g. a diskless box where the
> random device effectively gets no new input)
true, although newer chipsets include hardware random generators. But
indeed, object fingerprints (tokens? ids?) make the random generator a
much more central thing.
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()]
2001-01-16 11:26 ` Ingo Molnar
@ 2001-01-16 11:37 ` Andi Kleen
2001-01-16 12:04 ` O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] Ingo Molnar
0 siblings, 1 reply; 70+ messages in thread
From: Andi Kleen @ 2001-01-16 11:37 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andi Kleen, Linus Torvalds, dean gaudet, Linux Kernel List,
Jonathan Thackray
On Tue, Jan 16, 2001 at 12:26:12PM +0100, Ingo Molnar wrote:
>
> On Tue, 16 Jan 2001, Andi Kleen wrote:
>
> > On Tue, Jan 16, 2001 at 10:48:34AM +0100, Ingo Molnar wrote:
> > > this is a safe, very fast [ O(1) ] object-permission model. (it's a
> > > variation of a former idea of yours.) A process can pass object
> > > fingerprints and kernel pointers to other processes too - thus the other
> > > process can access the object too. Threads will 'naturally' share objects,
> > >...
> >
> > Just setuid etc. doesn't work with that because access cannot be
> > easily revoked without disturbing other clients.
>
> well, you cannot easily close() an already shared file descriptor in
> another process's context either. Is revocation so important? Why is
> setuid() a problem? A native file is just like a normal file, with the
> difference that not an integer but a fingerprint identifies it, and that
> access and usage counts are not automatically inherited across some
> explicit sharing interface.
Actually on second thought exec() is more a problem than setuid(), because
it requires closing for file descriptors.
So if you could devise a security model that doesn't depend on exec giving
you a clean plate -- then it could work, but would probably not be very
unixy.
I'm amazed how non flamed you can present radical API ideas though, I even
get flamed for much smaller things (like using text errors to replace
the hundreds of EINVALs in the rtnetlink message interface) ;);)
>
> perhaps we could get most of the advantages by allowing the relaxation of
> the 'allocate first free file descriptor number' rule for normal Unix
> files?
Not sure I follow. You mean dup2() ?
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-16 10:40 ` Felix von Leitner
@ 2001-01-16 11:56 ` Peter Samuelson
2001-01-16 12:37 ` Ingo Molnar
2001-01-16 12:42 ` Ingo Molnar
2 siblings, 0 replies; 70+ messages in thread
From: Peter Samuelson @ 2001-01-16 11:56 UTC (permalink / raw)
To: Linux Kernel List
[Felix von Leitner]
> I don't know how Linux does it, but returning the first free file
> descriptor can be implemented as O(1) operation.
How exactly? Maybe I'm being dense today. Having used up the lowest
available fd, how do you find the next-lowest one, the next open()? I
can't think of anything that isn't O(n). (Sure you can amortize it
different ways by keeping lists of fds, etc.)
Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]]
2001-01-16 11:37 ` Andi Kleen
@ 2001-01-16 12:04 ` Ingo Molnar
2001-01-16 12:09 ` Ingo Molnar
` (3 more replies)
0 siblings, 4 replies; 70+ messages in thread
From: Ingo Molnar @ 2001-01-16 12:04 UTC (permalink / raw)
To: Andi Kleen
Cc: Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray
On Tue, 16 Jan 2001, Andi Kleen wrote:
> > the 'allocate first free file descriptor number' rule for normal Unix
> > files?
> Not sure I follow. You mean dup2() ?
I'm sure you know this: when there are thousands of files open already,
much of the overhead of opening a new file comes from the mandatory POSIX
requirement of allocating the first not yet allocated file descriptor
integer to this file. Eg. if files 0, 1, 2, 10, 11 are already open, the
kernel must allocate file descriptor 3. Many utilities rely on this, and
the rule makes sense in a select() environment, because it compresses the
'file descriptor spectrum'. But in a non-select(), event-drive environment
it becomes unnecessery overhead.
- probably the most radical solution is what i suggested, to completely
avoid the unique-mapping of file structures to an integer range, and use
the address of the file structure (and some cookies) as an identification.
- a less radical solution would be to still map file structures to an
integer range (file descriptors) and usage-maintain files per processes,
but relax the 'allocate first non-allocated integer in the range' rule.
I'm not sure exactly how simple this is, but something like this should
work: on close()-ing file descriptors the freed file descriptors would be
cached in a list (this needs a new, separate structure which must be
allocated/freed as well). Something like:
struct lazy_filedesc {
int fd;
struct file *file;
}
struct task {
...
struct lazy_filedesc *lazy_files;
...
}
the actual filedescriptor bit of a 'lazy file' would be cleared for real
on close(), and the '*file' argument is not a real file - it's NULL if at
close() time this process wasnt the last user of the file, or contains a
pointer to an allocated (but otherwise invalid) file structure. This must
happen to ensure the first-free-desc rule, and to optimize
freeing/allocate of file structures. Now, if the new code does a:
fd = open(...,O_ANY);
then the kernel looks at the current->lazy_files list, and tries to set
the file descriptor bit in the current->files file table. If successful
then open() uses desc->fd and desc->file (if available) for opening the
new file, and unlinks+frees the lazy descriptor. If unsuccessful then
open() frees desc->file, frees and unlinks the descriptor and goes on to
look at the next descriptor.
- worst case overhead is the extra allocation overhead of the (very small)
lazy file descriptor. Worst-case happens only if O_ANY allocation is
mixed in a special way with normal open()s.
- Best-case overhead saves us a get_unused_fd() call, which can be *very*
expensive (in terms of CPU time and cache footprint) if thousands of
files are used. If O_ANY is used mostly, then the best-case is always
triggered.
- (the number of lazy files must be limited to some sane value)
at exit_files() time the current->lazy_files list must be processed. On
exec() it does not get inherited.
current->lazy_files has no effect on task state or semantics otherwise,
it's only an isolated 'information cache'.
Have i missed something important?
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]]
2001-01-16 12:04 ` O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] Ingo Molnar
@ 2001-01-16 12:09 ` Ingo Molnar
2001-01-16 12:13 ` Peter Samuelson
` (2 subsequent siblings)
3 siblings, 0 replies; 70+ messages in thread
From: Ingo Molnar @ 2001-01-16 12:09 UTC (permalink / raw)
To: Andi Kleen
Cc: Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray
On Tue, 16 Jan 2001, Ingo Molnar wrote:
> struct lazy_filedesc {
> int fd;
> struct file *file;
> }
in fact "struct file" can (ab)used for this, no need for new structures or
new fields. Eg. file->f_flags contains the cached descriptor-information.
file->f_list is used for the current->lazy_files ringlist.
this way there is no additional allocation overhead in the worst-case.
(unless i'm missing something obvious.)
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]]
2001-01-16 12:04 ` O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] Ingo Molnar
2001-01-16 12:09 ` Ingo Molnar
@ 2001-01-16 12:13 ` Peter Samuelson
2001-01-16 12:33 ` Ingo Molnar
2001-01-16 12:34 ` Andi Kleen
2001-01-16 13:00 ` Mitchell Blank Jr
3 siblings, 1 reply; 70+ messages in thread
From: Peter Samuelson @ 2001-01-16 12:13 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linux Kernel List
[Ingo Molnar]
> - probably the most radical solution is what i suggested, to
> completely avoid the unique-mapping of file structures to an integer
> range, and use the address of the file structure (and some cookies)
> as an identification.
Careful, these must cast to non-negative integers, without clashing.
> fd = open(...,O_ANY);
I like this idea, but call it O_ALLOCANYFD.
Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]]
2001-01-16 12:13 ` Peter Samuelson
@ 2001-01-16 12:33 ` Ingo Molnar
2001-01-16 14:40 ` Felix von Leitner
0 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2001-01-16 12:33 UTC (permalink / raw)
To: Peter Samuelson; +Cc: Linux Kernel List
On Tue, 16 Jan 2001, Peter Samuelson wrote:
> [Ingo Molnar]
> > - probably the most radical solution is what i suggested, to
> > completely avoid the unique-mapping of file structures to an integer
> > range, and use the address of the file structure (and some cookies)
> > as an identification.
>
> Careful, these must cast to non-negative integers, without clashing.
if you read my (radical) proposal, the identification is based on a kernel
pointer and a 256-bit random integer. So non-negative integers are not
needed. (file-IO system-calls would be modified to detect if 'Unix file
descriptors' or pointers to 'native file descriptors' are passed to them,
so this is truly radical.)
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]]
2001-01-16 12:04 ` O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] Ingo Molnar
2001-01-16 12:09 ` Ingo Molnar
2001-01-16 12:13 ` Peter Samuelson
@ 2001-01-16 12:34 ` Andi Kleen
2001-01-16 13:00 ` Mitchell Blank Jr
3 siblings, 0 replies; 70+ messages in thread
From: Andi Kleen @ 2001-01-16 12:34 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andi Kleen, Linus Torvalds, dean gaudet, Linux Kernel List,
Jonathan Thackray
On Tue, Jan 16, 2001 at 01:04:22PM +0100, Ingo Molnar wrote:
> - a less radical solution would be to still map file structures to an
> integer range (file descriptors) and usage-maintain files per processes,
> but relax the 'allocate first non-allocated integer in the range' rule.
> I'm not sure exactly how simple this is, but something like this should
> work: on close()-ing file descriptors the freed file descriptors would be
> cached in a list (this needs a new, separate structure which must be
> allocated/freed as well). Something like:
>
> struct lazy_filedesc {
> int fd;
> struct file *file;
> }
More generic file -> fd mapping would be useful to speed up poll() too,
because the event trigger could directly modify the poll table without
a second slow walk over the whole table.
So you could add another bit that tells if the fd is open or closed
and share it with poll.
Also in that table you could just keep a linked ordered free list
and not use GFP_ANY, because getting the lowest would be rather cheap.
Disadvantage is that it would need more cache and more overhead than
the current scheme.
[in a way it is a ugly duck like pte<->vma links]
> - Best-case overhead saves us a get_unused_fd() call, which can be *very*
> expensive (in terms of CPU time and cache footprint) if thousands of
> files are used. If O_ANY is used mostly, then the best-case is always
> triggered.
Really? Does the open_fds bitmap get that big ?
Maybe it just needs a faster find_next_zero_bit() @)
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-16 10:40 ` Felix von Leitner
2001-01-16 11:56 ` Peter Samuelson
@ 2001-01-16 12:37 ` Ingo Molnar
2001-01-16 12:42 ` Ingo Molnar
2 siblings, 0 replies; 70+ messages in thread
From: Ingo Molnar @ 2001-01-16 12:37 UTC (permalink / raw)
To: Felix von Leitner; +Cc: Linux Kernel List
On Tue, 16 Jan 2001, Felix von Leitner wrote:
> I don't know how Linux does it, but returning the first free file
> descriptor can be implemented as O(1) operation.
only if special allocation patters are assumed. Otherwise it cannot be a
generic O(1) solution. The first-free rule adds an implicit ordering to
the file descriptor space, and this order cannot be maintained in an O(1)
way. Linux can allocate up to a million file descriptors.
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-16 10:40 ` Felix von Leitner
2001-01-16 11:56 ` Peter Samuelson
2001-01-16 12:37 ` Ingo Molnar
@ 2001-01-16 12:42 ` Ingo Molnar
2001-01-16 12:47 ` Felix von Leitner
2 siblings, 1 reply; 70+ messages in thread
From: Ingo Molnar @ 2001-01-16 12:42 UTC (permalink / raw)
To: Felix von Leitner; +Cc: Linux Kernel List
On Tue, 16 Jan 2001, Felix von Leitner wrote:
> I don't know how Linux does it, but returning the first free file
> descriptor can be implemented as O(1) operation.
to put it more accurately: the requirement is to be able to open(), use
and close() an unlimited number of file descriptors with O(1) overhead,
under any allocation pattern, with only RAM limiting the number of files.
Both of my proposals attempt to provide this. It's possible to open() O(1)
but do a O(log(N)) close(), but that is of no practical value IMO.
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-16 12:42 ` Ingo Molnar
@ 2001-01-16 12:47 ` Felix von Leitner
2001-01-16 13:48 ` Jamie Lokier
0 siblings, 1 reply; 70+ messages in thread
From: Felix von Leitner @ 2001-01-16 12:47 UTC (permalink / raw)
To: Linux Kernel List
Thus spake Ingo Molnar (mingo@elte.hu):
> > I don't know how Linux does it, but returning the first free file
> > descriptor can be implemented as O(1) operation.
> to put it more accurately: the requirement is to be able to open(), use
> and close() an unlimited number of file descriptors with O(1) overhead,
> under any allocation pattern, with only RAM limiting the number of files.
> Both of my proposals attempt to provide this. It's possible to open() O(1)
> but do a O(log(N)) close(), but that is of no practical value IMO.
I cheated. I was only talking about open().
close() is of course more expensive then.
Other than that: where does the requirement come from?
Can't we just use a free list where we prepend closed fds and always use
the first one on open()? That would even increase spatial locality and
be good for the CPU caches.
Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]]
2001-01-16 12:04 ` O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] Ingo Molnar
` (2 preceding siblings ...)
2001-01-16 12:34 ` Andi Kleen
@ 2001-01-16 13:00 ` Mitchell Blank Jr
3 siblings, 0 replies; 70+ messages in thread
From: Mitchell Blank Jr @ 2001-01-16 13:00 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linux Kernel List
Ingo Molnar wrote:
> - probably the most radical solution is what i suggested, to completely
> avoid the unique-mapping of file structures to an integer range, and use
> the address of the file structure (and some cookies) as an identification.
IMO... gross. We do pretty much this exact thing in the ATM code (for
the signalling daemon and the kernel exchainging status on VCCs) and it's
pretty disgusting. I want to make it go away.
> - a less radical solution would be to still map file structures to an
> integer range (file descriptors) and usage-maintain files per processes,
> but relax the 'allocate first non-allocated integer in the range' rule.
[...]
> fd = open(...,O_ANY);
Yeah, this gets talked about, but I don't think a new flag for open is a
good way to do this, because open() isn't the only thing that returns
a new fd. What about socket()? pipe()?
Maybe we could have a new prctl() control that turns this behavior
on and off. Then you'd just have to be careful to turn it back off
before calling any library functions that require ordering (like popen).
Other than that, I think it'd be a good idea, especially if it could
be implemented clean enough to make it CONFIG_'urable. That can't
really be fairly judged until someone produces the code.
-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-15 23:16 ` Pavel Machek
@ 2001-01-16 13:47 ` jamal
2001-01-16 14:41 ` Pavel Machek
0 siblings, 1 reply; 70+ messages in thread
From: jamal @ 2001-01-16 13:47 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-kernel, netdev
On Tue, 16 Jan 2001, Pavel Machek wrote:
> > TWO observations:
> > - Given Linux's non-pre-emptability of the kernel i get the feeling that
> > sendfile could starve other user space programs. Imagine trying to send a
> > 1Gig file on 10Mbps pipe in one shot.
>
> Hehe, try sigkilling process doing that transfer. Last time I tried it
> it did not work.
>From Alexey's response: it does get descheduled possibly every sndbuf
send. So you should be able to sneak that sigkill.
cheers,
jamal
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-16 12:47 ` Felix von Leitner
@ 2001-01-16 13:48 ` Jamie Lokier
2001-01-16 14:20 ` Felix von Leitner
0 siblings, 1 reply; 70+ messages in thread
From: Jamie Lokier @ 2001-01-16 13:48 UTC (permalink / raw)
To: Linux Kernel List
Felix von Leitner wrote:
> I cheated. I was only talking about open().
> close() is of course more expensive then.
>
> Other than that: where does the requirement come from?
> Can't we just use a free list where we prepend closed fds and always use
> the first one on open()? That would even increase spatial locality and
> be good for the CPU caches.
You would need to use a new open() flag: O_ANYFD.
The requirement comes from this like this:
close (0);
close (1);
close (2);
open ("/dev/console", O_RDWR);
dup ();
dup ();
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()]
2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar
2000-01-01 2:02 ` Pavel Machek
2001-01-16 11:13 ` Andi Kleen
@ 2001-01-16 13:57 ` Jamie Lokier
2001-01-16 14:27 ` Felix von Leitner
` (2 subsequent siblings)
5 siblings, 0 replies; 70+ messages in thread
From: Jamie Lokier @ 2001-01-16 13:57 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, dean gaudet, Linux Kernel List, Jonathan Thackray
Ingo Molnar wrote:
> struct native_file {
> unsigned long master_fingerprint[8];
> unsigned long file_fingerprint[8];
> struct file file;
> };
>
> 'fingerprints' are 256 bit, true random numbers. master_fingerprint is
> global to the kernel and is generated once per boot. It validates the
> pointer of the structure. The master fingerprint is never known to
> user-space.
>
> file_fingerprint is a 256-bit identifier generated for this native file.
> The file fingerprint and the (kernel) pointer to the native file is
> returned to user-space. The cryptographical safety of these 256-bit random
> numbers guarantees that no breach can occur in a reasonable period of
> time. It's in essence an 'encrypted' communication between kernel and
> user-space.
Sounds similar to the Hurd...
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-16 13:48 ` Jamie Lokier
@ 2001-01-16 14:20 ` Felix von Leitner
2001-01-16 15:05 ` David L. Parsley
0 siblings, 1 reply; 70+ messages in thread
From: Felix von Leitner @ 2001-01-16 14:20 UTC (permalink / raw)
To: Linux Kernel List
Thus spake Jamie Lokier (lk@tantalophile.demon.co.uk):
> You would need to use a new open() flag: O_ANYFD.
> The requirement comes from this like this:
> close (0);
> close (1);
> close (2);
> open ("/dev/console", O_RDWR);
> dup ();
> dup ();
So it's not actually part of POSIX, it's just to get around fixing
legacy code? ;-)
Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()]
2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar
` (2 preceding siblings ...)
2001-01-16 13:57 ` 'native files', 'object fingerprints' [was: sendpath()] Jamie Lokier
@ 2001-01-16 14:27 ` Felix von Leitner
2001-01-16 17:47 ` Linus Torvalds
2001-01-17 4:39 ` dean gaudet
5 siblings, 0 replies; 70+ messages in thread
From: Felix von Leitner @ 2001-01-16 14:27 UTC (permalink / raw)
To: Linux Kernel List
Thus spake Ingo Molnar (mingo@elte.hu):
> But even user-space code could use 'native files', via the following, safe
> mechanizm:
[something reminiscient of a token from a capability system]
> (this 'fingerprint' mechanizm can be used for any object, not only files.)
One good thing about tokens is that file handles can be implemented on
top of them in user space.
On the other hand, there already are mechanisms to pass file descriptors
around and so on, so you don't gain anything tangible from your efford.
I would advise reading some text books about capability systems, there
is a lot to be learned here. But retrofitting something like this on an
existing kernel is probably not a very good idea. Experience shows that
you can't "un-bloat" a piece of software by introducing a few elegant
concepts. The compatibility stuff eats most of the benefits.
Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]]
2001-01-16 12:33 ` Ingo Molnar
@ 2001-01-16 14:40 ` Felix von Leitner
0 siblings, 0 replies; 70+ messages in thread
From: Felix von Leitner @ 2001-01-16 14:40 UTC (permalink / raw)
To: Linux Kernel List
Thus spake Ingo Molnar (mingo@elte.hu):
> if you read my (radical) proposal, the identification is based on a kernel
> pointer and a 256-bit random integer. So non-negative integers are not
> needed. (file-IO system-calls would be modified to detect if 'Unix file
> descriptors' or pointers to 'native file descriptors' are passed to them,
> so this is truly radical.)
Yuck, don't pass pointers in kernel space to user space!
NT does it and look what kernel call argument verification havoc it
wrought over them!
Felix
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-16 13:47 ` jamal
@ 2001-01-16 14:41 ` Pavel Machek
0 siblings, 0 replies; 70+ messages in thread
From: Pavel Machek @ 2001-01-16 14:41 UTC (permalink / raw)
To: jamal; +Cc: linux-kernel, netdev
Hi!
> > > TWO observations:
> > > - Given Linux's non-pre-emptability of the kernel i get the feeling that
> > > sendfile could starve other user space programs. Imagine trying to send a
> > > 1Gig file on 10Mbps pipe in one shot.
> >
> > Hehe, try sigkilling process doing that transfer. Last time I tried it
> > it did not work.
>
> >From Alexey's response: it does get descheduled possibly every sndbuf
> send. So you should be able to sneak that sigkill.
Did you actually tried it? Last time I did the test, SIGKILL did not
make it in. sendfile did not actually check for signals...
(And you could do something like send 100MB from cache into dev
null. I do not see where sigkill could sneak in in this case.)
Pavel
--
The best software in life is free (not shareware)! Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-16 14:20 ` Felix von Leitner
@ 2001-01-16 15:05 ` David L. Parsley
2001-01-16 15:05 ` Jakub Jelinek
2001-01-17 19:27 ` dean gaudet
0 siblings, 2 replies; 70+ messages in thread
From: David L. Parsley @ 2001-01-16 15:05 UTC (permalink / raw)
To: Felix von Leitner, linux-kernel, mingo
Felix von Leitner wrote:
> > close (0);
> > close (1);
> > close (2);
> > open ("/dev/console", O_RDWR);
> > dup ();
> > dup ();
>
> So it's not actually part of POSIX, it's just to get around fixing
> legacy code? ;-)
This makes me wonder...
If the kernel only kept a queue of the three smallest unused fd's, and
when the queue emptied handed out whatever it liked, how many things
would break? I suspect this would cover a lot of bases...
<dons flameproof underwear>
regards,
David
--
David L. Parsley
Network Administrator
Roanoke College
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-16 15:05 ` David L. Parsley
@ 2001-01-16 15:05 ` Jakub Jelinek
2001-01-16 15:46 ` David L. Parsley
2001-01-17 19:27 ` dean gaudet
1 sibling, 1 reply; 70+ messages in thread
From: Jakub Jelinek @ 2001-01-16 15:05 UTC (permalink / raw)
To: David L. Parsley; +Cc: Felix von Leitner, linux-kernel, mingo
On Tue, Jan 16, 2001 at 10:05:06AM -0500, David L. Parsley wrote:
> Felix von Leitner wrote:
> > > close (0);
> > > close (1);
> > > close (2);
> > > open ("/dev/console", O_RDWR);
> > > dup ();
> > > dup ();
> >
> > So it's not actually part of POSIX, it's just to get around fixing
> > legacy code? ;-)
>
> This makes me wonder...
>
> If the kernel only kept a queue of the three smallest unused fd's, and
> when the queue emptied handed out whatever it liked, how many things
> would break? I suspect this would cover a lot of bases...
First it would break Unix98 and other standards:
The Single UNIX (R) Specification, Version 2
Copyright (c) 1997 The Open Group
...
int open(const char *path, int oflag, ... );
...
The open() function will return a file descriptor for the named file that is the lowest file descriptor not currently
open for that process. The open file description is new, and therefore the file descriptor does not share it with any
other process in the system. The FD_CLOEXEC file descriptor flag associated with the new file descriptor will be
cleared.
Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-16 15:05 ` Jakub Jelinek
@ 2001-01-16 15:46 ` David L. Parsley
2001-01-18 14:00 ` Laramie Leavitt
0 siblings, 1 reply; 70+ messages in thread
From: David L. Parsley @ 2001-01-16 15:46 UTC (permalink / raw)
To: Jakub Jelinek, linux-kernel, leitner, mingo
Jakub Jelinek wrote:
> > This makes me wonder...
> >
> > If the kernel only kept a queue of the three smallest unused fd's, and
> > when the queue emptied handed out whatever it liked, how many things
> > would break? I suspect this would cover a lot of bases...
>
> First it would break Unix98 and other standards:
[snip]
Yeah, I reallized it would violate at least POSIX. The discussion was
just bandying about ways to avoid an expensive 'open()' without breaking
lots of utilities and glibc stuff. This might be something that could
be configured for specific server environments, where performance is
more imporant than POSIX/Unix98, but you still don't want to completely
break the system. Just a thought, brain-damaged as it might be. ;-)
regards,
David
--
David L. Parsley
Network Administrator
Roanoke College
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()]
2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar
` (3 preceding siblings ...)
2001-01-16 14:27 ` Felix von Leitner
@ 2001-01-16 17:47 ` Linus Torvalds
2001-01-17 4:39 ` dean gaudet
5 siblings, 0 replies; 70+ messages in thread
From: Linus Torvalds @ 2001-01-16 17:47 UTC (permalink / raw)
To: Ingo Molnar; +Cc: dean gaudet, Linux Kernel List, Jonathan Thackray
On Tue, 16 Jan 2001, Ingo Molnar wrote:
>
> yep, correct. But take a look at the trick it does with file descriptors,
> i believe it could be a useful way of doing things. It basically
> privatizes a struct file, without inserting it into the enumerated file
> descriptors. This shows that 'native files' are possible: file struct
> without file descriptor integers mapped to them.
That's nothing new: the exec() code does exactly the same.
In fact, there's a function for it: filp_open() and filp_close(). Which do
a better job of it than your private implementation did, I suspect.
I don't think your object fingerprints are anything more generic than the
current file descriptors.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [patch] sendpath() support, 2.4.0-test3/-ac9
2001-01-16 9:19 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar
@ 2001-01-17 0:03 ` dean gaudet
0 siblings, 0 replies; 70+ messages in thread
From: dean gaudet @ 2001-01-17 0:03 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel List, Jonathan Thackray
On Tue, 16 Jan 2001, Ingo Molnar wrote:
>
> On Mon, 15 Jan 2001, dean gaudet wrote:
>
> > > just for kicks i've implemented sendpath() support.
> > >
> > > _syscall4 (int, sendpath, int, out_fd, char *, path, off_t *, off, size_t, size)
> >
> > hey so how do you implement transmit timeouts with sendpath() ?
> > (i.e. drop the client after 30 seconds of no progress.)
>
> well this problem is not unique to sendpath(), sendfile() has it as well.
hrm? with sendfile() i just send 32k or 64k at a time and use alarm()
or non-blocking/select() to implement timeouts.
with sendpath() i can do the same thing but i'm gonna pay a path lookup
each time... and there's no guarantee that i'm getting the same file each
time.
> in TUX i've added per-socket connection timers, and i believe something
> like this should be done in Apache as well - timers are IMO not a good
> enough excuse for avoiding event-based IO models and using select() or
> poll().
i wasn't suggesting avoiding sendfile/sendpath -- i just couldn't see how
to use sendpath() effectively.
explain per-socket connection timers. are they available to the userland?
at least with the apache-2.0 i/o stuff i should be able to support
kernel-based timers. apache-2.0 uses non-blocking/poll() to implement
timeouts -- does write() or sendfile() until there's an EWOULDBLOCK then
it calls poll() waiting for write/timeout. with kernel supported
timeouts i could just block in the write() and that'd be fine by me.
1.2 used alarm() ... 1.3 communicates each child's activity to the parent
through the scoreboard and the parent occasionally wakes up and sends
SIGALRM to children that are past their timeout. (that let me get rid of
a few syscalls.)
-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: 'native files', 'object fingerprints' [was: sendpath()]
2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar
` (4 preceding siblings ...)
2001-01-16 17:47 ` Linus Torvalds
@ 2001-01-17 4:39 ` dean gaudet
5 siblings, 0 replies; 70+ messages in thread
From: dean gaudet @ 2001-01-17 4:39 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linus Torvalds, Linux Kernel List, Jonathan Thackray
On Tue, 16 Jan 2001, Ingo Molnar wrote:
> But even user-space code could use 'native files', via the following, safe
> mechanizm:
so here's an alternative to ingo's proposal which i think solves some of
the other objections raised. it's something i've proposed in the past
under the name "extended file handles".
struct extended_file_permission {
int refcount;
some form of mutex to protect refcount;
some list structure head;
};
struct extended_file {
struct file *file;
struct extended_file_permission *perm;
whatever list foo is needed to link with extended_file_perm above;
};
if you allocate a few huge arrays of struct extended_file, then you can
verify if a pointer passed from user space fits into one of those arrays
pretty quickly.
struct task has a struct extended_file_permission * added to it to
indicate which perm struct that task is associated with.
so you just compare the f->perm to current->extended_file_perm and you
know if the task is allowed to use it or not.
clone() allows you to create tasks sharing the same
extended_file_permissions.
fork()/exec() would create new extended_file_perms -- which implicitly
causes all those files to be closed. this gives you pretty light cgi
fork()/exec() off a main "process" which is handling thousands of sockets.
i also proposed various methods of doing O_foo flag inheritance... but the
above is more interesting.
-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-16 15:05 ` David L. Parsley
2001-01-16 15:05 ` Jakub Jelinek
@ 2001-01-17 19:27 ` dean gaudet
1 sibling, 0 replies; 70+ messages in thread
From: dean gaudet @ 2001-01-17 19:27 UTC (permalink / raw)
To: David L. Parsley; +Cc: Felix von Leitner, linux-kernel, mingo
On Tue, 16 Jan 2001, David L. Parsley wrote:
> Felix von Leitner wrote:
> > > close (0);
> > > close (1);
> > > close (2);
> > > open ("/dev/console", O_RDWR);
> > > dup ();
> > > dup ();
> >
> > So it's not actually part of POSIX, it's just to get around fixing
> > legacy code? ;-)
it's part of POSIX.
> This makes me wonder...
>
> If the kernel only kept a queue of the three smallest unused fd's, and
> when the queue emptied handed out whatever it liked, how many things
> would break? I suspect this would cover a lot of bases...
apache-1.3 relies on the open-lowest-numbered-free-fd behaviour... but
only as a band-aid to work around other broken behaviours surrounding
FD_SETSIZE.
when opening the log files, and listening sockets apache uses
fcntl(F_DUPFD) to push them all higher than fd 15. (see ap_slack) some
sites are configured in a way that there's thousands of log files or
listening fds (both are bogus configs in my opinion, but hey, let the
admin shoot themself).
this generally leaves a handful of low numbered fds available. this
pretty much protects apache from broken libraries compiled with small
FD_SETSIZE, or which otherwise can't handle big fds. libc used to be just
such a library because it used select() in the DNS resolver code. (a libc
guru can tell you when this was fixed.)
it also ensures that the client fd will be low numbered, and lets us be
lazy and just use select() rather than do all the config tests to figure
out which OSs support poll().
it's all pretty gross... but then select() is pretty gross and it's
essentially the bug that necessitated this.
(solaris also has a stupid FILE * limitation that it can't use fds > 255
in a FILE * ... which breaks even more libraries than fds >= FD_SETSIZE.)
-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* RE: Is sendfile all that sexy?
2001-01-16 15:46 ` David L. Parsley
@ 2001-01-18 14:00 ` Laramie Leavitt
0 siblings, 0 replies; 70+ messages in thread
From: Laramie Leavitt @ 2001-01-18 14:00 UTC (permalink / raw)
To: linux-kernel
> Jakub Jelinek wrote:
>
> > > This makes me wonder...
> > >
> > > If the kernel only kept a queue of the three smallest unused fd's, and
> > > when the queue emptied handed out whatever it liked, how many things
> > > would break? I suspect this would cover a lot of bases...
> >
> > First it would break Unix98 and other standards:
> [snip]
>
> Yeah, I reallized it would violate at least POSIX. The discussion was
> just bandying about ways to avoid an expensive 'open()' without breaking
> lots of utilities and glibc stuff. This might be something that could
> be configured for specific server environments, where performance is
> more imporant than POSIX/Unix98, but you still don't want to completely
> break the system. Just a thought, brain-damaged as it might be. ;-)
>
Merely following the discussion a thought occurred to me of how
to make fd allocation fairly efficient (and simple) even if it retains
the O(n) structure worst case. I don't know how it is currently implemented
so this may be how it is done, or I may be way off base.
First, keep a table of FDs in sorted order ( mark deleted entries )
that you can access quickly. O(1) lookup.
Then, maintain this struct like
struct
{
int lowest_fd;
int highest_fd;
}
open:
if( lowest_fd == highest_fd )
{
fd = lowest_fd;
lowest_fd = ++highest_fd;
}
if( flags == IGNORE_UNIX98 )
{
fd = highest_fd++;
}
else
{
fd = lowest_fd
lowest_fd = linear_search( lowest_fd+1, highest_fd );
}
close:
if( fd < lowest_fd )
{
lowest_fd = fd;
}
else if( fd == highest_fd - 1 )
{
if( highest_fd == lowest_fd )
{
lowest_fd = --highest_fd;
}
else
{
highest_fd;
}
}
For common cases this would be fairly quick. It would be very easy to
implement an O(1) allocation if you want it to be fast ( at the expense
of a growing file handle table. )
Just thinking about it.
Laramie.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-14 20:22 ` Linus Torvalds
` (2 preceding siblings ...)
2001-01-15 15:24 ` Jonathan Thackray
@ 2001-01-24 0:58 ` Sasi Peter
2001-01-24 8:44 ` James Sutherland
2001-01-25 10:20 ` Anton Blanchard
3 siblings, 2 replies; 70+ messages in thread
From: Sasi Peter @ 2001-01-24 0:58 UTC (permalink / raw)
To: linux-kernel
On 14 Jan 2001, Linus Torvalds wrote:
> The only obvious use for it is file serving, and as high-performance
> file serving tends to end up as a kernel module in the end anyway (the
> only hold-out is samba, and that's been discussed too), "sendfile()"
> really is more a proof of concept than anything else.
No plans for samba to use sendfile? Even better make it a tux-like module?
(that would enable Netware-Linux like performance with the standard
kernel... would be cool afterall ;)
--
SaPE - Peter, Sasi - mailto:sape@sch.hu - http://sape.iq.rulez.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-24 0:58 ` Sasi Peter
@ 2001-01-24 8:44 ` James Sutherland
2001-01-25 10:20 ` Anton Blanchard
1 sibling, 0 replies; 70+ messages in thread
From: James Sutherland @ 2001-01-24 8:44 UTC (permalink / raw)
To: Sasi Peter; +Cc: linux-kernel
On Wed, 24 Jan 2001, Sasi Peter wrote:
> On 14 Jan 2001, Linus Torvalds wrote:
>
> > The only obvious use for it is file serving, and as high-performance
> > file serving tends to end up as a kernel module in the end anyway (the
> > only hold-out is samba, and that's been discussed too), "sendfile()"
> > really is more a proof of concept than anything else.
>
> No plans for samba to use sendfile? Even better make it a tux-like module?
> (that would enable Netware-Linux like performance with the standard
> kernel... would be cool afterall ;)
AIUI, Jeff Merkey was working on loading "userspace" apps into the kernel
to tackle this sort of problem generically. I don't know if he's tried it
with Samba - the forking would probably be a problem...
James.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-24 0:58 ` Sasi Peter
2001-01-24 8:44 ` James Sutherland
@ 2001-01-25 10:20 ` Anton Blanchard
2001-01-25 10:58 ` Sasi Peter
1 sibling, 1 reply; 70+ messages in thread
From: Anton Blanchard @ 2001-01-25 10:20 UTC (permalink / raw)
To: Sasi Peter; +Cc: linux-kernel
> No plans for samba to use sendfile? Even better make it a tux-like module?
> (that would enable Netware-Linux like performance with the standard
> kernel... would be cool afterall ;)
I have patches for samba to do sendfile. Making a tux module does not make
sense to me, especially since we are nowhere near the limits of samba in
userspace. Once userspace samba can run no faster, then we should think
about other options.
Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-25 10:20 ` Anton Blanchard
@ 2001-01-25 10:58 ` Sasi Peter
2001-01-26 6:10 ` Anton Blanchard
0 siblings, 1 reply; 70+ messages in thread
From: Sasi Peter @ 2001-01-25 10:58 UTC (permalink / raw)
To: Anton Blanchard; +Cc: linux-kernel
On Thu, 25 Jan 2001, Anton Blanchard wrote:
> I have patches for samba to do sendfile. Making a tux module does not make
> sense to me, especially since we are nowhere near the limits of samba in
> userspace. Once userspace samba can run no faster, then we should think
> about other options.
Do you have it at a URL?
--
SaPE - Peter, Sasi - mailto:sape@sch.hu - http://sape.iq.rulez.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-25 10:58 ` Sasi Peter
@ 2001-01-26 6:10 ` Anton Blanchard
2001-01-26 11:46 ` David S. Miller
0 siblings, 1 reply; 70+ messages in thread
From: Anton Blanchard @ 2001-01-26 6:10 UTC (permalink / raw)
To: Sasi Peter; +Cc: linux-kernel
> Do you have it at a URL?
The patch is small so I have attached it to this email. It should apply
to the samba CVS tree. Remember this is still a hack and I need to add
code to ensure the file is not truncated and we sendfile() less than we
promised. (After talking to tridge and davem, this should be fixed shortly.)
There is a lot more going on than in the web serving case, so
sendfile+zero copy is not going to help us as much as it did for the tux
guys. For example currently on 2.4.0 + zero copy patches:
anton@drongo:~/dbench$ ~anton/samba/source/bin/smbtorture //otherhost/netbench -U% -N 15 NBW95
read/write:
Throughput 16.5478 MB/sec (NB=20.6848 MB/sec 165.478 MBit/sec)
sendfile:
Throughput 17.0128 MB/sec (NB=21.266 MB/sec 170.128 MBit/sec)
Of course there is still lots to be done :)
Cheers,
Anton
diff -u -u -r1.195 includes.h
--- source/include/includes.h 2000/12/06 00:05:14 1.195
+++ source/include/includes.h 2001/01/26 05:38:51
@@ -871,7 +871,8 @@
/* default socket options. Dave Miller thinks we should default to TCP_NODELAY
given the socket IO pattern that Samba uses */
-#ifdef TCP_NODELAY
+
+#if 0
#define DEFAULT_SOCKET_OPTIONS "TCP_NODELAY"
#else
#define DEFAULT_SOCKET_OPTIONS ""
diff -u -u -r1.257 reply.c
--- source/smbd/reply.c 2001/01/24 19:34:53 1.257
+++ source/smbd/reply.c 2001/01/26 05:38:53
@@ -2383,6 +2391,51 @@
END_PROFILE(SMBreadX);
return(ERROR(ERRDOS,ERRlock));
}
+
+#if 1
+ /* We can use sendfile if it is not chained */
+ if (CVAL(inbuf,smb_vwv0) == 0xFF) {
+ off_t tmpoffset;
+ struct stat buf;
+ int flags = 0;
+
+ nread = smb_maxcnt;
+
+ fstat(fsp->fd, &buf);
+ if (startpos > buf.st_size)
+ return(UNIXERROR(ERRDOS,ERRnoaccess));
+ if (nread > (buf.st_size - startpos))
+ nread = (buf.st_size - startpos);
+
+ SSVAL(outbuf,smb_vwv5,nread);
+ SSVAL(outbuf,smb_vwv6,smb_offset(data,outbuf));
+ SSVAL(smb_buf(outbuf),-2,nread);
+ CVAL(outbuf,smb_vwv0) = 0xFF;
+ set_message(outbuf,12,nread,False);
+
+#define MSG_MORE 0x8000
+ if (nread > 0)
+ flags = MSG_MORE;
+ if (send(smbd_server_fd(), outbuf, data - outbuf, flags) == -1)
+ DEBUG(0,("reply_read_and_X: send ERROR!\n"));
+
+ tmpoffset = startpos;
+ while(nread) {
+ int nwritten;
+ nwritten = sendfile(smbd_server_fd(), fsp->fd, &tmpoffset, nread);
+ if (nwritten == -1)
+ DEBUG(0,("reply_read_and_X: sendfile ERROR!\n"));
+
+ if (!nwritten)
+ break;
+
+ nread -= nwritten;
+ }
+
+ return -1;
+ }
+#endif
+
nread = read_file(fsp,data,startpos,smb_maxcnt);
if (nread < 0) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-26 6:10 ` Anton Blanchard
@ 2001-01-26 11:46 ` David S. Miller
2001-01-26 14:12 ` Anton Blanchard
0 siblings, 1 reply; 70+ messages in thread
From: David S. Miller @ 2001-01-26 11:46 UTC (permalink / raw)
To: Anton Blanchard; +Cc: Sasi Peter, linux-kernel
Anton Blanchard writes:
> diff -u -u -r1.257 reply.c
> --- source/smbd/reply.c 2001/01/24 19:34:53 1.257
> +++ source/smbd/reply.c 2001/01/26 05:38:53
> @@ -2383,6 +2391,51 @@
...
> + while(nread) {
> + int nwritten;
> + nwritten = sendfile(smbd_server_fd(), fsp->fd, &tmpoffset, nread);
> + if (nwritten == -1)
> + DEBUG(0,("reply_read_and_X: sendfile ERROR!\n"));
> +
> + if (!nwritten)
> + break;
> +
> + nread -= nwritten;
> + }
> +
> + return -1;
Anton, why are you always returning -1 (which means error for the
smb_message[] array functions) when using sendfile?
Aren't you supposed to return the number of bytes output or
something like this?
I'm probably missing something subtle here, so just let me
know what I missed.
Thanks.
Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: Is sendfile all that sexy?
2001-01-26 11:46 ` David S. Miller
@ 2001-01-26 14:12 ` Anton Blanchard
0 siblings, 0 replies; 70+ messages in thread
From: Anton Blanchard @ 2001-01-26 14:12 UTC (permalink / raw)
To: David S. Miller; +Cc: Sasi Peter, linux-kernel
Hi Dave,
How are the VB withdrawal symptoms going? :)
> Anton, why are you always returning -1 (which means error for the
> smb_message[] array functions) when using sendfile?
Returning -1 tells the higher level code that we actually sent the bytes
out ourselves and not to bother doing it.
> Aren't you supposed to return the number of bytes output or
> something like this?
Only if you want the code to do a send() on outbuf which we dont here.
Cheers,
Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 70+ messages in thread
end of thread, other threads:[~2001-01-26 14:16 UTC | newest]
Thread overview: 70+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-14 18:29 Is sendfile all that sexy? jamal
2001-01-14 18:50 ` Ingo Molnar
2001-01-14 19:02 ` jamal
2001-01-14 19:09 ` Ingo Molnar
2001-01-14 19:18 ` jamal
2001-01-14 20:22 ` Linus Torvalds
2001-01-14 20:38 ` Ingo Molnar
2001-01-14 21:44 ` Linus Torvalds
2001-01-14 21:49 ` Ingo Molnar
2001-01-14 21:54 ` Gerhard Mack
2001-01-14 22:40 ` Linus Torvalds
2001-01-14 22:45 ` J Sloan
2001-01-15 20:15 ` H. Peter Anvin
2001-01-15 3:43 ` Michael Peddemors
2001-01-15 13:02 ` Florian Weimer
2001-01-15 13:45 ` Tristan Greaves
2001-01-15 1:14 ` Dan Hollis
2001-01-15 15:24 ` Jonathan Thackray
2001-01-15 15:36 ` Matti Aarnio
2001-01-15 20:17 ` H. Peter Anvin
2001-01-15 16:05 ` dean gaudet
2001-01-15 18:34 ` Jonathan Thackray
2001-01-15 18:46 ` Linus Torvalds
2001-01-15 20:47 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar
2001-01-16 4:51 ` dean gaudet
2001-01-16 4:59 ` Linus Torvalds
2001-01-16 9:48 ` 'native files', 'object fingerprints' [was: sendpath()] Ingo Molnar
2000-01-01 2:02 ` Pavel Machek
2001-01-16 11:13 ` Andi Kleen
2001-01-16 11:26 ` Ingo Molnar
2001-01-16 11:37 ` Andi Kleen
2001-01-16 12:04 ` O_ANY [was: Re: 'native files', 'object fingerprints' [was: sendpath()]] Ingo Molnar
2001-01-16 12:09 ` Ingo Molnar
2001-01-16 12:13 ` Peter Samuelson
2001-01-16 12:33 ` Ingo Molnar
2001-01-16 14:40 ` Felix von Leitner
2001-01-16 12:34 ` Andi Kleen
2001-01-16 13:00 ` Mitchell Blank Jr
2001-01-16 13:57 ` 'native files', 'object fingerprints' [was: sendpath()] Jamie Lokier
2001-01-16 14:27 ` Felix von Leitner
2001-01-16 17:47 ` Linus Torvalds
2001-01-17 4:39 ` dean gaudet
2001-01-16 9:19 ` [patch] sendpath() support, 2.4.0-test3/-ac9 Ingo Molnar
2001-01-17 0:03 ` dean gaudet
2001-01-15 18:58 ` Is sendfile all that sexy? dean gaudet
2001-01-15 19:41 ` Ingo Molnar
2001-01-15 20:33 ` Albert D. Cahalan
2001-01-15 21:00 ` Linus Torvalds
2001-01-16 10:40 ` Felix von Leitner
2001-01-16 11:56 ` Peter Samuelson
2001-01-16 12:37 ` Ingo Molnar
2001-01-16 12:42 ` Ingo Molnar
2001-01-16 12:47 ` Felix von Leitner
2001-01-16 13:48 ` Jamie Lokier
2001-01-16 14:20 ` Felix von Leitner
2001-01-16 15:05 ` David L. Parsley
2001-01-16 15:05 ` Jakub Jelinek
2001-01-16 15:46 ` David L. Parsley
2001-01-18 14:00 ` Laramie Leavitt
2001-01-17 19:27 ` dean gaudet
2001-01-24 0:58 ` Sasi Peter
2001-01-24 8:44 ` James Sutherland
2001-01-25 10:20 ` Anton Blanchard
2001-01-25 10:58 ` Sasi Peter
2001-01-26 6:10 ` Anton Blanchard
2001-01-26 11:46 ` David S. Miller
2001-01-26 14:12 ` Anton Blanchard
2001-01-15 23:16 ` Pavel Machek
2001-01-16 13:47 ` jamal
2001-01-16 14:41 ` Pavel Machek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox