* sendfile to nonblocking socket
@ 2007-04-23 21:13 voron
2007-04-23 21:59 ` David Miller
2007-04-23 22:52 ` David Schwartz
0 siblings, 2 replies; 10+ messages in thread
From: voron @ 2007-04-23 21:13 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1554 bytes --]
Hello
I'm testing a web server nginx for films sharing in my LAN. And I've got
some interesting results. When I tried to download film or another big
file via gigabit link, I've got sendfile block with nonblocking
socket. Strace log in attach. Some commens
#enabling nonblock on fd 3
20:51:20 ioctl(3, FIONBIO, [1]) = 0
#normal nonblocking sendfile, asked 2147480274 bytes, sent 236164
bytes, so nonblocking works
20:51:20 sendfile(3, 8, [847150], 2147480274) = 236164
#sendfile 390 M, 6 seconds
20:51:22 sendfile(3, 8, [102578266], 2147481510) = 390115144
#sendfile 1000 M, 15 seconds
20:51:40 sendfile(3, 8, [1303409692], 2147482596) = 1008100764
#sendfile ~2G, 30 seconds
20:51:55 sendfile(3, 8, [2312288408], 1982678888) = 1982678888
As I see, nonblocking mode is enabled - sendfile sends less than asked.
But 2G via single 30 seconds sendfile call - this is blocking call. How
can I avoid that? I prefer sendfile as fastest way to send file
content to network socket. The problem with sendfile block on
nonblocking socket has place only when I'm using network connection,
that is faster than my hard disk, for example gigabit NIC or localhost.
When I use 100Mbit NIC, which is slower, than my hard disk, I got small
and fast sendfile calls without blocking.
My kernel is Linux 2.6.20.3-grsec x86_64. I verified that also on
2.6.18 - same results. Please advise.
ps: I've did same tests with lighttpd's sendfile and got same results -
block on nonblocking socket, when network is faster than disk.
Thank you,
Alex
[-- Attachment #2: strace.log --]
[-- Type: text/plain, Size: 5129 bytes --]
strace -tp 10305
Process 10305 attached - interrupt to quit
20:51:17 write(13, "2007/04/23 20:51:17 [info] 10305"..., 85) = 85
20:51:17 epoll_wait(7, {{EPOLLIN, {u32=1054142480, u64=50471214837776}}}, 512, 4294967295) = 1
20:51:20 accept(10, {sa_family=AF_INET, sin_port=htons(55446), sin_addr=inet_addr("192.168.78.1")}, [5056729052170158096]) = 3
20:51:20 ioctl(3, FIONBIO, [1]) = 0
20:51:20 epoll_ctl(7, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLET, {u32=1054142784, u64=50471214838080}}) = 0
20:51:20 epoll_wait(7, {{EPOLLIN, {u32=1054142784, u64=50471214838080}}}, 512, 600000) = 1
20:51:20 recvfrom(3, "GET /3.tmp HTTP/1.0\r\nUser-Agent:"..., 1024, 0, NULL, NULL) = 147
20:51:20 open("/var/www/cacti/htdocs/3.tmp", O_RDONLY) = 8
20:51:20 fstat(8, {st_mode=S_IFREG|0644, st_size=4294967296, ...}) = 0
20:51:20 setsockopt(3, SOL_TCP, TCP_CORK, [1], 4) = 0
20:51:20 writev(3, [{"HTTP/1.1 200 OK\r\nServer: nginx/0"..., 262}], 1) = 262
20:51:20 sendfile(3, 8, [0], 2147479552) = 20576
20:51:20 epoll_ctl(7, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLOUT|EPOLLET, {u32=1054142784, u64=50471214838080}}) = 0
20:51:20 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 600000) = 1
20:51:20 sendfile(3, 8, [20576], 2147483552) = 55568
20:51:20 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599981) = 1
20:51:20 sendfile(3, 8, [76144], 2147481232) = 27784
20:51:20 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599979) = 1
20:51:20 sendfile(3, 8, [103928], 2147482120) = 90298
20:51:20 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599979) = 1
20:51:20 sendfile(3, 8, [194226], 2147481934) = 652924
20:51:20 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599978) = 1
20:51:20 sendfile(3, 8, [847150], 2147480274) = 236164
20:51:20 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599970) = 1
20:51:21 sendfile(3, 8, [1083314], 2147481678) = 4549630
20:51:21 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599960) = 1
20:51:21 sendfile(3, 8, [5632944], 2147482704) = 38529462
20:51:21 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599894) = 1
20:51:21 sendfile(3, 8, [44162406], 2147480218) = 33451936
20:51:22 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 600000) = 1
20:51:22 sendfile(3, 8, [77614342], 2147480314) = 20393456
20:51:22 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 600000) = 1
20:51:22 sendfile(3, 8, [98007798], 2147480842) = 583464
20:51:22 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 600000) = 1
20:51:22 sendfile(3, 8, [98591262], 2147483106) = 687654
20:51:22 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599959) = 1
20:51:22 sendfile(3, 8, [99278916], 2147483580) = 798790
20:51:22 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599945) = 1
20:51:22 sendfile(3, 8, [100077706], 2147483510) = 2500560
20:51:22 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599931) = 1
20:51:22 sendfile(3, 8, [102578266], 2147481510) = 390115144
20:51:28 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599910) = 1
20:51:28 sendfile(3, 8, [492693410], 2147481694) = 1007170
20:51:28 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 600000) = 1
20:51:28 sendfile(3, 8, [493700580], 2147482140) = 1028008
20:51:28 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599984) = 1
20:51:28 sendfile(3, 8, [494728588], 2147482228) = 804430152
20:51:40 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599971) = 1
20:51:40 sendfile(3, 8, [1299158740], 2147481900) = 1069684
20:51:40 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 600000) = 1
20:51:40 sendfile(3, 8, [1300228424], 2147481272) = 1062738
20:51:40 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599976) = 1
20:51:40 sendfile(3, 8, [1301291162], 2147483494) = 1007170
20:51:40 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599954) = 1
20:51:40 sendfile(3, 8, [1302298332], 2147479844) = 1111360
20:51:40 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599931) = 1
20:51:40 sendfile(3, 8, [1303409692], 2147482596) = 1008100764
20:51:55 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 599905) = 1
20:51:55 sendfile(3, 8, [2311510456], 1983456840) = 777952
20:51:55 epoll_wait(7, {{EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 600000) = 1
20:51:55 sendfile(3, 8, [2312288408], 1982678888) = 1982678888
20:52:25 write(16, "192.168.78.1 - voron [23/Apr/200"..., 114) = 114
20:52:25 close(8) = 0
20:52:25 setsockopt(3, SOL_TCP, TCP_CORK, [0], 4) = 0
20:52:25 recvfrom(3, 0x57d880, 1024, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
20:52:25 epoll_wait(7, {{EPOLLIN|EPOLLOUT, {u32=1054142784, u64=50471214838080}}}, 512, 75000) = 1
20:52:25 recvfrom(3, "", 1024, 0, NULL, NULL) = 0
20:52:25 close(3)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: sendfile to nonblocking socket
2007-04-23 21:13 sendfile to nonblocking socket voron
@ 2007-04-23 21:59 ` David Miller
2007-04-24 4:42 ` Alex Vorona
2007-04-23 22:52 ` David Schwartz
1 sibling, 1 reply; 10+ messages in thread
From: David Miller @ 2007-04-23 21:59 UTC (permalink / raw)
To: voron; +Cc: linux-kernel
From: voron <voron@amhost.net>
Date: Tue, 24 Apr 2007 00:13:27 +0300
> As I see, nonblocking mode is enabled - sendfile sends less than asked.
The socket is marked as non-blocking, but the disk I/O is not.
It's blocking on the disk I/O not the socket part of the operation.
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: sendfile to nonblocking socket
2007-04-23 21:13 sendfile to nonblocking socket voron
2007-04-23 21:59 ` David Miller
@ 2007-04-23 22:52 ` David Schwartz
2007-04-24 4:54 ` Alex Vorona
1 sibling, 1 reply; 10+ messages in thread
From: David Schwartz @ 2007-04-23 22:52 UTC (permalink / raw)
To: linux-kernel
> As I see, nonblocking mode is enabled - sendfile sends less than asked.
> But 2G via single 30 seconds sendfile call - this is blocking call. How
> can I avoid that? I prefer sendfile as fastest way to send file
> content to network socket. The problem with sendfile block on
> nonblocking socket has place only when I'm using network connection,
> that is faster than my hard disk, for example gigabit NIC or localhost.
> When I use 100Mbit NIC, which is slower, than my hard disk, I got small
> and fast sendfile calls without blocking.
>
> My kernel is Linux 2.6.20.3-grsec x86_64. I verified that also on
> 2.6.18 - same results. Please advise.
You have a misunderstanding about the semantics of 'sendfile'. The 'sendfile' function is just a more efficient version of a read followed by a write. If you did a read followed by a write, it would block as well (in the read).
DS
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: sendfile to nonblocking socket
2007-04-23 21:59 ` David Miller
@ 2007-04-24 4:42 ` Alex Vorona
0 siblings, 0 replies; 10+ messages in thread
From: Alex Vorona @ 2007-04-24 4:42 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 838 bytes --]
David Miller wrote:
> From: voron <voron@amhost.net>
> Date: Tue, 24 Apr 2007 00:13:27 +0300
>
>
>> As I see, nonblocking mode is enabled - sendfile sends less than asked.
>>
>
> The socket is marked as non-blocking, but the disk I/O is not.
>
> It's blocking on the disk I/O not the socket part of the operation.
>
>
>
How can I told kernel to not block on disk I/O? I tried non-blocking on
disk i/o fd, but it seems to be ignored. Strace with both nonblocing
disk fd and socket fd in attach
#non-blocking on socket fd enabled
04:34:07 ioctl(9, FIONBIO, [1]) = 0
#non-blocking on disk fd enabled
04:34:11 ioctl(12, FIONBIO, [1]) = 0
#normal sendfile
04:34:07 sendfile(9, 12, [444282], 2147477638) = 812682
#32 seconds sendfile
04:34:11 sendfile(9, 12, [261474962], 2147476846) = 2144612230
Thank you,
Alex
[-- Attachment #2: strace.log --]
[-- Type: text/plain, Size: 5680 bytes --]
04:34:04 write(3, "2007/04/24 04:34:04 [info] 32390"..., 85) = 85
04:34:04 epoll_wait(11, {{EPOLLIN, {u32=3618562064, u64=61614924423184}}}, 512, 4294967295) = 1
04:34:07 accept(8, {sa_family=AF_INET, sin_port=htons(52673), sin_addr=inet_addr("192.168.78.1")}, [5056848310527066128]) = 9
04:34:07 ioctl(9, FIONBIO, [1]) = 0
04:34:07 epoll_ctl(11, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLET, {u32=3618562369, u64=61614924423489}}) = 0
04:34:07 epoll_wait(11, {{EPOLLIN, {u32=3618562369, u64=61614924423489}}}, 512, 600000) = 1
04:34:07 recvfrom(9, "GET /3.tmp HTTP/1.0\r\nUser-Agent:"..., 1024, 0, NULL, NULL) = 147
04:34:07 open("/var/www/cacti/htdocs/3.tmp", O_RDONLY) = 12
04:34:07 fstat(12, {st_mode=S_IFREG|0644, st_size=4294967296, ...}) = 0
04:34:07 setsockopt(9, SOL_TCP, TCP_CORK, [1], 4) = 0
04:34:07 writev(9, [{"HTTP/1.1 200 OK\r\nServer: nginx/0"..., 262}], 1) = 262
04:34:07 ioctl(12, FIONBIO, [1]) = 0
04:34:07 sendfile(9, 12, [0], 2147475456) = 20576
04:34:07 epoll_ctl(11, EPOLL_CTL_MOD, 9, {EPOLLIN|EPOLLOUT|EPOLLET, {u32=3618562369, u64=61614924423489}}) = 0
04:34:07 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 600000) = 1
04:34:07 ioctl(12, FIONBIO, [1]) = 0
04:34:07 sendfile(9, 12, [20576], 2147479456) = 55568
04:34:07 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599975) = 1
04:34:07 ioctl(12, FIONBIO, [1]) = 0
04:34:07 sendfile(9, 12, [76144], 2147477136) = 41676
04:34:07 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599973) = 1
04:34:07 ioctl(12, FIONBIO, [1]) = 0
04:34:07 sendfile(9, 12, [117820], 2147476420) = 83352
04:34:07 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599972) = 1
04:34:07 ioctl(12, FIONBIO, [1]) = 0
04:34:07 sendfile(9, 12, [201172], 2147479084) = 243110
04:34:07 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599971) = 1
04:34:07 ioctl(12, FIONBIO, [1]) = 0
04:34:07 sendfile(9, 12, [444282], 2147477638) = 812682
04:34:07 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599968) = 1
04:34:07 ioctl(12, FIONBIO, [1]) = 0
04:34:07 sendfile(9, 12, [1256964], 2147475964) = 7501680
04:34:07 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599950) = 1
04:34:07 ioctl(12, FIONBIO, [1]) = 0
04:34:07 sendfile(9, 12, [8758644], 2147478156) = 234274688
04:34:10 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599846) = 1
04:34:10 ioctl(12, FIONBIO, [1]) = 0
04:34:10 sendfile(9, 12, [243033332], 2147478284) = 583464
04:34:10 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 600000) = 1
04:34:10 ioctl(12, FIONBIO, [1]) = 0
04:34:10 sendfile(9, 12, [243616796], 2147476452) = 16712076
04:34:11 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599990) = 1
04:34:11 ioctl(12, FIONBIO, [1]) = 0
04:34:11 sendfile(9, 12, [260328872], 2147476056) = 1146090
04:34:11 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599766) = 1
04:34:11 ioctl(12, FIONBIO, [1]) = 0
04:34:11 sendfile(9, 12, [261474962], 2147476846) = 2144612230
04:34:43 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599754) = 1
04:34:43 ioctl(12, FIONBIO, [1]) = 0
04:34:43 sendfile(9, 12, [2406087192], 1888880104) = 611248
04:34:43 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 600000) = 1
04:34:43 ioctl(12, FIONBIO, [1]) = 0
04:34:43 sendfile(9, 12, [2406698440], 1888268856) = 1000890816
04:34:57 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599991) = 1
04:34:57 ioctl(12, FIONBIO, [1]) = 0
04:34:57 sendfile(9, 12, [3407589256], 887378040) = 972440
04:34:58 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 600000) = 1
04:34:58 ioctl(12, FIONBIO, [1]) = 0
04:34:58 sendfile(9, 12, [3408561696], 886405600) = 861304
04:34:58 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599977) = 1
04:34:58 ioctl(12, FIONBIO, [1]) = 0
04:34:58 sendfile(9, 12, [3409423000], 885544296) = 993278
04:34:58 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599954) = 1
04:34:58 ioctl(12, FIONBIO, [1]) = 0
04:34:58 sendfile(9, 12, [3410416278], 884551018) = 847412
04:34:58 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599930) = 1
04:34:58 ioctl(12, FIONBIO, [1]) = 0
04:34:58 sendfile(9, 12, [3411263690], 883703606) = 4021734
04:34:58 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599908) = 1
04:34:58 ioctl(12, FIONBIO, [1]) = 0
04:34:58 sendfile(9, 12, [3415285424], 879681872) = 492166998
04:35:07 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 599874) = 1
04:35:07 ioctl(12, FIONBIO, [1]) = 0
04:35:07 sendfile(9, 12, [3907452422], 387514874) = 104002458
04:35:09 epoll_wait(11, {{EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 600000) = 1
04:35:09 ioctl(12, FIONBIO, [1]) = 0
04:35:09 sendfile(9, 12, [4011454880], 283512416) = 283512416
04:35:15 write(6, "192.168.78.1 - voron [24/Apr/200"..., 114) = 114
04:35:15 close(12) = 0
04:35:15 setsockopt(9, SOL_TCP, TCP_CORK, [0], 4) = 0
04:35:15 recvfrom(9, 0x5804d0, 1024, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
04:35:15 epoll_wait(11, {{EPOLLIN|EPOLLOUT, {u32=3618562369, u64=61614924423489}}}, 512, 75000) = 1
04:35:15 recvfrom(9, "", 1024, 0, NULL, NULL) = 0
04:35:15 close(9) = 0
04:35:15 epoll_wait(11,
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: sendfile to nonblocking socket
2007-04-23 22:52 ` David Schwartz
@ 2007-04-24 4:54 ` Alex Vorona
2007-04-24 9:19 ` David Schwartz
0 siblings, 1 reply; 10+ messages in thread
From: Alex Vorona @ 2007-04-24 4:54 UTC (permalink / raw)
To: linux-kernel
David Schwartz пишет:
> You have a misunderstanding about the semantics of 'sendfile'. The 'sendfile' function is just a more efficient version of a read followed by a write. If you did a read followed by a write, it would block as well (in the read).
>
> DS
>
sendfile function is not just a more efficient version of a read
followed by a write. It reads from one fd and write to another at tha
same time. Please try to read 2G, and then write 2G - and how much
memory you will be need and how much time you will loose while reading
2G from disk, but not writing them to socket. If you know more
efficient method to transfer file from disk to network - please advise.
Now all I want is really non-blocking sendfile. Currently sendfile is
non-blocking on network, but not on disk i/o. And when I have network
faster than disk - I get block.
Thank you,
Alex
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: sendfile to nonblocking socket
2007-04-24 4:54 ` Alex Vorona
@ 2007-04-24 9:19 ` David Schwartz
2007-04-24 10:33 ` Re[2]: " Alex Vorona
0 siblings, 1 reply; 10+ messages in thread
From: David Schwartz @ 2007-04-24 9:19 UTC (permalink / raw)
To: linux-kernel
> David Schwartz пишет:
> > You have a misunderstanding about the semantics of 'sendfile'.
> The 'sendfile' function is just a more efficient version of a
> read followed by a write. If you did a read followed by a write,
> it would block as well (in the read).
> >
> > DS
> sendfile function is not just a more efficient version of a read
> followed by a write. It reads from one fd and write to another at tha
> same time. Please try to read 2G, and then write 2G - and how much
> memory you will be need and how much time you will loose while reading
> 2G from disk, but not writing them to socket.
You are correct. What I meant to say was that it's just a more efficient version of 'mmap'ing a file and then 'write'ing from the 'mmap'. The 'write' to a non-blocking socket can still 'block' on disk I/O.
> If you know more
> efficient method to transfer file from disk to network - please advise.
> Now all I want is really non-blocking sendfile. Currently sendfile is
> non-blocking on network, but not on disk i/o. And when I have network
> faster than disk - I get block.
There are many different techniques and which is correct depends on what direction you want to go. POSIX asynchronous I/O is one possibility. Threads plus epoll is another. It really depends upon how much performance you need, how much complexity you can tolerate, and how portable you need to be.
DS
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re[2]: sendfile to nonblocking socket
2007-04-24 9:19 ` David Schwartz
@ 2007-04-24 10:33 ` Alex Vorona
2007-04-24 12:48 ` Eric Dumazet
2007-04-24 19:19 ` David Schwartz
0 siblings, 2 replies; 10+ messages in thread
From: Alex Vorona @ 2007-04-24 10:33 UTC (permalink / raw)
To: linux-kernel
Hello David,
Tuesday, April 24, 2007, 1:19:49 PM, you wrote:
>> sendfile function is not just a more efficient version of a read
>> followed by a write. It reads from one fd and write to another at tha
>> same time. Please try to read 2G, and then write 2G - and how much
>> memory you will be need and how much time you will loose while reading
>> 2G from disk, but not writing them to socket.
DS> You are correct. What I meant to say was that it's just a
DS> more efficient version of 'mmap'ing a file and then 'write'ing
DS> from the 'mmap'. The 'write' to a non-blocking socket can still
DS> 'block' on disk I/O.
How can I avoid that blocking? Or maybe another question - how can I
deliver data from disk to network with minimal copy operations, etc.
>> If you know more
>> efficient method to transfer file from disk to network - please advise.
>> Now all I want is really non-blocking sendfile. Currently sendfile is
>> non-blocking on network, but not on disk i/o. And when I have network
>> faster than disk - I get block.
DS> There are many different techniques and which is correct
DS> depends on what direction you want to go.
_very_ fast frontend web-server :) using as much kernel features as possible
DS> POSIX asynchronous I/O is one possibility.
aio does not support direct transfer file->socket. Using some hints,
like aio reading into shared memory and then sendfile from shared
memory, like lighttpd does, is not what I want. My tests showing
that aio-sendfile realization in lighttpd is slower than sendfile.
I think, sendfile uses less copy operations(maybe even zerocopy), than current aio
realizations in kernel.
DS> Threads plus epoll is another.
20k threads and maybe more is too much :). Look at http://nginx.net/
senction "Architecture and scalability" for example.
DS> It really depends upon how much performance you need
all, that hardware can take and hold :)
--
Best regards,
Alex mailto:voron@amhost.net
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re[2]: sendfile to nonblocking socket
2007-04-24 10:33 ` Re[2]: " Alex Vorona
@ 2007-04-24 12:48 ` Eric Dumazet
2007-04-24 19:19 ` David Schwartz
1 sibling, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2007-04-24 12:48 UTC (permalink / raw)
To: Alex Vorona; +Cc: linux-kernel
On Tue, 24 Apr 2007 14:33:48 +0400
Alex Vorona <voron@amhost.net> wrote:
> Hello David,
>
> Tuesday, April 24, 2007, 1:19:49 PM, you wrote:
>
> >> sendfile function is not just a more efficient version of a read
> >> followed by a write. It reads from one fd and write to another at tha
> >> same time. Please try to read 2G, and then write 2G - and how much
> >> memory you will be need and how much time you will loose while reading
> >> 2G from disk, but not writing them to socket.
>
> DS> You are correct. What I meant to say was that it's just a
> DS> more efficient version of 'mmap'ing a file and then 'write'ing
> DS> from the 'mmap'. The 'write' to a non-blocking socket can still
> DS> 'block' on disk I/O.
>
> How can I avoid that blocking? Or maybe another question - how can I
> deliver data from disk to network with minimal copy operations, etc.
>
I believe the modern way would be to use splice() system call :
loop
splice(from disk , to pipe) with SPLICE_F_NONBLOCK
wait_event_from_epoll (pipe ready for reading)
splice(from pipe, to socket)
wait_event_from_epoll (socket ready for writing)
But :
1) I am not sure epoll can get an event when page(s) is/are ready in pipe (ie disk delivered the page(s) to cache)
2) I am not sure splice() to socket is actually implemented with 0-copy
3) I am not sure pipe capacity (16 pages) would be enough to get a good readahead.
Eric
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Re[2]: sendfile to nonblocking socket
2007-04-24 10:33 ` Re[2]: " Alex Vorona
2007-04-24 12:48 ` Eric Dumazet
@ 2007-04-24 19:19 ` David Schwartz
1 sibling, 0 replies; 10+ messages in thread
From: David Schwartz @ 2007-04-24 19:19 UTC (permalink / raw)
To: Alex Vorona, linux-kernel
> DS> Threads plus epoll is another.
> 20k threads and maybe more is too much :). Look at http://nginx.net/
> senction "Architecture and scalability" for example.
> DS> It really depends upon how much performance you need
> all, that hardware can take and hold :)
Why would you want 20k threads? You aren't seriously suggesting that you
need to have 20,000 outstanding disk operations, are you? Surely you don't
think that would be efficient. If the disk is the limiting factor, it may
get slightly faster as you pend more concurrent requests, but surely 20,000
is not the best number! (256 is probably closer to the optimal value, and it
may be less.)
Your application has to manage the outstanding disk read requests. I don't
know of any way to foist this task on the kernel. Perhaps a pool of disk
read threads?
I would keep a flag for each connection to track whether the last write got
a 'would block' or was incomplete. So long as this flag is clear, let the
disk read thread attempt the socket 'write'. If the disk read thread gets a
partial write (or a would block indication), set the flag on the socket and
let the socket I/O threads takeover the connection (based on 'epoll'
notification). When a write completes and you need more disk data, clear the
flag and let the disk read threads takeover the connection until a write
blocks again. (This disk read threads can use 'sendfile' or 'splice' so long
as they don't block on the socket.)
Perhaps the disk read threads should be using 'mmap' with MAP_POPULATE.
There are certainly many possible approaches.
DS
^ permalink raw reply [flat|nested] 10+ messages in thread
* sendfile to nonblocking socket
@ 2007-04-25 11:41 A.D.F.
0 siblings, 0 replies; 10+ messages in thread
From: A.D.F. @ 2007-04-25 11:41 UTC (permalink / raw)
To: linux-kernel
Answer to Alex Vorona (I'm not subscribed to linux kernel list so CC me).
Serious answers to an almost troll question (no offence here :-).
1) The possibility that sendfiles blocks,
when it has to wait for disk reads / pages,
has been repeatedly mentioned (and thus known) for ages by everybody
(including Linus) since kernel 2.2.x;
if you search the archives (mailing lists, etc.)
or just google "sendfile blocking" you'll find a lot of stuff about this
issue.
2) A possible answer, to your implicit request to add real asynchronous
support
to sendfile, is to wait for the general asynchronous support for all
blocking
syscalls (search the kernel archives in the last 3 months).
3) The simplest solution (that works with every OS) is to just pass
a small (*) amount of data to each sendfile() call,
(*) i.e. between 64 KB (very slow disk with DMA disabled)
and 1024 KB (very fast disk).
Of course, if files to be sent are not in page cache and
there are more than 10 - 100 parallel downloads of different files,
you'll see noticeable latencies (because of disk seeks, etc.);
in any case you may want to add a parameter to this nginx thing
to let the tuning of chunk size.
4) As it has already been mentioned by others, there are many good solutions
that can be used right now to allow the required scalability target:
- using threads;
- using asynchronous I/O;
- etc.
5) If you want to use some other Web Server that just works,
look for the alternatives (including lighttpd or even Apache).
--
Nick Name: A.D.F.
E-Mail: <adefacc () tin ! it>
--
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-04-25 10:47 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-23 21:13 sendfile to nonblocking socket voron
2007-04-23 21:59 ` David Miller
2007-04-24 4:42 ` Alex Vorona
2007-04-23 22:52 ` David Schwartz
2007-04-24 4:54 ` Alex Vorona
2007-04-24 9:19 ` David Schwartz
2007-04-24 10:33 ` Re[2]: " Alex Vorona
2007-04-24 12:48 ` Eric Dumazet
2007-04-24 19:19 ` David Schwartz
-- strict thread matches above, loose matches on Subject: below --
2007-04-25 11:41 A.D.F.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox