* SO_SNDTIMEO: 2.4 kernel bugs @ 2001-02-17 1:35 Chris Evans 2001-02-17 20:29 ` kuznet 0 siblings, 1 reply; 15+ messages in thread From: Chris Evans @ 2001-02-17 1:35 UTC (permalink / raw) To: linux-kernel; +Cc: kuznet, davem Hi, I was glad to see Linux gain SO_SNDTIMEO in kernel 2.4. It is a very use feature which can avoid complexity and pain in userspace programs. Unfortunately, it seems to be very buggy. Here are two buggy scenarios. 1) Create a socketpair(), PF_UNIX, SOCK_STREAM. Set a 5 second SO_SNDTIMEO on the socket. write() 100k down the socket in one write(), i.e. enough to cause the write to have to block. --> BUG!!! The call blocks indefinitely instead of returning after 5 seconds (Note that the same test but with SO_RCVTIMEO and a read() works as expected - I get EAGAIN after 5 seconds). 2) Create a localhost listening socket - AF_INET, SOCK_STREAM. Connect to the listening port Set a 5 second SO_SNDTIMEO on the socket. write() 1Mb down the socket in one write(), i.e. enough to cause it to have to block -> The write() will return after 5 seconds with a partial write count. GOOD! Repeat the write() - send another 1Mb. --> BUG!! The call blocks indefinitely instead of returning with EAGAIN after 5s. I hope this is detailled enough. I'm trying to gain access to a FreeBSD box to compare results.. Cheers Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-17 1:35 SO_SNDTIMEO: 2.4 kernel bugs Chris Evans @ 2001-02-17 20:29 ` kuznet 2001-02-17 21:56 ` Chris Evans ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: kuznet @ 2001-02-17 20:29 UTC (permalink / raw) To: Chris Evans; +Cc: linux-kernel, davem Hello! > Unfortunately, it seems to be very buggy. Here are two buggy scenarios. --- ../vger3-010210/linux/net/ipv4/tcp.c Sat Feb 10 23:16:51 2001 +++ linux/net/ipv4/tcp.c Sat Feb 17 23:27:43 2001 @@ -691,6 +691,8 @@ set_current_state(TASK_INTERRUPTIBLE); + if (!timeo) + break; if (signal_pending(current)) break; if (tcp_memory_free(sk) && !vm_wait) --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001 +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001 @@ -727,6 +727,8 @@ clear_bit(SOCK_ASYNC_NOSPACE, &sk->socket->flags); add_wait_queue(sk->sleep, &wait); for (;;) { + if (!timeo) + break; if (signal_pending(current)) break; set_bit(SOCK_NOSPACE, &sk->socket->flags); ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-17 20:29 ` kuznet @ 2001-02-17 21:56 ` Chris Evans 2001-02-17 22:46 ` Chris Evans 2001-02-18 1:50 ` Chris Evans 2 siblings, 0 replies; 15+ messages in thread From: Chris Evans @ 2001-02-17 21:56 UTC (permalink / raw) To: kuznet; +Cc: linux-kernel, davem Alexey, Damn you are quick! :) Testing immediately Cheers Chris On Sat, 17 Feb 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > Unfortunately, it seems to be very buggy. Here are two buggy scenarios. > > > --- ../vger3-010210/linux/net/ipv4/tcp.c Sat Feb 10 23:16:51 2001 > +++ linux/net/ipv4/tcp.c Sat Feb 17 23:27:43 2001 > @@ -691,6 +691,8 @@ > > set_current_state(TASK_INTERRUPTIBLE); > > + if (!timeo) > + break; > if (signal_pending(current)) > break; > if (tcp_memory_free(sk) && !vm_wait) > --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001 > +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001 > @@ -727,6 +727,8 @@ > clear_bit(SOCK_ASYNC_NOSPACE, &sk->socket->flags); > add_wait_queue(sk->sleep, &wait); > for (;;) { > + if (!timeo) > + break; > if (signal_pending(current)) > break; > set_bit(SOCK_NOSPACE, &sk->socket->flags); > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-17 20:29 ` kuznet 2001-02-17 21:56 ` Chris Evans @ 2001-02-17 22:46 ` Chris Evans 2001-02-18 17:24 ` kuznet 2001-02-18 1:50 ` Chris Evans 2 siblings, 1 reply; 15+ messages in thread From: Chris Evans @ 2001-02-17 22:46 UTC (permalink / raw) To: kuznet; +Cc: linux-kernel, davem Hi Alexey, This patch fixes my simple read()/write() tests, nice one. The behaviour also now matches BSD (someone kindly donated me a FreeBSD shell for testing). Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): - Connect an AF_INET, SOCK_STREAM socket to a local listening socket. - Set 5 seconds SO_SNDTIMEO on the connected socket - Do a sendfile() from a big file down the connected socket. Make sure the size is big (e.g. 1Mb) so the call blocks. --> BUG!! The call blocks indefinitely rather than being interrupted after 5 seconds. Cheers Chris On Sat, 17 Feb 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > Unfortunately, it seems to be very buggy. Here are two buggy scenarios. > > > --- ../vger3-010210/linux/net/ipv4/tcp.c Sat Feb 10 23:16:51 2001 > +++ linux/net/ipv4/tcp.c Sat Feb 17 23:27:43 2001 > @@ -691,6 +691,8 @@ > > set_current_state(TASK_INTERRUPTIBLE); > > + if (!timeo) > + break; > if (signal_pending(current)) > break; > if (tcp_memory_free(sk) && !vm_wait) > --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001 > +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001 > @@ -727,6 +727,8 @@ > clear_bit(SOCK_ASYNC_NOSPACE, &sk->socket->flags); > add_wait_queue(sk->sleep, &wait); > for (;;) { > + if (!timeo) > + break; > if (signal_pending(current)) > break; > set_bit(SOCK_NOSPACE, &sk->socket->flags); > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-17 22:46 ` Chris Evans @ 2001-02-18 17:24 ` kuznet 2001-02-18 19:25 ` Chris Evans 2001-02-19 1:04 ` Chris Evans 0 siblings, 2 replies; 15+ messages in thread From: kuznet @ 2001-02-18 17:24 UTC (permalink / raw) To: Chris Evans; +Cc: linux-kernel, davem Hello! > Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): None of the options apply to sendfile(). It is not socket level operation. You have to use alarm for it. BTW, if you have enough fast network, you probably can observe that sendfile() is even not interrupted by signals. 8) But this is possible to fix at least. BTW the same fix will repair SO_*TIMEO partially, i.e. it will timeout after n*timeo, where n is an arbitrary number not exceeding size/sndbuf. Alexey ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-18 17:24 ` kuznet @ 2001-02-18 19:25 ` Chris Evans 2001-02-18 19:33 ` kuznet 2001-02-19 1:04 ` Chris Evans 1 sibling, 1 reply; 15+ messages in thread From: Chris Evans @ 2001-02-18 19:25 UTC (permalink / raw) To: kuznet; +Cc: linux-kernel, davem On Sun, 18 Feb 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): > > None of the options apply to sendfile(). It is not socket level > operation. You have to use alarm for it. Hi Alexey, Actually sendfile() _does_ timeout using SO_SNDTIMEO. It just takes longer to timeout because the kernel sendfile() page loop will (usually) need to timeout a short write, and then timeout a 0 byte write. So the actual timeout would be 2 * SO_SNDTIMEO. Unfortunately, I'm seeing timeout at (I think) 3 * SO_SNDTIMEO, which I can't account for. Cheers Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-18 19:25 ` Chris Evans @ 2001-02-18 19:33 ` kuznet 2001-02-18 19:37 ` Chris Evans 0 siblings, 1 reply; 15+ messages in thread From: kuznet @ 2001-02-18 19:33 UTC (permalink / raw) To: Chris Evans; +Cc: linux-kernel, davem Hello! > So the actual timeout would be 2 * SO_SNDTIMEO. It will timeout if write of some page blocks for SO_SNDTIMEO. If transmission of any page never takes more than SO_SNDTIMEO it never times out. You can think about sendfile() as subroutine doing: for (;;) { read(4K from fdin); write(4K to fdout); } All the options apply to each read()/write() separetely, so that... alas. Alexey ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-18 19:33 ` kuznet @ 2001-02-18 19:37 ` Chris Evans 2001-02-18 19:48 ` kuznet 0 siblings, 1 reply; 15+ messages in thread From: Chris Evans @ 2001-02-18 19:37 UTC (permalink / raw) To: kuznet; +Cc: linux-kernel, davem On Sun, 18 Feb 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > So the actual timeout would be 2 * SO_SNDTIMEO. > > It will timeout if write of some page blocks for SO_SNDTIMEO. .. unless that page was partially written, in which case a short write count is returned (rather than a timeout error), and the loop goes around again. > If transmission of any page never takes more than SO_SNDTIMEO it never > times out. Which is good, because SO_SNDTIMEO is an inactivity monitor. Cheers Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-18 19:37 ` Chris Evans @ 2001-02-18 19:48 ` kuznet 0 siblings, 0 replies; 15+ messages in thread From: kuznet @ 2001-02-18 19:48 UTC (permalink / raw) To: Chris Evans; +Cc: linux-kernel, davem Hello! > .. unless that page was partially written, in which case a short write > count is returned (rather than a timeout error), and the loop goes around > again. sendfile() does not return on partial write and tries to push more until error. On fast link it most likely succeeds, so that it is unkillable even with SIGKILL. > Which is good, because SO_SNDTIMEO is an inactivity monitor. Then why did you blame? 8)8) I do not think so. It is rather scheduling breaker. If connection is idle 99% of time, but wakes each sndtimeo-1usec, it must yuild, otherwise thread is lost for production. Alexey ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-18 17:24 ` kuznet 2001-02-18 19:25 ` Chris Evans @ 2001-02-19 1:04 ` Chris Evans 2001-02-19 1:43 ` sendfile() breakage was " Chris Evans 2001-02-19 18:03 ` kuznet 1 sibling, 2 replies; 15+ messages in thread From: Chris Evans @ 2001-02-19 1:04 UTC (permalink / raw) To: kuznet; +Cc: linux-kernel, davem On Sun, 18 Feb 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): > > None of the options apply to sendfile(). It is not socket level > operation. You have to use alarm for it. > > BTW, if you have enough fast network, you probably can observe > that sendfile() is even not interrupted by signals. 8) But this > is possible to fix at least. BTW the same fix will repair SO_*TIMEO > partially, i.e. it will timeout after n*timeo, where n is an arbitrary > number not exceeding size/sndbuf. Hi Alexey, You are right - our sendfile() implementation is broken. I have fixed it (patch at end of mail). However, I believe something is still wrong in the networking layer, even with my fix applied. Before I go into details, I want to step back and describe things from a _users_ perspective. That is most important after all. Take two different operations: write() to a socket and sendfile() down a socket. In both cases, the socket has a send timeout of 10 seconds. From a users' point of view, these are two socket write operations. The source of data is different (a buffer or a file descriptor), but that is irrelevant. The user has the right to expect a timeout after 10 seconds of no progress, on both operations. I have tried this on FreeBSD, and this is what happens: both sendfile() and write() timeout in the same way. On Linux, this is not the case => bug. I fixed a small sendfile() issue, which did not recognise partial writes as an interruption, but as I said above, the bug still remains. Investigation shows that the Linux network layer is behaving oddly. It seems that we are writing 4096 bytes to a socket. This proceeds in 4096 byte chunks until the send buffer on the socket is full, and a 4096 byte write blocks. This blocking write is eventually interrupted by the timeout, and the write call returns.. wait for it.. 4096! This suggests there was socket space after all, and the call should not have blocked. I wonder what is going on? I'd like to get this fixed. I think the FreeBSD behaviour is definitely correct and we want it on Linux. Cheers Chris --- filemap.c.old Sun Feb 18 23:35:06 2001 +++ filemap.c Mon Feb 19 00:13:38 2001 @@ -1062,7 +1062,7 @@ for (;;) { struct page *page, **hash; - unsigned long end_index, nr; + unsigned long end_index, nr, actor_ret; end_index = inode->i_size >> PAGE_CACHE_SHIFT; if (index > end_index) @@ -1110,13 +1110,13 @@ * "pos" here (the actor routine has to update the user buffer * pointers and the remaining count). */ - nr = actor(desc, page, offset, nr); - offset += nr; + actor_ret = actor(desc, page, offset, nr); + offset += actor_ret; index += offset >> PAGE_CACHE_SHIFT; offset &= ~PAGE_CACHE_MASK; page_cache_release(page); - if (nr && desc->count) + if (actor_ret == nr && desc->count) continue; break; ^ permalink raw reply [flat|nested] 15+ messages in thread
* sendfile() breakage was Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-19 1:04 ` Chris Evans @ 2001-02-19 1:43 ` Chris Evans 2001-02-19 18:03 ` kuznet 1 sibling, 0 replies; 15+ messages in thread From: Chris Evans @ 2001-02-19 1:43 UTC (permalink / raw) To: linux-kernel On Mon, 19 Feb 2001, Chris Evans wrote: > > BTW, if you have enough fast network, you probably can observe > > that sendfile() is even not interrupted by signals. 8) But this > > is possible to fix at least. BTW the same fix will repair SO_*TIMEO > > partially, i.e. it will timeout after n*timeo, where n is an arbitrary > > number not exceeding size/sndbuf. > > Hi Alexey, > > You are right - our sendfile() implementation is broken. I have fixed it > (patch at end of mail). Actually the whole mess stems from our broken internal ->write() and ->read() APIs. The _single_ return value is trying to convery _two_ pieces of information - always a bad move. They are: 1) Success/failure (and error code if it's a failure) 2) Amount of bytes read or written This bogon does not allow for the following information to be returned (assume I asked for 8192 bytes to be written): "4096 bytes were written, and the operation was aborted due to EINTR" Cheers Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-19 1:04 ` Chris Evans 2001-02-19 1:43 ` sendfile() breakage was " Chris Evans @ 2001-02-19 18:03 ` kuznet 2001-02-19 21:56 ` Chris Evans 1 sibling, 1 reply; 15+ messages in thread From: kuznet @ 2001-02-19 18:03 UTC (permalink / raw) To: Chris Evans; +Cc: linux-kernel, davem Hello! > You are right - our sendfile() implementation is broken. I have fixed it Thank you! > Investigation shows that the Linux network layer is behaving oddly. It > seems that we are writing 4096 bytes to a socket. This proceeds in 4096 > byte chunks until the send buffer on the socket is full, and a 4096 byte > write blocks. This blocking write is eventually interrupted by the > timeout, and the write call returns.. wait for it.. 4096! This suggests > there was socket space after all, and the call should not have blocked. Wakeup does not happen until _enough_ (1/3 of snbuf) of space in sndbuf is released, otherwise you will overschedule. So, as soon as write() goes to sleep, it will sleep waiting until 1/3 is released. If it is interrupted, it use all the released space immediately before exit. Again, to make more for in this context. This can be even wrong and, probably, we should return instantly with -EAGAIN/-EINTR/partial count, but it is most likely suboptimal (though I have already changed this to instant return). But this does not look essential from caller's viewpoint, except for sendfile() of course. 8) Alexey ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-19 18:03 ` kuznet @ 2001-02-19 21:56 ` Chris Evans 2001-02-20 15:09 ` David Woodhouse 0 siblings, 1 reply; 15+ messages in thread From: Chris Evans @ 2001-02-19 21:56 UTC (permalink / raw) To: kuznet; +Cc: linux-kernel, davem On Mon, 19 Feb 2001 kuznet@ms2.inr.ac.ru wrote: > Wakeup does not happen until _enough_ (1/3 of snbuf) of space in sndbuf > is released, otherwise you will overschedule. So, as soon as > write() goes to sleep, it will sleep waiting until 1/3 is released. Of course. Thank you. > If it is interrupted, it use all the released space immediately before > exit. Again, to make more for in this context. This can be even wrong > and, probably, we should return instantly with -EAGAIN/-EINTR/partial > count, but it is most likely suboptimal (though I have already changed > this to instant return). But this does not look essential from > caller's viewpoint, except for sendfile() of course. 8) Cool. I think the proper fix, long term, is to fix our internal I/O routine APIs so that they are capable of returning a byte count _and_ an error. One day, that might be a useful thing to export to userspace. Cheers Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-19 21:56 ` Chris Evans @ 2001-02-20 15:09 ` David Woodhouse 0 siblings, 0 replies; 15+ messages in thread From: David Woodhouse @ 2001-02-20 15:09 UTC (permalink / raw) To: Chris Evans; +Cc: kuznet, linux-kernel, davem chris@scary.beasts.org said: > I think the proper fix, long term, is to fix our internal I/O routine > APIs so that they are capable of returning a byte count _and_ an > error. One day, that might be a useful thing to export to userspace. The MTD code already does this. -- dwmw2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: SO_SNDTIMEO: 2.4 kernel bugs 2001-02-17 20:29 ` kuznet 2001-02-17 21:56 ` Chris Evans 2001-02-17 22:46 ` Chris Evans @ 2001-02-18 1:50 ` Chris Evans 2 siblings, 0 replies; 15+ messages in thread From: Chris Evans @ 2001-02-18 1:50 UTC (permalink / raw) To: kuznet; +Cc: linux-kernel Hi, By the way - I tested SO_RCVLOWAT, another 2.4 addition. Good news this time - seems to work fine. Cheers Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2001-02-20 15:10 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-02-17 1:35 SO_SNDTIMEO: 2.4 kernel bugs Chris Evans 2001-02-17 20:29 ` kuznet 2001-02-17 21:56 ` Chris Evans 2001-02-17 22:46 ` Chris Evans 2001-02-18 17:24 ` kuznet 2001-02-18 19:25 ` Chris Evans 2001-02-18 19:33 ` kuznet 2001-02-18 19:37 ` Chris Evans 2001-02-18 19:48 ` kuznet 2001-02-19 1:04 ` Chris Evans 2001-02-19 1:43 ` sendfile() breakage was " Chris Evans 2001-02-19 18:03 ` kuznet 2001-02-19 21:56 ` Chris Evans 2001-02-20 15:09 ` David Woodhouse 2001-02-18 1:50 ` Chris Evans
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox