public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* UDP recvmsg blocks after select(), 2.6 bug?
@ 2004-10-06 14:52 Joris van Rantwijk
  2004-10-06 15:01 ` David S. Miller
                   ` (4 more replies)
  0 siblings, 5 replies; 191+ messages in thread
From: Joris van Rantwijk @ 2004-10-06 14:52 UTC (permalink / raw)
  To: linux-kernel

Hello,

I have a problem where the sequence of events is as follows:
 - application does select() on a UDP socket descriptor
 - select returns success with descriptor ready for reading
 - application does recvfrom() on this descriptor and this recvfrom()
   blocks forever

My understanding of POSIX is limited, but it seems to me that a read call
must never block after select just said that it's ok to read from the
descriptor. So any such behaviour would be a kernel bug.

This problem occurs repeatedly, but only once per week on average so it is
hard to debug but definitely a real problem. I know for a fact that the
sequence of events is as described above from strace output. My kernel
version is 2.6.7.

>From a brief look at the kernel UDP code, I suspect a problem in
net/ipv4/udp.c, udp_recvmsg(): it reads the first available datagram
from the queue, then checks the UDP checksum. If the UDP checksum fails at
this point, the datagram is discarded and the process blocks until the next
datagram arrives.

Could someone please help me track this problem?
Am I correct in my reasoning that the select() -> recvmsg() sequence must
never block?
If yes, is it possible that this problem is triggered by a failed UDP
checksum in the udp_recvmsg() function?
If yes, can we do something to fix this?

Thanks,
  Joris.

^ permalink raw reply	[flat|nested] 191+ messages in thread
* Re: UDP recvmsg blocks after select(), 2.6 bug?
@ 2004-10-06 15:30 Dan Kegel
  0 siblings, 0 replies; 191+ messages in thread
From: Dan Kegel @ 2004-10-06 15:30 UTC (permalink / raw)
  To: Linux Kernel Mailing List, davem

David S. Miller wrote:
> Incorrect UDP checksums could cause the read data to
> be discarded.  We do the copy into userspace and checksum
> computation in parallel.  This is totally legal and we've
> been doing it since 2.4.x first got released.

Might there be a similar effect for packets with bad IP or TCP checksums?
(http://citeseer.ist.psu.edu/stone00when.html)

And as Bert says, Stevens mentions that with TCP accepts,
the other side might close before you call accept.

BTW this is why I insisted in JSR-51 that Java's NIO not allow
the use of Selector with blocking sockets:

http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/SelectableChannel.html
"A channel must be placed into non-blocking mode before being
registered with a selector, and may not be returned to blocking mode until it has been deregistered."

So at least Java programs that use NIO are immune to this
particular user error (unless your implementation of NIO
relaxes this sanity check, tsk).

- Dan

-- 
My technical stuff: http://kegel.com
My politics: see http://www.misleader.org for examples of why I'm for regime change

^ permalink raw reply	[flat|nested] 191+ messages in thread
* Re: UDP recvmsg blocks after select(), 2.6 bug?
@ 2004-10-07  4:50 Dan Kegel
  2004-10-07  8:04 ` bert hubert
  0 siblings, 1 reply; 191+ messages in thread
From: Dan Kegel @ 2004-10-07  4:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List, joris

Joris wrote:
> On Wed, 6 Oct 2004, Andries Brouwer wrote:
>> Does this really happen?
> 
> Yes. Finally got my raw-udp-with-wrong-checksum sending program to work
> over localhost and it hangs recvfrom pretty good.
> 
>> All kernel versions?
> 
> Quick guess: probably since late 2.4. Source of 2.4.27 udp.c is similar to
> 2.6.9, but 2.4.17 returns EAGAIN even for blocking sockets, apparently
> this was "fixed" later on.

It would be nice to know how other operating systems behave
when receiving UDP packets with bad checksums.  Can someone
try BSD and Solaris?

- Dan

-- 
My technical stuff: http://kegel.com
My politics: see http://www.misleader.org for examples of why I'm for regime change

^ permalink raw reply	[flat|nested] 191+ messages in thread
* Re: UDP recvmsg blocks after select(), 2.6 bug?
@ 2004-10-07 18:16 linux
  2004-10-09 12:07 ` Colin Phipps
  0 siblings, 1 reply; 191+ messages in thread
From: linux @ 2004-10-07 18:16 UTC (permalink / raw)
  To: linux-kernel, mark

> There is one claim I'd like to question - the claim that select()
> would be slowed down unnecessarily, even if the behaviour was changed
> for both O_NONBLOCK enabled. Isn't it more expensive to allow the
> application to be woken up, and poll using read(), than to just do a
> quick check in the kernel and not tell the application there is data,
> when there really isn't?

It depends on how often the second check fails.

The 99.9+% case is that the checksum is good, and in that case, you have
to pay the wakeup cost anyway.  (So sending bad-checksum packets isn't
even a useful DoS; good-checksum packets still steal more cycles.)

You only get the benefit of saving the wakeup if the check fails.
But every time it passes, you get the benefit of making the check cheaper.
You have to multiply by the relative probabilities to see the net effect.

For something like UDP checksums on packets which have already passed the
ethernet checksum, that's a pretty overwhelming ratio.

^ permalink raw reply	[flat|nested] 191+ messages in thread
[parent not found: <fa.haprsoi.8k8kbk@ifi.uio.no>]
* Re: UDP recvmsg blocks after select(), 2.6 bug?
@ 2004-10-19  1:21 John Pearson
  2004-10-19 13:50 ` Colin Phipps
  0 siblings, 1 reply; 191+ messages in thread
From: John Pearson @ 2004-10-19  1:21 UTC (permalink / raw)
  To: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 4381 bytes --]


This has probably been said before, but just in case I missed
it; there are many viewpoints in this thread, but the two main camps
appear to be arguing at cross-purposes.

As far as I can see:

  - YES, Linux select 'lies' and violates POSIX wrt checksums:
    a call to recvmsg() might well have blocked when select()
    said data was ready, as a result of a currupt UDP packet;

  - NO, 'fixing' select() won't guarantee that recvmsg()
    will not block/return EAGAIN, because select() only
    guarantees that a call to recvmsg() would not have blocked
    at that time - as others have observed, it cannot guarantee
    that 'valid' data won't subsequently be discarded; any
    subsequent call to recvmsg() is only 'immediate' in a fuzzy,
    imprecise and inadequate sense.

Can we get back to arguing about something less repetitive
(or at least, make the circle larger and more scenic)?



John.


On Sun, Oct 17, 2004 at 07:22:44PM +0200, Lars Marowsky-Bree wrote
> On 2004-10-17T16:17:06, Buddy Lucas <buddy.lucas@gmail.com> wrote:
> 
> > > The SuV spec is actually quite detailed about the options here:
> > > 
> > >         A descriptor shall be considered ready for reading when a call
> > >         to an input function with O_NONBLOCK clear would not block,
> > >         whether or not the function would transfer data successfully.
> > >         (The function might return data, an end-of-file indication, or
> > >         an error other than one indicating that it is blocked, and in
> > >         each of these cases the descriptor shall be considered ready for
> > >         reading.)
> > But it says nowhere that the select()/recvmsg() operation is atomic, right?
> 
> See, Buddy, the point here is that Linux _does_ violate the
> specification. You can try weaseling out of it, but it's not going to
> work.
> 
> This isn't per se the same as saying that it's not a sensible violation,
> but very clearly the specs disagree with the current Linux behaviour.
> 
> It's impossible to claim that you are allowed by the spec to block on a
> recvmsg directly following a successful select. You are not. You could
> claim that, but you'd be wrong.
> 
> If the packet has been dropped in between, which _could_ have happened
> because UDP is allowed to be dropped basically anywhere, EIO may be
> returned. But blocking or returning EAGAIN/EWOULDBLOCK is verboten. The
> spec is very clearly on that.
> 
> (Now I'd claim that returning EIO after a succesful select is also
> slightly suboptimal - the performance optimizations should be turned off
> for blocking sockets, IMHO, and the data which caused the select() to
> return should be considered comitted - but it would be allowed.)
> 
> I'm not so sure what's so hard to accept about that. It may be well that
> Linux is following the de-facto industry standard (or even setting it)
> here, and I'd agree that if you don't want blocking use O_NONBLOCK, but
> in no way can Linux claim POSIX/SuV spec compliance for this behaviour.
> 
> I'm not getting why people argue so much to try and weasel the words so
> that it comes out as compliant. It's not. It may make sense due to
> practical reasons, but it's not compliant.
> 
> 
> Sincerely,
>     Lars Marowsky-Brée <lmb@suse.de>
> 
> -- 
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX AG - A Novell company
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
> --__--__--

-- 
Voice: +61 8 8202 9040
Email: jpearson@sa.pracom.com.au

Pracom Ltd
288 Glen Osmond Road
Fullarton, South Australia 5063

Ph: + 61 8 82029000
Fax: +61 8 82029001

CAUTION: This email and any attachments may contain information that is
confidential and subject to copyright. If you are not the
intended recipient, you must not read, use, disseminate, distribute or
copy this email or any attachments. If you have received this
email in error, please notify the sender immediately by reply email and
erase this email and any attachments.

DISCLAIMER: Pracom uses virus-scanning technology but accepts no
responsibility for loss or damage arising from the use of the
information transmitted by this email including damage from virus.


^ permalink raw reply	[flat|nested] 191+ messages in thread

end of thread, other threads:[~2004-10-21 15:22 UTC | newest]

Thread overview: 191+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-06 14:52 UDP recvmsg blocks after select(), 2.6 bug? Joris van Rantwijk
2004-10-06 15:01 ` David S. Miller
2004-10-06 15:13   ` Chris Friesen
2004-10-06 15:15   ` Richard B. Johnson
2004-10-06 15:21     ` David S. Miller
2004-10-06 15:29       ` Richard B. Johnson
2004-10-06 15:42         ` David S. Miller
2004-10-06 15:57           ` Chris Friesen
2004-10-06 15:44         ` Lars Marowsky-Bree
2004-10-07  1:16         ` Paul Jakma
2004-10-07  7:10           ` Chris Friesen
2004-10-07 11:53             ` Paul Jakma
2004-10-07 13:32               ` Martijn Sipkema
2004-10-07 12:53                 ` Paul Jakma
2004-10-07 13:12                   ` Richard B. Johnson
2004-10-07 14:07                   ` Martijn Sipkema
2004-10-07 13:19                     ` Paul Jakma
2004-10-07 13:36                     ` Paul Jakma
2004-10-07 15:01                       ` Jean-Sebastien Trottier
2004-10-07 16:20                         ` Chris Friesen
2004-10-07 18:20                           ` Hua Zhong
2004-10-07 18:33                             ` Chris Friesen
2004-10-07 22:41                               ` Martijn Sipkema
2004-10-07 21:49                                 ` Chris Friesen
2004-10-07 22:00                                   ` David S. Miller
2004-10-07 22:24                                     ` Chris Friesen
2004-10-07 22:26                                       ` David S. Miller
2004-10-07 22:39                                         ` Chris Friesen
2004-10-07 22:42                                           ` David S. Miller
2004-10-07 23:27                                             ` Chris Friesen
2004-10-08  0:04                                               ` Ben Greear
2004-10-08  2:51                                             ` Mark Mielke
2004-10-08  3:39                                               ` David S. Miller
2004-10-08  3:48                                                 ` Mark Mielke
2004-10-08  3:59                                                   ` David S. Miller
2004-10-07 23:19                                     ` Martijn Sipkema
2004-10-07 22:24                                       ` David S. Miller
2004-10-07 22:33                                         ` Alan Curry
2004-10-07 22:42                                         ` Mark Mielke
2004-10-07 22:47                                           ` David S. Miller
2004-10-07 23:00                                             ` Mark Mielke
2004-10-07 23:07                                               ` David S. Miller
2004-10-08  6:10                                               ` Theodore Ts'o
2004-10-08 15:20                                                 ` Mark Mielke
2004-10-08  0:37                                             ` Lee Revell
2004-10-07 22:46                                         ` Hua Zhong
2004-10-07 22:48                                           ` David S. Miller
2004-10-07 23:17                                   ` Martijn Sipkema
2004-10-07 13:45                     ` Alan Cox
2004-10-07 16:32                       ` Martijn Sipkema
2004-10-07 14:50                         ` Alan Cox
2004-10-07 21:58                           ` mmap specification - was: ... select specification Andries Brouwer
2004-10-07 22:17                             ` Chris Wedgwood
2004-10-07 22:34                               ` Andries Brouwer
2004-10-07 22:32                             ` Kyle Moffett
2004-10-07 22:46                               ` Andries Brouwer
2004-10-07 23:30                                 ` Kyle Moffett
2004-10-08  9:19                                   ` Andries Brouwer
2004-10-09 21:10                                     ` Martijn Sipkema
2004-10-07 13:48                     ` UDP recvmsg blocks after select(), 2.6 bug? Alan Cox
2004-10-07 14:57                       ` Richard B. Johnson
2004-10-07 15:18                       ` Adam Heath
2004-10-07 16:39                         ` Martijn Sipkema
2004-10-07 16:09                           ` Mark Mielke
2004-10-07 17:18                             ` Chris Friesen
2004-10-06 15:31       ` Chris Friesen
2004-10-06 15:41         ` David S. Miller
2004-10-06 16:07           ` Richard B. Johnson
2004-10-06 16:57           ` Neil Horman
2004-10-06 15:59       ` Paul Jackson
2004-10-06 16:35       ` Martijn Sipkema
2004-10-06 15:30     ` Chris Friesen
2004-10-06 15:09 ` Richard B. Johnson
2004-10-06 15:18 ` bert hubert
2004-10-06 16:41 ` Alan Cox
2004-10-06 18:04   ` Joris van Rantwijk
2004-10-06 19:30     ` Andries Brouwer
2004-10-06 19:23       ` Alan Cox
2004-10-06 22:08         ` Martijn Sipkema
2004-10-06 20:25           ` Alan Cox
2004-10-06 22:15             ` Andries Brouwer
2004-10-06 22:32               ` David S. Miller
2004-10-06 23:25               ` YOSHIFUJI Hideaki / 吉藤英明
2004-10-06 23:11             ` Willy Tarreau
2004-10-06 19:43       ` Hua Zhong
2004-10-06 19:54         ` Chris Friesen
2004-10-06 19:59           ` Hua Zhong
2004-10-06 20:10             ` Chris Friesen
2004-10-06 21:45               ` Martijn Sipkema
2004-10-06 23:35                 ` David S. Miller
2004-10-06 20:06           ` David S. Miller
2004-10-06 20:18             ` Chris Friesen
2004-10-06 20:26               ` Hua Zhong
2004-10-06 20:38               ` Andries Brouwer
2004-10-06 20:58                 ` Joris van Rantwijk
2004-10-06 22:29                 ` David S. Miller
2004-10-07 16:08                 ` Adrian Phillips
2004-10-06 20:06         ` Olivier Galibert
2004-10-06 23:35           ` David S. Miller
2004-10-07  0:19             ` Olivier Galibert
2004-10-07  0:29               ` David S. Miller
2004-10-07 10:56                 ` Martijn Sipkema
2004-10-08  6:41                 ` Willy Tarreau
2004-10-08 15:27                   ` Mark Mielke
2004-10-15 22:42                   ` Robert White
2004-10-15 23:33                     ` David Schwartz
2004-10-16  0:59                       ` Chris Friesen
2004-10-16  2:35                       ` Mark Mielke
2004-10-16  4:23                         ` David Schwartz
2004-10-16  4:35                           ` Mark Mielke
2004-10-16  4:58                             ` David Schwartz
2004-10-16  6:25                               ` Mark Mielke
2004-10-16 21:44                                 ` Roland Kuhn
2004-10-17  0:06                                   ` Mark Mielke
2004-10-17  0:30                                     ` David Schwartz
2004-10-17 14:47                                       ` Mark Mielke
2004-10-17  0:28                                 ` David Schwartz
2004-10-17 13:35                                   ` Lars Marowsky-Bree
2004-10-17 14:17                                     ` Buddy Lucas
2004-10-17 15:05                                       ` Mark Mielke
2004-10-17 15:40                                         ` Buddy Lucas
2004-10-17 16:13                                           ` Lee Revell
2004-10-17 17:35                                           ` Jesper Juhl
2004-10-17 18:04                                             ` Buddy Lucas
2004-10-17 18:06                                               ` Lars Marowsky-Bree
2004-10-17 18:21                                                 ` Buddy Lucas
2004-10-17 20:04                                                 ` Martijn Sipkema
2004-10-17 20:08                                                   ` Lars Marowsky-Bree
2004-10-17 17:35                                           ` Martijn Sipkema
2004-10-17 17:33                                             ` Buddy Lucas
2004-10-17 19:58                                               ` Martijn Sipkema
2004-10-17 19:33                                                 ` Buddy Lucas
2004-10-17 20:11                                                   ` Lars Marowsky-Bree
2004-10-17 20:25                                                     ` Buddy Lucas
2004-10-17 20:42                                                   ` Martijn Sipkema
2004-10-17 20:02                                                     ` Buddy Lucas
2004-10-17 18:53                                             ` David Schwartz
2004-10-17 19:26                                               ` Hua Zhong
2004-10-17 20:32                                               ` Martijn Sipkema
2004-10-17 19:21                                           ` Hua Zhong
2004-10-17 17:22                                       ` Lars Marowsky-Bree
2004-10-17 17:54                                         ` Buddy Lucas
2004-10-17 18:05                                           ` Lars Marowsky-Bree
2004-10-17 18:06                                           ` Mark Mielke
2004-10-20 21:31                                     ` H. Peter Anvin
2004-10-20 21:58                                       ` Chris Friesen
2004-10-20 22:00                                         ` H. Peter Anvin
2004-10-20 22:12                                           ` Chris Friesen
2004-10-20 23:16                                             ` David Schwartz
2004-10-21  1:03                                               ` Chris Friesen
2004-10-21  1:38                                                 ` David Schwartz
2004-10-21  3:01                                           ` Michael Clark
2004-10-21  3:52                                             ` Michael Clark
2004-10-21  4:10                                             ` H. Peter Anvin
2004-10-21  5:06                                               ` Chris Friesen
2004-10-21  5:11                                                 ` H. Peter Anvin
2004-10-21  5:50                                                   ` Chris Friesen
2004-10-21  5:58                                                     ` H. Peter Anvin
2004-10-21 15:18                                                       ` Chris Friesen
2004-10-21  6:14                                                     ` Michael Clark
2004-10-17 14:52                                   ` Mark Mielke
2004-10-16 18:25                               ` Andries Brouwer
2004-10-17  0:28                                 ` David Schwartz
2004-10-17 12:22                                   ` Andries Brouwer
2004-10-16 10:24                     ` Willy Tarreau
2004-10-16 13:21                       ` Mark Mielke
2004-10-18 22:25                       ` Robert White
2004-10-06 20:41         ` Neil Horman
2004-10-06 22:27           ` Chris Friesen
2004-10-06 23:32             ` Neil Horman
2004-10-06 23:36             ` David S. Miller
2004-10-07 19:31 ` David Schwartz
2004-10-07 22:36   ` Martijn Sipkema
2004-10-08  0:19     ` David Schwartz
2004-10-09 19:21       ` Martijn Sipkema
2004-10-09 18:28         ` David Schwartz
2004-10-09 18:49           ` Mark Mielke
2004-10-09 21:00             ` Martijn Sipkema
2004-10-09 22:59               ` Mark Mielke
  -- strict thread matches above, loose matches on Subject: below --
2004-10-06 15:30 Dan Kegel
2004-10-07  4:50 Dan Kegel
2004-10-07  8:04 ` bert hubert
2004-10-07  8:28   ` Adam Heath
2004-10-07 10:38     ` Martijn Sipkema
2004-10-07 10:07       ` Adam Heath
2004-10-07 11:29         ` Martijn Sipkema
2004-10-07 18:16 linux
2004-10-09 12:07 ` Colin Phipps
     [not found] <fa.haprsoi.8k8kbk@ifi.uio.no>
     [not found] ` <fa.isqjio8.ok2coo@ifi.uio.no>
2004-10-09 13:24   ` Bodo Eggert
2004-10-19  1:21 John Pearson
2004-10-19 13:50 ` Colin Phipps

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox