netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] UDP select handling of bad checksums.
@ 2004-11-03  0:24 Stephen Hemminger
  2004-11-03  0:52 ` Mitchell Blank Jr
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2004-11-03  0:24 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-net

This patch addresses the issue of blocking usage of select() by UDP applications.
The problem is Linux optimizes the UDP receive checksum path so that checksum
validation is not performed until the application receive. This is a performance win
but can cause applications that do select with blocking file descriptors to get false
positives. There is a long running thread about this on LKML, as well as the cause
of http://bugme.osdl.org/show_bug.cgi?id=3610

This patch makes these applications work, but keeps the one-pass performance gain
for those applications smart enough to use non-blocking file descriptors with
select/poll. There is still a possibility to get a false positive if application does
select on non-blocking fd then makes it blocking before doing the receive, but that
is unlikely.

Tested by injecting bad packets with SOCK_RAW.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>


diff -Nru a/include/net/udp.h b/include/net/udp.h
--- a/include/net/udp.h	2004-11-02 16:16:29 -08:00
+++ b/include/net/udp.h	2004-11-02 16:16:29 -08:00
@@ -71,6 +71,8 @@
 extern int	udp_rcv(struct sk_buff *skb);
 extern int	udp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 extern int	udp_disconnect(struct sock *sk, int flags);
+extern unsigned int udp_poll(struct file *file, struct socket *sock,
+			     poll_table *wait);
 
 DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
 #define UDP_INC_STATS(field)		SNMP_INC_STATS(udp_statistics, field)
diff -Nru a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
--- a/net/ipv4/af_inet.c	2004-11-02 16:16:29 -08:00
+++ b/net/ipv4/af_inet.c	2004-11-02 16:16:29 -08:00
@@ -809,7 +809,7 @@
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	inet_getname,
-	.poll =		datagram_poll,
+	.poll =		udp_poll,
 	.ioctl =	inet_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	inet_shutdown,
diff -Nru a/net/ipv4/udp.c b/net/ipv4/udp.c
--- a/net/ipv4/udp.c	2004-11-02 16:16:29 -08:00
+++ b/net/ipv4/udp.c	2004-11-02 16:16:29 -08:00
@@ -1303,6 +1303,67 @@
   	return 0;
 }
 
+static inline int udp_recv_ready(const struct file *file, 
+			  struct sk_buff_head *rcvq)
+{
+	struct sk_buff *skb;
+
+	/* If non-blocking, can use faster single pass method. */
+	if (file->f_flags & O_NONBLOCK)
+		return !skb_queue_empty(rcvq);
+	
+	spin_lock_irq(&rcvq->lock);
+	while ((skb = skb_peek(rcvq)) != NULL) {
+		/* checksum is wrong, silently remove it. */
+		if (udp_checksum_complete(skb))
+			__skb_unlink(skb, rcvq);
+		else {
+			/* no need to rescan */
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
+			break;
+		}
+	}
+	spin_unlock_irq(&rcvq->lock);
+
+	return skb != NULL;
+}
+
+/*
+ *	Wait for a UDP event.
+ *
+ *	This is same as datagram poll, except for the special case of 
+ *	blocking sockets. UDP doesn't checksum data until it is copied
+ *	to the application on receive. So handle that special case here.
+ */
+unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait)
+{
+	struct sock *sk = sock->sk;
+	unsigned int mask;
+
+	poll_wait(file, sk->sk_sleep, wait);
+	mask = 0;
+
+	/* exceptional events? */
+	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
+		mask |= POLLERR;
+
+	if (sk->sk_shutdown == SHUTDOWN_MASK)
+		mask |= POLLHUP;
+
+	/* readable? */
+	if ( udp_recv_ready(file, &sk->sk_receive_queue) ||
+	     (sk->sk_shutdown & RCV_SHUTDOWN))
+		mask |= POLLIN | POLLRDNORM;
+
+	/* writable? */
+	if (sock_writeable(sk))
+		mask |= POLLOUT | POLLWRNORM | POLLWRBAND;
+	else
+		set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags);
+
+	return mask;
+	
+}
 
 struct proto udp_prot = {
  	.name =		"UDP",
@@ -1516,6 +1577,7 @@
 EXPORT_SYMBOL(udp_port_rover);
 EXPORT_SYMBOL(udp_prot);
 EXPORT_SYMBOL(udp_sendmsg);
+EXPORT_SYMBOL(udp_poll);
 
 #ifdef CONFIG_PROC_FS
 EXPORT_SYMBOL(udp_proc_register);
diff -Nru a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
--- a/net/ipv6/af_inet6.c	2004-11-02 16:16:29 -08:00
+++ b/net/ipv6/af_inet6.c	2004-11-02 16:16:29 -08:00
@@ -501,7 +501,7 @@
 	.socketpair =	sock_no_socketpair,		/* a do nothing	*/
 	.accept =	sock_no_accept,			/* a do nothing	*/
 	.getname =	inet6_getname, 
-	.poll =		datagram_poll,			/* ok		*/
+	.poll =		udp_poll,			/* ok		*/
 	.ioctl =	inet6_ioctl,			/* must change  */
 	.listen =	sock_no_listen,			/* ok		*/
 	.shutdown =	inet_shutdown,			/* ok		*/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] UDP select handling of bad checksums.
  2004-11-03  0:24 [PATCH] UDP select handling of bad checksums Stephen Hemminger
@ 2004-11-03  0:52 ` Mitchell Blank Jr
  2004-11-03 21:20   ` Stephen Hemminger
  2004-11-22 18:09   ` Stephen Hemminger
  0 siblings, 2 replies; 5+ messages in thread
From: Mitchell Blank Jr @ 2004-11-03  0:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev, linux-net

Stephen Hemminger wrote:
> This patch addresses the issue of blocking usage of select() by UDP applications.

I'm glad to see someone actually putting some code forward in this debate...
Looks pretty good, but can't you implement this a bit cleaner by just
wrapping datagram_poll?  Something like:

unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait)
{
   unsigned int mask = datagram_poll(file, sock, wait);

  if ((mask & POLLRDNORM) != 0 && (file->f_flags & O_NONBLOCK) == 0 && 
      (sk->sk_shutdown & RCV_SHUTDOWN) == 0) {
	struct sk_buff_head *rcvq = &sk->sk_receive_queue;
	struct sk_buff *skb;
	spin_lock_irq(&rcvq->lock);
	// the skb_peek() loop from your udp_rcv_ready() goes here...
	spin_unlock_irq(&rcvq->lock);
	if (skb == NULL)	/* nope, nothing really ready */
	  mask &= ~(POLLIN | POLLRDNORM);
  }

  return mask;
}

That way you duplicate a lot less code.  It does slightly more work but
only in the broken !O_NONBLOCK case - the fast path is just as quick.

-Mitch

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] UDP select handling of bad checksums.
  2004-11-03  0:52 ` Mitchell Blank Jr
@ 2004-11-03 21:20   ` Stephen Hemminger
  2004-11-22 18:09   ` Stephen Hemminger
  1 sibling, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2004-11-03 21:20 UTC (permalink / raw)
  To: Mitchell Blank Jr; +Cc: David S. Miller, netdev, linux-net

On Tue, 2 Nov 2004 16:52:53 -0800
Mitchell Blank Jr <mitch@sfgoth.com> wrote:

> Stephen Hemminger wrote:
> > This patch addresses the issue of blocking usage of select() by UDP applications.
> 
> I'm glad to see someone actually putting some code forward in this debate...
> Looks pretty good, but can't you implement this a bit cleaner by just
> wrapping datagram_poll?  Something like:

But the wrapper was bigger and uglier than just doing it.
Especially since UDP doesn't need to connection oriented checks.
So the fast path code in UDP is actually faster than datagram_poll.

> 
> That way you duplicate a lot less code.  It does slightly more work but
> only in the broken !O_NONBLOCK case - the fast path is just as quick.
> 
> -Mitch

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] UDP select handling of bad checksums.
  2004-11-03  0:52 ` Mitchell Blank Jr
  2004-11-03 21:20   ` Stephen Hemminger
@ 2004-11-22 18:09   ` Stephen Hemminger
  2004-12-01  5:49     ` David S. Miller
  1 sibling, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2004-11-22 18:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: Mitchell Blank Jr, netdev, linux-net

Alternate workaround for blocking usage of select() by UDP applications.
The problem is Linux optimizes the UDP receive checksum path so that checksum
validation is not performed until the application read. This is a performance win
but can cause applications that do select with blocking file descriptors to get false
positives if the received message has a checksum error.
There is a long running thread about this on LKML.

This patch makes these applications work, but keeps the one-pass performance gain
for those applications smart enough to use non-blocking file descriptors with
select/poll. There is still a possibility to get a false positive if application does
select on non-blocking fd then makes it blocking before doing the receive, but that
is unlikely.

Tested by injecting bad packets with SOCK_RAW.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

diff -Nru a/include/net/udp.h b/include/net/udp.h
--- a/include/net/udp.h	2004-11-19 10:37:56 -08:00
+++ b/include/net/udp.h	2004-11-19 10:37:56 -08:00
@@ -71,6 +71,8 @@
 extern int	udp_rcv(struct sk_buff *skb);
 extern int	udp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 extern int	udp_disconnect(struct sock *sk, int flags);
+extern unsigned int udp_poll(struct file *file, struct socket *sock,
+			     poll_table *wait);
 
 DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
 #define UDP_INC_STATS(field)		SNMP_INC_STATS(udp_statistics, field)
diff -Nru a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
--- a/net/ipv4/af_inet.c	2004-11-19 10:37:56 -08:00
+++ b/net/ipv4/af_inet.c	2004-11-19 10:37:56 -08:00
@@ -809,7 +809,7 @@
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	inet_getname,
-	.poll =		datagram_poll,
+	.poll =		udp_poll,
 	.ioctl =	inet_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	inet_shutdown,
diff -Nru a/net/ipv4/udp.c b/net/ipv4/udp.c
--- a/net/ipv4/udp.c	2004-11-19 10:37:56 -08:00
+++ b/net/ipv4/udp.c	2004-11-19 10:37:56 -08:00
@@ -1303,6 +1303,52 @@
   	return 0;
 }
 
+/**
+ * 	udp_poll - wait for a UDP event.
+ *	@file - file struct
+ *	@sock - socket
+ *	@wait - poll table
+ *
+ *	This is same as datagram poll, except for the special case of 
+ *	blocking sockets. If application is using a blocking fd
+ *	and a packet with checksum error is in the queue;
+ *	then it could get return from select indicating data available
+ *	but then block when reading it. Add special case code
+ *	to work around these arguably broken applications.
+ */
+unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait)
+{
+	unsigned int mask = datagram_poll(file, sock, wait);
+	struct sock *sk = sock->sk;
+	
+	/* Check for false positives due to checksum errors */
+	if ( (mask & POLLRDNORM) &&
+	     !(file->f_flags & O_NONBLOCK) &&
+	     !(sk->sk_shutdown & RCV_SHUTDOWN)){
+		struct sk_buff_head *rcvq = &sk->sk_receive_queue;
+		struct sk_buff *skb;
+
+		spin_lock_irq(&rcvq->lock);
+		while ((skb = skb_peek(rcvq)) != NULL) {
+			if (udp_checksum_complete(skb)) {
+				UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+				__skb_unlink(skb, rcvq);
+				kfree_skb(skb);
+			} else {
+				skb->ip_summed = CHECKSUM_UNNECESSARY;
+				break;
+			}
+		}
+		spin_unlock_irq(&rcvq->lock);
+
+		/* nothing to see, move along */
+		if (skb == NULL)
+			mask &= ~(POLLIN | POLLRDNORM);
+	}
+
+	return mask;
+	
+}
 
 struct proto udp_prot = {
  	.name =		"UDP",
@@ -1516,6 +1562,7 @@
 EXPORT_SYMBOL(udp_port_rover);
 EXPORT_SYMBOL(udp_prot);
 EXPORT_SYMBOL(udp_sendmsg);
+EXPORT_SYMBOL(udp_poll);
 
 #ifdef CONFIG_PROC_FS
 EXPORT_SYMBOL(udp_proc_register);
diff -Nru a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
--- a/net/ipv6/af_inet6.c	2004-11-19 10:37:56 -08:00
+++ b/net/ipv6/af_inet6.c	2004-11-19 10:37:56 -08:00
@@ -501,7 +501,7 @@
 	.socketpair =	sock_no_socketpair,		/* a do nothing	*/
 	.accept =	sock_no_accept,			/* a do nothing	*/
 	.getname =	inet6_getname, 
-	.poll =		datagram_poll,			/* ok		*/
+	.poll =		udp_poll,			/* ok		*/
 	.ioctl =	inet6_ioctl,			/* must change  */
 	.listen =	sock_no_listen,			/* ok		*/
 	.shutdown =	inet_shutdown,			/* ok		*/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] UDP select handling of bad checksums.
  2004-11-22 18:09   ` Stephen Hemminger
@ 2004-12-01  5:49     ` David S. Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David S. Miller @ 2004-12-01  5:49 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: mitch, netdev, linux-net

On Mon, 22 Nov 2004 10:09:07 -0800
Stephen Hemminger <shemminger@osdl.org> wrote:

> Alternate workaround for blocking usage of select() by UDP applications.

Ok, I've applied this to my 2.6.x tree.

Please provide a 2.4.x backport, it needs this too.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-12-01  5:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-03  0:24 [PATCH] UDP select handling of bad checksums Stephen Hemminger
2004-11-03  0:52 ` Mitchell Blank Jr
2004-11-03 21:20   ` Stephen Hemminger
2004-11-22 18:09   ` Stephen Hemminger
2004-12-01  5:49     ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).