netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Olaf Kirch <okir@suse.de>
To: netdev@oss.sgi.com
Subject: [PATCH] 2.6 UDP recvmsg vs POSIX
Date: Tue, 17 Feb 2004 13:17:26 +0100	[thread overview]
Message-ID: <20040217121726.GD8554@suse.de> (raw)

[-- Attachment #1: Type: text/plain, Size: 1454 bytes --]

Hi,

I'm currently investigating a problem in udp recvmsg. In short, a test
program selects on the socket, gets woken up as a packet arrives, but
recvmsg return EAGAIN because the UDP checksum was wrong.

This broke one real-life application, and reportedly this violates POSIX,
as operations on a blocking socket must never return EAGAIN.

The quick and dirty fix would be to always do the checksum in
udp_queue_rcv_skb, but I assume this "checksum-outside-the-bottom-half"
approach is there because it improves performance, so I've tried to
come up with a different solution.

Please find attached a patch that tries to pluck bad udp packets from the
receive queue as part of poll/select handling, so that we never mark a
udp socket as POLLIN when there are only bad packets in the queue. This
is admittedly not very elegant, and it may have side effects I'm not
seeing yet. But it seems to fix things for me here, without slowing
the bottom half handlers down.

The second half of the patch deals with bad checksums in recvmsg.  If an
application calls recvmsg on a blocking socket, and we find a packet
with a bad checksum (this may happen if the application didn't select
first), we start over at the top of the function, so that we block in
skb_datagram_receive rather than returning EAGAIN.

Any comments welcome.

Olaf
-- 
Olaf Kirch     |  Stop wasting entropy - start using predictable
okir@suse.de   |  tempfile names today!
---------------+ 

[-- Attachment #2: linux-2.6.2-udpfix.patch --]
[-- Type: text/plain, Size: 3335 bytes --]

--- linux-2.6.2/include/net/udp.h.udpfix	2004-02-17 10:10:03.000000000 +0100
+++ linux-2.6.2/include/net/udp.h	2004-02-17 10:10:18.000000000 +0100
@@ -74,6 +74,10 @@
 extern int	udp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 extern int	udp_disconnect(struct sock *sk, int flags);
 
+extern unsigned int udp_poll(struct file *file,
+			    struct socket *sock,
+			    struct poll_table_struct *wait);
+
 DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
 #define UDP_INC_STATS(field)		SNMP_INC_STATS(udp_statistics, field)
 #define UDP_INC_STATS_BH(field)		SNMP_INC_STATS_BH(udp_statistics, field)
--- linux-2.6.2/net/ipv4/af_inet.c.udpfix	2004-02-04 04:43:12.000000000 +0100
+++ linux-2.6.2/net/ipv4/af_inet.c	2004-02-17 10:36:00.000000000 +0100
@@ -912,7 +912,7 @@
 	.socketpair =	sock_no_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	inet_getname,
-	.poll =		datagram_poll,
+	.poll =		udp_poll,
 	.ioctl =	inet_ioctl,
 	.listen =	sock_no_listen,
 	.shutdown =	inet_shutdown,
--- linux-2.6.2/net/ipv4/udp.c.udpfix	2004-02-13 16:07:30.000000000 +0100
+++ linux-2.6.2/net/ipv4/udp.c	2004-02-17 10:35:41.000000000 +0100
@@ -766,6 +766,64 @@
 }
 
 /*
+ * Packets queued on the UDP socket may have a bad checksum,
+ * and need to be discarded. On the other hand, recvmsg is not
+ * allowed to return EAGAIN on a blocking socket, especially
+ * if select() indicated that data is available.
+ *
+ * To solve this, udp_poll makes sure that the next packet in
+ * the queue has a valid checksum. This fixes the select/recvmsg
+ * case.
+ * The general case of recvmsg on a blocking socket is solved by
+ * simply looping over in udp_recvmsg when the checksum is bad. --okir
+ */
+unsigned int
+udp_poll(struct file *file, struct socket *sock, struct poll_table_struct *wait)
+{
+	struct sock *sk = sock->sk;
+  	struct sk_buff *skb;
+	int error;
+
+	do {
+		spin_lock_irq(&sk->sk_receive_queue.lock);
+		skb = skb_peek(&sk->sk_receive_queue);
+		if (skb != NULL)
+			atomic_inc(&skb->users);
+		spin_unlock_irq(&sk->sk_receive_queue.lock);
+
+		if (skb == NULL)
+			break;
+
+		error = 0;
+		if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
+			/* All is well */
+		} else if (!(error = __udp_checksum_complete(skb))) {
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
+		} else {
+			/* Bad checksum. Pluck it from there queue */
+			UDP_INC_STATS_BH(UdpInErrors);
+
+			spin_lock_irq(&sk->sk_receive_queue.lock);
+			if (skb == skb_peek(&sk->sk_receive_queue)) {
+				__skb_unlink(skb, &sk->sk_receive_queue);
+				spin_unlock_irq(&sk->sk_receive_queue.lock);
+				kfree_skb(skb);
+			} else {
+				spin_unlock_irq(&sk->sk_receive_queue.lock);
+			}
+		}
+
+		skb_free_datagram(sk, skb);
+	} while (error);
+
+	/* We have made sure there's at least one valid
+	 * UDP packet in the queue, so do the normal datagram
+	 * poll now */
+	return datagram_poll(file, sock, wait);
+}
+
+
+/*
  * 	This should be easy, if there is something there we
  * 	return it, otherwise we block.
  */
@@ -787,6 +845,7 @@
 	if (flags & MSG_ERRQUEUE)
 		return ip_recv_error(sk, msg, len);
 
+try_again:
 	skb = skb_recv_datagram(sk, flags, noblock, &err);
 	if (!skb)
 		goto out;
@@ -852,7 +911,7 @@
 
 	skb_free_datagram(sk, skb);
 
-	return -EAGAIN;	
+	goto try_again;
 }
 
 int udp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)

             reply	other threads:[~2004-02-17 12:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-17 12:17 Olaf Kirch [this message]
2004-02-18  4:30 ` [PATCH] 2.6 UDP recvmsg vs POSIX Andi Kleen
2004-02-18  6:17 ` David S. Miller
2004-02-19  7:07   ` Andi Kleen
2004-02-21 21:08 ` David S. Miller
2004-02-23  9:42   ` Olaf Kirch
2004-02-23 17:42     ` David S. Miller
2004-02-24  9:32       ` Olaf Kirch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040217121726.GD8554@suse.de \
    --to=okir@suse.de \
    --cc=netdev@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).